[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[egit-dev] Re: patch/GetTextTest.testGetText_Convert() question
|
Meinrad Recheis <meinrad.recheis@xxxxxxxxx> wrote:
> I got a question about a test case that is failing on Windows *both*
> in Java (see your CI server) and in C#. Of course we can not
> completely exclude the possibility of a porting error. I am also not
> an expert on encoding so I'd like to ask you something:
>
> If one changes the line
>
> exp = exp.replace("\303\205ngstr\303\266m", "\u00c5ngstr\u00f6m");
>
> to
>
> exp = exp.replace("\u00C3\u0085ngstr\u00C3\u00B6m", "\u00c5ngstr\u00f6m");
>
> which is just a different representation of the same string (isn't
> it?) then the test passes in C# on Windows. However, when doing this
> with the same line in testGetText_DiffCc() then the latter fails in C#
> on Windows. Because of this strange behavior I am not sure if the fix
> I found really is a fix or is just masking the real bug (which I
> suspect).
It does seem like your change should have no effect. So I'm equally
confused about why it would work when you change it. I would
blame it on the compiler not supporting the escape we are using,
but both the Java and the C# compilers are having an issue here,
so it must be our test case.
> Right now it is not possible for me to say *what exactly* is the
> expected result for sure. The problem could be a mistake in the test
> or in the system or in the patch code or in both.
>
> Would it be possible to request, that the original author of the test
> (from the copyright it must be some guy from Google) rewrites it in
> order to make the intent of the test case unmistakably clear?
I think the original author may have been me. The comment above it
tries to explain:
// Read the original file as ISO-8859-1 and fix up the one place
// where we changed the character encoding. That makes the exp
// string match what we really expect to get back.
The point of the test is that the patch contents are expected to
be in UTF-8 encoding, but we originally parsed it in ISO-8859-1,
so multi-byte UTF-8 sequences are currently separate chars.
The getScriptText(Charset, Charset) method is supposed to perform
a transcoding of the content into thew 2nd charset, thus fixing
the multi-byte UTF-8 sequences to be correct.
That exp.replace call is trying to preform that fixup *without*
going through the same code path, so we can compare the two strings
and validate the result is correct.
> I think
> asserting raw byte sequences for equality instead of unicode strings
> would make it clear enough. That would probably make it possible to
> fix the issue on Windows both in java and in C#.
I'm not sure how to assert a raw byte sequence here. The test is
about decoding a byte sequence into a character sequence, given
a guess about the character encoding. If we convert back to a
byte sequence, can we still assert that the intermediate character
sequence was correct?
--
Shawn.