|[egit-dev] Re: patch/GetTextTest.testGetText_Convert() question|
Meinrad Recheis <meinrad.recheis@xxxxxxxxx> wrote: > I got a question about a test case that is failing on Windows *both* > in Java (see your CI server) and in C#. Of course we can not > completely exclude the possibility of a porting error. I am also not > an expert on encoding so I'd like to ask you something: > > If one changes the line > > exp = exp.replace("\303\205ngstr\303\266m", "\u00c5ngstr\u00f6m"); > > to > > exp = exp.replace("\u00C3\u0085ngstr\u00C3\u00B6m", "\u00c5ngstr\u00f6m"); > > which is just a different representation of the same string (isn't > it?) then the test passes in C# on Windows. However, when doing this > with the same line in testGetText_DiffCc() then the latter fails in C# > on Windows. Because of this strange behavior I am not sure if the fix > I found really is a fix or is just masking the real bug (which I > suspect). It does seem like your change should have no effect. So I'm equally confused about why it would work when you change it. I would blame it on the compiler not supporting the escape we are using, but both the Java and the C# compilers are having an issue here, so it must be our test case. > Right now it is not possible for me to say *what exactly* is the > expected result for sure. The problem could be a mistake in the test > or in the system or in the patch code or in both. > > Would it be possible to request, that the original author of the test > (from the copyright it must be some guy from Google) rewrites it in > order to make the intent of the test case unmistakably clear? I think the original author may have been me. The comment above it tries to explain: // Read the original file as ISO-8859-1 and fix up the one place // where we changed the character encoding. That makes the exp // string match what we really expect to get back. The point of the test is that the patch contents are expected to be in UTF-8 encoding, but we originally parsed it in ISO-8859-1, so multi-byte UTF-8 sequences are currently separate chars. The getScriptText(Charset, Charset) method is supposed to perform a transcoding of the content into thew 2nd charset, thus fixing the multi-byte UTF-8 sequences to be correct. That exp.replace call is trying to preform that fixup *without* going through the same code path, so we can compare the two strings and validate the result is correct. > I think > asserting raw byte sequences for equality instead of unicode strings > would make it clear enough. That would probably make it possible to > fix the issue on Windows both in java and in C#. I'm not sure how to assert a raw byte sequence here. The test is about decoding a byte sequence into a character sequence, given a guess about the character encoding. If we convert back to a byte sequence, can we still assert that the intermediate character sequence was correct? -- Shawn.
Back to the top