Guys,
Consider the following block of code:
for (int codePoint = 0; codePoint <=
Character.MAX_CODE_POINT; ++codePoint)
{
if (Character.isDefined(codePoint) &&
Character.isWhitespace(codePoint) !=
UCharacter.isWhitespace(codePoint))
{
System.err.println("Character and UCharacter dissagree on
codePoint=" + codePoint);
System.err.println(" Character.isWhitespace(" + codePoint +
") == " + Character.isWhitespace(codePoint));
System.err.println(" UCharacter.isWhitespace(" + codePoint +
") == " + UCharacter.isWhitespace(codePoint));
}
}
It produces the following trace
Character and UCharacter dissagree on codePoint=8199
Character.isWhitespace(8199) == false
UCharacter.isWhitespace(8199) == true
Character and UCharacter dissagree on codePoint=8203
Character.isWhitespace(8203) == true
UCharacter.isWhitespace(8203) == false
It's a bit disconcerting that they disagree. What characters are
these? Why is there disagreement on these specific two? Is there one
that's more properly correct and why? What are the implications of
getting these characters "wrong" in any particular application, if
there is a right or wrong.
here's how they are documented:
UCharacter.isWhitespace
Determines if the specified code point is a white space character.
A code point is considered to be an whitespace character if and only if
it satisfies one of the following criteria:
- It is a Unicode space separator (category "Zs"), but is not a
no-break space (\u00A0 or \u202F or \uFEFF).
- It is a Unicode line separator (category "Zl").
- It is a Unicode paragraph separator (category "Zp").
- It is \u0009, HORIZONTAL TABULATION.
- It is \u000A, LINE FEED.
- It is \u000B, VERTICAL TABULATION.
- It is \u000C, FORM FEED.
- It is \u000D, CARRIAGE RETURN.
- It is \u001C, FILE SEPARATOR.
- It is \u001D, GROUP SEPARATOR.
- It is \u001E, RECORD SEPARATOR.
- It is \u001F, UNIT SEPARATOR.
This API tries to synch to the semantics of the Java API,
java.lang.Character.isWhitespace().
Character.isWhitespace
Determines if the specified character (Unicode code point) is white
space according to Java. A character is a Java whitespace character if
and only if it satisfies one of the following criteria:
- It is a Unicode space character (
SPACE_SEPARATOR ,
LINE_SEPARATOR ,
or PARAGRAPH_SEPARATOR )
but is not also a non-breaking space ('\u00A0' , '\u2007' ,
'\u202F' ).
- It is
'\u0009' , HORIZONTAL TABULATION.
- It is
'\u000A' , LINE FEED.
- It is
'\u000B' , VERTICAL TABULATION.
- It is
'\u000C' , FORM FEED.
- It is
'\u000D' , CARRIAGE RETURN.
- It is
'\u001C' , FILE SEPARATOR.
- It is
'\u001D' , GROUP SEPARATOR.
- It is
'\u001E' , RECORD SEPARATOR.
- It is
'\u001F' , UNIT SEPARATOR.
My favorite comment is "This API tries to synch to the semantics of
the Java API, java.lang.Character.isWhitespace(). " Maybe it
could try harder! :-P
Regards,
Ed
Thomas Hallgren wrote:
Hi Igor,
You can safely fall back to using the Character.isWhitespace(). From
all I know, there is in fact no difference between UCharacter and
Character in that particular method. They both fall back on Unicode and
a special set of Java rules.
Also, to my knowledge, the NLS support provided by the Subversion
libraries that you are on top of is rudimentary. I doubt very much that
it goes beyond whats provided by the standard Java platform which would
make the use of ICU4J completely redundant for all your bundles. Does
Subversion support a Hebrew Calendar?
Regards,
Thomas Hallgren
Igor V. Burilo wrote:
Hello All,
We reviewed Subversive's UI
plugin (org.eclipse.team.svn.ui) and found that it also uses UCharacter
class which is the only one class which makes it impossible to use
com.ibm.use.base. We use only UCharacter.isWhitespace method. Is
there any workaround to replace its usage in order not to have
dependency to com.ibm.icu plugin, probably we can use
java.lang.Character instead of it? Is it acceptable that Subversive has
direct dependecy on com.ibm.icu plugin?
Best
regards,
Burilo Igor
John Arthorne wrote:
I agree you should be able to
avoid using ICU4J in headless applications, if you don't need to build
locale-specific representations of dates, times, etc. We don't use
ICU4J at all in non-UI bundles in Equinox, and the Platform core
bundles also don't use it for the most part. Maybe the answer for you
is to avoid the problematic classes mentioned on http://wiki.eclipse.org/ICU4J
altogether, which would remove any need for ICU4J in Buckminster. Is
the problem for you that you have dependencies that in turn pull in the
dependency on ICU4J?
Yes, that's the problem. One exampel is the Subversive adapter. As a
response to the new requirement, Subversive now has a direct bundle
requirement to the com.ibm.icu bundle. They have also some code that
uses the UCharacter class which makes it impossible to use the
com.ibm.icu.base bundle even if we'd like to. I have submitted a patch
for this already to the Subversive project that rectifies the problem
and hopefully they will accept it.
<grumpyMode>
I file that under yet another time consuming effort made to satisfy the
new requirements that nobody was asking for.
</grumpyMode>
Regards,
Thomas Hallgren
_______________________________________________
cross-project-issues-dev mailing list
cross-project-issues-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev
_______________________________________________
cross-project-issues-dev mailing list
cross-project-issues-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev
|