Eclipse Community Forums: Java Development Tools (JDT)

Home » Language IDEs » Java Development Tools (JDT) » Character.isIdentifierIgnorable(Compile Class with invalid identifier)

Character.isIdentifierIgnorable [message #1826447]

Fri, 24 April 2020 21:10

Eclipse User

class Z {
class _\u0001 {}
class _\u0001\u0001 {}
}
Compiles without warnings or errors using 2020-03 and OpenJDK 8

using javac from command line 2 warnings 1 error

javac -version
javac 1.8.0_252

javac Z.java
Z.java:2: warning: '_' used as an identifier
class _\u0001{}
^
(use of '_' as an identifier might not be supported in releases after Java SE 8)
Z.java:3: warning: '_' used as an identifier
class _\u0001\u0001{}
^
(use of '_' as an identifier might not be supported in releases after Java SE 8)
Z.java:3: error: class Z._ is already defined in class Z
class _\u0001\u0001{}
^
1 error
2 warnings

Re: Character.isIdentifierIgnorable [message #1826471 is a reply to message #1826447]

Sat, 25 April 2020 08:37

Eclipse User

There seems to be disagreement about characters where isJavaIdentifierPart() answers true, and isIdentifierIgnorable() is also true.
In similar situations (adjusted so javac accepts) I can see that javac simply drops any \0001 character in a name. My feeling is that both compilers are wrong:
- "_\0001" should indeed be treated as equal to "_" during comparison
- \0001 is still part of the name and should be respected when generating code.

Cf. https://bugs.eclipse.org/bugs/show_bug.cgi?id=547817 , https://bugs.eclipse.org/bugs/show_bug.cgi?id=547601
Feel free to join the discussion in either of these bugs.

Re: Character.isIdentifierIgnorable [message #1826477 is a reply to message #1826471]

Sat, 25 April 2020 11:52

Eclipse User

Thank you, Stephan, for the links. Interesting reading. Sorry to read that the compiler differences have caused a real world annoying bug bite. Has caused me to expand my reading; Unicode UAX#31 and Java Virtual Machine Spec for JavaSE 14. The latter, in Chapter 4.2.3 Module and Package Names, prompts me to comment on your comment:

"- \0001 is still part of the name and should be respected when generating code."

from JVMS14:

"Module names may be drawn from the entire Unicode codespace, subject to the
following constraints:
• A module name must not contain any code point in the range ' \u0000 ' to ' \u001F '
inclusive." ...

Sounds almost like a bug in the JLS to me.

Re: Character.isIdentifierIgnorable [message #1826479 is a reply to message #1826477]

Sat, 25 April 2020 12:04

Eclipse User

"Sounds almost like a bug in the JLS to me."

Although I can't say that I have fathomed the names vs binary names divide yet!

Re: Character.isIdentifierIgnorable [message #1826488 is a reply to message #1826479]

Sat, 25 April 2020 14:38

Eclipse User

Let's continue in one of the corresponding bugs, which have more details, than what my quick answer above was based upon.

Apart from questions of correctness, is there any practical reason why one would want to use _\0001 as a name in Java?

Re: Character.isIdentifierIgnorable [message #1826491 is a reply to message #1826488]

Sat, 25 April 2020 15:52

Eclipse User

The whole situation is a mess.

So, yes, the practicality of _\u0001 as a name is to point out the practice of different compilers to implement JLS 3.8.

According to http://www.unicode.org/reports/tr31/#A1

A Farsi speaker wishing to name a method to the Farsi equivalent of the English "isLetter() " would be frustrated by a compiler that dumps ignorable characters and would find himself with the Farsi name of his method to be "isNames() when he compiles with javac and "isLetter()" when he compiles with ecj. According to the way I read the spec he cannot have his method named "isLetter()".

Re: Character.isIdentifierIgnorable [message #1826493 is a reply to message #1826491]

Sat, 25 April 2020 17:28

Eclipse User

I'm not a unicode expert, but seeing that \200C is ignorable indicates that in Java names that differ only in this character are considered to be equal. Isn't that what the javadoc of Character.isIdentifierIgnorable() says?

Re: Character.isIdentifierIgnorable [message #1826496 is a reply to message #1826493]

Sat, 25 April 2020 18:07

Eclipse User

Nor I, but that is the way I'm reading it.

According to above UAX, the Farsi word spelled: "translates to "names""
NOON + ALEF + MEEM + HEH + ALEF + FARSI YEH.

Is a different word from: "means "a letter""
NOON + ALEF + MEEM + HEH + ZWNJ + ALEF + FARSI YEH

It seems to me that at least one motivation for the concept of Unicode in identifiers is to allow speakers of diverse languages to express understandable identifiers.

Certainly two implementations interpreting the spec differently is a concern.
Even if the JVM runs both, which I'm taking to be backwards compatibility issue.

Re: Character.isIdentifierIgnorable [message #1826497 is a reply to message #1826496]

Sat, 25 April 2020 18:18

Eclipse User

Let's please differentiate:

isIdentifierIgnorable() is specified and implemented as part of the Java system libraries. If that conflicts with practical issues in certain languages, than that specification (and its implementation) should be changed, which would be a task for Oracle.

In one of the linked bugs someone argued why ecj might be at fault. Whether and how ecj is to be changed is discussed in bugzilla, not here. That would, however, go towards ignoring *more*, not less.

Re: Character.isIdentifierIgnorable [message #1826500 is a reply to message #1826497]

Sat, 25 April 2020 18:42

Eclipse User

isIdentifierIgnorable() is specified and implemented as part of the Java system libraries.

Yes.

If that conflicts with practical issues in certain languages, than that specification (and its implementation) should be changed

Yes, no, maybe.

We, I, are/am not discussing which characters are specified in Character.isIdentifierIgnorable(int).
The issue is what an implementation does with these ignorable characters, whatever they may be.

Prior to JLS for SE 9 the relevent paragraph read:
Two identifiers are the same only if they are identical, that is, have the same
Unicode character for each letter or digit. Identifiers that have the same external
appearance may yet be different.

Since then reads:
Two identifiers are the same only if, after ignoring characters that are
ignorable, the identifiers have the same Unicode character for each letter
or digit. An ignorable character is a character for which the method
Character.isIdentifierIgnorable(int) returns true. Identifiers that have the
same external appearance may yet be different.

in one of the linked bugs someone argued why ecj might be at fault.
I tend to agree that openjdk impl is more in line with my reading.

Whether and how ecj is to be changed is discussed in bugzilla, not here.
Now your sounding like a dev. :-)

That would, however, go towards ignoring *more*, not less.
I don't understand this sentence.

Looking at your comment #13 in 547817 you seem to be agreeing with my assessment above:
"I tend to agree that openjdk impl is more in line with my reading."

[Updated on: Sat, 25 April 2020 18:56] by Moderator

Previous Topic:	Where is the Java 14 Facet?
Next Topic:	im getting "var cannot be resolved to a type" error

Goto Forum:

-=] Back to Top [=-

Current Time: Wed Jul 02 16:03:17 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter