Matching Unicode category in terminal rules [message #651440] |
Sun, 30 January 2011 17:12  |
Eclipse User |
|
|
|
Hi all,
my DSL accepts all Unicode letters (i.e. with the Alphabetic property) in IDs. How could I formulate that as a lexer rule without enumerating all the proper character ranges spread throughout the Unicode planes? I know regexps aren't possible, but I thought of achieving what the \p{Alpha} regular expression does.
Thanks in advance,
thSoft
|
|
|
|
Re: Matching Unicode category in terminal rules [message #651468 is a reply to message #651443] |
Mon, 31 January 2011 04:24   |
Eclipse User |
|
|
|
In Xtext 2.0, we now support unicode escapes in STRINGs, and thereby in
terminal rules and keywords. We also ship value converter that checks
for valid characters using Character helper methods.
I don't quite get your problem, there are no characters above \uffff in
16bit unicode, and AFAIK, even the last ones are invalid.
Am 31.01.11 00:36, schrieb Dennis Harmath:
> Hmm, my DSL's specification wasn't correct: it accepts everything above
> \uA1, so simply
> terminal ID: ("a".."z" | "A".."Z" | "¡".."ᅵ")+;
> does the trick. Unfortunately, it seems characters above \uFFFF aren't
> supported, the generator signals the following error:
>
> error(100): ../org.elysium/src-gen/org/elysium/parser/antlr/lexer/Intern
> alLilyPond.g:205:40: syntax error: antlr:
> ../org.elysium/src-gen/org/elysium/parser/antlr/lexer/Intern
> alLilyPond.g:205:40: expecting CHAR_LITERAL, found ''\uD800\uDC00''
>
> But I think this is not a practically significant issue. :)
--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com
|
|
|
|
|
Powered by
FUDForum. Page generated in 0.04831 seconds