|
|
Re: Matching Unicode category in terminal rules [message #651468 is a reply to message #651443] |
Mon, 31 January 2011 09:24   |
Jan Koehnlein Messages: 760 Registered: July 2009 Location: Hamburg |
Senior Member |
|
|
In Xtext 2.0, we now support unicode escapes in STRINGs, and thereby in
terminal rules and keywords. We also ship value converter that checks
for valid characters using Character helper methods.
I don't quite get your problem, there are no characters above \uffff in
16bit unicode, and AFAIK, even the last ones are invalid.
Am 31.01.11 00:36, schrieb Dennis Harmath:
> Hmm, my DSL's specification wasn't correct: it accepts everything above
> \uA1, so simply
> terminal ID: ("a".."z" | "A".."Z" | "¡".."ᅵ")+;
> does the trick. Unfortunately, it seems characters above \uFFFF aren't
> supported, the generator signals the following error:
>
> error(100): ../org.elysium/src-gen/org/elysium/parser/antlr/lexer/Intern
> alLilyPond.g:205:40: syntax error: antlr:
> ../org.elysium/src-gen/org/elysium/parser/antlr/lexer/Intern
> alLilyPond.g:205:40: expecting CHAR_LITERAL, found ''\uD800\uDC00''
>
> But I think this is not a practically significant issue. :)
--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com
---
Get professional support from the Xtext committers at www.typefox.io
|
|
|
Re: Matching Unicode category in terminal rules [message #1854324 is a reply to message #651468] |
Sat, 13 August 2022 21:01   |
Mirko Raner Messages: 125 Registered: July 2009 Location: New York City, NY |
Senior Member |
|
|
I think the OP's question was meant to be about avoiding code like this.
I am struggling with a similar issue where certain identifiers and operators need to comprise entire Unicode character categories (like Letter, Number, Math Symbol, etc.).
Currently, it appears that the only way to accommodate this in Xtext is by listing each individual character (or character range) in the category (as shown in the the linked source example). Most regex implementations, on the other hand, have long supported short-hand forms for identifying groups of characters by their Unicode category (e.g., \p{Lu} for uppercase letters, or \p{N} for numbers).
Is a shorthand syntax that refers to character groups by their Unicode category something that would be considered for a future version of Xtext? (I might be interested in contributing some code for this)
|
|
|
|
Powered by
FUDForum. Page generated in 0.12711 seconds