|Re: Catch Unquoted String [message #798618 is a reply to message #798574]
||Tue, 14 February 2012 22:42
| Henrik Lindberg
Registered: July 2009
You can solve this by redefining the ID terminal to contain a wider |
range of characters. If you need to restrict the set of characters in
some places, you can add validation rules.
A second way to do this, is to define a datatype rule that consists of
combination of ID and EXT_ID. EXT_ID would be a terminal similar to ID
but only for the additional characters.
UnquotedString : (ID | EXT_ID)+ ;
A more elegant solution would be to use an external lexer where you you
use the terminal that allow all characters that are valid in the
unquoted string, but before returning a token, it checks if it a valid
more restrictive ID, and if so instead returns this token. (You can not
do this with the xtext grammar alone.
terminal UNQUOTEDSTRING : [ ...long list of char ranges] ;
terminal ID : ... standard ID ...
(If an external lexer is not use, the ID token would never be found as
the UNQUOTEDSTRING has higher precedence, but you need to declare them
in the grammar to make it possible to return these tokens).
The external lexer approach will be slightly more efficient as the parse
tree will be smaller (best case in favor of the external lexer would be
if every other character was a 'non-ascii' char.
Hope that helps.
On 2012-14-02 22:22, stefan bosshard wrote:
> I am trying to write a DSL for an existing 'language'. The primary aim
> is to get a parser. I got that going with one exception. The 'language'
> allows for strings to be unquoted as long as they do not contain spaces.
> Both string varieties coeexist. I cannot use the ID terminal because
> some strings contain non-ASCII characters from French, Spanish, etc.
> Having tried a great many approaches I find myself at a loss. Can you
> suggest how I should go about this problem?
> Thanks a lot in advance
Powered by FUDForum
. Page generated in 0.02684 seconds