Eclipse Community Forums: TMF (Xtext) » Context sensitive lexing

Help

Home

Home » Modeling » TMF (Xtext) » Context sensitive lexing

Show: Today's Messages :: Show Polls :: Message Navigator

Context sensitive lexing [message #1188843]

Fri, 15 November 2013 21:11

Andrew Gacek

Messages: 32
Registered: October 2011

Member

I'm writing an Xtext grammar for a language which has built-in regular expression syntax. For example, the user can write

regex_match(/ab+c/, x)

The problem I'm having is how to parse the regular expression so that I can warn the user about syntax errors within the regular expression. Naively, if I try to create a parser rule for what a regex is I might do something like

Regex:
  '/' RegexBody '/'
;

RegexBody:
  RegexBody '?'
| RegexBody '*'
| REGEX_TOKEN
;

REGEX_TOKEN:
  ~('?' | '*' | '/')
;

Ignoring the LL(*) issues, the problem is that REGEX_TOKEN overlaps with ID. Indeed if we look at something like /ab+c/ this is lexed as '/' 'ab' '+' '/'.

What I would really like is a way to make the lexer context-sensitive so that between the '/' characters it would treat characters differently. Is there a good way to do this in Xtext? Or more generally, a good way to handle this kind of nested sublanguage of regular expressions within a more general language?

Thanks,
Andrew

[Updated on: Fri, 15 November 2013 21:12]

Report message to a moderator