Eclipse Community Forums: TMF (Xtext) » How to read a "terminal" up to next MATCHING parenthesis?

Home » Modeling » TMF (Xtext) » How to read a "terminal" up to next MATCHING parenthesis?(How to read a "terminal" up to next matching parenthesis?)

Show: Today's Messages :: Show Polls :: Message Navigator

How to read a "terminal" up to next MATCHING parenthesis? [message #710202]

Thu, 04 August 2011 20:06

Ferdinando Villa

Messages: 1
Registered: August 2011

Junior Member

Hello,

in the DSL I'm developing, I'd like to support "expressions" that would look like regular strings to the language, delimited by unambiguous characters like open/close square brackets, to be returned as strings and parsed outside the grammar. The obvious choice would be something like

terminal EXPR: '[' -> ']';

but obviously, the contents of the expression may contain nested pairs of [] so this would stop at the first closed bracket. Obviously what I need is not a terminal, but I also wouldn't like to write a whole parser - just an extended lexer rule in Java that reads the input until the next matching bracket. I spent quite a bit of time with the docs without finding a way although I'm pretty sure this is possible.

Advice please? Thanks so much in advance.

ferdinando

Report message to a moderator

Re: How to read a "terminal" up to next MATCHING parenthesis? [message #710238 is a reply to message #710202]

Thu, 04 August 2011 20:59

Caner

Messages: 98
Registered: July 2011

Member

Hi
you need; maybe somethng like this

myID : ID (',' | ExtID | ID)* ;

terminal ID : '^'?('a'..'z'|'A'..'Z'|'_'|'.') ('a'..'z'|'A'..'Z'|'_'|'0'..'9'|'.')* ;

terminal ExtID = ('%'|'['|']'|'('|')'|'.'|' '|','|'|' )+ ;
BR
caner

[Updated on: Thu, 04 August 2011 21:00]

Report message to a moderator

(no subject) [message #710334 is a reply to message #710202]

Thu, 04 August 2011 23:15

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

There are three options (as I see it):
1) define the grammar using a clever set of terminals and data rules in
your .xtext grammar
2) override the generated lexer class, and do special processing for a
small selection of tokens
3) Use an external lexer

More details...
1) Sounds like you already ruled this out (it quickly gets messy with
overlapping terminals and whitepsace handling). A pro is that if you do
specify the grammar, you get a lot for free with code completion inside
the special section etc.

2) This approach is not recommended for more that a couple of special
tokens. You can override the entry point to the lexer and thus take
alternate action on certain tokens, and prevent it to take generated
actions etc. You then simply use guice to bind your implementation
instead of the generated one. (IIRC I use this approach in the Eclipse
b3 project, as I was not aware of using an external lexer). As a
starting point you can look at what is generated for a small sample
grammar as that makes it straight forward to figure out how it works.

3) You can use an "external lexer" with Xtext. That means that you can
replace the generated lexer with one generated from ANTLR source. In
ANTLR you can provide your own logic with the rules. It is not as
difficult as it sounds. I use this approach in Cloudsmith/Geppetto (at
github) as there are several difficult to parse features in the puppet
language grammar (regular expressions and expression interpolation in
strings). The only tricky thing is the requirement to (manually) sync
the token enumerator values. Integrating the external lexer with mwe is
easy as it is a supported option.

Take a look at
https://github.com/cloudsmith/geppetto/tree/master/org.cloudsmith.geppetto.pp.dsl/src/org/cloudsmith/geppetto/pp/dsl/lexer
package

- The PPOverridingLexer is of type 2, but only sets up a
"lastSignificantToken" to be used in the lexer.
- PPLexer.g is the external lexer in ANTLR.

Hope that helps.
Regards
- henrik

On 8/4/11 10:06 PM, Ferdinando Villa wrote:
> Hello,
>
> in the DSL I'm developing, I'd like to support "expressions" that would
> look like regular strings to the language, delimited by unambiguous
> characters like open/close square brackets, to be returned as strings
> and parsed outside the grammar. The obvious choice would be something like
>
> terminal EXPR: '[' -> ']';
> but obviously, the contents of the expression may contain nested pairs
> of [] so this would stop at the first closed bracket. Obviously what I
> need is not a terminal, but I also wouldn't like to write a whole parser
> - just an extended lexer rule in Java that reads the input until the
> next matching bracket. I spent quite a bit of time with the docs without
> finding a way although I'm pretty sure this is possible.
> Advice please? Thanks so much in advance.
>
> ferdinando

Report message to a moderator

Previous Topic:	Scopes - Get assignment of parent element.
Next Topic:	New Xtext Editor: Google's protobuf-dt

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Tue Apr 16 21:19:03 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter