Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Comma Separated List
Comma Separated List [message #790909] Sun, 05 February 2012 01:58 Go to next message
Eric Springer is currently offline Eric SpringerFriend
Messages: 5
Registered: January 2012
Junior Member
I've simplified my problem to the point it sounds a bit silly, but if you can help me solve this, I can solve my real problem.


Imagine a comma separated list:

aaa,bbb,ccc,ddd


Ok, that's easy to parse:

Parser: vals+=ID (',' vals+=ID)*


But my grammar allows 1 or more separating comma, so this is valid:

aaa,bbb,,,ccc,,ddd


Ok, still no problems:
Parser: vals+=ID (','+ vals+=ID)*



And my grammar has to allow 1 or more commas before the list, so this is valid:
,,,aaa,bbb,,,ccc,,ddd



Still, no problems:
Parser: ','* vals+=ID (','+ vals+=ID)*



Now, for the tricky part: my grammar allows 1 or more commas after this list. So this is valid:

,,,aaa,bbb,,,ccc,,ddd,,



Now, I can left-factor and get something like this:


Parser: ','* (val=ID rest=ParserRest)?;

ParserRest: ','+ (val=ID rest=ParserRest)?;



Which seems to work, but gives me an ugly "linked-list" sort of structure of vals. Is there a better way to do this, and is it possible to use the "tree rewrite" operations in fixing this up?

[Updated on: Sun, 05 February 2012 02:25]

Report message to a moderator

Re: Comma Separated List [message #791027 is a reply to message #790909] Sun, 05 February 2012 06:33 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 6614
Registered: July 2009
Senior Member
Hi

For the problem as you described it,

,,,aaa,bbb,,,ccc,,ddd,,

could be parsed by

Parser: ','* vals+=ID (','+ vals+=ID)* ','*

but I suspect that you want a sequence of Parser, which is where you
will hit problems. Do the ','s (new-lines) belong with the preceding or
subsequent Parser. It is ambiguous and the tool cannot decide.

You may be able to achieve something with syntactic precedences to
favour one direction of resolution, but without understanding why you
want to parse the new-lines at all, I cannot suggest a resolution. It
would seem much easier to leave the new-lines as whitespace and rescue
the surrounding characters from the tokens once parsing has completed.

Consider the Java

int a; // This is a
// This is some text
int b; // This is b

Does "This is some text" elaborate "a" or "b"? Who knows? A simple
choice is that comments end at the end of the line containing the
associated token, so "This is some text" goes with "b". Perhaps your
new-lines need a similar plausible policy.

Regards

Ed Willink


On 05/02/2012 01:58, Eric Springer wrote:
> I've simplified my problem to the point it sounds a bit silly, but if
> you can help me solve this, I can solve my real problem.
>
>
> Imagine a comma separated list:
>
> aaa,bbb,ccc,ddd
>
> Ok, that's easy to parse:
>
> Parser: vals+=ID (',' vals+=ID)*
>
> But my grammar allows 1 or more separating comma, so this is valid:
>
> aaa,bbb,,,ccc,,ddd
>
> Ok, still no problems:
> Parser: vals+=ID (','+ vals+=ID)*
>
>
> And my grammar has to allow 1 or more commas before the list, so this
> is valid:
> ,,,aaa,bbb,,,ccc,,ddd
>
>
> Still, no problems:
> Parser: ','* vals+=ID (','+ vals+=ID)*
>
>
> Now, for the tricky part: my grammar allows 1 or more commas after
> this list. So this is valid:
>
> ,,,aaa,bbb,,,ccc,,ddd,,
>
>
> But I'm totally stumped how to do this in XText or a non-backtracking
> LL parser. Any help would be appreciated.
>
>
> [For those wondering, in my real grammar they're not comma's but new
> lines :d]
>
>
Re: Comma Separated List [message #791070 is a reply to message #791027] Sun, 05 February 2012 08:05 Go to previous message
Eric Springer is currently offline Eric SpringerFriend
Messages: 5
Registered: January 2012
Junior Member
Thank you Edward! You answered my question,

Parser: ','* vals+=ID (','+ vals+=ID)* ','*


Is exactly what I was looking for.

It turns out I'm just brain-damaged and didn't try the obvious solution. I seem to have deluded myself into thinking I knew how an LL(1) parser worked, and believed something like that would be ambiguous and in need of left-factoring. I'll need to re-read that wikipedia article, I guess a little knowledge can be dangerous.

---

[For what it's worth, this is for having something like an "implicit semicolon" as featured by a language like js]
Previous Topic:What's the meaning of the operator "row" (=>) in Xtext?
Next Topic:Harvesting data from multiple DSL files
Goto Forum:
  


Current Time: Sun Mar 29 07:08:49 GMT 2020

Powered by FUDForum. Page generated in 0.02158 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top