Home » Modeling » TMF (Xtext) » Comma Separated List
Comma Separated List [message #790909] |
Sat, 04 February 2012 20:58  |
Eclipse User |
|
|
|
I've simplified my problem to the point it sounds a bit silly, but if you can help me solve this, I can solve my real problem.
Imagine a comma separated list:
Ok, that's easy to parse:
Parser: vals+=ID (',' vals+=ID)*
But my grammar allows 1 or more separating comma, so this is valid:
Ok, still no problems:
Parser: vals+=ID (','+ vals+=ID)*
And my grammar has to allow 1 or more commas before the list, so this is valid:
Still, no problems:
Parser: ','* vals+=ID (','+ vals+=ID)*
Now, for the tricky part: my grammar allows 1 or more commas after this list. So this is valid:
,,,aaa,bbb,,,ccc,,ddd,,
Now, I can left-factor and get something like this:
Parser: ','* (val=ID rest=ParserRest)?;
ParserRest: ','+ (val=ID rest=ParserRest)?;
Which seems to work, but gives me an ugly "linked-list" sort of structure of vals. Is there a better way to do this, and is it possible to use the "tree rewrite" operations in fixing this up?
[Updated on: Sat, 04 February 2012 21:25] by Moderator
|
|
|
Re: Comma Separated List [message #791027 is a reply to message #790909] |
Sun, 05 February 2012 01:33   |
Eclipse User |
|
|
|
Hi
For the problem as you described it,
,,,aaa,bbb,,,ccc,,ddd,,
could be parsed by
Parser: ','* vals+=ID (','+ vals+=ID)* ','*
but I suspect that you want a sequence of Parser, which is where you
will hit problems. Do the ','s (new-lines) belong with the preceding or
subsequent Parser. It is ambiguous and the tool cannot decide.
You may be able to achieve something with syntactic precedences to
favour one direction of resolution, but without understanding why you
want to parse the new-lines at all, I cannot suggest a resolution. It
would seem much easier to leave the new-lines as whitespace and rescue
the surrounding characters from the tokens once parsing has completed.
Consider the Java
int a; // This is a
// This is some text
int b; // This is b
Does "This is some text" elaborate "a" or "b"? Who knows? A simple
choice is that comments end at the end of the line containing the
associated token, so "This is some text" goes with "b". Perhaps your
new-lines need a similar plausible policy.
Regards
Ed Willink
On 05/02/2012 01:58, Eric Springer wrote:
> I've simplified my problem to the point it sounds a bit silly, but if
> you can help me solve this, I can solve my real problem.
>
>
> Imagine a comma separated list:
>
> aaa,bbb,ccc,ddd
>
> Ok, that's easy to parse:
>
> Parser: vals+=ID (',' vals+=ID)*
>
> But my grammar allows 1 or more separating comma, so this is valid:
>
> aaa,bbb,,,ccc,,ddd
>
> Ok, still no problems:
> Parser: vals+=ID (','+ vals+=ID)*
>
>
> And my grammar has to allow 1 or more commas before the list, so this
> is valid:
> ,,,aaa,bbb,,,ccc,,ddd
>
>
> Still, no problems:
> Parser: ','* vals+=ID (','+ vals+=ID)*
>
>
> Now, for the tricky part: my grammar allows 1 or more commas after
> this list. So this is valid:
>
> ,,,aaa,bbb,,,ccc,,ddd,,
>
>
> But I'm totally stumped how to do this in XText or a non-backtracking
> LL parser. Any help would be appreciated.
>
>
> [For those wondering, in my real grammar they're not comma's but new
> lines :d]
>
>
|
|
|
Re: Comma Separated List [message #791070 is a reply to message #791027] |
Sun, 05 February 2012 03:05  |
Eclipse User |
|
|
|
Thank you Edward! You answered my question,
Parser: ','* vals+=ID (','+ vals+=ID)* ','*
Is exactly what I was looking for.
It turns out I'm just brain-damaged and didn't try the obvious solution. I seem to have deluded myself into thinking I knew how an LL(1) parser worked, and believed something like that would be ambiguous and in need of left-factoring. I'll need to re-read that wikipedia article, I guess a little knowledge can be dangerous.
---
[For what it's worth, this is for having something like an "implicit semicolon" as featured by a language like js]
|
|
|
Goto Forum:
Current Time: Wed Jul 02 19:46:27 EDT 2025
Powered by FUDForum. Page generated in 0.06686 seconds
|