Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » A rule that consumes everything until certain char/string?
A rule that consumes everything until certain char/string? [message #712810] Sun, 07 August 2011 19:31 Go to next message
Robin  is currently offline Robin Friend
Messages: 25
Registered: August 2010
Junior Member
Hi,

I was wondering if one could create a parser rule that allocates all characters up to a certain character or a certain string to a variable??

I was trying to do that with the '.' operator but if you loop this, you obviously consume everything.

Here's what I tried so far:
Form :
	'(' name=ANY_OTHERS ')';

terminal ANY_OTHERS :
	(ANY_OTHER)*;


Of course I would like to do something like:
name=ID

But in my case there can occur special characters which are not met by the terminal ID rule.
So I want to allocate anything to 'name' until a ')' occurs.

Has somebody an idea??

Regards, Robin

[Updated on: Sun, 07 August 2011 19:31]

Report message to a moderator

(no subject) [message #712845 is a reply to message #712810] Sun, 07 August 2011 20:10 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
How about:

Form : '(' name = (ID|STRING|INT|ANY_OTHER)* ')' ;

If you also need to be able to have keywords and punctuation included,
you need to list them as well.

If you need white space, you need to unhide WS:

Form hidden(ML_COMMENT, SL_COMMENT) : name = (ID | STRING | INT |
ANY_OTHER | WS)* ;

Remember - ANY_OTHER is a token for things that are non of the other
tokens (keywords, punctuation, terminals).

Hope that helps.
- henrik

On 8/7/11 9:31 PM, Robin wrote:
> Hi,
>
> I was wondering if one could create a parser rule that allocates all
> characters up to a certain character or a certain string to a variable??
>
> I was trying to do that with the '.' operator but if you loop this, you
> obviously consume everything.
>
> Here's what I tried so far:
>
> Form :
> '(' name=ANY_OTHERS ')';
>
> terminal ANY_OTHERS :
> (ANY_OTHER)*;
>
>
> Of course I would like to do something like: name=ID
> But in my case there can occur special characters which are not met by
> the terminal ID rule.
> So I want to allocate anything to 'name' until a ')' occurs.
>
> Has somebody an idea??
>
> Regards, Michael
>
Re: (no subject) [message #713708 is a reply to message #712845] Mon, 08 August 2011 21:01 Go to previous messageGo to next message
Robin  is currently offline Robin Friend
Messages: 25
Registered: August 2010
Junior Member
Thanks, Henrik! That's a good idea. But I get an error saying
Quote:
Cannot find type for 'ID | STRING | INT | ANY_OTHER'.

Also, I don't know if that will work eventually. Because at that position really ANY character can occur.
I thought maybe I could define a unicode range:
terminal ANY_EXCEPT_CLOSING_PARENTHESIS :
	( '\u0000'..'\u0028')* | ('\u0030'..'\uFFFF')*;

But this won't work either. :/ What am I doing wrong?

Regards, Robin
Re: (no subject) [message #713732 is a reply to message #713708] Mon, 08 August 2011 22:56 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
Need to see your entire grammar - or at least all terminals in the order
they are stated to give better advice.

On 8/8/11 11:01 PM, Robin wrote:
> Thanks, Henrik! That's a good idea. But I get an error saying
> Quote:
>> Cannot find type for 'ID | STRING | INT | ANY_OTHER'.
>
Maybe you need to put it in a separate rule. Do they have different
types declared?

It is best if the terminals all return string. If you want conversion,
write a separate rule that converts to the wanted type e.g. IntLiteral.
I think Xtext switches to string type automatically if a literal is used
in a data rule, but maybe if not used directly in an assignment.

So, try:
Form : name = FlexibleName ;
FlexibleName : (ID | STRING | INT | ANY_OTHER)* ;

> Also, I don't know if that will work eventually. Because at that
> position really ANY character can occur.

Including a closing parenthesis?

> I thought maybe I could define
> a unicode range:
>
> terminal ANY_EXCEPT_CLOSING_PARENTHESIS :
> ( '\u0000'..'\u0028')* | ('\u0030'..'\uFFFF')*;
>
> But this won't work either. :/ What am I doing wrong?
>
> Regards, Robin

Are you on Xtext 1.0 ? I think unicode and ranges are supported in 2.0.
Still with ANY_OTHER : . ; you will get anything that is not matched by
any other rule, which basically means all unicode characters (except
possibly 0, which I am not sure about).

Remember - the ANY rule at the end can only match what preceding rules
have not matched. The lexer is a cookie cutter that operates without
knowledge about the grammar - just because a rule seems to be "calling"
a terminal rule does not mean that only that rule is considered - at
that point, the lexer has already done its work - the grammar gets a
stream of tokens from the lexer (i.e. 'terminals').

Thus - there is no difference if you use ANY_OTHER : '.'; or define a
character range that matches all possible characters. Since you have to
place this terminal last (or the *only* token your grammar will ever see
is your ANY_OTHER :) ).

- henrik
Re: (no subject) [message #713836 is a reply to message #713732] Tue, 09 August 2011 07:08 Go to previous messageGo to next message
Robin  is currently offline Robin Friend
Messages: 25
Registered: August 2010
Junior Member
Thank you so much, Henrik. It works now! Smile Though I am not sure how. Very Happy

I finally used:
Form :
	'(' name=FlexibleName '),';


FlexibleName :
	(ID | STRING | INT | ANY_OTHER)* ;


I am wondering though why the ANY_OTHER not matches the '),' which is used in the Form rule. Hmmmm.... Is it because the Form rule is preceding as you mentioned? Does this mean then that the preceding rules create some kind of look-ahead mechanism for the Lexer?

I am using xtext 1.0.2 btw (sorry, forgot about versioning).

Regards, Robin
Re: (no subject) [message #713957 is a reply to message #713836] Tue, 09 August 2011 13:42 Go to previous messageGo to next message
Alexander Nittka is currently offline Alexander NittkaFriend
Messages: 1193
Registered: July 2009
Senior Member
Hi,

ANY_OTHER matches only stuff not covered by any other terminal rule or keyword. '(' is a keyword in your Form rule.

Alex


Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de
Re: (no subject) [message #713993 is a reply to message #713836] Tue, 09 August 2011 13:41 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
A suggestion, although The use of ")," works you probably want to have
')' ',' to enable whitespace, comments between them.

You also want to use hidden() on the FlexibleName to prevent spaces and
comments to be intermixed with the ID | STRING .... You don't need it on
Form, as the FlexibleName does not contain spaces or comments as it is
now defined.

Regarding tokens - *everything* you declare as keywords and punctuation
are delivered as separate tokens. If you need them anywhere you have to
specify them - or put differently, ANY_OTHER will never contain the
other tokens ('(' ')' '),' in this example).

If you use a bunch of punctuation and keywords in your grammar you can
do something like:

KEYWORDS : "if" | "else" | "while" | ... ;
PUNCTUATION : '.' | ',' | '-' | '_' ... ;
FlexibleName hidden() : PUNCTUATION | KEYWORDS | ID | STRING | ... ;

Remember, the Lexer is a scout sent out to report what animals are
walking towards the camp. Given only the terminal rules, the scout
reports "I see 'a duck', 'a bear', 'a tiger', ..."

- henrik


On 8/9/11 9:08 AM, Robin wrote:
> Thank you so much, Henrik. It works now! :) Though I am not sure how. :d
>
> I finally used:
>
> Form :
> '(' name=FlexibleName '),';
>
>
> FlexibleName :
> (ID | STRING | INT | ANY_OTHER)* ;
>
>
> I am wondering though why the ANY_OTHER not matches the '),' which is
> used in the Form rule. Hmmmm.... Is it because the Form rule is
> preceding as you mentioned? Does this mean then that the preceding rules
> create some kind of look-ahead mechanism for the Lexer?
>
> I am using xtext 1.0.2 btw (sorry, forgot about versioning).
>
> Regards, Robin
Re: A rule that consumes everything until certain char/string? [message #714006 is a reply to message #712810] Tue, 09 August 2011 15:08 Go to previous message
Robin  is currently offline Robin Friend
Messages: 25
Registered: August 2010
Junior Member
@Alexander, @ Henrik: thanks so much for pointing all that stuff out. Smile

Still a bit unsure of everything, but things are getting clearer... But I think I will have to do some more threats when I hit more problems. Very Happy

Regards, Robin

[Updated on: Tue, 09 August 2011 15:08]

Report message to a moderator

Previous Topic:Migrating to xtext 2.0
Next Topic:How to find files paths/names in project running with Xtend
Goto Forum:
  


Current Time: Fri Apr 19 18:36:16 GMT 2024

Powered by FUDForum. Page generated in 0.03438 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top