Eclipse Community Forums: TMF (Xtext) » parsing arguments without separating whitespace

Home » Modeling » TMF (Xtext) » parsing arguments without separating whitespace

parsing arguments without separating whitespace [message #1090904]

Tue, 20 August 2013 16:27

Eclipse User

Hello,

I try to build an xtext grammar for parsing smali code.

Method names are normal IDs, simple data types are just single letters "I" for Integer, "D" for double, etc.

Method signatures are written like "add(II)I" which denotes a method with name "add" taking two integers and returning an integer.

Another valid signature would be "III(III)I" - a method with name "III" taking three Integer "III".

The most natural gramar would be:

  ID'('Type*')'Type

Unfortunately the lexer always tags strings like "III" as an ID, even when Type* would be correct.

What is the right way to build such a gramar?

Re: parsing arguments without separating whitespace [message #1091687 is a reply to message #1090904]

Wed, 21 August 2013 17:21

Eclipse User

Hi

did you try to turn Type into a terminal
and ID into a datatype rule?

Re: parsing arguments without separating whitespace [message #1091975 is a reply to message #1091687]

Thu, 22 August 2013 03:03

Eclipse User

Alright I tried the following:

grammar language.Name with org.eclipse.xtext.common.Terminals

generate name "http ://www.Name.language"

Signature: '-' method=Id2'(' types+=Type* ')' return=Type;

Type: PRIMITIVE|Class;
terminal PRIMITIVE: 'I'|'D'|'F';
terminal NONPRIM: 'A'..'C'|'E'|'G'..'H'|'J'..'Z'|'a'..'z';
terminal NUMBER:'0'..'9';
//terminal ID : '^'?('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
Id2 : '^'?(PRIMITIVE|NONPRIM|'_')(PRIMITIVE|NONPRIM|NUMBER|'_')*;
//terminal CLASS: 'L' ('a'..'z'|'A'..'Z'|'0'..'9'|'/'|'$')+ ';';
Class:'L'Id2('/'Id2)*';';

Multiple problems:
- Id2 now expects whitespace between the single letters.
- ID is still lurking around. (Is it posible to deactivate an imported terminal Rule?)
- As far as I understood the documentation some xtext magic requires a terminal named ID.

=> It feels like this is not the inteded way for xtext.

Re: parsing arguments without separating whitespace [message #1091979 is a reply to message #1091975]

Thu, 22 August 2013 03:07

Eclipse User

Hi i thought Of not extending terminals grammar anymore

--
Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext at itemis dot de

Re: parsing arguments without separating whitespace [message #1091990 is a reply to message #1091979]

Thu, 22 August 2013 03:26

Eclipse User

Sorry, I do not understand what you mean. (I am new to xtext. I only read the documentation and currently work on my first meaningful grammar.)

To define my own Id, I need some way to refer to "a single digit" or "any letter primitive or not", therefore I need some terminal rule, right?

If I do not yet understand some important concept, I'd be happy if you could give me a pointer in the right direction.

Re: parsing arguments without separating whitespace [message #1091997 is a reply to message #1091990]

Thu, 22 August 2013 03:33

Eclipse User

change to

grammar language.Name

(copy what you need from terminal grammar to yours (excluding ID)

Re: parsing arguments without separating whitespace [message #1092286 is a reply to message #1091687]

Thu, 22 August 2013 11:14

Eclipse User

Christian Dietrich wrote on Wed, 21 August 2013 17:21

did you try to turn Type into a terminal
and ID into a datatype rule?

Turning ID into a datatype rule causes ID no longer to be allowed to contain any keywords. Sad

For example if "end" is part of your language anywhere, something like "SendMessage" is no longer a valid ID.

----------

I found another strange thing with this grammar:

grammar language.Name
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate name "http ://www.Name.language"

Syntax: 'xxxx' id=ID;

ID : '^'?(PRIMITIVE|NONPRIM|'_')(PRIMITIVE|NONPRIM|NUMBER|'_')*;

terminal PRIMITIVE: 'B'|'D'|'F'|'I'|'J'|'V'|'Z';
terminal NONPRIM: 'A'..'Z'|'a'..'z';
terminal NUMBER:'0'..'9';

Depending on the input you get different results:

* "xxxx_aaaa_" is perfectly valid
* "xxxx_xxaa_" gives an error because the second keyword is incomplete (wtf?!)
* "xxxx_xxxx_" is parsed as Keyword-ID-Keyword-ID - which is no valid Syntax.

----------

=> Looks like a datatype rule for ID is no good idea...

[Updated on: Thu, 22 August 2013 11:16] by Moderator

Re: parsing arguments without separating whitespace [message #1092290 is a reply to message #1092286]

Thu, 22 August 2013 11:18

Eclipse User

hi,

as i said you have to introduce ID as Datatype. (as you did)

ifff xxxx is a keyword you would have to escape it with ^ i think

Re: parsing arguments without separating whitespace [message #1092293 is a reply to message #1092290]

Thu, 22 August 2013 11:23

Eclipse User

Or to change the grammar

(PRIMITIVE|NONPRIM|'_'|'xxxx')

Re: parsing arguments without separating whitespace [message #1092303 is a reply to message #1092293]

Thu, 22 August 2013 11:40

Eclipse User

Christian, don't be offended, but you are really not helpful at all.

Please first think and test before writing your suggestions. Both ideas obviously do not work.

Does anyone else have an idea how to solve the original problem?

Re: parsing arguments without separating whitespace [message #1092311 is a reply to message #1092303]

Thu, 22 August 2013 11:51

Eclipse User

sorry i dont have the time to test.
and both are meachanisms are std ways to solve the "xtext finds a keyword problem"

and your grammar seems to contain no whitespace at all (is this really wanted).
since the parser is eager to eat everything up unless

Syntax: FOURX id=ID;

ID : '^'?(PRIMITIVE|NONPRIM|'_'| 'x')(PRIMITIVE|NONPRIM|NUMBER|'_'| 'x')*;

FOURX : 'x' 'x' 'x' 'x';

terminal PRIMITIVE: 'B'|'D'|'F'|'I'|'J'|'V'|'Z';
terminal NONPRIM: 'A'..'Z'|'a'..'z';
terminal NUMBER:'0'..'9';

Re: parsing arguments without separating whitespace [message #1094809 is a reply to message #1091975]

Mon, 26 August 2013 04:10

Eclipse User

Michael Schnupp wrote on Thu, 22 August 2013 09:03

- ID is still lurking around. (Is it posible to deactivate an imported terminal Rule?)

This might just be the key.
You have to not import the "Terminals" grammar, as Christian had indicated.

grammar org.xtext.example.sandbox.Sandbox  // with org.eclipse.xtext.common.Terminals

import "http://www.eclipse.org/emf/2002/Ecore" as ecore

generate sandbox "http://www.xtext.org/example/sandbox/Sandbox"

Signature: '-'? method=Id2'(' types+=Type* ')' return=Type;

Type: PRIMITIVE|Class;
terminal PRIMITIVE: 'I'|'D'|'F';
terminal NONPRIM: 'A'..'C'|'E'|'G'..'H'|'J'..'Z'|'a'..'z';
terminal NUMBER:'0'..'9';
//terminal ID : '^'?('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
Id2 : '^'?(PRIMITIVE|NONPRIM|'_')(PRIMITIVE|NONPRIM|NUMBER|'_')*;
//terminal CLASS: 'L' ('a'..'z'|'A'..'Z'|'0'..'9'|'/'|'$')+ ';';
Class:'L'Id2('/'Id2)*';';

Works perfectly fine for me - III(III)I is parsed without error.

Re: parsing arguments without separating whitespace [message #1095138 is a reply to message #1094809]

Mon, 26 August 2013 13:28

Eclipse User

Claudio Heeg wrote on Mon, 26 August 2013 04:10

Works perfectly fine for me - III(III)I is parsed without error.

Yes, it works fine, but it will break horribly once your grammar contains any keywords. (See my xxxx example above.)

[Updated on: Mon, 26 August 2013 13:30] by Moderator

Re: parsing arguments without separating whitespace [message #1095552 is a reply to message #1095138]

Tue, 27 August 2013 03:55

Eclipse User

Michael Schnupp wrote on Mon, 26 August 2013 19:28

Yes, it works fine, but it will break horribly once your grammar contains any keywords. (See my xxxx example above.)

I'm sorry, but I can't quite follow.
You do want to have 'xxxx' (in this case) as a keyword somewhere in your language but also as a possible name of a function?

If that's the case, I believe you'd have to explicitly allow 'xxxx' on places where it may appear.
Sorry if I misunderstood the problem.

Re: parsing arguments without separating whitespace [message #1095604 is a reply to message #1095552]

Tue, 27 August 2013 05:22

Eclipse User

Claudio Heeg wrote on Tue, 27 August 2013 03:55

I'm sorry, but I can't quite follow.
You do want to have 'xxxx' (in this case) as a keyword somewhere in your language but also as a possible name of a function?

No, I want "end" as a keyword and "sendMessage" as a valid identifier(e.g. method name).

Similarly, in the xxxx-Example I want xxxx as a keyword and "_xxxx_" as a valid identifier.

[Updated on: Tue, 27 August 2013 05:23] by Moderator

Re: parsing arguments without separating whitespace [message #1095662 is a reply to message #1095604]

Tue, 27 August 2013 07:03

Eclipse User

I see.
So the problem once again seems to be the greediness of the lexer, i.e. seeing "II" as method arguments (if ID is a terminal), but lexing as much as possible into one token, thus making it an ID instead of two seperated PRIMITIVEs.

Is it possible for you to restrict Identifiers not to begin with a "PRIMITIVE", or can that not be changed within the language itself?
Otherwise I'm afraid I'm at a loss here and hope someone more knowledgeable will come along.

[Updated on: Tue, 27 August 2013 07:03] by Moderator

Re: parsing arguments without separating whitespace [message #1095675 is a reply to message #1095662]

Tue, 27 August 2013 07:27

Eclipse User

Yes, exactly this is the problem. Wink

And yes, I already tried to solve the problem with cheating: I defined the ID to only start with a lowercase letter. - This pretty much solves all problems - as long as there are no identifiers starting with a capital letter.

Unfortunately some identifiers (especially CONSTANTS) really do start with capital letters and the language is a existing one, hence I cannot change it's definition.

BTW: Two really good ideas just arrived at Stackoverflow.

Re: parsing arguments without separating whitespace [message #1095684 is a reply to message #1095675]

Tue, 27 August 2013 07:41

Eclipse User

As for the different fragments Sebastian is talking about, here's where an how you use them in a different context.
http://zarnekow.blogspot.de/2010/06/new-in-xtext-case-insensitive-languages.html
(Especially the part about replacing the existing generators below the code snippet.)

Still, even with those Generator fragments, it doesn't seem to work.

[Updated on: Tue, 27 August 2013 07:43] by Moderator

Re: parsing arguments without separating whitespace [message #1096020 is a reply to message #1090904]

Tue, 27 August 2013 17:36

Eclipse User

On 2013-21-08 17:41, Michael Schnupp wrote:
> Hello,
>
> I try to build an xtext grammar for parsing smali code.
>
> Method names are normal IDs, simple data types are just single letters
> "I" for Integer, "D" for double, etc.
>
> Method signatures are written like "add(II)I" which denotes a method
> with name "add" taking two integers and returning an integer.
>
> Another valid signature would be "III(III)I" - a method with name "III"
> taking three Integer "III".
>
> The most natural gramar would be:
>
> ID'('Type*')'Type
>
> Unfortunately the lexer always tags strings like "III" as an ID, even
> when Type* would be correct.
>
> What is the right way to build such a gramar?

I have followed the conversation that followed this post, and it seems
that however you try to work around the issues there is no way it is
completely right.

I was in a similar situation with the Puppet Language. The only way I
know how to solve this in a good way is to use an external lexer where
you have full control over the lexing. I.e. define the grammar in a
natural way, and solve all the problems in an external ANLTR based lexer
where you have full control. (Xtext supports this).

It takes a bit of setup, but works really well otherwise. I would not be
able to handle the Puppet language without it.

My implementation is in cloudsmith / geppetto @ github

Regards
- henrik

Re: parsing arguments without separating whitespace [message #1096442 is a reply to message #1096020]

Wed, 28 August 2013 07:56

Eclipse User

Hi,

I also tried to solve a lexer problem by using datatype. It was a dead end, because of simmilar side effects. My solution at the moment is a lexer with semantic predicates.

Maybe a semantic predicate for the token "Type" can help here, too. It depends on the grammar whether it is possible to find a predicate that decides whether it is ID or Type.

Semantic predicates are not supported by the xtext generated lexer and a custom lexer is required. I don't liked to build a custom lexer and just tweak the xtext generated lexer each time within the mwe2 workflow ( see http://www.eclipse.org/forums/index.php/mv/msg/494581/1073856/#msg_1073856 , it is just quick and dirty and can obviously improved a lot).

[Updated on: Wed, 28 August 2013 07:56] by Moderator

Previous Topic:	Generate non-Java code from the Ecore model of the grammar
Next Topic:	Wrong parent type when computing type of child expressions

Goto Forum:

-=] Back to Top [=-

Current Time: Sun Jul 13 11:22:16 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter