Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » parsing arguments without separating whitespace
parsing arguments without separating whitespace [message #1090904] Tue, 20 August 2013 20:27 Go to next message
Michael Schnupp is currently offline Michael SchnuppFriend
Messages: 8
Registered: August 2013
Junior Member
Hello,

I try to build an xtext grammar for parsing smali code.

Method names are normal IDs, simple data types are just single letters "I" for Integer, "D" for double, etc.

Method signatures are written like "add(II)I" which denotes a method with name "add" taking two integers and returning an integer.

Another valid signature would be "III(III)I" - a method with name "III" taking three Integer "III".

The most natural gramar would be:

  ID'('Type*')'Type


Unfortunately the lexer always tags strings like "III" as an ID, even when Type* would be correct.

What is the right way to build such a gramar?
Re: parsing arguments without separating whitespace [message #1091687 is a reply to message #1090904] Wed, 21 August 2013 21:21 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
Hi

did you try to turn Type into a terminal
and ID into a datatype rule?


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: parsing arguments without separating whitespace [message #1091975 is a reply to message #1091687] Thu, 22 August 2013 07:03 Go to previous messageGo to next message
Michael Schnupp is currently offline Michael SchnuppFriend
Messages: 8
Registered: August 2013
Junior Member
Alright I tried the following:

grammar language.Name with org.eclipse.xtext.common.Terminals

generate name "http ://www.Name.language"

Signature: '-' method=Id2'(' types+=Type* ')' return=Type;

Type: PRIMITIVE|Class;
terminal PRIMITIVE: 'I'|'D'|'F';
terminal NONPRIM: 'A'..'C'|'E'|'G'..'H'|'J'..'Z'|'a'..'z';
terminal NUMBER:'0'..'9';
//terminal ID : '^'?('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
Id2 : '^'?(PRIMITIVE|NONPRIM|'_')(PRIMITIVE|NONPRIM|NUMBER|'_')*;
//terminal CLASS: 'L' ('a'..'z'|'A'..'Z'|'0'..'9'|'/'|'$')+ ';';
Class:'L'Id2('/'Id2)*';';


Multiple problems:
- Id2 now expects whitespace between the single letters.
- ID is still lurking around. (Is it posible to deactivate an imported terminal Rule?)
- As far as I understood the documentation some xtext magic requires a terminal named ID.

=> It feels like this is not the inteded way for xtext.
Re: parsing arguments without separating whitespace [message #1091979 is a reply to message #1091975] Thu, 22 August 2013 07:07 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
Hi i thought Of not extending terminals grammar anymore

--
Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext at itemis dot de


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: parsing arguments without separating whitespace [message #1091990 is a reply to message #1091979] Thu, 22 August 2013 07:26 Go to previous messageGo to next message
Michael Schnupp is currently offline Michael SchnuppFriend
Messages: 8
Registered: August 2013
Junior Member
Sorry, I do not understand what you mean. (I am new to xtext. I only read the documentation and currently work on my first meaningful grammar.)

To define my own Id, I need some way to refer to "a single digit" or "any letter primitive or not", therefore I need some terminal rule, right?

If I do not yet understand some important concept, I'd be happy if you could give me a pointer in the right direction.
Re: parsing arguments without separating whitespace [message #1091997 is a reply to message #1091990] Thu, 22 August 2013 07:33 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
change to

grammar language.Name

(copy what you need from terminal grammar to yours (excluding ID)


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: parsing arguments without separating whitespace [message #1092286 is a reply to message #1091687] Thu, 22 August 2013 15:14 Go to previous messageGo to next message
Michael Schnupp is currently offline Michael SchnuppFriend
Messages: 8
Registered: August 2013
Junior Member
Christian Dietrich wrote on Wed, 21 August 2013 17:21
did you try to turn Type into a terminal
and ID into a datatype rule?


Turning ID into a datatype rule causes ID no longer to be allowed to contain any keywords. Sad

For example if "end" is part of your language anywhere, something like "SendMessage" is no longer a valid ID.

----------

I found another strange thing with this grammar:
grammar language.Name
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate name "http ://www.Name.language"

Syntax: 'xxxx' id=ID;

ID : '^'?(PRIMITIVE|NONPRIM|'_')(PRIMITIVE|NONPRIM|NUMBER|'_')*;

terminal PRIMITIVE: 'B'|'D'|'F'|'I'|'J'|'V'|'Z';
terminal NONPRIM: 'A'..'Z'|'a'..'z';
terminal NUMBER:'0'..'9';


Depending on the input you get different results:

* "xxxx_aaaa_" is perfectly valid
* "xxxx_xxaa_" gives an error because the second keyword is incomplete (wtf?!)
* "xxxx_xxxx_" is parsed as Keyword-ID-Keyword-ID - which is no valid Syntax.

----------

=> Looks like a datatype rule for ID is no good idea...

[Updated on: Thu, 22 August 2013 15:16]

Report message to a moderator

Re: parsing arguments without separating whitespace [message #1092290 is a reply to message #1092286] Thu, 22 August 2013 15:18 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
hi,

as i said you have to introduce ID as Datatype. (as you did)

ifff xxxx is a keyword you would have to escape it with ^ i think


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: parsing arguments without separating whitespace [message #1092293 is a reply to message #1092290] Thu, 22 August 2013 15:23 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
Or to change the grammar

(PRIMITIVE|NONPRIM|'_'|'xxxx')


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: parsing arguments without separating whitespace [message #1092303 is a reply to message #1092293] Thu, 22 August 2013 15:40 Go to previous messageGo to next message
Michael Schnupp is currently offline Michael SchnuppFriend
Messages: 8
Registered: August 2013
Junior Member
Christian, don't be offended, but you are really not helpful at all.

Please first think and test before writing your suggestions. Both ideas obviously do not work.

Does anyone else have an idea how to solve the original problem?
Re: parsing arguments without separating whitespace [message #1092311 is a reply to message #1092303] Thu, 22 August 2013 15:51 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
sorry i dont have the time to test.
and both are meachanisms are std ways to solve the "xtext finds a keyword problem"

and your grammar seems to contain no whitespace at all (is this really wanted).
since the parser is eager to eat everything up unless
Syntax: FOURX id=ID;

ID : '^'?(PRIMITIVE|NONPRIM|'_'| 'x')(PRIMITIVE|NONPRIM|NUMBER|'_'| 'x')*;

FOURX : 'x' 'x' 'x' 'x';

terminal PRIMITIVE: 'B'|'D'|'F'|'I'|'J'|'V'|'Z';
terminal NONPRIM: 'A'..'Z'|'a'..'z';
terminal NUMBER:'0'..'9';



Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: parsing arguments without separating whitespace [message #1094809 is a reply to message #1091975] Mon, 26 August 2013 08:10 Go to previous messageGo to next message
Claudio Heeg is currently offline Claudio HeegFriend
Messages: 75
Registered: April 2013
Member
Michael Schnupp wrote on Thu, 22 August 2013 09:03

- ID is still lurking around. (Is it posible to deactivate an imported terminal Rule?)

This might just be the key.
You have to not import the "Terminals" grammar, as Christian had indicated.

grammar org.xtext.example.sandbox.Sandbox  // with org.eclipse.xtext.common.Terminals

import "http://www.eclipse.org/emf/2002/Ecore" as ecore

generate sandbox "http://www.xtext.org/example/sandbox/Sandbox"

Signature: '-'? method=Id2'(' types+=Type* ')' return=Type;

Type: PRIMITIVE|Class;
terminal PRIMITIVE: 'I'|'D'|'F';
terminal NONPRIM: 'A'..'C'|'E'|'G'..'H'|'J'..'Z'|'a'..'z';
terminal NUMBER:'0'..'9';
//terminal ID : '^'?('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
Id2 : '^'?(PRIMITIVE|NONPRIM|'_')(PRIMITIVE|NONPRIM|NUMBER|'_')*;
//terminal CLASS: 'L' ('a'..'z'|'A'..'Z'|'0'..'9'|'/'|'$')+ ';';
Class:'L'Id2('/'Id2)*';';


Works perfectly fine for me - III(III)I is parsed without error.
Re: parsing arguments without separating whitespace [message #1095138 is a reply to message #1094809] Mon, 26 August 2013 17:28 Go to previous messageGo to next message
Michael Schnupp is currently offline Michael SchnuppFriend
Messages: 8
Registered: August 2013
Junior Member
Claudio Heeg wrote on Mon, 26 August 2013 04:10
Works perfectly fine for me - III(III)I is parsed without error.


Yes, it works fine, but it will break horribly once your grammar contains any keywords. (See my xxxx example above.)

[Updated on: Mon, 26 August 2013 17:30]

Report message to a moderator

Re: parsing arguments without separating whitespace [message #1095552 is a reply to message #1095138] Tue, 27 August 2013 07:55 Go to previous messageGo to next message
Claudio Heeg is currently offline Claudio HeegFriend
Messages: 75
Registered: April 2013
Member
Michael Schnupp wrote on Mon, 26 August 2013 19:28

Yes, it works fine, but it will break horribly once your grammar contains any keywords. (See my xxxx example above.)

I'm sorry, but I can't quite follow.
You do want to have 'xxxx' (in this case) as a keyword somewhere in your language but also as a possible name of a function?

If that's the case, I believe you'd have to explicitly allow 'xxxx' on places where it may appear.
Sorry if I misunderstood the problem.
Re: parsing arguments without separating whitespace [message #1095604 is a reply to message #1095552] Tue, 27 August 2013 09:22 Go to previous messageGo to next message
Michael Schnupp is currently offline Michael SchnuppFriend
Messages: 8
Registered: August 2013
Junior Member
Claudio Heeg wrote on Tue, 27 August 2013 03:55

I'm sorry, but I can't quite follow.
You do want to have 'xxxx' (in this case) as a keyword somewhere in your language but also as a possible name of a function?


No, I want "end" as a keyword and "sendMessage" as a valid identifier(e.g. method name).

Similarly, in the xxxx-Example I want xxxx as a keyword and "_xxxx_" as a valid identifier.

[Updated on: Tue, 27 August 2013 09:23]

Report message to a moderator

Re: parsing arguments without separating whitespace [message #1095662 is a reply to message #1095604] Tue, 27 August 2013 11:03 Go to previous messageGo to next message
Claudio Heeg is currently offline Claudio HeegFriend
Messages: 75
Registered: April 2013
Member
I see.
So the problem once again seems to be the greediness of the lexer, i.e. seeing "II" as method arguments (if ID is a terminal), but lexing as much as possible into one token, thus making it an ID instead of two seperated PRIMITIVEs.

Is it possible for you to restrict Identifiers not to begin with a "PRIMITIVE", or can that not be changed within the language itself?
Otherwise I'm afraid I'm at a loss here and hope someone more knowledgeable will come along.

[Updated on: Tue, 27 August 2013 11:03]

Report message to a moderator

Re: parsing arguments without separating whitespace [message #1095675 is a reply to message #1095662] Tue, 27 August 2013 11:27 Go to previous messageGo to next message
Michael Schnupp is currently offline Michael SchnuppFriend
Messages: 8
Registered: August 2013
Junior Member
Yes, exactly this is the problem. Wink

And yes, I already tried to solve the problem with cheating: I defined the ID to only start with a lowercase letter. - This pretty much solves all problems - as long as there are no identifiers starting with a capital letter.

Unfortunately some identifiers (especially CONSTANTS) really do start with capital letters and the language is a existing one, hence I cannot change it's definition.

BTW: Two really good ideas just arrived at Stackoverflow.
Re: parsing arguments without separating whitespace [message #1095684 is a reply to message #1095675] Tue, 27 August 2013 11:41 Go to previous messageGo to next message
Claudio Heeg is currently offline Claudio HeegFriend
Messages: 75
Registered: April 2013
Member
As for the different fragments Sebastian is talking about, here's where an how you use them in a different context.
http://zarnekow.blogspot.de/2010/06/new-in-xtext-case-insensitive-languages.html
(Especially the part about replacing the existing generators below the code snippet.)

Still, even with those Generator fragments, it doesn't seem to work.

[Updated on: Tue, 27 August 2013 11:43]

Report message to a moderator

Re: parsing arguments without separating whitespace [message #1096020 is a reply to message #1090904] Tue, 27 August 2013 21:36 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
On 2013-21-08 17:41, Michael Schnupp wrote:
> Hello,
>
> I try to build an xtext grammar for parsing smali code.
>
> Method names are normal IDs, simple data types are just single letters
> "I" for Integer, "D" for double, etc.
>
> Method signatures are written like "add(II)I" which denotes a method
> with name "add" taking two integers and returning an integer.
>
> Another valid signature would be "III(III)I" - a method with name "III"
> taking three Integer "III".
>
> The most natural gramar would be:
>
> ID'('Type*')'Type
>
> Unfortunately the lexer always tags strings like "III" as an ID, even
> when Type* would be correct.
>
> What is the right way to build such a gramar?

I have followed the conversation that followed this post, and it seems
that however you try to work around the issues there is no way it is
completely right.

I was in a similar situation with the Puppet Language. The only way I
know how to solve this in a good way is to use an external lexer where
you have full control over the lexing. I.e. define the grammar in a
natural way, and solve all the problems in an external ANLTR based lexer
where you have full control. (Xtext supports this).

It takes a bit of setup, but works really well otherwise. I would not be
able to handle the Puppet language without it.

My implementation is in cloudsmith / geppetto @ github

Regards
- henrik
Re: parsing arguments without separating whitespace [message #1096442 is a reply to message #1096020] Wed, 28 August 2013 11:56 Go to previous message
Jens Kuenzer is currently offline Jens KuenzerFriend
Messages: 29
Registered: October 2009
Junior Member
Hi,

I also tried to solve a lexer problem by using datatype. It was a dead end, because of simmilar side effects. My solution at the moment is a lexer with semantic predicates.

Maybe a semantic predicate for the token "Type" can help here, too. It depends on the grammar whether it is possible to find a predicate that decides whether it is ID or Type.

Semantic predicates are not supported by the xtext generated lexer and a custom lexer is required. I don't liked to build a custom lexer and just tweak the xtext generated lexer each time within the mwe2 workflow ( see http://www.eclipse.org/forums/index.php/mv/msg/494581/1073856/#msg_1073856 , it is just quick and dirty and can obviously improved a lot).

[Updated on: Wed, 28 August 2013 11:56]

Report message to a moderator

Previous Topic:Generate non-Java code from the Ecore model of the grammar
Next Topic:Wrong parent type when computing type of child expressions
Goto Forum:
  


Current Time: Sat Apr 20 14:55:58 GMT 2024

Powered by FUDForum. Page generated in 0.04460 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top