Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Xtext generated parser reports errors - why?(Xtext parser problem)
Xtext generated parser reports errors - why? [message #701723] Mon, 25 July 2011 13:42 Go to next message
FJ  Stöver is currently offline FJ StöverFriend
Messages: 25
Registered: July 2011
Location: Aachen, Germany
Junior Member
Dear colleagures,

As an Xtext beginner I'm stuck with a parsing problem.

My (simplified) grammar (no 'with org.eclipse.xtext.common.Terminals', and 'import ".../emf/2002/Ecore" as ecore' instead) is:

MyDsl:
FourLettersFullStop = FourLettersFullStop
OneToManyLettersElement = OneToManyLettersElement;
FourLettersFullStop:
FourLetters = FourLetters '.' '\r\n';
FourLetters:
A A A A;
OneToManyLettersElement:
'XY ' OneToManyLetters += OneToManyLetters;
OneToManyLetters:
A+;
terminal A: // Uppercase alphabetic letter
('A'..'Z');

Now parsing 'ABYY.\r\nXY ARBITRARYTEXT' is no problem. When parsing 'ABXY.\r\nXY ARBITRARYTEXT' (note the 'XY' in 'ABXY') my generated parser reports errors however.

How can I tell the parser not to try matching the 'XY' in 'ABXY' with the 'OneToManyLettersElement' - and why should I need to?

Any hints are welcome - thanks for your cooperation.

Kind regards Franz-Josef
Re: Xtext generated parser reports errors - why? [message #701901 is a reply to message #701723] Mon, 25 July 2011 18:15 Go to previous messageGo to next message
Meinte Boersma is currently offline Meinte BoersmaFriend
Messages: 434
Registered: July 2009
Location: Leiden, Netherlands
Senior Member
Input is first tokenized and then parsed so 'ABXY' ends up as Token(A, 'A'), Token(A, 'A'), Token('XY ') and now your parser has no option but to try and shoehorn that last token into something.

If you want "ABXY" to be parsed as one token, you need a terminal rule to do so.


Re: Xtext generated parser reports errors - why? [message #701911 is a reply to message #701723] Mon, 25 July 2011 18:21 Go to previous messageGo to next message
Alexander Nittka is currently offline Alexander NittkaFriend
Messages: 1193
Registered: July 2009
Senior Member
Hi,

it is not a matter of the parser (which does have a context), but of the lexer (which does not).
The lexer chops the input into tokens based on the terminal rules and the keywords. It tries to make tokens as long as possible (and it does not care if *you* want the XY to be read as two A-tokens).

Is it really necessary to solve the FourLetters-Problem on the grammar level? Couldn't you have something like
Rule: fourLetters=ID '.'
oneToManyLetters=OneToMany;
OneToMany: 'XY' value=ID;

and have a validatio on Rule checking that fourLetters really has length 4? That way you can have meaningful error messages and at the same time make it easier for the lexer/parser. You'll also improve the memory footprint of the model.

Alex
Re: Xtext generated parser reports errors - why? [message #702335 is a reply to message #701911] Tue, 26 July 2011 08:27 Go to previous messageGo to next message
FJ  Stöver is currently offline FJ StöverFriend
Messages: 25
Registered: July 2011
Location: Aachen, Germany
Junior Member
Hi Alexander, hi Meinte,

Thank you for your contributions.

@Meinte: Parse "ABXY" as terminal rule is not an option, because any 4 letter combination of arbitrary chars is possible.
@Alexander: I don't have 'ID', because I don't use the built-in terminals from 'org.eclipse.xtext.common.Terminals'

What I learned: The lexer/tokenizer causes the problem. It does not only look for terminal rules (which I knew) but as well for keywords (like the 'xy ') - which I hadn't expected.


With all respect as a newbie: I wonder if that's an (architectural) bug in Xtext?

I think that the grammar describes clearly that the 'FourLetters' consist of arbitrary 4 letters; if the lexer/tokenizer decides to look for 'XY ' in that element it seems for me a wrong decision.


By the way, to give an idea of my application domain: I'm investigating Xtext to parse messages of the 'Airline teletype system' (http...://en.wikipedia.org/wiki/Airline_teletype_system) - which is widely used, but was defined nearly 90 years ago - long before computers came to life. Actually, when your flight is late such a message is generated. The messages are well defined, but the inventors didn' know much about Xtext - like me Smile

kind regards Franz-Josef
Re: Xtext generated parser reports errors - why? [message #702352 is a reply to message #702335] Tue, 26 July 2011 08:56 Go to previous messageGo to next message
Alexander Nittka is currently offline Alexander NittkaFriend
Messages: 1193
Registered: July 2009
Senior Member
Hi,

it is not a bug. Keywords are very important "terminals" (one could say the most important ones in a structured language). They deserve a separate token as the parser desperately looks for keyword tokens in order to choose the correct path Wink If you really want to tokenise on a one-letter basis, you might consider splitting up the 'XY ' keyword into 'X''Y'' '. You then have to adapt your grammar introducing a datatype-Rule for A.
ALetter: A | 'X' | 'Y';
and use ALetter in FourLetters etc. The reason is that the keyword 'X' and the terminal A overlap (both fit 'X' but the keyword will have the higher priority, hence you must specifically allow for that keyword).

Alex
Re: Xtext generated parser reports errors - why? [message #702465 is a reply to message #702352] Tue, 26 July 2011 12:07 Go to previous message
FJ  Stöver is currently offline FJ StöverFriend
Messages: 25
Registered: July 2011
Location: Aachen, Germany
Junior Member
Hallo Alex,

Your advice on the one-letter-tokenizer really made my day!

Thanks a lot.

Franz-Josef
Previous Topic:ValidatorTester - Test all Checks
Next Topic:Duplicated Type Error
Goto Forum:
  


Current Time: Sat Apr 10 23:23:55 GMT 2021

Powered by FUDForum. Page generated in 0.02351 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top