Eclipse Community Forums: TMF (Xtext) » Xtext generated parser reports errors

Home » Modeling » TMF (Xtext) » Xtext generated parser reports errors - why?(Xtext parser problem)

Xtext generated parser reports errors - why? [message #701723]

Mon, 25 July 2011 09:42

Eclipse User

Dear colleagures,

As an Xtext beginner I'm stuck with a parsing problem.

My (simplified) grammar (no 'with org.eclipse.xtext.common.Terminals', and 'import ".../emf/2002/Ecore" as ecore' instead) is:

MyDsl:
FourLettersFullStop = FourLettersFullStop
OneToManyLettersElement = OneToManyLettersElement;
FourLettersFullStop:
FourLetters = FourLetters '.' '\r\n';
FourLetters:
A A A A;
OneToManyLettersElement:
'XY ' OneToManyLetters += OneToManyLetters;
OneToManyLetters:
A+;
terminal A: // Uppercase alphabetic letter
('A'..'Z');

Now parsing 'ABYY.\r\nXY ARBITRARYTEXT' is no problem. When parsing 'ABXY.\r\nXY ARBITRARYTEXT' (note the 'XY' in 'ABXY') my generated parser reports errors however.

How can I tell the parser not to try matching the 'XY' in 'ABXY' with the 'OneToManyLettersElement' - and why should I need to?

Any hints are welcome - thanks for your cooperation.

Kind regards Franz-Josef

Re: Xtext generated parser reports errors - why? [message #701901 is a reply to message #701723]

Mon, 25 July 2011 14:15

Eclipse User

Input is first tokenized and then parsed so 'ABXY' ends up as Token(A, 'A'), Token(A, 'A'), Token('XY ') and now your parser has no option but to try and shoehorn that last token into something.

If you want "ABXY" to be parsed as one token, you need a terminal rule to do so.

Re: Xtext generated parser reports errors - why? [message #701911 is a reply to message #701723]

Mon, 25 July 2011 14:21

Eclipse User

Hi,

it is not a matter of the parser (which does have a context), but of the lexer (which does not).
The lexer chops the input into tokens based on the terminal rules and the keywords. It tries to make tokens as long as possible (and it does not care if *you* want the XY to be read as two A-tokens).

Is it really necessary to solve the FourLetters-Problem on the grammar level? Couldn't you have something like
Rule: fourLetters=ID '.'
oneToManyLetters=OneToMany;
OneToMany: 'XY' value=ID;

and have a validatio on Rule checking that fourLetters really has length 4? That way you can have meaningful error messages and at the same time make it easier for the lexer/parser. You'll also improve the memory footprint of the model.

Alex

Re: Xtext generated parser reports errors - why? [message #702335 is a reply to message #701911]

Tue, 26 July 2011 04:27

Eclipse User

Hi Alexander, hi Meinte,

Thank you for your contributions.

@Meinte: Parse "ABXY" as terminal rule is not an option, because any 4 letter combination of arbitrary chars is possible.
@Alexander: I don't have 'ID', because I don't use the built-in terminals from 'org.eclipse.xtext.common.Terminals'

What I learned: The lexer/tokenizer causes the problem. It does not only look for terminal rules (which I knew) but as well for keywords (like the 'xy ') - which I hadn't expected.

With all respect as a newbie: I wonder if that's an (architectural) bug in Xtext?

I think that the grammar describes clearly that the 'FourLetters' consist of arbitrary 4 letters; if the lexer/tokenizer decides to look for 'XY ' in that element it seems for me a wrong decision.

By the way, to give an idea of my application domain: I'm investigating Xtext to parse messages of the 'Airline teletype system' (http...://en.wikipedia.org/wiki/Airline_teletype_system) - which is widely used, but was defined nearly 90 years ago - long before computers came to life. Actually, when your flight is late such a message is generated. The messages are well defined, but the inventors didn' know much about Xtext - like me Smile

kind regards Franz-Josef

Re: Xtext generated parser reports errors - why? [message #702352 is a reply to message #702335]

Tue, 26 July 2011 04:56

Eclipse User

Hi,

it is not a bug. Keywords are very important "terminals" (one could say the most important ones in a structured language). They deserve a separate token as the parser desperately looks for keyword tokens in order to choose the correct path Wink

If you really want to tokenise on a one-letter basis, you might consider splitting up the 'XY ' keyword into 'X''Y'' '. You then have to adapt your grammar introducing a datatype-Rule for A.
ALetter: A | 'X' | 'Y';
and use ALetter in FourLetters etc. The reason is that the keyword 'X' and the terminal A overlap (both fit 'X' but the keyword will have the higher priority, hence you must specifically allow for that keyword).

Alex

Re: Xtext generated parser reports errors - why? [message #702465 is a reply to message #702352]

Tue, 26 July 2011 08:07

Eclipse User

Hallo Alex,

Your advice on the one-letter-tokenizer really made my day!

Thanks a lot.

Franz-Josef

Previous Topic:	ValidatorTester - Test all Checks
Next Topic:	Duplicated Type Error

Goto Forum:

-=] Back to Top [=-

Current Time: Mon Jul 07 06:01:30 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter