Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Newbie question -> How to let "anything" be accepted?
Newbie question -> How to let "anything" be accepted? [message #772457] Thu, 29 December 2011 20:45 Go to next message
wind surf is currently offline wind surfFriend
Messages: 3
Registered: December 2011
Junior Member
I 've a grammar like this.
In the DoStatement, it will be possible to type everything (The red line)... How to implement this??? why the .* doesn-t work? It's not like regexp?

grammar xxx with xxxx

generate myDsl xxxxxxxxx
Model:
(statements+=Statement)+;

Statement:
{Statement}
ML_COMMENT | VARIABLE | PunchStatement | DoStatement;

DoStatement:
') DO'
') (.)*'
') UNTIL'
;



PunchStatement:
') PUNCH GET LISTE ' name=ID ';'
(') ' TOKEN)*
') END PUNCH';

TOKEN:
'TEST1' | 'TEST2' | 'TEST3';

VARIABLE:
ID '=' STRING ';';
Re: Newbie question -> How to let "anything" be accepted? [message #772481 is a reply to message #772457] Thu, 29 December 2011 22:13 Go to previous message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
On 2011-29-12 21:45, wind surf wrote:
> It's not like regexp?

No, "it" is not like regexp.

Xtext (and most other technologies for parsing text) divides the problem
into separate parts; lexing and parsing.

The lexer turns the source text into tokens (with attached text), and
the parser checks if the sequence of tokens is a valid sequence of
tokens in the target language. Lexing takes place without any (i.e. only
textual) knowledge of already seen syntactical elements.
To use an analogue; the lexer turns letters into words and punctuation,
and the parser turns those into sentences. (Later, validation and
linking are used to check that the sentences have meaning / are valid).

DEAD FISH RIDES BICYCLES DIAGONALLY.

Regular expression like rules (like .*) belong in the lexing category,
and they are defined in Xtext via the keyword 'terminal'. You must
however be very careful, the terminal rules can not overlap - the first
rule that matches input is used to define which token the lexer will
emit. (in your case, if you were to use the rule .* as the first, your
grammar would only see that particular token because it shadows
everything else.

FISH FISH FISH FISH FISH...

Again a lexer analogue; a boat comes in to shore with the daily catch,
as boxes are unloaded, terminal workers (pun intended) label the crates
with tokens based on their content; COD, SHRIMP, LOBSTER, FISH.
Occasionally the content of the crates are separated into more than one;
SHRIMP WITH TWO LOBSTERS becomes SHRIMP, LOBSTER, LOBSTER. Workers can
remember what they just have sent off, and may also look ahead into the
coming crates before they make a decision (I am holding on to a crate
with only one COD, and if the next one is also COD, I will place the one
I have in that crate). etc.

I noticed you asked other questions about defining a language consisting
of HTML with Javascript and a new language intermixed.
This is a challenging task to take on. You have three languages that
seem to have very little in common. If your goal is to support one of
them and the rest is "noise" it is less challenging that supporting all
three with everything expected per language (coloring, understanding
syntax, linking, errors/warnings, quick fixes, code completion, etc.).

FISH FISH FISH do x FISH FISH do y ...

The first problem is to sort out how to distinguish between the
languages in source text. The easiest is to do this at the lexical
level, and you need to write an external lexer since the lexer generated
by default by Xtext does not support "modes". In order to do this, you
need to learn how to write an ANTLR lexer.

IN THE BIN EIN BERLINER WAS FOUND.
(Where does the English and German end/start)?

IN <!-- c --> THE <!-- o --> /* BIN */ <!-- m --> EIN <!-- m
--> BER<!-- e -->LINER<!-- n -->WAS<!-- s --> FOUND
<!-- . -->.

Is /* BIN */ a comment or not? What is a comment in one language (and
eaten whole by the lexer). <!-- --> is a comment in HTML, but not in
most other languages, and vice versa with /* */. Without "modes" you can
not do this, and all of your rules end up as being defined as grammar
(result is a huge very slow parser where almost nothing of the built-in
support is of any help to you).

If you really want to support javascript as one language, you know have
an even greater challenge as it is truly a horribly complex language to
parse due to its many oddball semantics. If all you want is a bit of
syntax coloring of keywords it may work just fine. (Google for any
javascript bashing site for examples of weird stuff in Javascript).

I have implemented support for two major languages using Xtext (and have
used other parser technologies before), and I would still consider
implementing HTML + embedded JavaScript + one more embedded language
using Xtext (or any other similar technology) to be a major undertaking.

If you really want to do this, start with something that allows you to
familiarize with the concepts and the technology. You need to know this
to be able to ask concrete questions.

Finally, to answer your question "How can anything be accepted".

Fish: . ;

eh, sorry:

ANY: . ;

Which is already placed last among the terminals in the default
terminals. And if you followed the reasoning above, you know it can not
be placed first since you would then *only* get ANY tokens. Also, if you
defined it as:

ANY: .* ;

Then it would get everything in the file from "that point onwards"
delivered as one ANY token.

Hope that helps.

Regards
- henrik

On 2011-29-12 21:45, wind surf wrote:
> I 've a grammar like this.
> In the DoStatement, it will be possible to type everything (The red
> line)... How to implement this??? why the .* doesn-t work? It's not like
> regexp?
>
> grammar xxx with xxxx
>
> generate myDsl xxxxxxxxx
> Model:
> (statements+=Statement)+;
>
> Statement:
> {Statement}
> ML_COMMENT | VARIABLE | PunchStatement | DoStatement;
>
> DoStatement:
> ') DO'
> ') (.)*'
> ') UNTIL'
> ;
>
>
>
> PunchStatement:
> ') PUNCH GET LISTE ' name=ID ';'
> (') ' TOKEN)*
> ') END PUNCH';
>
> TOKEN:
> 'TEST1' | 'TEST2' | 'TEST3';
>
> VARIABLE:
> ID '=' STRING ';';
>
Previous Topic:serialize a node
Next Topic:validation MyDslPackage.Literals.<??>
Goto Forum:
  


Current Time: Thu Apr 25 03:36:01 GMT 2024

Powered by FUDForum. Page generated in 0.02644 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top