Eclipse Community Forums: TMF (Xtext) » [xtext] is lexical analysis over syntactic analysis in xtext?

Home » Modeling » TMF (Xtext) » [xtext] is lexical analysis over syntactic analysis in xtext?

[xtext] is lexical analysis over syntactic analysis in xtext? [message #660252]

Thu, 17 March 2011 10:35

Eclipse User

Hello,

I have noticed that xtext tries first to identify lexical tokens and then it
tries to combine it to the grammar syntactic rules.
I think that this appoach is bad.

Does really identifying lexems happens before syntactic analysis?

example:
we had defined token FOR:
terminal FOR: "for".

But problem with xtext was that it identifies in the following example for
and terminal and not the other word:
////////// parsed file:start
....
format
....
////////// parsed file:end

xtext marked for in the word format as a LEXICAL TOKEN - terminal symbol.
This is wrong IMHO.

And we had to do the following
ForKeyword: F O R;

terminal F: ('f' | 'F');

terminal O: ('o' | 'O');

terminal R: ('r' | 'R');

But the parsing of such grammar is slower I would say,

BR,
Jan

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660276 is a reply to message #660252]

Thu, 17 March 2011 11:50

Eclipse User

The context free lexing (a priori lexing) is a property of the Antlr
parser generator we use in the backend of Xtext. It is a tradeoff for a
lot of beautiful things we get from using Antlr, such as error recovery.

As a result, you must be careful what kind of terminal rules you define
and in which order.

An easy workaround for your example should be to switch from a terminal
rule to a datatype rule (by leaving out the 'terminal' keyword). That
way, it won't be the lexer do decide whether its a token or an
identifier. 'for' is a keyword in your langauge anyway, isn't it?

Am 17.03.11 15:35, schrieb Jan:
> Hello,
>
> I have noticed that xtext tries first to identify lexical tokens and then it
> tries to combine it to the grammar syntactic rules.
> I think that this appoach is bad.
>
> Does really identifying lexems happens before syntactic analysis?
>
>
> example:
> we had defined token FOR:
> terminal FOR: "for".
>
> But problem with xtext was that it identifies in the following example for
> and terminal and not the other word:
> ////////// parsed file:start
> ...
> format
> ...
> ////////// parsed file:end
>
>
> xtext marked for in the word format as a LEXICAL TOKEN - terminal symbol.
> This is wrong IMHO.
>
>
> And we had to do the following
> ForKeyword: F O R;
>
> terminal F: ('f' | 'F');
>
>
> terminal O: ('o' | 'O');
>
>
> terminal R: ('r' | 'R');
>
> But the parsing of such grammar is slower I would say,
>
> BR,
> Jan
>
>

--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660372 is a reply to message #660252]

Fri, 18 March 2011 03:37

Eclipse User

Hi,

in addion to Jan's reply note that Xtext has a switch for case insensitive language. You should find information about that in the documentation.

Alex

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660406 is a reply to message #660276]

Fri, 18 March 2011 06:12

Eclipse User

Yes 'for' is our keyword.
Actually we are going realize editor for extended JavaScript.
What is your opinion. Is it possible with XText? I noticed that javascript
contains some automatic semicolon insertions in its syntax.

Thanks,
Jan

"Jan Koehnlein" <jan.koehnlein@itemis.de> wrote in message
news:ilta73$g72$1@news.eclipse.org...
> The context free lexing (a priori lexing) is a property of the Antlr
> parser generator we use in the backend of Xtext. It is a tradeoff for a
> lot of beautiful things we get from using Antlr, such as error recovery.
>
> As a result, you must be careful what kind of terminal rules you define
> and in which order.
>
> An easy workaround for your example should be to switch from a terminal
> rule to a datatype rule (by leaving out the 'terminal' keyword). That way,
> it won't be the lexer do decide whether its a token or an identifier.
> 'for' is a keyword in your langauge anyway, isn't it?
>
> Am 17.03.11 15:35, schrieb Jan:
>> Hello,
>>
>> I have noticed that xtext tries first to identify lexical tokens and then
>> it
>> tries to combine it to the grammar syntactic rules.
>> I think that this appoach is bad.
>>
>> Does really identifying lexems happens before syntactic analysis?
>>
>>
>> example:
>> we had defined token FOR:
>> terminal FOR: "for".
>>
>> But problem with xtext was that it identifies in the following example
>> for
>> and terminal and not the other word:
>> ////////// parsed file:start
>> ...
>> format
>> ...
>> ////////// parsed file:end
>>
>>
>> xtext marked for in the word format as a LEXICAL TOKEN - terminal symbol.
>> This is wrong IMHO.
>>
>>
>> And we had to do the following
>> ForKeyword: F O R;
>>
>> terminal F: ('f' | 'F');
>>
>>
>> terminal O: ('o' | 'O');
>>
>>
>> terminal R: ('r' | 'R');
>>
>> But the parsing of such grammar is slower I would say,
>>
>> BR,
>> Jan
>>
>>
>
>
> --
> Need professional support for Eclipse Modeling?
> Go visit: http://xtext.itemis.com

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660408 is a reply to message #660406]

Fri, 18 March 2011 06:24

Eclipse User

Not sure, because I don't know extended JavaScript too well.
The challenge in such projects is usually to get the grammar right and
free of ambiguities. You might have to enable backtracking or use
syntactic predicates (Xtext2 only).

Am 18.03.11 11:12, schrieb Jan:
> Yes 'for' is our keyword.
> Actually we are going realize editor for extended JavaScript.
> What is your opinion. Is it possible with XText? I noticed that javascript
> contains some automatic semicolon insertions in its syntax.
>
> Thanks,
> Jan
>
> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
> news:ilta73$g72$1@news.eclipse.org...
>> The context free lexing (a priori lexing) is a property of the Antlr
>> parser generator we use in the backend of Xtext. It is a tradeoff for a
>> lot of beautiful things we get from using Antlr, such as error recovery.
>>
>> As a result, you must be careful what kind of terminal rules you define
>> and in which order.
>>
>> An easy workaround for your example should be to switch from a terminal
>> rule to a datatype rule (by leaving out the 'terminal' keyword). That way,
>> it won't be the lexer do decide whether its a token or an identifier.
>> 'for' is a keyword in your langauge anyway, isn't it?
>>
>> Am 17.03.11 15:35, schrieb Jan:
>>> Hello,
>>>
>>> I have noticed that xtext tries first to identify lexical tokens and then
>>> it
>>> tries to combine it to the grammar syntactic rules.
>>> I think that this appoach is bad.
>>>
>>> Does really identifying lexems happens before syntactic analysis?
>>>
>>>
>>> example:
>>> we had defined token FOR:
>>> terminal FOR: "for".
>>>
>>> But problem with xtext was that it identifies in the following example
>>> for
>>> and terminal and not the other word:
>>> ////////// parsed file:start
>>> ...
>>> format
>>> ...
>>> ////////// parsed file:end
>>>
>>>
>>> xtext marked for in the word format as a LEXICAL TOKEN - terminal symbol.
>>> This is wrong IMHO.
>>>
>>>
>>> And we had to do the following
>>> ForKeyword: F O R;
>>>
>>> terminal F: ('f' | 'F');
>>>
>>>
>>> terminal O: ('o' | 'O');
>>>
>>>
>>> terminal R: ('r' | 'R');
>>>
>>> But the parsing of such grammar is slower I would say,
>>>
>>> BR,
>>> Jan
>>>
>>>
>>
>>
>> --
>> Need professional support for Eclipse Modeling?
>> Go visit: http://xtext.itemis.com
>
>

--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660409 is a reply to message #660408]

Fri, 18 March 2011 06:36

Eclipse User

Basically it's JSON + javascript.

Is there any documentation about " syntactic predicates "?

Thanks,

Jan

"Jan Koehnlein" <jan.koehnlein@itemis.de> wrote in message
news:ilvbf3$s4p$1@news.eclipse.org...
> Not sure, because I don't know extended JavaScript too well.
> The challenge in such projects is usually to get the grammar right and
> free of ambiguities. You might have to enable backtracking or use
> syntactic predicates (Xtext2 only).
>
> Am 18.03.11 11:12, schrieb Jan:
>> Yes 'for' is our keyword.
>> Actually we are going realize editor for extended JavaScript.
>> What is your opinion. Is it possible with XText? I noticed that
>> javascript
>> contains some automatic semicolon insertions in its syntax.
>>
>> Thanks,
>> Jan
>>
>> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
>> news:ilta73$g72$1@news.eclipse.org...
>>> The context free lexing (a priori lexing) is a property of the Antlr
>>> parser generator we use in the backend of Xtext. It is a tradeoff for a
>>> lot of beautiful things we get from using Antlr, such as error recovery.
>>>
>>> As a result, you must be careful what kind of terminal rules you define
>>> and in which order.
>>>
>>> An easy workaround for your example should be to switch from a terminal
>>> rule to a datatype rule (by leaving out the 'terminal' keyword). That
>>> way,
>>> it won't be the lexer do decide whether its a token or an identifier.
>>> 'for' is a keyword in your langauge anyway, isn't it?
>>>
>>> Am 17.03.11 15:35, schrieb Jan:
>>>> Hello,
>>>>
>>>> I have noticed that xtext tries first to identify lexical tokens and
>>>> then
>>>> it
>>>> tries to combine it to the grammar syntactic rules.
>>>> I think that this appoach is bad.
>>>>
>>>> Does really identifying lexems happens before syntactic analysis?
>>>>
>>>>
>>>> example:
>>>> we had defined token FOR:
>>>> terminal FOR: "for".
>>>>
>>>> But problem with xtext was that it identifies in the following example
>>>> for
>>>> and terminal and not the other word:
>>>> ////////// parsed file:start
>>>> ...
>>>> format
>>>> ...
>>>> ////////// parsed file:end
>>>>
>>>>
>>>> xtext marked for in the word format as a LEXICAL TOKEN - terminal
>>>> symbol.
>>>> This is wrong IMHO.
>>>>
>>>>
>>>> And we had to do the following
>>>> ForKeyword: F O R;
>>>>
>>>> terminal F: ('f' | 'F');
>>>>
>>>>
>>>> terminal O: ('o' | 'O');
>>>>
>>>>
>>>> terminal R: ('r' | 'R');
>>>>
>>>> But the parsing of such grammar is slower I would say,
>>>>
>>>> BR,
>>>> Jan
>>>>
>>>>
>>>
>>>
>>> --
>>> Need professional support for Eclipse Modeling?
>>> Go visit: http://xtext.itemis.com
>>
>>
>
>
> --
> Need professional support for Eclipse Modeling?
> Go visit: http://xtext.itemis.com

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660453 is a reply to message #660409]

Fri, 18 March 2011 09:59

Eclipse User

Syntactic predicates are new in Xtext 2.0 and the documentation is not
yet finished. But there's a thread "syntatic predicates in Xtext 2.0" in
this newsgroup.

Am 18.03.11 11:36, schrieb Jan:
> Basically it's JSON + javascript.
>
> Is there any documentation about " syntactic predicates "?
>
>
>
> Thanks,
>
> Jan
>
>
>
> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
> news:ilvbf3$s4p$1@news.eclipse.org...
>> Not sure, because I don't know extended JavaScript too well.
>> The challenge in such projects is usually to get the grammar right and
>> free of ambiguities. You might have to enable backtracking or use
>> syntactic predicates (Xtext2 only).
>>
>> Am 18.03.11 11:12, schrieb Jan:
>>> Yes 'for' is our keyword.
>>> Actually we are going realize editor for extended JavaScript.
>>> What is your opinion. Is it possible with XText? I noticed that
>>> javascript
>>> contains some automatic semicolon insertions in its syntax.
>>>
>>> Thanks,
>>> Jan
>>>
>>> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
>>> news:ilta73$g72$1@news.eclipse.org...
>>>> The context free lexing (a priori lexing) is a property of the Antlr
>>>> parser generator we use in the backend of Xtext. It is a tradeoff for a
>>>> lot of beautiful things we get from using Antlr, such as error recovery.
>>>>
>>>> As a result, you must be careful what kind of terminal rules you define
>>>> and in which order.
>>>>
>>>> An easy workaround for your example should be to switch from a terminal
>>>> rule to a datatype rule (by leaving out the 'terminal' keyword). That
>>>> way,
>>>> it won't be the lexer do decide whether its a token or an identifier.
>>>> 'for' is a keyword in your langauge anyway, isn't it?
>>>>
>>>> Am 17.03.11 15:35, schrieb Jan:
>>>>> Hello,
>>>>>
>>>>> I have noticed that xtext tries first to identify lexical tokens and
>>>>> then
>>>>> it
>>>>> tries to combine it to the grammar syntactic rules.
>>>>> I think that this appoach is bad.
>>>>>
>>>>> Does really identifying lexems happens before syntactic analysis?
>>>>>
>>>>>
>>>>> example:
>>>>> we had defined token FOR:
>>>>> terminal FOR: "for".
>>>>>
>>>>> But problem with xtext was that it identifies in the following example
>>>>> for
>>>>> and terminal and not the other word:
>>>>> ////////// parsed file:start
>>>>> ...
>>>>> format
>>>>> ...
>>>>> ////////// parsed file:end
>>>>>
>>>>>
>>>>> xtext marked for in the word format as a LEXICAL TOKEN - terminal
>>>>> symbol.
>>>>> This is wrong IMHO.
>>>>>
>>>>>
>>>>> And we had to do the following
>>>>> ForKeyword: F O R;
>>>>>
>>>>> terminal F: ('f' | 'F');
>>>>>
>>>>>
>>>>> terminal O: ('o' | 'O');
>>>>>
>>>>>
>>>>> terminal R: ('r' | 'R');
>>>>>
>>>>> But the parsing of such grammar is slower I would say,
>>>>>
>>>>> BR,
>>>>> Jan
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Need professional support for Eclipse Modeling?
>>>> Go visit: http://xtext.itemis.com
>>>
>>>
>>
>>
>> --
>> Need professional support for Eclipse Modeling?
>> Go visit: http://xtext.itemis.com
>
>

--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660509 is a reply to message #660409]

Fri, 18 March 2011 13:20

Eclipse User

I don't think it is enough with syntactic predicates to specify a JS
parser, you probably also need semantic predicates (which are not
supported in Xtext). JS is a *bitch* to parse if you aim to correctly
cover the entire language.

Suggest you get the antlr book and look at some JS parser samples
written for antlr so you know what sort of challenges you will encounter
before you start.

Regards
- henrik

On 3/18/11 11:36 AM, Jan wrote:
> Basically it's JSON + javascript.
>
> Is there any documentation about " syntactic predicates "?
>
>
>
> Thanks,
>
> Jan
>
>
>
> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
> news:ilvbf3$s4p$1@news.eclipse.org...
>> Not sure, because I don't know extended JavaScript too well.
>> The challenge in such projects is usually to get the grammar right and
>> free of ambiguities. You might have to enable backtracking or use
>> syntactic predicates (Xtext2 only).
>>
>> Am 18.03.11 11:12, schrieb Jan:
>>> Yes 'for' is our keyword.
>>> Actually we are going realize editor for extended JavaScript.
>>> What is your opinion. Is it possible with XText? I noticed that
>>> javascript
>>> contains some automatic semicolon insertions in its syntax.
>>>
>>> Thanks,
>>> Jan
>>>
>>> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
>>> news:ilta73$g72$1@news.eclipse.org...
>>>> The context free lexing (a priori lexing) is a property of the Antlr
>>>> parser generator we use in the backend of Xtext. It is a tradeoff for a
>>>> lot of beautiful things we get from using Antlr, such as error recovery.
>>>>
>>>> As a result, you must be careful what kind of terminal rules you define
>>>> and in which order.
>>>>
>>>> An easy workaround for your example should be to switch from a terminal
>>>> rule to a datatype rule (by leaving out the 'terminal' keyword). That
>>>> way,
>>>> it won't be the lexer do decide whether its a token or an identifier.
>>>> 'for' is a keyword in your langauge anyway, isn't it?
>>>>
>>>> Am 17.03.11 15:35, schrieb Jan:
>>>>> Hello,
>>>>>
>>>>> I have noticed that xtext tries first to identify lexical tokens and
>>>>> then
>>>>> it
>>>>> tries to combine it to the grammar syntactic rules.
>>>>> I think that this appoach is bad.
>>>>>
>>>>> Does really identifying lexems happens before syntactic analysis?
>>>>>
>>>>>
>>>>> example:
>>>>> we had defined token FOR:
>>>>> terminal FOR: "for".
>>>>>
>>>>> But problem with xtext was that it identifies in the following example
>>>>> for
>>>>> and terminal and not the other word:
>>>>> ////////// parsed file:start
>>>>> ...
>>>>> format
>>>>> ...
>>>>> ////////// parsed file:end
>>>>>
>>>>>
>>>>> xtext marked for in the word format as a LEXICAL TOKEN - terminal
>>>>> symbol.
>>>>> This is wrong IMHO.
>>>>>
>>>>>
>>>>> And we had to do the following
>>>>> ForKeyword: F O R;
>>>>>
>>>>> terminal F: ('f' | 'F');
>>>>>
>>>>>
>>>>> terminal O: ('o' | 'O');
>>>>>
>>>>>
>>>>> terminal R: ('r' | 'R');
>>>>>
>>>>> But the parsing of such grammar is slower I would say,
>>>>>
>>>>> BR,
>>>>> Jan
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Need professional support for Eclipse Modeling?
>>>> Go visit: http://xtext.itemis.com
>>>
>>>
>>
>>
>> --
>> Need professional support for Eclipse Modeling?
>> Go visit: http://xtext.itemis.com
>
>

Previous Topic:	Caching extracted names
Next Topic:	Getting the Qualified Name and Global Name Collision

Goto Forum:

-=] Back to Top [=-

Current Time: Sun Jul 06 10:58:21 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter