Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » [xtext] is lexical analysis over syntactic analysis in xtext?
[xtext] is lexical analysis over syntactic analysis in xtext? [message #660252] Thu, 17 March 2011 14:35 Go to next message
hanys is currently offline hanysFriend
Messages: 188
Registered: July 2009
Senior Member
Hello,

I have noticed that xtext tries first to identify lexical tokens and then it
tries to combine it to the grammar syntactic rules.
I think that this appoach is bad.

Does really identifying lexems happens before syntactic analysis?


example:
we had defined token FOR:
terminal FOR: "for".

But problem with xtext was that it identifies in the following example for
and terminal and not the other word:
////////// parsed file:start
....
format
....
////////// parsed file:end


xtext marked for in the word format as a LEXICAL TOKEN - terminal symbol.
This is wrong IMHO.


And we had to do the following
ForKeyword: F O R;

terminal F: ('f' | 'F');


terminal O: ('o' | 'O');


terminal R: ('r' | 'R');

But the parsing of such grammar is slower I would say,

BR,
Jan
Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660276 is a reply to message #660252] Thu, 17 March 2011 15:50 Go to previous messageGo to next message
Jan Koehnlein is currently offline Jan KoehnleinFriend
Messages: 760
Registered: July 2009
Location: Hamburg
Senior Member
The context free lexing (a priori lexing) is a property of the Antlr
parser generator we use in the backend of Xtext. It is a tradeoff for a
lot of beautiful things we get from using Antlr, such as error recovery.

As a result, you must be careful what kind of terminal rules you define
and in which order.

An easy workaround for your example should be to switch from a terminal
rule to a datatype rule (by leaving out the 'terminal' keyword). That
way, it won't be the lexer do decide whether its a token or an
identifier. 'for' is a keyword in your langauge anyway, isn't it?

Am 17.03.11 15:35, schrieb Jan:
> Hello,
>
> I have noticed that xtext tries first to identify lexical tokens and then it
> tries to combine it to the grammar syntactic rules.
> I think that this appoach is bad.
>
> Does really identifying lexems happens before syntactic analysis?
>
>
> example:
> we had defined token FOR:
> terminal FOR: "for".
>
> But problem with xtext was that it identifies in the following example for
> and terminal and not the other word:
> ////////// parsed file:start
> ...
> format
> ...
> ////////// parsed file:end
>
>
> xtext marked for in the word format as a LEXICAL TOKEN - terminal symbol.
> This is wrong IMHO.
>
>
> And we had to do the following
> ForKeyword: F O R;
>
> terminal F: ('f' | 'F');
>
>
> terminal O: ('o' | 'O');
>
>
> terminal R: ('r' | 'R');
>
> But the parsing of such grammar is slower I would say,
>
> BR,
> Jan
>
>


--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com


---
Get professional support from the Xtext committers at www.typefox.io
Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660372 is a reply to message #660252] Fri, 18 March 2011 07:37 Go to previous messageGo to next message
Alexander Nittka is currently offline Alexander NittkaFriend
Messages: 1193
Registered: July 2009
Senior Member
Hi,

in addion to Jan's reply note that Xtext has a switch for case insensitive language. You should find information about that in the documentation.

Alex
Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660406 is a reply to message #660276] Fri, 18 March 2011 10:12 Go to previous messageGo to next message
hanys is currently offline hanysFriend
Messages: 188
Registered: July 2009
Senior Member
Yes 'for' is our keyword.
Actually we are going realize editor for extended JavaScript.
What is your opinion. Is it possible with XText? I noticed that javascript
contains some automatic semicolon insertions in its syntax.

Thanks,
Jan

"Jan Koehnlein" <jan.koehnlein@itemis.de> wrote in message
news:ilta73$g72$1@news.eclipse.org...
> The context free lexing (a priori lexing) is a property of the Antlr
> parser generator we use in the backend of Xtext. It is a tradeoff for a
> lot of beautiful things we get from using Antlr, such as error recovery.
>
> As a result, you must be careful what kind of terminal rules you define
> and in which order.
>
> An easy workaround for your example should be to switch from a terminal
> rule to a datatype rule (by leaving out the 'terminal' keyword). That way,
> it won't be the lexer do decide whether its a token or an identifier.
> 'for' is a keyword in your langauge anyway, isn't it?
>
> Am 17.03.11 15:35, schrieb Jan:
>> Hello,
>>
>> I have noticed that xtext tries first to identify lexical tokens and then
>> it
>> tries to combine it to the grammar syntactic rules.
>> I think that this appoach is bad.
>>
>> Does really identifying lexems happens before syntactic analysis?
>>
>>
>> example:
>> we had defined token FOR:
>> terminal FOR: "for".
>>
>> But problem with xtext was that it identifies in the following example
>> for
>> and terminal and not the other word:
>> ////////// parsed file:start
>> ...
>> format
>> ...
>> ////////// parsed file:end
>>
>>
>> xtext marked for in the word format as a LEXICAL TOKEN - terminal symbol.
>> This is wrong IMHO.
>>
>>
>> And we had to do the following
>> ForKeyword: F O R;
>>
>> terminal F: ('f' | 'F');
>>
>>
>> terminal O: ('o' | 'O');
>>
>>
>> terminal R: ('r' | 'R');
>>
>> But the parsing of such grammar is slower I would say,
>>
>> BR,
>> Jan
>>
>>
>
>
> --
> Need professional support for Eclipse Modeling?
> Go visit: http://xtext.itemis.com
Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660408 is a reply to message #660406] Fri, 18 March 2011 10:24 Go to previous messageGo to next message
Jan Koehnlein is currently offline Jan KoehnleinFriend
Messages: 760
Registered: July 2009
Location: Hamburg
Senior Member
Not sure, because I don't know extended JavaScript too well.
The challenge in such projects is usually to get the grammar right and
free of ambiguities. You might have to enable backtracking or use
syntactic predicates (Xtext2 only).

Am 18.03.11 11:12, schrieb Jan:
> Yes 'for' is our keyword.
> Actually we are going realize editor for extended JavaScript.
> What is your opinion. Is it possible with XText? I noticed that javascript
> contains some automatic semicolon insertions in its syntax.
>
> Thanks,
> Jan
>
> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
> news:ilta73$g72$1@news.eclipse.org...
>> The context free lexing (a priori lexing) is a property of the Antlr
>> parser generator we use in the backend of Xtext. It is a tradeoff for a
>> lot of beautiful things we get from using Antlr, such as error recovery.
>>
>> As a result, you must be careful what kind of terminal rules you define
>> and in which order.
>>
>> An easy workaround for your example should be to switch from a terminal
>> rule to a datatype rule (by leaving out the 'terminal' keyword). That way,
>> it won't be the lexer do decide whether its a token or an identifier.
>> 'for' is a keyword in your langauge anyway, isn't it?
>>
>> Am 17.03.11 15:35, schrieb Jan:
>>> Hello,
>>>
>>> I have noticed that xtext tries first to identify lexical tokens and then
>>> it
>>> tries to combine it to the grammar syntactic rules.
>>> I think that this appoach is bad.
>>>
>>> Does really identifying lexems happens before syntactic analysis?
>>>
>>>
>>> example:
>>> we had defined token FOR:
>>> terminal FOR: "for".
>>>
>>> But problem with xtext was that it identifies in the following example
>>> for
>>> and terminal and not the other word:
>>> ////////// parsed file:start
>>> ...
>>> format
>>> ...
>>> ////////// parsed file:end
>>>
>>>
>>> xtext marked for in the word format as a LEXICAL TOKEN - terminal symbol.
>>> This is wrong IMHO.
>>>
>>>
>>> And we had to do the following
>>> ForKeyword: F O R;
>>>
>>> terminal F: ('f' | 'F');
>>>
>>>
>>> terminal O: ('o' | 'O');
>>>
>>>
>>> terminal R: ('r' | 'R');
>>>
>>> But the parsing of such grammar is slower I would say,
>>>
>>> BR,
>>> Jan
>>>
>>>
>>
>>
>> --
>> Need professional support for Eclipse Modeling?
>> Go visit: http://xtext.itemis.com
>
>


--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com


---
Get professional support from the Xtext committers at www.typefox.io
Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660409 is a reply to message #660408] Fri, 18 March 2011 10:36 Go to previous messageGo to next message
hanys is currently offline hanysFriend
Messages: 188
Registered: July 2009
Senior Member
Basically it's JSON + javascript.

Is there any documentation about " syntactic predicates "?



Thanks,

Jan



"Jan Koehnlein" <jan.koehnlein@itemis.de> wrote in message
news:ilvbf3$s4p$1@news.eclipse.org...
> Not sure, because I don't know extended JavaScript too well.
> The challenge in such projects is usually to get the grammar right and
> free of ambiguities. You might have to enable backtracking or use
> syntactic predicates (Xtext2 only).
>
> Am 18.03.11 11:12, schrieb Jan:
>> Yes 'for' is our keyword.
>> Actually we are going realize editor for extended JavaScript.
>> What is your opinion. Is it possible with XText? I noticed that
>> javascript
>> contains some automatic semicolon insertions in its syntax.
>>
>> Thanks,
>> Jan
>>
>> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
>> news:ilta73$g72$1@news.eclipse.org...
>>> The context free lexing (a priori lexing) is a property of the Antlr
>>> parser generator we use in the backend of Xtext. It is a tradeoff for a
>>> lot of beautiful things we get from using Antlr, such as error recovery.
>>>
>>> As a result, you must be careful what kind of terminal rules you define
>>> and in which order.
>>>
>>> An easy workaround for your example should be to switch from a terminal
>>> rule to a datatype rule (by leaving out the 'terminal' keyword). That
>>> way,
>>> it won't be the lexer do decide whether its a token or an identifier.
>>> 'for' is a keyword in your langauge anyway, isn't it?
>>>
>>> Am 17.03.11 15:35, schrieb Jan:
>>>> Hello,
>>>>
>>>> I have noticed that xtext tries first to identify lexical tokens and
>>>> then
>>>> it
>>>> tries to combine it to the grammar syntactic rules.
>>>> I think that this appoach is bad.
>>>>
>>>> Does really identifying lexems happens before syntactic analysis?
>>>>
>>>>
>>>> example:
>>>> we had defined token FOR:
>>>> terminal FOR: "for".
>>>>
>>>> But problem with xtext was that it identifies in the following example
>>>> for
>>>> and terminal and not the other word:
>>>> ////////// parsed file:start
>>>> ...
>>>> format
>>>> ...
>>>> ////////// parsed file:end
>>>>
>>>>
>>>> xtext marked for in the word format as a LEXICAL TOKEN - terminal
>>>> symbol.
>>>> This is wrong IMHO.
>>>>
>>>>
>>>> And we had to do the following
>>>> ForKeyword: F O R;
>>>>
>>>> terminal F: ('f' | 'F');
>>>>
>>>>
>>>> terminal O: ('o' | 'O');
>>>>
>>>>
>>>> terminal R: ('r' | 'R');
>>>>
>>>> But the parsing of such grammar is slower I would say,
>>>>
>>>> BR,
>>>> Jan
>>>>
>>>>
>>>
>>>
>>> --
>>> Need professional support for Eclipse Modeling?
>>> Go visit: http://xtext.itemis.com
>>
>>
>
>
> --
> Need professional support for Eclipse Modeling?
> Go visit: http://xtext.itemis.com
Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660453 is a reply to message #660409] Fri, 18 March 2011 13:59 Go to previous messageGo to next message
Jan Koehnlein is currently offline Jan KoehnleinFriend
Messages: 760
Registered: July 2009
Location: Hamburg
Senior Member
Syntactic predicates are new in Xtext 2.0 and the documentation is not
yet finished. But there's a thread "syntatic predicates in Xtext 2.0" in
this newsgroup.

Am 18.03.11 11:36, schrieb Jan:
> Basically it's JSON + javascript.
>
> Is there any documentation about " syntactic predicates "?
>
>
>
> Thanks,
>
> Jan
>
>
>
> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
> news:ilvbf3$s4p$1@news.eclipse.org...
>> Not sure, because I don't know extended JavaScript too well.
>> The challenge in such projects is usually to get the grammar right and
>> free of ambiguities. You might have to enable backtracking or use
>> syntactic predicates (Xtext2 only).
>>
>> Am 18.03.11 11:12, schrieb Jan:
>>> Yes 'for' is our keyword.
>>> Actually we are going realize editor for extended JavaScript.
>>> What is your opinion. Is it possible with XText? I noticed that
>>> javascript
>>> contains some automatic semicolon insertions in its syntax.
>>>
>>> Thanks,
>>> Jan
>>>
>>> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
>>> news:ilta73$g72$1@news.eclipse.org...
>>>> The context free lexing (a priori lexing) is a property of the Antlr
>>>> parser generator we use in the backend of Xtext. It is a tradeoff for a
>>>> lot of beautiful things we get from using Antlr, such as error recovery.
>>>>
>>>> As a result, you must be careful what kind of terminal rules you define
>>>> and in which order.
>>>>
>>>> An easy workaround for your example should be to switch from a terminal
>>>> rule to a datatype rule (by leaving out the 'terminal' keyword). That
>>>> way,
>>>> it won't be the lexer do decide whether its a token or an identifier.
>>>> 'for' is a keyword in your langauge anyway, isn't it?
>>>>
>>>> Am 17.03.11 15:35, schrieb Jan:
>>>>> Hello,
>>>>>
>>>>> I have noticed that xtext tries first to identify lexical tokens and
>>>>> then
>>>>> it
>>>>> tries to combine it to the grammar syntactic rules.
>>>>> I think that this appoach is bad.
>>>>>
>>>>> Does really identifying lexems happens before syntactic analysis?
>>>>>
>>>>>
>>>>> example:
>>>>> we had defined token FOR:
>>>>> terminal FOR: "for".
>>>>>
>>>>> But problem with xtext was that it identifies in the following example
>>>>> for
>>>>> and terminal and not the other word:
>>>>> ////////// parsed file:start
>>>>> ...
>>>>> format
>>>>> ...
>>>>> ////////// parsed file:end
>>>>>
>>>>>
>>>>> xtext marked for in the word format as a LEXICAL TOKEN - terminal
>>>>> symbol.
>>>>> This is wrong IMHO.
>>>>>
>>>>>
>>>>> And we had to do the following
>>>>> ForKeyword: F O R;
>>>>>
>>>>> terminal F: ('f' | 'F');
>>>>>
>>>>>
>>>>> terminal O: ('o' | 'O');
>>>>>
>>>>>
>>>>> terminal R: ('r' | 'R');
>>>>>
>>>>> But the parsing of such grammar is slower I would say,
>>>>>
>>>>> BR,
>>>>> Jan
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Need professional support for Eclipse Modeling?
>>>> Go visit: http://xtext.itemis.com
>>>
>>>
>>
>>
>> --
>> Need professional support for Eclipse Modeling?
>> Go visit: http://xtext.itemis.com
>
>


--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com


---
Get professional support from the Xtext committers at www.typefox.io
Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660509 is a reply to message #660409] Fri, 18 March 2011 17:20 Go to previous message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
I don't think it is enough with syntactic predicates to specify a JS
parser, you probably also need semantic predicates (which are not
supported in Xtext). JS is a *bitch* to parse if you aim to correctly
cover the entire language.

Suggest you get the antlr book and look at some JS parser samples
written for antlr so you know what sort of challenges you will encounter
before you start.

Regards
- henrik

On 3/18/11 11:36 AM, Jan wrote:
> Basically it's JSON + javascript.
>
> Is there any documentation about " syntactic predicates "?
>
>
>
> Thanks,
>
> Jan
>
>
>
> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
> news:ilvbf3$s4p$1@news.eclipse.org...
>> Not sure, because I don't know extended JavaScript too well.
>> The challenge in such projects is usually to get the grammar right and
>> free of ambiguities. You might have to enable backtracking or use
>> syntactic predicates (Xtext2 only).
>>
>> Am 18.03.11 11:12, schrieb Jan:
>>> Yes 'for' is our keyword.
>>> Actually we are going realize editor for extended JavaScript.
>>> What is your opinion. Is it possible with XText? I noticed that
>>> javascript
>>> contains some automatic semicolon insertions in its syntax.
>>>
>>> Thanks,
>>> Jan
>>>
>>> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
>>> news:ilta73$g72$1@news.eclipse.org...
>>>> The context free lexing (a priori lexing) is a property of the Antlr
>>>> parser generator we use in the backend of Xtext. It is a tradeoff for a
>>>> lot of beautiful things we get from using Antlr, such as error recovery.
>>>>
>>>> As a result, you must be careful what kind of terminal rules you define
>>>> and in which order.
>>>>
>>>> An easy workaround for your example should be to switch from a terminal
>>>> rule to a datatype rule (by leaving out the 'terminal' keyword). That
>>>> way,
>>>> it won't be the lexer do decide whether its a token or an identifier.
>>>> 'for' is a keyword in your langauge anyway, isn't it?
>>>>
>>>> Am 17.03.11 15:35, schrieb Jan:
>>>>> Hello,
>>>>>
>>>>> I have noticed that xtext tries first to identify lexical tokens and
>>>>> then
>>>>> it
>>>>> tries to combine it to the grammar syntactic rules.
>>>>> I think that this appoach is bad.
>>>>>
>>>>> Does really identifying lexems happens before syntactic analysis?
>>>>>
>>>>>
>>>>> example:
>>>>> we had defined token FOR:
>>>>> terminal FOR: "for".
>>>>>
>>>>> But problem with xtext was that it identifies in the following example
>>>>> for
>>>>> and terminal and not the other word:
>>>>> ////////// parsed file:start
>>>>> ...
>>>>> format
>>>>> ...
>>>>> ////////// parsed file:end
>>>>>
>>>>>
>>>>> xtext marked for in the word format as a LEXICAL TOKEN - terminal
>>>>> symbol.
>>>>> This is wrong IMHO.
>>>>>
>>>>>
>>>>> And we had to do the following
>>>>> ForKeyword: F O R;
>>>>>
>>>>> terminal F: ('f' | 'F');
>>>>>
>>>>>
>>>>> terminal O: ('o' | 'O');
>>>>>
>>>>>
>>>>> terminal R: ('r' | 'R');
>>>>>
>>>>> But the parsing of such grammar is slower I would say,
>>>>>
>>>>> BR,
>>>>> Jan
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Need professional support for Eclipse Modeling?
>>>> Go visit: http://xtext.itemis.com
>>>
>>>
>>
>>
>> --
>> Need professional support for Eclipse Modeling?
>> Go visit: http://xtext.itemis.com
>
>
Previous Topic:Caching extracted names
Next Topic:Getting the Qualified Name and Global Name Collision
Goto Forum:
  


Current Time: Thu Sep 19 05:52:25 GMT 2024

Powered by FUDForum. Page generated in 0.03938 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top