Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » RFC: Indentation based grammar
RFC: Indentation based grammar [message #556885] Fri, 03 September 2010 03:27 Go to next message
Ralf Ebert is currently offline Ralf EbertFriend
Messages: 72
Registered: July 2009
Member
I just tried to tackle the "parse indentation" issue separate from my
markup language, starting with a grammar for simple hierarchical todo
lists like:

----
Buy milk
walk to the store
find milk
put milk in cart
pay for milk
walk to pay desk
hand over money
Drink milk
Enjoy reduced risk of heart disease
----

The code is here:
http://code.google.com/a/eclipselabs.org/p/todotext/
http://github.com/ralfebert/org.eclipselabs.todotext

This can be parsed by this grammar:

----
grammar org.eclipselabs.todotext.Todos

import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate todos "http://www.eclipselabs.org/todotext/Todos"

TodoList: {TodoList}
(todos += Todo (NL todos += Todo)*)?;

Todo:
name=String (NL INDENT subtasks = TodoList DEDENT)?;

String returns ecore::EString:
CHAR+;

terminal NL: ('\r'|'\n')+ '\t'*;
terminal INDENT : '{';
terminal DEDENT : '}';
terminal CHAR: .;
----

All that's needed seems to be a custom TokenSource that yields the
INDENT and DEDENT tokens before and after NL.
AbstractSplittingTokenSource seems perfect for this, see:
http://github.com/ralfebert/org.eclipselabs.todotext/commit/ 8cf8bc

This makes the folding work for such a simplified grammar. Seems to be
too easy to be true :)

Is there something that I missed and that will not work or get
inconvenient about this approach?
(@Sebastian, you mentioned you had support for indentation-sensitive
grammars working already, I guess you meant AbstractSplittingTokenSource?)

Ralf
Re: RFC: Indentation based grammar [message #556917 is a reply to message #556885] Fri, 03 September 2010 07:36 Go to previous messageGo to next message
Sven Efftinge is currently offline Sven EfftingeFriend
Messages: 1771
Registered: July 2009
Senior Member
Hi Ralph,

yes, this is how it is meant to work.

Sven

Am 9/3/10 5:27 AM, schrieb Ralf Ebert:
> I just tried to tackle the "parse indentation" issue separate from my
> markup language, starting with a grammar for simple hierarchical todo
> lists like:
>
> ----
> Buy milk
> walk to the store
> find milk
> put milk in cart
> pay for milk
> walk to pay desk
> hand over money
> Drink milk
> Enjoy reduced risk of heart disease
> ----
>
> The code is here:
> http://code.google.com/a/eclipselabs.org/p/todotext/
> http://github.com/ralfebert/org.eclipselabs.todotext
>
> This can be parsed by this grammar:
>
> ----
> grammar org.eclipselabs.todotext.Todos
>
> import "http://www.eclipse.org/emf/2002/Ecore" as ecore
> generate todos "http://www.eclipselabs.org/todotext/Todos"
>
> TodoList: {TodoList}
> (todos += Todo (NL todos += Todo)*)?;
>
> Todo:
> name=String (NL INDENT subtasks = TodoList DEDENT)?;
>
> String returns ecore::EString:
> CHAR+;
>
> terminal NL: ('\r'|'\n')+ '\t'*;
> terminal INDENT : '{';
> terminal DEDENT : '}';
> terminal CHAR: .;
> ----
>
> All that's needed seems to be a custom TokenSource that yields the
> INDENT and DEDENT tokens before and after NL.
> AbstractSplittingTokenSource seems perfect for this, see:
> http://github.com/ralfebert/org.eclipselabs.todotext/commit/ 8cf8bc
>
> This makes the folding work for such a simplified grammar. Seems to be
> too easy to be true :)
>
> Is there something that I missed and that will not work or get
> inconvenient about this approach?
> (@Sebastian, you mentioned you had support for indentation-sensitive
> grammars working already, I guess you meant AbstractSplittingTokenSource?)
>
> Ralf


--
--
Need professional support for Xtext or other Eclipse Modeling technologies?
Go to: http://xtext.itemis.com
Twitter : @svenefftinge
Blog : http://blog.efftinge.de


--
Need professional support on Xtext or Xtend?
Mail to: xtext (at) itemis.com
Twitter : @svenefftinge
Blog : blog.efftinge.de
Stale marker problem [message #557109 is a reply to message #556885] Sat, 04 September 2010 14:17 Go to previous messageGo to next message
Ralf Ebert is currently offline Ralf EbertFriend
Messages: 72
Registered: July 2009
Member
I'm seeing stale markers for the todotext language, I guess something
trips over the fake indent/dedent tokens. I will change the grammar to
remove the rule which causes the markers anyway, so I don't particularly
care about this; I thought I'd report it anyway because it's easy to
reproduce. If I shall file a bugzilla for this, just tell me ;)

Steps to reproduce:

git clone http://github.com/ralfebert/org.eclipselabs.todotext.git
git checkout 78a70c7

Generate language, run, create a .todos document like this:

1
1a
2

(1\n\t1a\n2). Go with the cursor behind "1a", press enter. It will
complain about:

required (...)+ loop did not match anything at input ''

This marker is correct, the language does not allow empty todos. Now
type a character. The Problem will go away, but the red error marker in
the document stays, even after saving:

http://www.ralfebert.de/dump/xtext_todos_stale_marker.png

If the document is reopened, it goes away.

Reproducable on Eclipse SDK 3.6.1 M20100902-1717 + Xtext SDK
1.0.1.v201008311940.

Ralf
TokenIterator [message #557131 is a reply to message #556885] Sat, 04 September 2010 19:52 Go to previous messageGo to next message
Ralf Ebert is currently offline Ralf EbertFriend
Messages: 72
Registered: July 2009
Member
For testing my IndentTokenSource I just needed an Iterator<Token> that
takes a token source as input (for conveniently pulling all tokens into
a list). I found one deep in XtextDamagerRepairer#TokenIterator (using
CommonToken instead of Token). Is such a class available someplace else,
and if not, could such a iterator class be provided as utility class
somewhere?

Ralf
Lexing URIs in text / support for fragment rules? [message #557158 is a reply to message #556885] Sun, 05 September 2010 14:22 Go to previous messageGo to next message
Ralf Ebert is currently offline Ralf EbertFriend
Messages: 72
Registered: July 2009
Member
I just tried to extend the .todos language to find URIs in text (without
a special syntax to mark the begin/end of the URI), one feature I also
want to have in my markup language. And so I happily defined:

terminal URI: ('a'..'z')+ ':'
('a'..'z'|'A'..'Z'|'0'..'9'|
':'|'/'|'?'|'#'|'['|']'|'@'|
'!'|'$'|'&'|'\''|'('|')'|'*'|'+'|','|';'|'='|
'-'|'.'|'_'|'~'|
'%')+;

terminal OTHER: .;

This works. Almost. Except: If the lexer runs into something that's only
almost an URI (f.e. 'htt') it will have one large token which is not an
URI. Now it will not split that up into many OTHER tokens but spit out
one single INVALID_TYPE token. Apparently, OTHER can't match ".+",
because this would be greedy and eat up the whole document.

The only way I found to fix this is to define an additional "fallback"
terminal that can catch this case:

terminal URISH: ('a'..'z'|'A'..'Z'|'0'..'9'|
':'|'/'|'?'|'#'|'['|']'|'@'|
'!'|'$'|'&'|'\''|'('|')'|'*'|'+'|','|';'|'='|
'-'|'.'|'_'|'~'|
'%')+;

But this ugly, especially because I need to repeat half the rule.

- Is there a better design that could be used for this?

- If not, from reading the 'ANTLR Reference' I guess a 'fragment' rule
could be used to re-use parts of rules without defining tokens. Is this
supported / can I sneak such a thing into the lexer grammar?

Ralf
Re: Lexing URIs in text / support for fragment rules? [message #557165 is a reply to message #557158] Sun, 05 September 2010 16:00 Go to previous messageGo to next message
Alexander Nittka is currently offline Alexander NittkaFriend
Messages: 1162
Registered: July 2009
Senior Member
Hi,

you could consider defining URIs not as terminal rules but rather as
datatype rule (analogous the QualifiedName rule).

URI hidden(): ID':'OtherStuff;

The advantage is that you don't run into the problem of overlapping
terminal rules. Also, it is easier the provide better error messages
(rather than "unexpected token x, expecting rule
HOWSHOULDTHEUSERKNOWABOUTTHEGRAMMAR" you can check the syntax in the
validator and write "note that a Uri has to have the form xyz").

In a way datatype rules allow for reusing "fragments" (the terminal rules).


Alex


Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de
Re: TokenIterator [message #557169 is a reply to message #557131] Sun, 05 September 2010 16:22 Go to previous messageGo to next message
Sven Efftinge is currently offline Sven EfftingeFriend
Messages: 1771
Registered: July 2009
Senior Member
Am 9/4/10 9:52 PM, schrieb Ralf Ebert:
> For testing my IndentTokenSource I just needed an Iterator<Token> that
> takes a token source as input (for conveniently pulling all tokens into
> a list). I found one deep in XtextDamagerRepairer#TokenIterator (using
> CommonToken instead of Token). Is such a class available someplace else,
> and if not, could such a iterator class be provided as utility class
> somewhere?
>
> Ralf

AbstractIterator from google collection could serve as a convenient basis:

final TokenSource source = ...;
Iterator<Token> iterator = new AbstractIterator<Token>() {
@Override
protected Token computeNext() {
Token nextToken = source.nextToken();
if (nextToken==Token.EOF_TOKEN)
endOfData();
return nextToken;
}
};

--
--
Need professional support for Xtext or other Eclipse Modeling technologies?
Go to: http://xtext.itemis.com
Twitter : @svenefftinge
Blog : http://blog.efftinge.de


--
Need professional support on Xtext or Xtend?
Mail to: xtext (at) itemis.com
Twitter : @svenefftinge
Blog : blog.efftinge.de
Re: Lexing URIs in text / support for fragment rules? [message #557170 is a reply to message #557158] Sun, 05 September 2010 16:24 Go to previous messageGo to next message
Sven Efftinge is currently offline Sven EfftingeFriend
Messages: 1771
Registered: July 2009
Senior Member
I'ld try to avoid doing this on lexer level.
Wouldn't it be enough to validate the syntax in a validation rule
or does the parser do any decisions based on whether it's a URI or not?

Am 9/5/10 4:22 PM, schrieb Ralf Ebert:
> I just tried to extend the .todos language to find URIs in text (without
> a special syntax to mark the begin/end of the URI), one feature I also
> want to have in my markup language. And so I happily defined:
>
> terminal URI: ('a'..'z')+ ':'
> ('a'..'z'|'A'..'Z'|'0'..'9'|
> ':'|'/'|'?'|'#'|'['|']'|'@'|
> '!'|'$'|'&'|'\''|'('|')'|'*'|'+'|','|';'|'='|
> '-'|'.'|'_'|'~'|
> '%')+;
>
> terminal OTHER: .;
>
> This works. Almost. Except: If the lexer runs into something that's only
> almost an URI (f.e. 'htt') it will have one large token which is not an
> URI. Now it will not split that up into many OTHER tokens but spit out
> one single INVALID_TYPE token. Apparently, OTHER can't match ".+",
> because this would be greedy and eat up the whole document.
>
> The only way I found to fix this is to define an additional "fallback"
> terminal that can catch this case:
>
> terminal URISH: ('a'..'z'|'A'..'Z'|'0'..'9'|
> ':'|'/'|'?'|'#'|'['|']'|'@'|
> '!'|'$'|'&'|'\''|'('|')'|'*'|'+'|','|';'|'='|
> '-'|'.'|'_'|'~'|
> '%')+;
>
> But this ugly, especially because I need to repeat half the rule.
>
> - Is there a better design that could be used for this?
>
> - If not, from reading the 'ANTLR Reference' I guess a 'fragment' rule
> could be used to re-use parts of rules without defining tokens. Is this
> supported / can I sneak such a thing into the lexer grammar?
>
> Ralf


--
--
Need professional support for Xtext or other Eclipse Modeling technologies?
Go to: http://xtext.itemis.com
Twitter : @svenefftinge
Blog : http://blog.efftinge.de


--
Need professional support on Xtext or Xtend?
Mail to: xtext (at) itemis.com
Twitter : @svenefftinge
Blog : blog.efftinge.de
Re: Lexing URIs in text / support for fragment rules? [message #557173 is a reply to message #557170] Sun, 05 September 2010 17:19 Go to previous messageGo to next message
Ralf Ebert is currently offline Ralf EbertFriend
Messages: 72
Registered: July 2009
Member
> I'ld try to avoid doing this on lexer level.
> Wouldn't it be enough to validate the syntax in a validation rule
> or does the parser do any decisions based on whether it's a URI or not?

I want to have these separately in the model for further interpretation
and treat them special in the editor (syntax highlighting, context menu
items, ...). I already implemented the highlighting, see the screenshot at:

http://code.google.com/a/eclipselabs.org/p/todotext/

I put it on the lexer level because I didn't know how to define parser
rules that could match such input in text input (where everything which
isn't treated at the lexer level end ups as OTHER+). Also, in a textual
language, imho these things stand out and amount to separate tokens.

For example, I want to parse

"Send mailto:ralf@ralfebert.de about visiting http://www.google.de/"

to

Text
- PlainText[Send ]
- Link[mailto:ralf@ralfebert.de]
- PlainText[ about visiting ]
- Link[http://www.google.de/]

This is what

http://github.com/ralfebert/org.eclipselabs.todotext/blob/d3 0b80/plugins/org.eclipselabs.todotext/src/org/eclipselabs/to dotext/Todos.xtext

does, but it ugly. Could such a result be achieved in other, simpler ways?

Ralf
Re: Lexing URIs in text / support for fragment rules? [message #557174 is a reply to message #557165] Sun, 05 September 2010 17:45 Go to previous messageGo to next message
Ralf Ebert is currently offline Ralf EbertFriend
Messages: 72
Registered: July 2009
Member
Hi Alex,

thanks; I tried this, but this yields a problem. If I define a datatype
rule like this:

URI:
ID':'OtherStuff;

This will be in conflict with my PlainText rule for matching arbitrary
text. (I'm parsing arbitrary text that can contain URLs, see

http://code.google.com/a/eclipselabs.org/p/todotext/
http://github.com/ralfebert/org.eclipselabs.todotext/blob/d3 0b80/plugins/org.eclipselabs.todotext/src/org/eclipselabs/to dotext/Todos.xtext

to get an idea)

Now, how does one define a rule that describes "all texts that are not
URI's"? This was the point were I stopped fiddling with the parser and
started to use the lexer :)

Also, when I use the ':' character in a parser rule, it will define an
implicit token ':', treating all colons special.

Probably my wishes regarding the language are too special and quirky
again :)

Ralf
Re: Stale marker problem [message #558433 is a reply to message #557109] Sun, 12 September 2010 19:28 Go to previous messageGo to next message
Ralf Ebert is currently offline Ralf EbertFriend
Messages: 72
Registered: July 2009
Member
> This marker is correct, the language does not allow empty todos. Now
> type a character. The Problem will go away, but the red error marker in
> the document stays, even after saving:
>
> http://www.ralfebert.de/dump/xtext_todos_stale_marker.png

After all I started to care about this problem. I debugged it, what
happens is that PartialParsingHelper doesn't include the empty IN/DEDENT
tokens when calculating potential replaceRegions. Therefore, the syntax
error which happens to be on such a TOKEN, is not removed. IMHO it
wouldn't hurt to include empty tokens here. I will take it as a
challenge and will come up with a test and bugfix for this. Would it be
ok to add a tab-indented language to the test languages for reproducing it?

Ralf
Re: Stale marker problem [message #646799 is a reply to message #558433] Fri, 31 December 2010 11:11 Go to previous messageGo to next message
Rune Kaagaard is currently offline Rune KaagaardFriend
Messages: 1
Registered: December 2010
Junior Member
Hi

I'm also trying to get indentation to work for a programming language I'm working on. Have tried 5 different techniques but can't make it work at all. A little HOWTO on making indentation work for the latest version of xtext would be a lifesaver!!!

Regards
Rune Kaagaard
Re: Stale marker problem [message #1037166 is a reply to message #558433] Tue, 09 April 2013 08:31 Go to previous message
junior developer is currently offline junior developerFriend
Messages: 335
Registered: January 2013
Senior Member
Hi,

I want to reach todotext example .How can I reach it? I works on indentation based grammar.Example can be helpful for me.Help me Sad


Best Regards,

[Updated on: Tue, 09 April 2013 08:33]

Report message to a moderator

Previous Topic:Xtext-Ecore problem with cross-reference and abstraction/specialization
Next Topic:Guava dependency and compatibility
Goto Forum:
  


Current Time: Tue Dec 23 03:10:38 GMT 2014

Powered by FUDForum. Page generated in 0.01940 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software