| Whitespace without newlines [message #1059120] |
Thu, 16 May 2013 16:23  |
Brad Riching Messages: 16 Registered: May 2012 |
Junior Member |
|
|
Hello Xtext enthusiasts,
My language has some whitespace-sensitive aspects with newlines which have forced me to implement the whitespace rule like this with only spaces and tabs:
terminal WS:
(' '|'\t')+;
Throughout the rest of my grammar I look for one of three possible alternatives that end in a newline character. I created a datatype rule for an end-of-line (EOL).
EOL:
SL_COMMENT | LINEWRAP | NEWLINE;
terminal LINEWRAP:
('\\' '\r'? '\n');
terminal SL_COMMENT:
'//' !('\r'|'\n')* ('\r'? '\n');
terminal NEWLINE:
('\r'? '\n');
I hide my whitespace rule throughout the grammar, so all spaces and tabs are hidden. A simplified example of this is as follows:
grammar XXXXX
hidden(WS)
generate wirelist "http://XXXX/Wirelist"
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
Model: {Model}
(blocks+=Block)* EOL*;
Block:
EOL*
( Block1
| Block2
);
Block1:
'BLOCK1' EOL* 'BEGIN' EOL*
(
('PARAM1' ',' p1val1=ID',' p1val2=Number ',' p1val3=ID EOL+) &
('PARAM2' ',' p2val1=ID EOL+) &
('PARAM3' ',' p3val1=ID EOL+)
) 'END';
Block2:
'BLOCK2' ...
As you can see, I use my EOL rule whenever I need to have at least one newline character end the line before anything else can happen. In this sense, this language is truly a whitespace-sensitive language since there must be at least one new-line detected after every parameter. A valid model for this might be:
BLOCK1
BEGIN
PARAM1, abc, 123, def
PARAM3, ghi
PARAM2, jkl
END
whereas an invalid model might be:
BLOCK1
BEGIN
PARAM1, abc, 123, def PARAM3, ghi
PARAM2, jkl
END
The good news is that the editor seems to be working nicely. The bad news is that when I run the formatter, it adds a bunch of new lines everywhere with each invocation, and the text grows in length. So, formatting the valid model code above one time yields:
BLOCK1
BEGIN
PARAM1, abc, 123, def
PARAM3, ghi
PARAM2, jkl
END
Formatting again yields:
BLOCK1
BEGIN
PARAM1, abc, 123, def
PARAM3, ghi
PARAM2, jkl
END
My guess is that since I have made newline tokens separate from whitespace tokens the formatter API is getting confused. I have taken a look at IHiddenTokenHelper and DefaultHiddenTokenHelper in the hopes that I could override its behavior but have not yet had any success. I've set breakpoints in the setLinewrap() routine, but can't figure out exactly what is happening. Is there something else that I have to do to tell xtext that I want to consider all WS and EOL tokens when formatting?
Thanks in advance!
|
|
|
| Re: Whitespace without newlines [message #1059189 is a reply to message #1059120] |
Fri, 17 May 2013 04:00   |
Alexander Nittka Messages: 1085 Registered: July 2009 |
Senior Member |
|
|
Hi,
this question may be beside the point, but couldn't you enforce correct linebreaks by validation and formatter. This should not be too difficult and provides the user with better error messages.
It is not always the best idea to put every language restriction into the grammar.
Alex
Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de
|
|
|
|
| Re: Whitespace without newlines [message #1060291 is a reply to message #1060269] |
Thu, 23 May 2013 19:12  |
Henrik Lindberg Messages: 2431 Registered: July 2009 |
Senior Member |
|
|
On 2013-23-05 11:38, Brad Riching wrote:
> Thanks Alexander for your reply.
>
> Turns out I will have to make some compromises with the language in
> order to make the grammar less restrictive (i.e. get rid of that EOL in
> all of my rules) and use NodeModelUtils?? inside the validator to check
> for line breaks at a higher level. The reason is that the previous
> implementors of the language did not use an actual lexer, and defined
> character sets on an item-by-item basis. The line breaks are
> essentially all I have to distinguish some potential clashes in valid
> input that would otherwise cause the ANTLR lexer to bail. This problem
> is exacerbated by the possible inclusion of some keywords as
> identifiers. To implement the full language, the ANTLR grammar will
> only compile properly if I explicitly sprinkle newline tokens throughout
> my xtext grammar. It sure makes for a real mess, I know.
>
> But what you've written has helped me rethink my approach to involving a
> less restrictive grammar at the expense of defining only a subset of the
> language in the hopes that it will still work for the majority of our
> use cases.
>
> Thanks again,
> Brad
You can write an external lexer that does the lookahead to find the
newlines and then emit the correct token.
That is what I need to do in cloudsmith / geppetto @ github to work
around ambiguities. The grammar (in Xtext) then only sees tokens that
are completely unambiguous, and it also sees NL as normal whitespace
that can be ignored.
Regards
- henrik
|
|
|
Powered by
FUDForum. Page generated in 0.01620 seconds