Eclipse Community Forums: TMF (Xtext) » Whitespace without newlines

Help

Home

Home » Modeling » TMF (Xtext) » Whitespace without newlines

Show: Today's Messages :: Show Polls :: Message Navigator

Whitespace without newlines [message #1059120]

Thu, 16 May 2013 20:23

Brad Riching

Messages: 20
Registered: May 2012

Junior Member

Hello Xtext enthusiasts,

My language has some whitespace-sensitive aspects with newlines which have forced me to implement the whitespace rule like this with only spaces and tabs:

terminal WS: 
	(' '|'\t')+;

Throughout the rest of my grammar I look for one of three possible alternatives that end in a newline character. I created a datatype rule for an end-of-line (EOL).

EOL:
	SL_COMMENT | LINEWRAP | NEWLINE;

terminal LINEWRAP:
	('\\' '\r'? '\n');
	
terminal SL_COMMENT: 
	'//' !('\r'|'\n')* ('\r'? '\n');
	
terminal NEWLINE:
	('\r'? '\n');

I hide my whitespace rule throughout the grammar, so all spaces and tabs are hidden. A simplified example of this is as follows:

grammar XXXXX

hidden(WS)

generate wirelist "http://XXXX/Wirelist"

import "http://www.eclipse.org/emf/2002/Ecore" as ecore

Model: {Model}
	(blocks+=Block)* EOL*;

Block:  
	EOL*
	(	Block1
	|	Block2
	);

Block1:
	'BLOCK1' EOL* 'BEGIN' EOL*
	(
		('PARAM1' ',' p1val1=ID',' p1val2=Number ',' p1val3=ID EOL+) &
		('PARAM2' ',' p2val1=ID EOL+) &
		('PARAM3' ',' p3val1=ID EOL+)
	) 'END';

Block2:
	'BLOCK2' ...

As you can see, I use my EOL rule whenever I need to have at least one newline character end the line before anything else can happen. In this sense, this language is truly a whitespace-sensitive language since there must be at least one new-line detected after every parameter. A valid model for this might be:

BLOCK1
BEGIN
  PARAM1, abc, 123, def
  PARAM3, ghi
  PARAM2, jkl
END

whereas an invalid model might be:

BLOCK1
BEGIN
  PARAM1, abc, 123, def PARAM3, ghi
  PARAM2, jkl
END

The good news is that the editor seems to be working nicely. The bad news is that when I run the formatter, it adds a bunch of new lines everywhere with each invocation, and the text grows in length. So, formatting the valid model code above one time yields:


BLOCK1

BEGIN

  PARAM1, abc, 123, def

  PARAM3, ghi

  PARAM2, jkl

END

Formatting again yields:


BLOCK1


BEGIN


  PARAM1, abc, 123, def


  PARAM3, ghi


  PARAM2, jkl


END

My guess is that since I have made newline tokens separate from whitespace tokens the formatter API is getting confused. I have taken a look at IHiddenTokenHelper and DefaultHiddenTokenHelper in the hopes that I could override its behavior but have not yet had any success. I've set breakpoints in the setLinewrap() routine, but can't figure out exactly what is happening. Is there something else that I have to do to tell xtext that I want to consider all WS and EOL tokens when formatting?

Thanks in advance!

Report message to a moderator

Re: Whitespace without newlines [message #1059189 is a reply to message #1059120]

Fri, 17 May 2013 08:00

Alexander Nittka

Messages: 1193
Registered: July 2009

Senior Member

Hi,

this question may be beside the point, but couldn't you enforce correct linebreaks by validation and formatter. This should not be too difficult and provides the user with better error messages.
It is not always the best idea to put every language restriction into the grammar.

Alex

Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de

Report message to a moderator

Re: Whitespace without newlines [message #1060269 is a reply to message #1059189]

Thu, 23 May 2013 18:38

Brad Riching

Messages: 20
Registered: May 2012

Junior Member

Thanks Alexander for your reply.

Turns out I will have to make some compromises with the language in order to make the grammar less restrictive (i.e. get rid of that EOL in all of my rules) and use NodeModelUtils?? inside the validator to check for line breaks at a higher level. The reason is that the previous implementors of the language did not use an actual lexer, and defined character sets on an item-by-item basis. The line breaks are essentially all I have to distinguish some potential clashes in valid input that would otherwise cause the ANTLR lexer to bail. This problem is exacerbated by the possible inclusion of some keywords as identifiers. To implement the full language, the ANTLR grammar will only compile properly if I explicitly sprinkle newline tokens throughout my xtext grammar. It sure makes for a real mess, I know.

But what you've written has helped me rethink my approach to involving a less restrictive grammar at the expense of defining only a subset of the language in the hopes that it will still work for the majority of our use cases.

Thanks again,
Brad

Report message to a moderator

Re: Whitespace without newlines [message #1060291 is a reply to message #1060269]

Thu, 23 May 2013 23:12

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

On 2013-23-05 11:38, Brad Riching wrote:
> Thanks Alexander for your reply.
>
> Turns out I will have to make some compromises with the language in
> order to make the grammar less restrictive (i.e. get rid of that EOL in
> all of my rules) and use NodeModelUtils?? inside the validator to check
> for line breaks at a higher level. The reason is that the previous
> implementors of the language did not use an actual lexer, and defined
> character sets on an item-by-item basis. The line breaks are
> essentially all I have to distinguish some potential clashes in valid
> input that would otherwise cause the ANTLR lexer to bail. This problem
> is exacerbated by the possible inclusion of some keywords as
> identifiers. To implement the full language, the ANTLR grammar will
> only compile properly if I explicitly sprinkle newline tokens throughout
> my xtext grammar. It sure makes for a real mess, I know.
>
> But what you've written has helped me rethink my approach to involving a
> less restrictive grammar at the expense of defining only a subset of the
> language in the hopes that it will still work for the majority of our
> use cases.
>
> Thanks again,
> Brad

You can write an external lexer that does the lookahead to find the
newlines and then emit the correct token.

That is what I need to do in cloudsmith / geppetto @ github to work
around ambiguities. The grammar (in Xtext) then only sees tokens that
are completely unambiguous, and it also sees NL as normal whitespace
that can be ignored.

Regards
- henrik

Report message to a moderator

Re: Whitespace without newlines [message #1792872 is a reply to message #1060291]

Wed, 25 July 2018 10:04

Devin Xin

Messages: 15
Registered: March 2017

Junior Member

Hi Brad,

I also met this problems in my project today.
So which way you used to solve this issue?
Could you please share to me?
Thanks.

Best Regards,
Devin

Report message to a moderator