Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Whitespace without newlines
Whitespace without newlines [message #1059120] Thu, 16 May 2013 20:23 Go to next message
Brad Riching is currently offline Brad RichingFriend
Messages: 20
Registered: May 2012
Junior Member
Hello Xtext enthusiasts,

My language has some whitespace-sensitive aspects with newlines which have forced me to implement the whitespace rule like this with only spaces and tabs:

terminal WS: 
	(' '|'\t')+;


Throughout the rest of my grammar I look for one of three possible alternatives that end in a newline character. I created a datatype rule for an end-of-line (EOL).

EOL:
	SL_COMMENT | LINEWRAP | NEWLINE;

terminal LINEWRAP:
	('\\' '\r'? '\n');
	
terminal SL_COMMENT: 
	'//' !('\r'|'\n')* ('\r'? '\n');
	
terminal NEWLINE:
	('\r'? '\n');



I hide my whitespace rule throughout the grammar, so all spaces and tabs are hidden. A simplified example of this is as follows:

grammar XXXXX

hidden(WS)

generate wirelist "http://XXXX/Wirelist"

import "http://www.eclipse.org/emf/2002/Ecore" as ecore

Model: {Model}
	(blocks+=Block)* EOL*;

Block:  
	EOL*
	(	Block1
	|	Block2
	);

Block1:
	'BLOCK1' EOL* 'BEGIN' EOL*
	(
		('PARAM1' ',' p1val1=ID',' p1val2=Number ',' p1val3=ID EOL+) &
		('PARAM2' ',' p2val1=ID EOL+) &
		('PARAM3' ',' p3val1=ID EOL+)
	) 'END';

Block2:
	'BLOCK2' ...



As you can see, I use my EOL rule whenever I need to have at least one newline character end the line before anything else can happen. In this sense, this language is truly a whitespace-sensitive language since there must be at least one new-line detected after every parameter. A valid model for this might be:

BLOCK1
BEGIN
  PARAM1, abc, 123, def
  PARAM3, ghi
  PARAM2, jkl
END


whereas an invalid model might be:

BLOCK1
BEGIN
  PARAM1, abc, 123, def PARAM3, ghi
  PARAM2, jkl
END


The good news is that the editor seems to be working nicely. The bad news is that when I run the formatter, it adds a bunch of new lines everywhere with each invocation, and the text grows in length. So, formatting the valid model code above one time yields:


BLOCK1

BEGIN

  PARAM1, abc, 123, def

  PARAM3, ghi

  PARAM2, jkl

END


Formatting again yields:

BLOCK1


BEGIN


  PARAM1, abc, 123, def


  PARAM3, ghi


  PARAM2, jkl


END



My guess is that since I have made newline tokens separate from whitespace tokens the formatter API is getting confused. I have taken a look at IHiddenTokenHelper and DefaultHiddenTokenHelper in the hopes that I could override its behavior but have not yet had any success. I've set breakpoints in the setLinewrap() routine, but can't figure out exactly what is happening. Is there something else that I have to do to tell xtext that I want to consider all WS and EOL tokens when formatting?

Thanks in advance!
Re: Whitespace without newlines [message #1059189 is a reply to message #1059120] Fri, 17 May 2013 08:00 Go to previous messageGo to next message
Alexander Nittka is currently offline Alexander NittkaFriend
Messages: 1193
Registered: July 2009
Senior Member
Hi,

this question may be beside the point, but couldn't you enforce correct linebreaks by validation and formatter. This should not be too difficult and provides the user with better error messages.
It is not always the best idea to put every language restriction into the grammar.

Alex


Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de
Re: Whitespace without newlines [message #1060269 is a reply to message #1059189] Thu, 23 May 2013 18:38 Go to previous messageGo to next message
Brad Riching is currently offline Brad RichingFriend
Messages: 20
Registered: May 2012
Junior Member
Thanks Alexander for your reply.

Turns out I will have to make some compromises with the language in order to make the grammar less restrictive (i.e. get rid of that EOL in all of my rules) and use NodeModelUtils?? inside the validator to check for line breaks at a higher level. The reason is that the previous implementors of the language did not use an actual lexer, and defined character sets on an item-by-item basis. The line breaks are essentially all I have to distinguish some potential clashes in valid input that would otherwise cause the ANTLR lexer to bail. This problem is exacerbated by the possible inclusion of some keywords as identifiers. To implement the full language, the ANTLR grammar will only compile properly if I explicitly sprinkle newline tokens throughout my xtext grammar. It sure makes for a real mess, I know.

But what you've written has helped me rethink my approach to involving a less restrictive grammar at the expense of defining only a subset of the language in the hopes that it will still work for the majority of our use cases.

Thanks again,
Brad
Re: Whitespace without newlines [message #1060291 is a reply to message #1060269] Thu, 23 May 2013 23:12 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
On 2013-23-05 11:38, Brad Riching wrote:
> Thanks Alexander for your reply.
>
> Turns out I will have to make some compromises with the language in
> order to make the grammar less restrictive (i.e. get rid of that EOL in
> all of my rules) and use NodeModelUtils?? inside the validator to check
> for line breaks at a higher level. The reason is that the previous
> implementors of the language did not use an actual lexer, and defined
> character sets on an item-by-item basis. The line breaks are
> essentially all I have to distinguish some potential clashes in valid
> input that would otherwise cause the ANTLR lexer to bail. This problem
> is exacerbated by the possible inclusion of some keywords as
> identifiers. To implement the full language, the ANTLR grammar will
> only compile properly if I explicitly sprinkle newline tokens throughout
> my xtext grammar. It sure makes for a real mess, I know.
>
> But what you've written has helped me rethink my approach to involving a
> less restrictive grammar at the expense of defining only a subset of the
> language in the hopes that it will still work for the majority of our
> use cases.
>
> Thanks again,
> Brad

You can write an external lexer that does the lookahead to find the
newlines and then emit the correct token.

That is what I need to do in cloudsmith / geppetto @ github to work
around ambiguities. The grammar (in Xtext) then only sees tokens that
are completely unambiguous, and it also sees NL as normal whitespace
that can be ignored.

Regards
- henrik
Re: Whitespace without newlines [message #1792872 is a reply to message #1060291] Wed, 25 July 2018 10:04 Go to previous messageGo to next message
Devin Xin is currently offline Devin XinFriend
Messages: 15
Registered: March 2017
Junior Member

Hi Brad,

I also met this problems in my project today.
So which way you used to solve this issue?
Could you please share to me?
Thanks.

Best Regards,
Devin
Re: Whitespace without newlines [message #1792874 is a reply to message #1792872] Wed, 25 July 2018 10:10 Go to previous message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
Are you sure your usecase is not one for the whitespace aware languages like python that xtext meanwhile has (have a look st the docs)

Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Previous Topic:Multiple LanguageServers vs one JVM process
Next Topic:Using Xtext reserved wording withing DSL
Goto Forum:
  


Current Time: Thu Apr 25 19:54:31 GMT 2024

Powered by FUDForum. Page generated in 0.04119 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top