Eclipse Community Forums: TMF (Xtext) » why the parser hides non breaking spaces

Home » Modeling » TMF (Xtext) » why the parser hides non breaking spaces(i want to show sybtax errors for non breaking spaces)

Show: Today's Messages :: Show Polls :: Message Navigator

why the parser hides non breaking spaces [message #1075415]

Mon, 29 July 2013 11:37

paul lu

Messages: 43
Registered: April 2013

Member

hi
my grammar only hides ws rule, which doent include non.breaking space at all . why the non breaking spaces seem all get hidden when parsing a source file of my grammar?
however, afted i declared a unused terminal rule for '\u00A0', the parser does show errors for non breaking spaces.

Report message to a moderator

Re: why the parser hides non breaking spaces [message #1075501 is a reply to message #1075415]

Mon, 29 July 2013 14:38

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

On 2013-29-07 13:37, paul lu wrote:
> hi my grammar only hides ws rule, which doent include non.breaking space
> at all . why the non breaking spaces seem all get hidden when parsing a
> source file of my grammar? however, afted i declared a unused terminal
> rule for '\u00A0', the parser does show errors for non breaking spaces.

Your "unused" terminal rule will still deliver a different token from
the lexer. Which token it delivered before you added the rule is
impossible to say without looking at your (complete) grammar.

- henrik

Report message to a moderator

Re: why the parser hides non breaking spaces [message #1075783 is a reply to message #1075501]

Tue, 30 July 2013 03:51

paul lu

Messages: 43
Registered: April 2013

Member

Thanks,
I mistakenly thought the lexer hides non-breaking spaces, when I declare the grammar hides normal spaces. It didn't as I tried a simplest grammar.
So far no clue as to why in my case it hides non-breaking spaces. I'll see if I can post a simpler grammar to reproduce this.

Paul

Report message to a moderator

Re: why the parser hides non breaking spaces [message #1076030 is a reply to message #1075783]

Tue, 30 July 2013 14:43

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

On 2013-30-07 5:51, paul lu wrote:
> Thanks, I mistakenly thought the lexer hides non-breaking spaces, when I
> declare the grammar hides normal spaces. It didn't as I tried a simplest
> grammar. So far no clue as to why in my case it hides non-breaking
> spaces. I'll see if I can post a simpler grammar to reproduce this.
> Paul
If a character is not covered by a lexer rule it is delivered to the
grammar as an ANY_OTHER token (or similar depending on how you declared
it in your grammar). Thus if you have some grammar rule that accepts
ANY_OTHER, the non breaking space may have ended up there.

It is very difficult to speculate without a complete grammar. Did you
use the default terminals? Do you have other overlapping lexer rules, etc.

Regards
- henrik

Report message to a moderator

Re: why the parser hides non breaking spaces [message #1082023 is a reply to message #1076030]

Thu, 08 August 2013 01:42

paul lu

Messages: 43
Registered: April 2013

Member

Hi Henrik,
Exactly, I hide the . rule in the grammar, and in tokenstream, some tokens not intended to be subsumed by the rules are assigned this "Other" rule to be hidden. I can reproduce it with a simpler grammar.

grammar org.xtext.example.mydsl.MyDsl hidden(WS, ML_COMMENT, SL_COMMENT, Other)
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"

Model:
	greetings+=Greeting*;

Greeting:
	'Hello' name=ID '!';



terminal ID  		: '^'?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
terminal INT returns ecore::EInt: ('0'..'9')+;
terminal STRING	: 
			'"' ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|'"') )* '"' |
			"'" ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|"'") )* "'"
		; 
terminal ML_COMMENT	: '/*' -> '*/';
terminal SL_COMMENT 	: '//' !('\n'|'\r')* ('\r'? '\n')?;

terminal WS			: (' '|'\t'|'\r'|'\n')+;

terminal Other: .;

When I hide Other, "Hello X!" with leading non-breaking spaces gets parsed without syntax errors. Otherwise, the same case cannot get parsed.

Currently, this is fixed by defining a new terminal rule for non-breaking spaces, but it seems there could be more characters not defined in the terminals...

Thanks!
Paul

Report message to a moderator

Re: why the parser hides non breaking spaces [message #1082462 is a reply to message #1082023]

Thu, 08 August 2013 15:48

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

ok,
and what is it you really want? I can't quite figure that out.
Why not simply add the non breaking space to the WS rule ?

Having the rule '.' be hidden means user can enter anything that is
otherwise unrecognized and it is interpreted as "white space".
e.g. xÅ=Ä1Ö+Ö1

- henrik

On 2013-07-08 21:42, paul lu wrote:
> Hi Henrik, Exactly, I hide the . rule in the grammar, and in
> tokenstream, some tokens not intended to be subsumed by the rules are
> assigned this "Other" rule to be hidden. I can reproduce it with a
> simpler grammar.
> grammar org.xtext.example.mydsl.MyDsl hidden(WS, ML_COMMENT, SL_COMMENT,
> Other)
> import "http://www.eclipse.org/emf/2002/Ecore" as ecore
> generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
>
> Model:
> greetings+=Greeting*;
>
> Greeting:
> 'Hello' name=ID '!';
>
>
>
> terminal ID : '^'?('a'..'z'|'A'..'Z'|'_')
> ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
> terminal INT returns ecore::EInt: ('0'..'9')+;
> terminal STRING : '"' ( '\\'
> ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|'"') )* '"' |
> "'" ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') |
> !('\\'|"'") )* "'"
> ; terminal ML_COMMENT : '/*' -> '*/';
> terminal SL_COMMENT : '//' !('\n'|'\r')* ('\r'? '\n')?;
>
> terminal WS : (' '|'\t'|'\r'|'\n')+;
>
> terminal Other: .;
>
>
>
> When I hide Other, "Hello X!" with leading non-breaking spaces gets
> parsed without syntax errors. Otherwise, the same case cannot get parsed.
> Currently, this is fixed by defining a new terminal rule for
> non-breaking spaces, but it seems there could be more characters not
> defined in the terminals...
> Thanks!
> Paul

Report message to a moderator

Re: why the parser hides non breaking spaces [message #1082744 is a reply to message #1082462]

Fri, 09 August 2013 01:28

paul lu

Messages: 43
Registered: April 2013

Member

Hi,
The non-breaking space is not allowed. But hiding "Other" unintentionally hides the non-breaking spaces. Originally, in our grammar, "Other" serves as a rule used to set the type of some token to be hidden, to implement a preprocessor-like function, making the parser ignore some blocks of code based on some configuration.
But this obviously introduces problems...

- Paul

Henrik Lindberg wrote on Thu, 08 August 2013 11:48

ok,
and what is it you really want? I can't quite figure that out.
Why not simply add the non breaking space to the WS rule ?

Having the rule '.' be hidden means user can enter anything that is
otherwise unrecognized and it is interpreted as "white space".
e.g. xÅ=Ä1Ö+Ö1

- henrik

On 2013-07-08 21:42, paul lu wrote:
> Hi Henrik, Exactly, I hide the . rule in the grammar, and in
> tokenstream, some tokens not intended to be subsumed by the rules are
> assigned this "Other" rule to be hidden. I can reproduce it with a
> simpler grammar.
> grammar org.xtext.example.mydsl.MyDsl hidden(WS, ML_COMMENT, SL_COMMENT,
> Other)
> import "http://www.eclipse.org/emf/2002/Ecore" as ecore
> generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
>
> Model:
> greetings+=Greeting*;
>
> Greeting:
> 'Hello' name=ID '!';
>
>
>
> terminal ID : '^'?('a'..'z'|'A'..'Z'|'_')
> ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
> terminal INT returns ecore::EInt: ('0'..'9')+;
> terminal STRING : '"' ( '\\'
> ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|'"') )* '"' |
> "'" ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') |
> !('\\'|"'") )* "'"
> ; terminal ML_COMMENT : '/*' -> '*/';
> terminal SL_COMMENT : '//' !('\n'|'\r')* ('\r'? '\n')?;
>
> terminal WS : (' '|'\t'|'\r'|'\n')+;
>
> terminal Other: .;
>
>
>
> When I hide Other, "Hello X!" with leading non-breaking spaces gets
> parsed without syntax errors. Otherwise, the same case cannot get parsed.
> Currently, this is fixed by defining a new terminal rule for
> non-breaking spaces, but it seems there could be more characters not
> defined in the terminals...
> Thanks!
> Paul

Report message to a moderator

Re: why the parser hides non breaking spaces [message #1082795 is a reply to message #1082744]

Fri, 09 August 2013 03:30

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

It is far better to have a permissive grammar / lexer and instead
validate what is illegal. I do that with special spaces. That way you
can create markers for all the positions where there is a non-breaking
space and offer a quick fix to turn it into a regular space.

- henrik

On 2013-08-08 21:28, paul lu wrote:
> Hi, The non-breaking space is not allowed. But hiding "Other"
> unintentionally hides the non-breaking spaces. Originally, in our
> grammar, "Other" serves as a rule used to set the type of some token to
> be hidden, to implement a preprocessor-like function, making the parser
> ignore some blocks of code based on some configuration. But this
> obviously introduces problems...
>
> - Paul
>
> Henrik Lindberg wrote on Thu, 08 August 2013 11:48
>> ok,
>> and what is it you really want? I can't quite figure that out.
>> Why not simply add the non breaking space to the WS rule ?
>>
>> Having the rule '.' be hidden means user can enter anything that is
>> otherwise unrecognized and it is interpreted as "white space".
>> e.g. xÅ=Ä1Ö+Ö1
>>
>> - henrik
>>
>> On 2013-07-08 21:42, paul lu wrote:
>> > Hi Henrik, Exactly, I hide the . rule in the grammar, and in
>> > tokenstream, some tokens not intended to be subsumed by the rules are
>> > assigned this "Other" rule to be hidden. I can reproduce it with a
>> > simpler grammar.
>> > grammar org.xtext.example.mydsl.MyDsl hidden(WS, ML_COMMENT,
>> SL_COMMENT,
>> > Other)
>> > import "http://www.eclipse.org/emf/2002/Ecore" as ecore
>> > generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
>> >
>> > Model:
>> > greetings+=Greeting*;
>> >
>> > Greeting:
>> > 'Hello' name=ID '!';
>> >
>> >
>> >
>> > terminal ID : '^'?('a'..'z'|'A'..'Z'|'_')
>> > ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
>> > terminal INT returns ecore::EInt: ('0'..'9')+;
>> > terminal STRING : '"' ( '\\'
>> > ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|'"') )* '"' |
>> > "'" ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') |
>> > !('\\'|"'") )* "'"
>> > ; terminal ML_COMMENT : '/*' -> '*/';
>> > terminal SL_COMMENT : '//' !('\n'|'\r')* ('\r'? '\n')?;
>> >
>> > terminal WS : (' '|'\t'|'\r'|'\n')+;
>> >
>> > terminal Other: .;
>> >
>> >
>> >
>> > When I hide Other, "Hello X!" with leading non-breaking spaces gets
>> > parsed without syntax errors. Otherwise, the same case cannot get
>> parsed.
>> > Currently, this is fixed by defining a new terminal rule for
>> > non-breaking spaces, but it seems there could be more characters not
>> > defined in the terminals...
>> > Thanks!
>> > Paul
>
>

Report message to a moderator

Previous Topic:	How to import an XSD into an Xtext model file
Next Topic:	Automatic configuration of application project

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Apr 25 19:04:47 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter