Home » Modeling » TMF (Xtext) » why the parser hides non breaking spaces(i want to show sybtax errors for non breaking spaces)
| | | |
Re: why the parser hides non breaking spaces [message #1082023 is a reply to message #1076030] |
Thu, 08 August 2013 01:42 |
paul lu Messages: 43 Registered: April 2013 |
Member |
|
|
Hi Henrik,
Exactly, I hide the . rule in the grammar, and in tokenstream, some tokens not intended to be subsumed by the rules are assigned this "Other" rule to be hidden. I can reproduce it with a simpler grammar.
grammar org.xtext.example.mydsl.MyDsl hidden(WS, ML_COMMENT, SL_COMMENT, Other)
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Model:
greetings+=Greeting*;
Greeting:
'Hello' name=ID '!';
terminal ID : '^'?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
terminal INT returns ecore::EInt: ('0'..'9')+;
terminal STRING :
'"' ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|'"') )* '"' |
"'" ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|"'") )* "'"
;
terminal ML_COMMENT : '/*' -> '*/';
terminal SL_COMMENT : '//' !('\n'|'\r')* ('\r'? '\n')?;
terminal WS : (' '|'\t'|'\r'|'\n')+;
terminal Other: .;
When I hide Other, "Hello X!" with leading non-breaking spaces gets parsed without syntax errors. Otherwise, the same case cannot get parsed.
Currently, this is fixed by defining a new terminal rule for non-breaking spaces, but it seems there could be more characters not defined in the terminals...
Thanks!
Paul
|
|
|
Re: why the parser hides non breaking spaces [message #1082462 is a reply to message #1082023] |
Thu, 08 August 2013 15:48 |
Henrik Lindberg Messages: 2509 Registered: July 2009 |
Senior Member |
|
|
ok,
and what is it you really want? I can't quite figure that out.
Why not simply add the non breaking space to the WS rule ?
Having the rule '.' be hidden means user can enter anything that is
otherwise unrecognized and it is interpreted as "white space".
e.g. xÅ=Ä1Ö+Ö1
- henrik
On 2013-07-08 21:42, paul lu wrote:
> Hi Henrik, Exactly, I hide the . rule in the grammar, and in
> tokenstream, some tokens not intended to be subsumed by the rules are
> assigned this "Other" rule to be hidden. I can reproduce it with a
> simpler grammar.
> grammar org.xtext.example.mydsl.MyDsl hidden(WS, ML_COMMENT, SL_COMMENT,
> Other)
> import "http://www.eclipse.org/emf/2002/Ecore" as ecore
> generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
>
> Model:
> greetings+=Greeting*;
>
> Greeting:
> 'Hello' name=ID '!';
>
>
>
> terminal ID : '^'?('a'..'z'|'A'..'Z'|'_')
> ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
> terminal INT returns ecore::EInt: ('0'..'9')+;
> terminal STRING : '"' ( '\\'
> ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|'"') )* '"' |
> "'" ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') |
> !('\\'|"'") )* "'"
> ; terminal ML_COMMENT : '/*' -> '*/';
> terminal SL_COMMENT : '//' !('\n'|'\r')* ('\r'? '\n')?;
>
> terminal WS : (' '|'\t'|'\r'|'\n')+;
>
> terminal Other: .;
>
>
>
> When I hide Other, "Hello X!" with leading non-breaking spaces gets
> parsed without syntax errors. Otherwise, the same case cannot get parsed.
> Currently, this is fixed by defining a new terminal rule for
> non-breaking spaces, but it seems there could be more characters not
> defined in the terminals...
> Thanks!
> Paul
|
|
|
Re: why the parser hides non breaking spaces [message #1082744 is a reply to message #1082462] |
Fri, 09 August 2013 01:28 |
paul lu Messages: 43 Registered: April 2013 |
Member |
|
|
Hi,
The non-breaking space is not allowed. But hiding "Other" unintentionally hides the non-breaking spaces. Originally, in our grammar, "Other" serves as a rule used to set the type of some token to be hidden, to implement a preprocessor-like function, making the parser ignore some blocks of code based on some configuration.
But this obviously introduces problems...
- Paul
Henrik Lindberg wrote on Thu, 08 August 2013 11:48ok,
and what is it you really want? I can't quite figure that out.
Why not simply add the non breaking space to the WS rule ?
Having the rule '.' be hidden means user can enter anything that is
otherwise unrecognized and it is interpreted as "white space".
e.g. xÅ=Ä1Ö+Ö1
- henrik
On 2013-07-08 21:42, paul lu wrote:
> Hi Henrik, Exactly, I hide the . rule in the grammar, and in
> tokenstream, some tokens not intended to be subsumed by the rules are
> assigned this "Other" rule to be hidden. I can reproduce it with a
> simpler grammar.
> grammar org.xtext.example.mydsl.MyDsl hidden(WS, ML_COMMENT, SL_COMMENT,
> Other)
> import "http://www.eclipse.org/emf/2002/Ecore" as ecore
> generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
>
> Model:
> greetings+=Greeting*;
>
> Greeting:
> 'Hello' name=ID '!';
>
>
>
> terminal ID : '^'?('a'..'z'|'A'..'Z'|'_')
> ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
> terminal INT returns ecore::EInt: ('0'..'9')+;
> terminal STRING : '"' ( '\\'
> ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|'"') )* '"' |
> "'" ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') |
> !('\\'|"'") )* "'"
> ; terminal ML_COMMENT : '/*' -> '*/';
> terminal SL_COMMENT : '//' !('\n'|'\r')* ('\r'? '\n')?;
>
> terminal WS : (' '|'\t'|'\r'|'\n')+;
>
> terminal Other: .;
>
>
>
> When I hide Other, "Hello X!" with leading non-breaking spaces gets
> parsed without syntax errors. Otherwise, the same case cannot get parsed.
> Currently, this is fixed by defining a new terminal rule for
> non-breaking spaces, but it seems there could be more characters not
> defined in the terminals...
> Thanks!
> Paul
|
|
|
Re: why the parser hides non breaking spaces [message #1082795 is a reply to message #1082744] |
Fri, 09 August 2013 03:30 |
Henrik Lindberg Messages: 2509 Registered: July 2009 |
Senior Member |
|
|
It is far better to have a permissive grammar / lexer and instead
validate what is illegal. I do that with special spaces. That way you
can create markers for all the positions where there is a non-breaking
space and offer a quick fix to turn it into a regular space.
- henrik
On 2013-08-08 21:28, paul lu wrote:
> Hi, The non-breaking space is not allowed. But hiding "Other"
> unintentionally hides the non-breaking spaces. Originally, in our
> grammar, "Other" serves as a rule used to set the type of some token to
> be hidden, to implement a preprocessor-like function, making the parser
> ignore some blocks of code based on some configuration. But this
> obviously introduces problems...
>
> - Paul
>
> Henrik Lindberg wrote on Thu, 08 August 2013 11:48
>> ok,
>> and what is it you really want? I can't quite figure that out.
>> Why not simply add the non breaking space to the WS rule ?
>>
>> Having the rule '.' be hidden means user can enter anything that is
>> otherwise unrecognized and it is interpreted as "white space".
>> e.g. xÅ=Ä1Ö+Ö1
>>
>> - henrik
>>
>> On 2013-07-08 21:42, paul lu wrote:
>> > Hi Henrik, Exactly, I hide the . rule in the grammar, and in
>> > tokenstream, some tokens not intended to be subsumed by the rules are
>> > assigned this "Other" rule to be hidden. I can reproduce it with a
>> > simpler grammar.
>> > grammar org.xtext.example.mydsl.MyDsl hidden(WS, ML_COMMENT,
>> SL_COMMENT,
>> > Other)
>> > import "http://www.eclipse.org/emf/2002/Ecore" as ecore
>> > generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
>> >
>> > Model:
>> > greetings+=Greeting*;
>> >
>> > Greeting:
>> > 'Hello' name=ID '!';
>> >
>> >
>> >
>> > terminal ID : '^'?('a'..'z'|'A'..'Z'|'_')
>> > ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
>> > terminal INT returns ecore::EInt: ('0'..'9')+;
>> > terminal STRING : '"' ( '\\'
>> > ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|'"') )* '"' |
>> > "'" ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') |
>> > !('\\'|"'") )* "'"
>> > ; terminal ML_COMMENT : '/*' -> '*/';
>> > terminal SL_COMMENT : '//' !('\n'|'\r')* ('\r'? '\n')?;
>> >
>> > terminal WS : (' '|'\t'|'\r'|'\n')+;
>> >
>> > terminal Other: .;
>> >
>> >
>> >
>> > When I hide Other, "Hello X!" with leading non-breaking spaces gets
>> > parsed without syntax errors. Otherwise, the same case cannot get
>> parsed.
>> > Currently, this is fixed by defining a new terminal rule for
>> > non-breaking spaces, but it seems there could be more characters not
>> > defined in the terminals...
>> > Thanks!
>> > Paul
>
>
|
|
|
Goto Forum:
Current Time: Thu Apr 25 19:04:47 GMT 2024
Powered by FUDForum. Page generated in 0.03230 seconds
|