Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Supporting unquoted string with spaces
Supporting unquoted string with spaces [message #1639487] Fri, 27 February 2015 17:10 Go to next message
Luis De Bello is currently offline Luis De BelloFriend
Messages: 95
Registered: January 2015
Member
Hi guys,

I am trying to support unquoted strings with spaces so I have built this grammar.

/*
* Document
*/
Document:
name=UNQUOTED_STRING;

/*
* Terminals
*/
terminal WS:
(' ' | '\t')+;

terminal EOL:
LINE_BREAK;

terminal UNQUOTED_STRING:
!NON_QUOTED_STRING_START -> NON_QUOTED_STRING_END;

terminal fragment LINE_BREAK:
('\r' | '\n');

terminal fragment NON_QUOTED_STRING_START:
'"'|"'"|'0'..'9'|'!'|'#'|'$'|'('|')'|'*'|'+'|','|'-'|'.'|'/'|':'|'<'|'='|'>'|'?'|'['|']'|'{'|'}'|'|'|'%'|'^'|'@'|'\r'|'\n'|' '|'\t';

terminal fragment NON_QUOTED_STRING_END:
('!'|'#'|'$'|'('|')'|'*'|','|'.'|'/'|':'|'<'|'='|'>'|'?'|'['|']'|'{'|'}'|'|'|'%'|'^'|'\r'|'\n');

And then I tested the lexer using the following code:

InternalMyDslLexer lexer = new InternalMyDslLexer(new ANTLRStringStream("Data"));

Token token = lexer.nextToken();
while (token.getType() != -1) {
System.out.println(token);
token = lexer.nextToken();
}

The lexer is not able to parse my string however depending on the input it works ok:

Inputs:
Data --> Does not work
Data with spaces --> Works ok
Data\n --> Works ok

Do you have any idea why my lexer is not able to parse the first input which is a single word

Thanks in advance.

Regards,
Luis
Re: Supporting unquoted string with spaces [message #1644143 is a reply to message #1639487] Sun, 01 March 2015 23:35 Go to previous messageGo to next message
Luis De Bello is currently offline Luis De BelloFriend
Messages: 95
Registered: January 2015
Member
Hi guys,

I am replying to my own answer ,maybe this can be useful to others. I was able to support unquoted string with spaces, using a custom lexer, I enclose my terminals in Xtext and the portion of code of my lexer

Xtext file:
terminal UNQUOTED_STRING:
!NON_QUOTED_STRING_START !(NON_QUOTED_STRING_END)*;

terminal fragment NON_QUOTED_STRING_START:
'"'|"'"|'0'..'9'|'!'|'#'|'$'|'('|')'|'*'|'+'|','|'-'|'.'|'/'|':'|'<'|'='|'>'|'?'|'['|']'|'{'|'}'|'|'|'%'|'^'|'@'|'\r'|'\n'|' '|'\t';

terminal fragment NON_QUOTED_STRING_END:
('!'|'#'|'$'|'('|')'|'*'|','|'.'|'/'|':'|'<'|'='|'>'|'?'|'['|']'|'{'|'}'|'|'|'%'|'^'|'\r'|'\n');

Lexer grammar:
RULE_UNQUOTED_STRING : {!isKeyword()}?=> ~(RULE_NON_QUOTED_STRING_START) ({!isIsolatedKeyword()}?=> ~(RULE_NON_QUOTED_STRING_END))*;

The isKeyword and isIsolatedKeyword are two methods implement to check for keywords using lookahead functionality it will depend on each implementation.

I hope this will be useful for others also ,you will need to split lexer and parser using the a fragment because text by default use a lexer/parser in the same file.

Regards,
Luis
Re: Supporting unquoted string with spaces [message #1644862 is a reply to message #1644143] Mon, 02 March 2015 07:53 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 7655
Registered: July 2009
Senior Member
Hi

Seems interesting. I nearly replied to your original message suggesting
that using Xtext for a lexing problem was crazy, but you seem to have a
new way of using Xtext that I do not understand.

Please elaborate on "you will need to split lexer and parser using the a
fragment because text by default use a lexer/parser in the same file".
I'm only aware of grammar splitting by the grammar...with... daisy
chain. I'm not sure which fragment you refer to: both of the existing
AntlrGeneratorFragments, your custom fragment or ...

Regards

Ed Willink


On 01/03/2015 23:35, Luis De Bello wrote:
> Hi guys,
>
> I am replying to my own answer ,maybe this can be useful to others. I
> was able to support unquoted string with spaces, using a custom lexer,
> I enclose my terminals in Xtext and the portion of code of my lexer
>
> Xtext file:
> terminal UNQUOTED_STRING:
> !NON_QUOTED_STRING_START !(NON_QUOTED_STRING_END)*;
>
> terminal fragment NON_QUOTED_STRING_START:
> '"'|"'"|'0'..'9'|'!'|'#'|'$'|'('|')'|'*'|'+'|','|'-'|'.'|'/'|':'|'<'|'='|'>'|'?'|'['|']'|'{'|'}'|'|'|'%'|'^'|'@'|'\r'|'\n'|'
> '|'\t';
>
> terminal fragment NON_QUOTED_STRING_END:
> ('!'|'#'|'$'|'('|')'|'*'|','|'.'|'/'|':'|'<'|'='|'>'|'?'|'['|']'|'{'|'}'|'|'|'%'|'^'|'\r'|'\n');
>
>
> Lexer grammar:
> RULE_UNQUOTED_STRING : {!isKeyword()}?=>
> ~(RULE_NON_QUOTED_STRING_START) ({!isIsolatedKeyword()}?=>
> ~(RULE_NON_QUOTED_STRING_END))*;
>
> The isKeyword and isIsolatedKeyword are two methods implement to check
> for keywords using lookahead functionality it will depend on each
> implementation.
>
> I hope this will be useful for others also ,you will need to split
> lexer and parser using the a fragment because text by default use a
> lexer/parser in the same file.
>
> Regards,
> Luis
Re: Supporting unquoted string with spaces [message #1647584 is a reply to message #1644862] Tue, 03 March 2015 14:05 Go to previous message
Luis De Bello is currently offline Luis De BelloFriend
Messages: 95
Registered: January 2015
Member
Hi Ed,

I tried to say that you will need to use a fragment which is provided by Xtext to split the lexer and parser grammar.

// Splitting lexer and parser generation, this is use to replace the default fragment "parser.antlr.XtextAntlrGeneratorFragment"
fragment = org.eclipse.xtext.generator.parser.antlr.ex.rt.AntlrGeneratorFragment {}

After splitting this file you can start adding some context to your lexer using one additional fragment
// Uses ANTLR Tools to compile a custom lexer and will also add a binding in the runtime module to use the Lexer
fragment = parser.antlr.ex.ExternalAntlrLexerFragment {
// A grammar file extension with .g will be expected in this package (Should be stored in src folder)
lexerGrammar = "org.mule.tooling.dfl.parser.antlr.lexer.InternalDFLLexer"
runtime = true
antlrParam = "-lib"// This is the folder where the lexer will be created
antlrParam = "${runtimeProject}/src-gen/org/mule/tooling/dfl/parser/antlr/lexer"
}

Now you have your own lexer grammar and you can add some predicates and context using LA techniques. The only issue with predicates is that Xtext only handles NoViableAltException for the recovery mode so you will have to override the method next token or use an awful hack as replace the "FailedPredicateException" to "NoViableAltException" does works for me but it is not a nice solution.

I hope to make myself clear and I hope this will be useful for you

Regards,
Luis
Previous Topic:import of namespace in Xtext editor
Next Topic:Using until operator without consuming the last token
Goto Forum:
  


Current Time: Thu Mar 28 16:14:38 GMT 2024

Powered by FUDForum. Page generated in 0.02320 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top