Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Regular Expression and newline(Regular Expression in Xtext and Newline)
Regular Expression and newline [message #894237] Sun, 08 July 2012 03:02 Go to next message
paolo pinkel is currently offline paolo pinkelFriend
Messages: 10
Registered: July 2012
Junior Member
Hello, I am quite new to Xtext and I am trying to implement this grammar:
File: ( Specification NewLine )* .
Specification:
    [ TokenName ':' ]
    Pattern
    [ '(' AuxiliaryScannerName ')' ]
    [ '[' TokenProcessorName ']' ] .
Pattern: RegularExpression / CannedSpecificationName .
TokenName: Identifier .
CannedSpecificationName: Identifier .
AuxiliaryScannerName: Identifier .
TokenProcessorName: Identifier .

(TokenProcessorName, AuxiliaryScannerName, CannedSpecificationName are predefined by the program)
I came across two main problems.
RegularExpressions in Xtext. Meaning the production Pattern can either be a regular expression, or a cannedSpecification(predefined name).
However I am not quite able to implement a production for regular expressions ( i dont want full support but i want to be able to enter a regular expression. )

The other Problem is the NewLine within the File -production. I haven't found a possibility to make a newline needed within the generated editor Sad

My xtext file looks like this this far
File:
	( specifications += Specification )+ ;

Specification:
	 ( tokenName = ID ':')? 
	 pattern=Pattern 
	 ( '(' auxScanner = AuxiliaryScannerName ')' )? 
	 ( '[' tokenScanner = TokenProcessorName ']' )?
;
Pattern:
	CannedExpr | RegExpr
;

RegExpr:
	'$' regex+ 
;
terminal regex:
    'a'..'z'|'A'..'Z'|'0'..'9'|'-'
    |','|'.'|'?'|'\''|':'|'\"'|'>'
    |'<'|'/'|'_'|'='|';'|'('|')'
    |'&'|'!'|'#'|'%'|'*'|'+'|'['|']'; 


terminal CannedExpr:
	'ADA_COMMENT' |  'ADA_IDENTIFIER' |  'AWK_COMMENT' | 'C_COMMENT' | 'C_CHAR_CONSTANT' | 'C_FLOAT'
    | 'C_IDENTIFIER' | 'C_INTEGER' | 'C_INT_DENOTATION' | 'C_STRING_LIT' | 'MODULA_INTEGER' | 'MODULA2_COMMENT' 
    | 'MODULA3_COMMENT' | 'MODULA2_CHARINT' | 'MODULA2_INTEGER' | 'MODULA2_LITERALDQ' | 'MODULA2_LITERALSQ' 
    | 'PASCAL_COMMENT' | 'PASCAL_IDENTIFIER' | 'PASCAL_INTEGER' | 'PASCAL_REAL' | 'PASCAL_STRING' | 'SPACES' | 'TAB' | 'NEW_LINE'
;

AuxiliaryScannerName:
	auxiliaryScannerName=('auxEOF'| 'coordAdjust'| 'auxNewLine' | 'auxTab' | 'auxEOL'| 'auxPascalString' | 'auxPascalComment'
     | 'auxNoEOL' | 'auxCString' | 'auxCChar' | 'auxCComment' | 'auxM2String'| 'auxM3Comment' | 'Ctext')
;


P.s sorry for my bad english.

I really appreciate your help!
Best of greetings
Re: Regular Expression and newline [message #894350 is a reply to message #894237] Sun, 08 July 2012 15:14 Go to previous messageGo to next message
Jens Kuenzer is currently offline Jens KuenzerFriend
Messages: 29
Registered: October 2009
Junior Member
Hi,

The NewLine aka. '\n' is already part of the whitespace rule (WS). If you like to use NewLine in your language you can define your own WS rule.
You can change your Xtext file header to not import the defaults like this (make sure to add the import of ecore):
grammar <your grammar name> hidden(<your whitespace and comment rules here>)
import "http://www.eclipse.org/emf/2002/Ecore" as ecore

Also note that the terminal rules CannedExpr and regex can match the same input and that regex might hide the CannedExpr. I have still not yet completly understood how a ANTLR lexer handles that. So best would be to jail the regexp inside a string kind of literal. That way the lexer can us the delimiter character to detect the regex without knowing the context.
Re: Regular Expression and newline [message #894375 is a reply to message #894237] Sun, 08 July 2012 22:38 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
If '$' is not used anywhere else in your grammar/terminals, it should
work if you do this:

terminal REGEX:
'$' (
'a'..'z'|'A'..'Z'|'0'..'9'|'-'
|','|'.'|'?'|'\''|':'|'\"'|'>'
|'<'|'/'|'_'|'='|';'|'('|')'
|'&'|'!'|'#'|'%'|'*'|'+'|'['|']')+
;

Some other notes:
You do not want a bunch of terminals for keywords. Do something like:

Pattern : CannedExpr | RegExp ;

Regexp : value = REGEX ;

CannedExpr:
value = ('ADA_COMMENT' | 'ADA_IDENTIFIER' | 'AWK_COMMENT'
| 'C_COMMENT' | ' ...
);

Now an instance of the abstract "Pattern" class will have a feature
"value" that is either the parsed REGEX text or one of the keywords
given in CannedExpr.

Hope that helps you.
Regards
- henrik

On 2012-08-07 5:02, paolo pinkel wrote:
> Hello, I am quite new to Xtext and I am trying to implement this grammar:
>
> File: ( Specification NewLine )* .
> Specification:
> [ TokenName ':' ]
> Pattern
> [ '(' AuxiliaryScannerName ')' ]
> [ '[' TokenProcessorName ']' ] .
> Pattern: RegularExpression / CannedSpecificationName .
> TokenName: Identifier .
> CannedSpecificationName: Identifier .
> AuxiliaryScannerName: Identifier .
> TokenProcessorName: Identifier .
>
> (TokenProcessorName, AuxiliaryScannerName, CannedSpecificationName are
> predefined by the program)
> I came across two main problems.
> RegularExpressions in Xtext. Meaning the production Pattern can either
> be a regular expression, or a cannedSpecification(predefined name).
> However I am not quite able to implement a production for regular
> expressions ( i dont want full support but i want to be able to enter a
> regular expression. )
>
> The other Problem is the NewLine within the File -production. I haven't
> found a possibility to make a newline needed within the generated editor :(
>
> My xtext file looks like this this far
> File:
> ( specifications += Specification )+ ;
>
> Specification:
> ( tokenName = ID ':')? pattern=Pattern ( '(' auxScanner
> = AuxiliaryScannerName ')' )? ( '[' tokenScanner =
> TokenProcessorName ']' )?
> ;
> Pattern:
> CannedExpr | RegExpr
> ;
>
> RegExpr:
> '$' regex+ ;
> terminal regex:
> 'a'..'z'|'A'..'Z'|'0'..'9'|'-'
> |','|'.'|'?'|'\''|':'|'\"'|'>'
> |'<'|'/'|'_'|'='|';'|'('|')'
> |'&'|'!'|'#'|'%'|'*'|'+'|'['|']';
>
> terminal CannedExpr:
> 'ADA_COMMENT' | 'ADA_IDENTIFIER' | 'AWK_COMMENT' | 'C_COMMENT' |
> 'C_CHAR_CONSTANT' | 'C_FLOAT'
> | 'C_IDENTIFIER' | 'C_INTEGER' | 'C_INT_DENOTATION' | 'C_STRING_LIT'
> | 'MODULA_INTEGER' | 'MODULA2_COMMENT' | 'MODULA3_COMMENT' |
> 'MODULA2_CHARINT' | 'MODULA2_INTEGER' | 'MODULA2_LITERALDQ' |
> 'MODULA2_LITERALSQ' | 'PASCAL_COMMENT' | 'PASCAL_IDENTIFIER' |
> 'PASCAL_INTEGER' | 'PASCAL_REAL' | 'PASCAL_STRING' | 'SPACES' | 'TAB' |
> 'NEW_LINE'
> ;
>
> AuxiliaryScannerName:
> auxiliaryScannerName=('auxEOF'| 'coordAdjust'| 'auxNewLine' |
> 'auxTab' | 'auxEOL'| 'auxPascalString' | 'auxPascalComment'
> | 'auxNoEOL' | 'auxCString' | 'auxCChar' | 'auxCComment' |
> 'auxM2String'| 'auxM3Comment' | 'Ctext')
> ;
>
>
> P.s sorry for my bad english.
>
> I really appreciate your help!
> Best of greetings
Re: Regular Expression and newline [message #894380 is a reply to message #894375] Mon, 09 July 2012 00:11 Go to previous messageGo to next message
paolo pinkel is currently offline paolo pinkelFriend
Messages: 10
Registered: July 2012
Junior Member
Thank you guys, very much. You really helped me a lot.
Re: Regular Expression and newline [message #897406 is a reply to message #894380] Tue, 24 July 2012 02:19 Go to previous message
paolo pinkel is currently offline paolo pinkelFriend
Messages: 10
Registered: July 2012
Junior Member
Sorry to bother you guys again, but i got another problem.
this is my new xtext file. I a currently having issues with adding comments or having some random whitespaces within such a gla file.
grammar de.sabram.upb.specs.Gla hidden(WS, ML_COMMENT)
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate gla "//www.sabram.de/upb/specs/Gla"


File:
	specifications+=Specification(('\n' |'\r' )specifications+=Specification)*;

Specification returns Specifications:
	 ( tokenName = ID ':')? 
	 pattern=Pattern
	 ('('auxiliaryScannerName=AuxiliaryScannerName')')? 
	 ('['tokenProcessorName=TokenProcessorName']')?
;
Pattern: CannedExpression | RegularExpression;

CannedExpression returns Pattern:
	value=('ADA_COMMENT' |  'ADA_IDENTIFIER' |  'AWK_COMMENT' | 'C_COMMENT' | 'C_CHAR_CONSTANT' | 'C_FLOAT'
    | 'C_IDENTIFIER' | 'C_INTEGER' | 'C_INT_DENOTATION' | 'C_STRING_LIT' | 'MODULA_INTEGER' | 'MODULA2_COMMENT' 
    | 'MODULA3_COMMENT' | 'MODULA2_CHARINT' | 'MODULA2_INTEGER' | 'MODULA2_LITERALDQ' | 'MODULA2_LITERALSQ' 
    | 'PASCAL_COMMENT' | 'PASCAL_IDENTIFIER' | 'PASCAL_INTEGER' | 'PASCAL_REAL' | 'PASCAL_STRING' | 'SPACES' | 'TAB' | 'NEW_LINE')	
;
RegularExpression returns Pattern: value=REGEX;	

AuxiliaryScannerName: 
	value=('auxEOF'| 'coordAdjust'| 'auxNewLine' | 'auxTab' | 'auxEOL'| 'auxPascalString' | 'auxPascalComment'
	| 'auxNoEOL' | 'auxCString' | 'auxCChar' | 'auxCComment' | 'auxM2String'| 'auxM3Comment' | 'Ctext')
;

TokenProcessorName: 
	value=('c_mkchar' | 'c_mkint' | 'c_mkstr' | 'EndOfText' | 'lexerr' | 'mkidn'| 'mkint' | 'mkstr' | 'modula_mkint')
;

/********************************************************************
 ******** TERMINALS
 ******** No rules below this line, only terminals
 ********************************************************************/
 
/*
 * RegularExpression Terminal for Eli/GLA. Beginning with a $ followed by the body of a regularExpression
 * the regularExpression is split into seperate parts, to make it readable
 * - all possible digits: REGEX_NUMBER
 * - all possible characters: REGEX_DIGIT
 * - all possible cardinality operators : REGEX_CARDINALITY
 * - all possible symbols within a regular expression  
*/
terminal REGEX: '$' REGEX_BODY+;
terminal fragment REGEX_BODY: (REGEX_STRING|REGEX_DIGIT|REGEX_CARDINALITY|REGEX_SYMBOL);
terminal fragment REGEX_DIGIT: ('0'..'9');
terminal fragment REGEX_STRING: ('a'..'z'|'A'..'Z');
terminal fragment REGEX_CARDINALITY: ('*'|'+'|'-'|'?');
terminal fragment REGEX_SYMBOL: (','|'.'|'\''|':'|'\"'|'>'|'<'|'/'|'_'|'='|';'|'('|')'|'&'|'!'|'#'|'%'|'['|']'|'\\');
/*
 * TERMINALS OUT OF DEFAULT
 */										
terminal ID  		: '^'?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
terminal INT returns ecore::EInt: ('0'..'9')+;
terminal STRING	: 
			'"' ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|'"') )* '"' |
			"'" ( '\\' ('b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\') | !('\\'|"'") )* "'"
		; 
terminal ML_COMMENT: '/*' -> '*/';
terminal WS			: (' '|'\t'|'\r'|'\n')+;
terminal ANY_OTHER: .;

The editor doesnt accept such an input
/*
* this is a small gla Example file
* which simply defines 3 different tokens by using CannedExpressions
*/
Identifier: 	C_IDENTIFIER
Number:		C_INTEGER
      		C_COMMENT

my problem in this example is the multiple \t characters in front of C_COMMENT or a random \t or simple space after a line like Identifier: C_IDENTIFIER.
Am I using the WS rule wrong? Took the hidden(WS,ML_COMMENT) out of the xtext default terminals and was expecting to ignore whitespaces and multiline comments.

I really appreciate your help.
Best greetings, Paolo

Edit somehow i wasn't able to copy the whole code because this forum doesn't accept "links" which are too long and don't point to eclipse. So the generate gla part is missing the "http:"

[Updated on: Tue, 24 July 2012 02:21]

Report message to a moderator

Previous Topic:Generating JvmTypeReference for Generics
Next Topic:embedded languages
Goto Forum:
  


Current Time: Wed Feb 26 16:42:45 GMT 2020

Powered by FUDForum. Page generated in 0.02360 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top