Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Fuzzy grammar advice(Grammar that skips the statement and expression level and just parses objects/classes/methods etc..)
Fuzzy grammar advice [message #1829942] Tue, 14 July 2020 14:01 Go to next message
Tomas Öberg is currently offline Tomas ÖbergFriend
Messages: 14
Registered: February 2020
Junior Member
I'm wondering how to best implement a grammar that skips statements and expressions (and their descendants) and just parses at the surface level of the code. In other words I only want to parse classes/objects with their fields and method names but I don't want to meddle with the expressions in this fuzzy parser version.
The language I'm creating an xtext based LS for, has its methods defined within brackets, as for example:
functionId[ param1 param2 paramN;

...stat/expr...<--- skip all of these

]


And the global function's syntax looks like:

[globalFunctionId param1 param2 paramN; 

...stat/expr... <--- skip all of these

] ;



My (flawed) solution so far has been to create a BRACKET_TO_BRACKET terminal:
terminal BRACKET_TO_BRACKET:
'[' -> ']' ';'
;

But the disadvantage is of course that the ']' token can appear within a string statement, such as "[Text]" and break up the parsing halfway.
Luckily though the language doesn't use brackets for array elements so it is probably only the strings that will require special handling.

I'm under the impression that this is hard to do by grammar and that I might need to override the lexer with my own variant of it, but cannot say for sure. It might just be that I haven't thought long enough on how accomplish this yet. But IF I need to override the Lexer Ive noticed that it turns all its rules to final methods making it even harder to override the specific mBRACKET_TO_BRACKET function I want to customize. Where can the final modifier be turned off in case there isn't a better way forward?

(Any suggestions whatsoever in how accomplish the end goal here is most welcome.)

[Updated on: Tue, 14 July 2020 14:12]

Report message to a moderator

Re: Fuzzy grammar advice [message #1829946 is a reply to message #1829942] Tue, 14 July 2020 15:13 Go to previous messageGo to next message
Karsten Thoms is currently offline Karsten ThomsFriend
Messages: 739
Registered: July 2009
Location: Dortmund, Germany
Senior Member

Right, if you can't use a single token to be consumed by the standard lexer, you'll have to use your custom lexer.

There are a couple of articles and projects out there where you could get some inspiration from. In Xtext's code base custom lexing is done in the Xtend language (org.eclipse.xtend.core).

Since 2.20 a new generator fragment org.eclipse.xtext.xtext.generator.parser.antlr.ex.ExternalAntlrLexerFragment exists to embed the basic bindings to customize external lexers.
Re: Fuzzy grammar advice [message #1829972 is a reply to message #1829946] Wed, 15 July 2020 04:51 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 6874
Registered: July 2009
Senior Member
Hi

You may not need to resort to a custom Lexer. Xtext provides a convenient interface that allows you to just adjust the token stream between lexer and parser, in your case to reorganize mis-lexed angle brackets.

You might find some inspiration in:

https://git.eclipse.org/r/plugins/gitiles/ocl/org.eclipse.ocl/+/refs/heads/master/plugins/org.eclipse.ocl.xtext.base/src/org/eclipse/ocl/xtext/base/services/RetokenizingTokenSource.java

Regards

Ed Willink
Re: Fuzzy grammar advice [message #1829982 is a reply to message #1829972] Wed, 15 July 2020 08:53 Go to previous messageGo to next message
Tomas Öberg is currently offline Tomas ÖbergFriend
Messages: 14
Registered: February 2020
Junior Member
Thanks to you both! I think I'll try out the retokenizer approach first and go for the custom lexer if it fails. Do you know Ed if that approach will work equally well if I were to apply the same for a language where the syntax was more alike c/java when it comes to functions, i.e blocks within curly braces { ... } ? And do you know if there is any drawbacks to this method that I wouldn't get with the custom lexer?

[Updated on: Wed, 15 July 2020 09:36]

Report message to a moderator

Re: Fuzzy grammar advice [message #1829983 is a reply to message #1829982] Wed, 15 July 2020 08:54 Go to previous messageGo to next message
Tomas Öberg is currently offline Tomas ÖbergFriend
Messages: 14
Registered: February 2020
Junior Member
(I guess what I'm mostly afraid of is messing up the current line/character position within the tokens themselves.)
Re: Fuzzy grammar advice [message #1829987 is a reply to message #1829983] Wed, 15 July 2020 09:54 Go to previous message
Ed Willink is currently offline Ed WillinkFriend
Messages: 6874
Registered: July 2009
Senior Member
Hi

I have no experience of custom lexers for Xtext so cannot comment on how easy it is. I shared your reticence about messing up the larger framework. The retokeniser changes very little and my example should demonstrate how to initialize a replacement token with adequate context.

Braces and such like should not be a problem. The retokeniser assumes that the default lexer breaks the text up into sensible tokens so presumably your nested expression will be lexed as a sequence of identifiers and punctuation tokens. The retokeniser just needs to recognise the start token, presumably easy, and the end token, perhaps harder if a string mis-lex occurred, then return a single opaque token for the expression, some residue tokens from the mis-lex and then carry on as normal.

Regards

Ed Willink
Previous Topic:Bypass Xtend's IterableExtensions when there's a conflict
Next Topic:Force relinking of XAbstractFeatureCall
Goto Forum:
  


Current Time: Thu Oct 22 12:50:41 GMT 2020

Powered by FUDForum. Page generated in 0.01807 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top