Eclipse Community Forums: TMF (Xtext) » XText grammar with variable token terminators & embedded grammars

Home » Modeling » TMF (Xtext) » XText grammar with variable token terminators & embedded grammars(Does XText support all of the crazy grammatical constructs I'll need?)

XText grammar with variable token terminators & embedded grammars [message #1259151]

Thu, 27 February 2014 23:31

Eclipse User

I'm just starting out with XText and don't yet know whether it's the right tool for the job. I read through the grammar documentation and it's unclear whether desired syntax is supported. My input text looks something like the following:

# comment
a|b|c|
d|e|<<F|g
<?xml version="1.0"?>
<foo>
 <bar/>
 <!-- comment -->
 # not a comment
 not|a|record
</foo>
F
h|<<I|<<J
A,B,C,D
E,F,G,H
I,J,K,L
I
<malformed><xml/>
J
k|l

Essentially, I'm working with '|' delimited files with support for embedded 'HERE' blocks (see en.wikipedia.org/wiki/Here_document). The '<<' sequence precedes an identifier that will be used to mark the end of a HERE value linked to that column. HERE block is terminated only if identifier occurs all by itself on its own line.

The above snippet contains four records. Record #2 has one HERE val while Record # 3 has two. Fourth record ('k|l') should count as a record despite unterminated XML of prior record.

To make things even more complicated, I'll want certain columns to be parsed with a different grammar. Notice that the above example contains XML text and also some CSV. Can XText handle embedded grammars? Would it be possible to use a specific grammar based on the value of a preceding column? In the below 2-record example, I'd want to parse Column 5 as either an arithmetic expression or XML, depending upon the value of Column 4.

last|col|is|arithmetic|1+3*7
last|col|is|xml|<<XML
<some><xml/></some>
XML

This is looking pretty complicated so I wouldn't expect anyone to write a grammar for me. Rather, I'm hoping for guidance on what syntactic constructs I should look at or if I should even be using XText in the first place.

The attached image shows my first attempt at an Eclipse plugin built from scratch. Would love to do this with XText as things should be much cleaner that way.

index.php/fa/17599/0/

Attachment: editor1.png
(Size: 68.96KB, Downloaded 876 times)

Re: XText grammar with variable token terminators & embedded grammars [message #1261753 is a reply to message #1259151]

Sun, 02 March 2014 19:16

Eclipse User

Hi,
It is mainly a lexical problem. You will need to write an external
lexer. Xtext does not have any specific features for handling multiple
grammars - you will need to construct one grammar that contains all
syntax you want to support. Depending on your use case you can get
around it by creating tokens that represent the other language, and then
color/validate them with a separate parser. You will have to build that
yourself. If you also want code completion etc. it starts to become
difficult. These "external grammar token" are opaque to the first
grammar so your case where its syntax is determined by another token can
be supported. Basically, the "external grammar token" contains a string
in the external language.

It is probably just as difficult (or more) using other tools than Xtext.

You will need to start by writing an external lexer. (You can look at
one in the project puppetlabs/geppetto @ github).

Handling heredoc is tricky since the grammar expects context free
lexing, and there may be problems with text regions not being
continuous. We are going to support heredoc in the external lexer for
Geppetto as that is being added to the Puppet language it supports, but
we have not started yet, so I do have more concrete advice atm.

Regards

- henrik

On 2014-28-02 5:56, Hollis Waite wrote:
> I'm just starting out with XText and don't yet know whether it's the right tool for the job. I read through the grammar documentation and it's unclear whether desired syntax is supported. My input text looks something like the following:
>
> # comment
> a|b|c|
> d|e|<<F|g
> <?xml version="1.0"?>
> <foo>
> <bar/>
> 
> # not a comment
> not|a|record
> </foo>
> F
> h|<<I|<<J
> A,B,C,D
> E,F,G,H
> I,J,K,L
> I
> <malformed><xml/>
> J
> k|l
>
> Essentially, I'm working with '|' delimited files with support for embedded 'HERE' blocks (see en.wikipedia.org/wiki/Here_document). The '<<' sequence precedes an identifier that will be used to mark the end of a HERE value linked to that column. HERE block is terminated only if identifier occurs all by itself on its own line.
>
> The above snippet contains four records. Record #2 has one HERE val while Record # 3 has two. Fourth record ('k|l') should count as a record despite unterminated XML of prior record.
>
> To make things even more complicated, I'll want certain columns to be parsed with a different grammar. Notice that the above example contains XML text and also some CSV. Can XText handle embedded grammars? Would it be possible to use a specific grammar based on the value of a preceding column? In the below 2-record example, I'd want to parse Column 5 as either an arithmetic expression or XML, depending upon the value of Column 4.
>
> last|col|is|arithmetic|1+3*7
> last|col|is|xml|<<XML
> <some><xml/></some>
> XML
>
> This is looking pretty complicated so I wouldn't expect anyone to write a grammar for me. Rather, I'm hoping for guidance on what syntactic constructs I should look at or if I should even be using XText in the first place.
>
> The attached image shows my first attempt at an Eclipse plugin built from scratch. Would love to do this with XText as things should be much cleaner that way.
>
>

Re: XText grammar with variable token terminators & embedded grammars [message #1262251 is a reply to message #1261753]

Mon, 03 March 2014 07:21

Eclipse User

Thanks for your response. Up to now, I've been using nested, custom RuleBasedScanner implementations for tokenization. It's reasonably straightforward to divide document into records, then break records up into columns plus attendant HERE values and then deal with nested grammars as necessary. Unfortunately, all the back and forth evaluation is quite inefficient. Also, hand-spun approach forfeits all of the features that XText provides for free (e.g. content assist, code outline, etc).

It sounds like an XText solution may be marginally cleaner but that there will be some learning curve to deal with. I'll continue to evaluate my options of "build in XText", "build from scratch" and "pay someone else to do it." Your advice is a good start. For future consumers of this thread, I include an image clarifying how my original example should be parsed. If/When you end up supporting heredoc in Geppetto, I encourage you to resurrect thread with any new advice. If I'm able to devise an XText-based solution, I'll plan on doing the same.

index.php/fa/17621/0/

Attachment: parse.png
(Size: 21.37KB, Downloaded 576 times)

Previous Topic:	Is it possible to deliver a basic dsl model in the target platform?
Next Topic:	Maintaining Customized Xtext Grammar

Goto Forum:

-=] Back to Top [=-

Current Time: Thu Jul 10 04:24:23 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter