Eclipse Community Forums: TMF (Xtext) » The Story of cq-markup-xtext (pt. 1)

Home » Modeling » TMF (Xtext) » The Story of cq-markup-xtext (pt. 1)

The Story of cq-markup-xtext (pt. 1) [message #556596]

Wed, 01 September 2010 20:59

Eclipse User

Created cq-markup-xtext

As I planned to implement a markup language parser in Xtext
for practicing purposes on my way to Xtext wizardy, I'm glad
that I can motivate myself now not only by doing that, but
also by participating in a challenge:

http://www.codequarterly.com/code-challenges/markup/

I plan to work on this for about an hour a day and if it
fully works until the deadline (October 15th, 2010), I will
enter it as an entry to the challenge. I think it will be
quite interesting to see different implementations of such a
parser. Also, if I end up with a result I like, I will
change+extend the syntax to Markdown (my wikitext format of
choice) and maintain it as a Java Markdown parser bundle for
everyone.

Doing this I thought I'd write a bit about my progress,
things that worked out of the box, things that didn't work
quite right on the first time, things that I found too hard
or too easy, behavior that is beyond my understanding,
things that I believe need to be improved, things that are
fine as they are. I hope it's not boring but interesting. As
always, I'm very interested in hearing your feedback and
suggestions.

btw, the code is located here:

http://github.com/ralfebert/cq-markup-xtext

Thanks to Holger Schill from itemis for the very helpful
Xtext coaching!

http://github.com/ralfebert/cq-markup-xtext/commit/ddbc54f

Provided target platform
plugins/com.codequarterly.markup.releng/markup.target

Eclipse and EMF and Xtext and JDOM and Jakarta Commons are
needed for this and provided by the target platform in
com.codequarterly.markup.releng. If the target platform
doesn't work, reload it from the .target file, reload it
from the preferences dialog, restart Eclipse, rename the
.target file, use a fresh workspace, file a bug against p2,
don't ask me about it.

http://github.com/ralfebert/cq-markup-xtext/commit/4d87482

Created com.codequarterly.markup.tests bundle with test files

As a start, I slapped the test documents based on the parser
spec:

http://www.codequarterly.com/code-challenges/markup/markup-s pec.html

in a bundle.

http://github.com/ralfebert/cq-markup-xtext/commit/7029166

Created Xtext project com.codequarterly.markup

I stumbled upon the generation of the plugin.xml_gen and the
apparent requirement to resolve xml_gen -> xml changes
myself, as I changed the file extension after creating the
new plug-in. Maybe the .xml_gen should be generated only if
different and contain a comment about its purpose.

http://github.com/ralfebert/cq-markup-xtext/commit/80ebcde

MarkupTests comparing MarkupParser output with xml files

Wrote a parameterized test "MarkupTests" (did you know:

http://junit.sourceforge.net/javadoc/org/junit/runners/Param eterized.html
) that throws the .txt files to the Markup parser of my
dreams and compares the output with the .xml.

Great, this yields 33 failing tests, so there's quite some
work ahead of me.

http://github.com/ralfebert/cq-markup-xtext/commit/0c91cda

Empty document working, Markup/MarkupParser classes created

Let's see. Empty Document is return "". One green.

http://github.com/ralfebert/cq-markup-xtext/commit/9eecee5

Ignore whitespace when comparing markup output with example xmls

Done by formatting both documents from a DOM tree and
comparing the result. Being the nerd I am, I even asked if
this is allowed according to the challenge rules. I like
<tags /> a lot more than <tags/>.

http://github.com/ralfebert/cq-markup-xtext/commit/ac75ee6

02_simple_paragraph working

Next one is one paragraph. The most contemporary way to
convert 'hello' to
'<p>hello</p>' is an Xtext grammar, use MWE2 that generates
a lexer and a parser that will eat all these delicious CHAR
tokens and convert them to Text EObjects and finally
building together a MarkupDocument EObject which then is
carefully mangled to an JDOM XML document containing a <p>
Element, and beautifully written out as text using JDOM's
Format.getPrettyFormat().

Of course I need to pray three times in the direction of EMF
to get an URI for my file on the classpath to get my
ResourceFactoryImpl to get an resource to get my EObject,
but I consider this now fixed because soon I can just google
for 'pray EMF' to find this commit containing the example
snippet. Then, finally, I will never mistake
ResourceFactoryImpl.createResource for
ResourceSetImpl.getResource again.

http://github.com/ralfebert/cq-markup-xtext/commit/664e9bd

03_multiline_paragraph and 04_two_paragraphs working

I'm proud that I can still recite '(A (sep A)*)?' a week
after learning about this. Should I hit a party of parsers
one day, I might last a minute or two.

http://github.com/ralfebert/cq-markup-xtext/commit/eaf9406

Support headlines: 06_header

This is the moment where all the work to setup EMF and Xtext
pays off. Extending the generated switch class to do a model
tree walk outputting XML elements is just priceless and
beautiful. Thanks to Sven Efftinge for mentioning to me that
these classes make great interpreters, I'd never had
concluded this myself.

I wonder if I can make Xtext generate an EMF model which has
a common text attribute in the Part class and not individual
ones in Headline and Paragraph.

http://github.com/ralfebert/cq-markup-xtext/commit/628874e

Headline depth, 07_headers and 08_crazy_header

Long headlines, short headlines, crazy headlines, if we
don't have it, you don't want it!

http://github.com/ralfebert/cq-markup-xtext/commit/2fb4c64

support tagged_markup like \i{} and \u{}

I almost weeped when 23_tagged_markup went green (I skipped
the other tests for now because they need to parse the
indentation information and I'm not in the mood for that
now). I would not want to write such a conversion without
Xtext as of now.

I'm still a bit sad that I don't get why it makes little
parsers happy when you tell them

Text:
parts += (PlainText|FormattedText) (parts += FormattedText
parts += PlainText?)*;

instead of:

Text:
parts += TextPart+;

(PlainText and FormattedText being subclasses of TextPart).
Thanks to Sebastian Zarnekow for giving me the
above-mentioned solution to get a similar example working.

Also, the error message is not really helpful for humble
humans, telling me:

Decision can match input such as "RULE_CHAR" using multiple
alternatives: 1, 3 As a result, alternative(s) 3 were
disabled for that input

Isn't there a way Xtext could figure out what's meant by
TextPart+ automagically? Which doubt do I remove by stating
(A|B)(A B?)* instead of C+ when A and B are subclasses of C?

http://github.com/ralfebert/cq-markup-xtext/commit/ea1e81e

Re: The Story of cq-markup-xtext (pt. 1) [message #556732 is a reply to message #556596]

Thu, 02 September 2010 09:23

Eclipse User

The error about the decision directly comes from ANTLR, so it's basically Spolsky's Law of Leaky Abstractions playing up. To resolve these errors (which actually do prohibit your grammar from functioning correctly), I'm afraid you need some knowledge of ANTLR. Using AntlrWorks represents a useful middle ground, though. Xtext can essentially do nothing about this other than running ANTLR (against a generated .g grammar) all the time to validate it -this would be a severe performance problem.

I'm guessing (without actually trying to reproduce it) that the decision error prevented the TextPart+ from working. I don't see why the Xtext editor would flag it as an error.

Re: The Story of cq-markup-xtext (pt. 1) [message #556787 is a reply to message #556596]

Thu, 02 September 2010 11:19

Eclipse User

Hi Ralf,

good to hear that you still have a good time with Xtext (as I have with
git ;-))

Am 9/2/10 2:59 AM, schrieb Ralf Ebert:
> Created Xtext project com.codequarterly.markup
>
> I stumbled upon the generation of the plugin.xml_gen and the
> apparent requirement to resolve xml_gen -> xml changes
> myself, as I changed the file extension after creating the
> new plug-in. Maybe the .xml_gen should be generated only if
> different and contain a comment about its purpose.
>
> http://github.com/ralfebert/cq-markup-xtext/commit/80ebcde

I know you are not a bug fan of bugzilla, but it would be really nice to
see such enhancement requests there :-)

Cheers,
Sven

--
Need professional support for Xtext or other Eclipse Modeling technologies?
Go to: http://xtext.itemis.com
Twitter : @svenefftinge
Blog : http://blog.efftinge.de

Idea: Helper methods to load Xtext resources [message #557130 is a reply to message #556596]

Sat, 04 September 2010 15:17

Eclipse User

This is a multi-part message in MIME format.
--------------090208060203090306010704
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

> Of course I need to pray three times in the direction of EMF
> to get an URI for my file on the classpath to get my
> ResourceFactoryImpl to get an resource to get my EObject,

This was annoying me again, I was thinking about how this could be
simplified. As a start, I made my StandaloneSetup class a singleton and
added convenience methods to load resources by URL so that I can load a
resource and get the root model object using:

TodoList list =
TodosStandaloneSetup.get().load(SomeClass.class.getResource( "test.todos"));

Example implementation attached. This is especially helpful for tests, a
version with load(File/URI) might be helpful as well.

I deliberately didn't file a bug on this as of now because I wanted to
discuss ideas at first:

Do you think this might be worth generating code for this by default?

Is the StandaloneSetup class the best place for such a thing?
- Could this be helpful for non-standalone cases as well?
- Maybe StandaloneSetup could get the root model type as generic type

I changed the StandaloneSetup to do the initialization only once by
making it a singleton, are there cases where someone potentially want to
call createInjectorAndDoEMFRegistration() multiple times?

In my cases I only ever wanted to load one single resource in a new
ResourceSet, I wonder if a general solution should support more than that?

Ralf

--------------090208060203090306010704
Content-Type: text/x-java;
name="TodosStandaloneSetup.java"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename="TodosStandaloneSetup.java"

package org.eclipselabs.todotext;

import java.net.URL;

import org.eclipse.emf.common.util.EList;
import org.eclipse.emf.common.util.URI;
import org.eclipse.emf.ecore.EObject;
import org.eclipse.emf.ecore.resource.Resource;
import org.eclipse.emf.ecore.resource.impl.ResourceSetImpl;
import org.eclipselabs.todotext.todos.TodoList;

public class TodosStandaloneSetup extends TodosStandaloneSetupGenerated {

private final static TodosStandaloneSetup INSTANCE = new TodosStandaloneSetup();

private TodosStandaloneSetup() {
createInjectorAndDoEMFRegistration();
}

public static TodosStandaloneSetup get() {
return INSTANCE;
}

public TodoList load(URL url) {
if (url == null)
throw new RuntimeException("Resource URL cannot be null");
URI uri = URI.createURI(url.toExternalForm());
Resource resource = new ResourceSetImpl().getResource(uri, true);
if (!resource.getErrors().isEmpty()) {
throw new RuntimeException("Resource has errors: " + resource.getErrors());
}
EList<EObject> contents = resource.getContents();
if (contents.isEmpty()) {
throw new RuntimeException("Resource " + resource + " is empty.");
}
return (TodoList) contents.get(0);
}
}

--------------090208060203090306010704--

Re: Idea: Helper methods to load Xtext resources [message #557167 is a reply to message #557130]

Sun, 05 September 2010 12:13

Eclipse User

We have such convenience methods in our test utilities, which we plan to
polish and officially publish at sometime within this development stream.

Making StandaloneSetup a singleton is not something we would want to do.
That is because you should use guice to intialize your application and
therefore you never ever need to access the injector through static access.

I think the EMF Resource API isn't too inconvenient. It doesn't abstract
the three core concepts (Resource,ResourceSet and URI) away, which is
good. In inut tests you have to do a lot of calls to this API, therefore
it is nice to have convenience API.

Sven

Am 9/4/10 9:17 PM, schrieb Ralf Ebert:
>> Of course I need to pray three times in the direction of EMF
>> to get an URI for my file on the classpath to get my
>> ResourceFactoryImpl to get an resource to get my EObject,
>
> This was annoying me again, I was thinking about how this could be
> simplified. As a start, I made my StandaloneSetup class a singleton and
> added convenience methods to load resources by URL so that I can load a
> resource and get the root model object using:
>
> TodoList list =
> TodosStandaloneSetup.get().load(SomeClass.class.getResource( "test.todos"));
>
> Example implementation attached. This is especially helpful for tests, a
> version with load(File/URI) might be helpful as well.
>
> I deliberately didn't file a bug on this as of now because I wanted to
> discuss ideas at first:
>
> Do you think this might be worth generating code for this by default?
>
> Is the StandaloneSetup class the best place for such a thing?
> - Could this be helpful for non-standalone cases as well?
> - Maybe StandaloneSetup could get the root model type as generic type
>
> I changed the StandaloneSetup to do the initialization only once by
> making it a singleton, are there cases where someone potentially want to
> call createInjectorAndDoEMFRegistration() multiple times?
>
> In my cases I only ever wanted to load one single resource in a new
> ResourceSet, I wonder if a general solution should support more than that?
>
> Ralf

--
--
Need professional support for Xtext or other Eclipse Modeling technologies?
Go to: http://xtext.itemis.com
Twitter : @svenefftinge
Blog : http://blog.efftinge.de

Re: The Story of cq-markup-xtext (pt. 1) [message #558425 is a reply to message #556596]

Sun, 12 September 2010 14:28

Eclipse User

Hi Ralf,

(A|B)(A B?)* does not allow to write B B which is the doubt that the
parser has with (A|B)+ or C+ ;-)

Regards,
Sebastian
--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com

Am 02.09.10 02:59, schrieb Ralf Ebert:
> Which doubt do I remove by stating
> (A|B)(A B?)* instead of C+ when A and B are subclasses of C?

Re: The Story of cq-markup-xtext (pt. 1) [message #558432 is a reply to message #558425]

Sun, 12 September 2010 15:22

Eclipse User

Hi Sebastian,

> (A|B)(A B?)* does not allow to write B B which is the doubt that the
> parser has with (A|B)+ or C+ ;-)

thanks for clarifying this. Would it be possible in theory to remove
such doubt by saying 'C is greedy/it makes no sense to be followed by
elements of itself'?

Ralf

Previous Topic:	Access Model tree in a IEditorActionDelegate
Next Topic:	[Xtext] Defining Strings with included crosslinks

Goto Forum:

-=] Back to Top [=-

Current Time: Thu Jul 03 09:14:57 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter