Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » linking performance
linking performance [message #977605] Fri, 09 November 2012 12:19 Go to next message
Vlad Dumitrescu is currently offline Vlad Dumitrescu
Messages: 318
Registered: July 2009
Location: Gothenburg
Senior Member
Hi!

This question is related to my previous ones, regarding to performance when running the Xtext builder on files. I found that part of the problem is caused by the linking step that needs to do a lot of work. However, I can't see how to change the grammar so as to need a lighter linker. I hope someone can give me some pointers.

The language has symbols that can either just represent themselves or be references to named elements in the code. Originally I modeled all these symbols as potential links and used the linker to decide whether it's a real one or not. But that slows it down considerably.

I tried to specify the exact kind of reference that is expected, since in several places it is possible to know that statically and that resulted in a halving of the time the Xtext builder needed!

But of course the joy didn't last long, because in the general case of an expression, a symbolic value may be any of these references or a simple name and the parser can't discern which, because they all look alike. Of course, it has the info higher up in the stack, but this isn't a context-sensitive parser.

Is there any trick that I can use, or is my initial solution the only one that works? Or maybe I just misunderstood a lot of things about how to use Xtext?

Code example to understand the problem. A function call's target can be a general expression.

In the case where the reference can be resolved statically, we want two cross-references, one to the "mod" and one to the "fun".
mod.fun(1,mod,fun)

Here, ?MOD is a macro that is known at compile time. Here we want two cross-references from it, one to the macro definition and in case the value is a symbol, to that module.
?MOD.fun(1,?MOD,fun)

In this third case, Fun is a variable and we don't know its value. No cross-references.
mod.Fun(1,mod,Fun)

In the parameter lists, "mod" and "fun" are just names unrelated to the ones before. The macro and variable should reference their respective definitions.

best regards,
Vlad

Re: linking performance [message #977700 is a reply to message #977605] Fri, 09 November 2012 13:54 Go to previous messageGo to next message
Sebastian Zarnekow is currently offline Sebastian Zarnekow
Messages: 2832
Registered: July 2009
Senior Member
Vlad,

the described problems sound similar to the ones that were solved in
Geppetto (see on Github).
Without insight on the code and the grammar it's hard to tell where the
performance is lost.

Regards,
Sebastian
--
Looking for professional support for Xtext, Xtend or Eclipse Modeling?
Go visit: http://xtext.itemis.com

Am 09.11.12 13:19, schrieb Vlad Dumitrescu:
> Hi!
>
> This question is related to my previous ones, regarding to performance
> when running the Xtext builder on files. I found that part of the
> problem is caused by the linking step that needs to do a lot of work.
> However, I can't see how to change the grammar so as to need a lighter
> linker. I hope someone can give me some pointers.
>
> The language has symbols that can either just represent themselves or be
> references to named elements in the code. Originally I modeled all these
> symbols as potential links and used the linker to decide whether it's a
> real one or not. But that slows it down considerably.
> I tried to specify the exact kind of reference that is expected, since
> in several places it is possible to know that statically and that
> resulted in a halving of the time the Xtext builder needed!
> But of course the joy didn't last long, because in the general case of
> an expression, a symbolic value may be any of these references or a
> simple name and the parser can't discern which, because they all look
> alike. Of course, it has the info higher up in the stack, but this isn't
> a context-sensitive parser.
>
> Is there any trick that I can use, or is my initial solution the only
> one that works? Or maybe I just misunderstood a lot of things about how
> to use Xtext?
>
> Code example to understand the problem. A function call's target can be
> a general expression.
> In the case where the reference can be resolved statically, we want two
> cross-references, one to the "mod" and one to the "fun". mod.fun(1,mod,fun)
> Here, ?MOD is a macro that is known at compile time. Here we want two
> cross-references from it, one to the macro definition and in case the
> value is a symbol, to that module. ?MOD.fun(1,?MOD,fun)
> In this third case, Fun is a variable and we don't know its value. No
> cross-references. mod.Fun(1,mod,Fun)
> In the parameter lists, "mod" and "fun" are just names unrelated to the
> ones before. The macro and variable should reference their respective
> definitions.
>
> best regards,
> Vlad
>
>
Re: linking performance [message #977736 is a reply to message #977700] Fri, 09 November 2012 14:30 Go to previous messageGo to next message
Vlad Dumitrescu is currently offline Vlad Dumitrescu
Messages: 318
Registered: July 2009
Location: Gothenburg
Senior Member
Thank you Sebastian,

Geppetto used a separate ecore model and the Xtext grammar has no references defined. Would that help with the performance? I don't see any reason for the model generated by Xtext to be somehow "crippled".

Otherwise I'm basically doing the same thing as in geppetto. I will test a largish puppet project with and without linking, to see if the difference is about as large as for me.

regards,
Vlad
Re: linking performance [message #977752 is a reply to message #977736] Fri, 09 November 2012 14:44 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik Lindberg
Messages: 2500
Registered: July 2009
Senior Member
Linking performance was an issue in Geppetto earlier.
I am now building an index on last part of exported qualified names, and
then only scan those. IIRC, that had a 10x effect on linking performance.

- henrik

On 2012-09-11 15:30, Vlad Dumitrescu wrote:
> Thank you Sebastian,
>
> Geppetto used a separate ecore model and the Xtext grammar has no
> references defined. Would that help with the performance? I don't see
> any reason for the model generated by Xtext to be somehow "crippled".
>
> Otherwise I'm basically doing the same thing as in geppetto. I will test
> a largish puppet project with and without linking, to see if the
> difference is about as large as for me.
>
> regards,
> Vlad
>
Re: linking performance [message #979398 is a reply to message #977752] Sat, 10 November 2012 22:15 Go to previous messageGo to next message
Vlad Dumitrescu is currently offline Vlad Dumitrescu
Messages: 318
Registered: July 2009
Location: Gothenburg
Senior Member
Thank you Henrik! I didn't got that part of the indexing magic, I will look closer.

regards,
Vlad

Henrik Lindberg wrote on Fri, 09 November 2012 15:44
Linking performance was an issue in Geppetto earlier.
I am now building an index on last part of exported qualified names, and
then only scan those. IIRC, that had a 10x effect on linking performance.

- henrik

On 2012-09-11 15:30, Vlad Dumitrescu wrote:
> Thank you Sebastian,
>
> Geppetto used a separate ecore model and the Xtext grammar has no
> references defined. Would that help with the performance? I don't see
> any reason for the model generated by Xtext to be somehow "crippled".
>
> Otherwise I'm basically doing the same thing as in geppetto. I will test
> a largish puppet project with and without linking, to see if the
> difference is about as large as for me.
>
> regards,
> Vlad
>

Re: linking performance [message #980963 is a reply to message #979398] Mon, 12 November 2012 03:51 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik Lindberg
Messages: 2500
Registered: July 2009
Senior Member
There are a couple of complex things regarding the indexing; not in
concept, but because of the unorthodox ways things can be referenced in
the language (multiple named scopes floating around, and variables on
the "call path" are visible in older versions of the language).

The "finder" logic is the most horrible part in Geppetto, but before it
goes there, the actual indexing is straight forward. The idea is that
when linking, there is going to be many searches from the perspective of
one resource. While doing these lookups, if there is a need to lookup
something in the perspective of some other resource I don't bother to
index that search (this happens when A inherits B, and B in turn
inherits from some C that it sees).

I toyed with the idea to keep the indexes; basically one per possible
"perspective" (or container), but I never got that far; now I compute
them when first starting to link a resource.

The majority of my lookups are from some inner scope - i.e. starting
with a name like "a", then testing longer and longer names in outer
scopes. "A.a", "B.a", "A.A.a", etc. - i.e. lots of lookups among many
exported things. In my case it is very beneficial to lookup all that end
with the segment "a" first.

Hope that helps.
Regards
- henrik

On 2012-10-11 23:15, Vlad Dumitrescu wrote:
> Thank you Henrik! I didn't got that part of the indexing magic, I will
> look closer.
>
> regards,
> Vlad
>
> Henrik Lindberg wrote on Fri, 09 November 2012 15:44
>> Linking performance was an issue in Geppetto earlier.
>> I am now building an index on last part of exported qualified names,
>> and then only scan those. IIRC, that had a 10x effect on linking
>> performance.
>>
>> - henrik
>>
>> On 2012-09-11 15:30, Vlad Dumitrescu wrote:
>> > Thank you Sebastian,
>> >
>> > Geppetto used a separate ecore model and the Xtext grammar has no
>> > references defined. Would that help with the performance? I don't see
>> > any reason for the model generated by Xtext to be somehow "crippled".
>> >
>> > Otherwise I'm basically doing the same thing as in geppetto. I will
>> test
>> > a largish puppet project with and without linking, to see if the
>> > difference is about as large as for me.
>> >
>> > regards,
>> > Vlad
>> >
>
>
Re: linking performance [message #982697 is a reply to message #980963] Tue, 13 November 2012 10:47 Go to previous messageGo to next message
Vlad Dumitrescu is currently offline Vlad Dumitrescu
Messages: 318
Registered: July 2009
Location: Gothenburg
Senior Member
Thanks again Henrik, with your indications I read the code with different eyes and I had a couple of "aha!"-moments.

However, and unfortunately, this is not enough in my case. Geppetto has the same issues as I do with large enough and many enough files... I imported a project from the forge, copied and pasted a pp file's content until it got around 3000 lines long and then copied that file 200 times. Building that project crashes now when the memory isn't enough (I've given it 1500M), just like mine. Validating and linking each file takes around one minute, where a lot of the time is spent by the garbage collection.

I feel quite confident to say that the conclusion is that the current implementation of Xtext is not adapted to many and large files. The linking algorithm is (by nature?) O(n^2), it seems. The workaround that Sebastian suggested (just index the top-level constructs and navigate the model when needed to find things below) helps, but it's still a problem that all the work is done while locking the workspace (when doing a clean build) and users can't save files in the meanwhile.

I really hope that someone will tell me that I'm wrong or that it can be addressed without requiring a couple of man-years of effort.

best regards,
Vlad
Re: linking performance [message #982984 is a reply to message #982697] Tue, 13 November 2012 15:27 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik Lindberg
Messages: 2500
Registered: July 2009
Senior Member
Vlad, thanks for measuring; good to know. (Fortunately, the real Puppet
projects are *much* smaller).

Don't have any additional input on the real issues here though. Sorry.

- henrik

On 2012-13-11 11:47, Vlad Dumitrescu wrote:
> Thanks again Henrik, with your indications I read the code with
> different eyes and I had a couple of "aha!"-moments.
>
> However, and unfortunately, this is not enough in my case. Geppetto has
> the same issues as I do with large enough and many enough files... I
> imported a project from the forge, copied and pasted a pp file's content
> until it got around 3000 lines long and then copied that file 200 times.
> Building that project crashes now when the memory isn't enough (I've
> given it 1500M), just like mine. Validating and linking each file takes
> around one minute, where a lot of the time is spent by the garbage
> collection.
>
> I feel quite confident to say that the conclusion is that the current
> implementation of Xtext is not adapted to many and large files. The
> linking algorithm is (by nature?) O(n^2), it seems. The workaround that
> Sebastian suggested (just index the top-level constructs and navigate
> the model when needed to find things below) helps, but it's still a
> problem that all the work is done while locking the workspace (when
> doing a clean build) and users can't save files in the meanwhile.
>
> I really hope that someone will tell me that I'm wrong or that it can be
> addressed without requiring a couple of man-years of effort.
> best regards,
> Vlad
>
Re: linking performance [message #983096 is a reply to message #982697] Tue, 13 November 2012 17:12 Go to previous messageGo to next message
Ed Merks is currently offline Ed Merks
Messages: 26063
Registered: July 2009
Senior Member
Vlad,

Have you identified where all the time is being spent? Without a
profiling tool to help you understand where to focus your efforts, you
will never solve problems such as this one.


On 13/11/2012 11:47 AM, Vlad Dumitrescu wrote:
> Thanks again Henrik, with your indications I read the code with
> different eyes and I had a couple of "aha!"-moments.
>
> However, and unfortunately, this is not enough in my case. Geppetto
> has the same issues as I do with large enough and many enough files...
> I imported a project from the forge, copied and pasted a pp file's
> content until it got around 3000 lines long and then copied that file
> 200 times. Building that project crashes now when the memory isn't
> enough (I've given it 1500M), just like mine. Validating and linking
> each file takes around one minute, where a lot of the time is spent by
> the garbage collection.
>
> I feel quite confident to say that the conclusion is that the current
> implementation of Xtext is not adapted to many and large files. The
> linking algorithm is (by nature?) O(n^2), it seems. The workaround
> that Sebastian suggested (just index the top-level constructs and
> navigate the model when needed to find things below) helps, but it's
> still a problem that all the work is done while locking the workspace
> (when doing a clean build) and users can't save files in the meanwhile.
>
> I really hope that someone will tell me that I'm wrong or that it can
> be addressed without requiring a couple of man-years of effort.
> best regards,
> Vlad
>
Re: linking performance [message #983422 is a reply to message #983096] Tue, 13 November 2012 23:07 Go to previous messageGo to next message
Vlad Dumitrescu is currently offline Vlad Dumitrescu
Messages: 318
Registered: July 2009
Location: Gothenburg
Senior Member
Ed Merks wrote on Tue, 13 November 2012 18:12

Vlad,
Have you identified where all the time is being spent? Without a
profiling tool to help you understand where to focus your efforts, you
will never solve problems such as this one.
>


Hi Ed,

Yes, I measured. The parts that are in my code are only answering for 50% of the time; it's possible to optimize still, but as I mentioned before, even removing all cross-references altogether is quite slow. I will profile even that case, if it is of interest for you.

The time goes as expected partly in my ResourceDescriptionStrategy.createEObjectDescriptions and partly in LinkingService.getReferencedObjects. In the latter case, it's XtextResource.updateInternalState and AbstractParser.parse that are the worst offenders (together 26% of time).

CPU hotspot #1 is EcoreUtil2.getContainerOfType.

Memory is used by the nodemodel nodes, also as expected.

best regards,
Vlad
Re: linking performance [message #983757 is a reply to message #983422] Wed, 14 November 2012 05:48 Go to previous messageGo to next message
Ed Merks is currently offline Ed Merks
Messages: 26063
Registered: July 2009
Senior Member
Vlad,

Comments below.

On 14/11/2012 12:07 AM, Vlad Dumitrescu wrote:
> Ed Merks wrote on Tue, 13 November 2012 18:12
>> Vlad,
>> Have you identified where all the time is being spent? Without a
>> profiling tool to help you understand where to focus your efforts,
>> you will never solve problems such as this one.
>> >
>
>
> Hi Ed,
>
> Yes, I measured. The parts that are in my code are only answering for
> 50% of the time;
Even when there slow parts are in other code, if your code is on the
call stack, you can often improve things by reducing the number of calls
to the slow code...
> it's possible to optimize still, but as I mentioned before, even
> removing all cross-references altogether is quite slow. I will profile
> even that case, if it is of interest for you.
Yes.
>
> The time goes as expected partly in my
> ResourceDescriptionStrategy.createEObjectDescriptions and partly in
> LinkingService.getReferencedObjects. In the latter case, it's
> XtextResource.updateInternalState and AbstractParser.parse that are
> the worst offenders (together 26% of time).
>
> CPU hotspot #1 is EcoreUtil2.getContainerOfType.
Looking at the code, it would be better written like this:

@SuppressWarnings("unchecked")
public static <T extends EObject> T getContainerOfType(EObject ele,
Class<T> type) {
for (EObject e = ele; e != null; e = e.eContainer()) {
if (type.isAssignableFrom(e.getClass()))
return (T) ele;
}
return null;
}

I.e., the original code

@SuppressWarnings("unchecked")
public static <T extends EObject> T getContainerOfType(EObject ele,
Class<T> type) {
if (type.isAssignableFrom(ele.getClass()))
return (T) ele;
if (ele.eContainer() != null)
return getContainerOfType(ele.eContainer(), type);
return null;
}

calls eContainer twice on the same object, and uses a recursive call. I
expect the alternative form to be at least twice as fast.
>
> Memory is used by the nodemodel nodes, also as expected.
>
> best regards,
> Vlad
>
Re: linking performance [message #983935 is a reply to message #983757] Wed, 14 November 2012 09:05 Go to previous messageGo to next message
Vlad Dumitrescu is currently offline Vlad Dumitrescu
Messages: 318
Registered: July 2009
Location: Gothenburg
Senior Member
Hi Ed,

I ran the profiler with a grammar without any cross-references. There is only one thing that feels weird: ClusteringBuilderState#doUpdate calls #writeNewResourcesDescriptions which in turn parses the content of the resources, but a little bit later, when going through the queue, this line
LoadResult loadResult = loadOperation.next();
calls the parser again, when loading the resource. This looks like a bit wasteful to me, but I suppose it's the price for not being forced to keep all the parsed data in memory. Since the workspace is locked during this operation, I wonder if it would help to save the model in a file and read it from there in the later step...

I mentioned before that the time taken by the Xtext build (even when reduced as much as possible), is a problem mostly because the workspace is locked while it runs. Couldn't it be possible to run completely in the background, accepting that there might be phantom problems reported, but these should disappear when the next batch of workspace changes is processed. The builder has already a queue where changes are pushed. Could it run as a job that just watches the queue, while the builder proper feeds it with delta changes and empties it when a clean build is started?

best regards,
Vlad



Re: linking performance [message #987594 is a reply to message #983757] Tue, 27 November 2012 09:50 Go to previous messageGo to next message
Knut Wannheden is currently offline Knut Wannheden
Messages: 296
Registered: July 2009
Senior Member
Hi Ed,

On 11/14/12 6:48 AM, Ed Merks wrote:
>> CPU hotspot #1 is EcoreUtil2.getContainerOfType.
> Looking at the code, it would be better written like this:
>
> @SuppressWarnings("unchecked")
> public static <T extends EObject> T getContainerOfType(EObject ele,
> Class<T> type) {
> for (EObject e = ele; e != null; e = e.eContainer()) {
> if (type.isAssignableFrom(e.getClass()))
> return (T) ele;
> }
> return null;
> }
>
> I.e., the original code
>
> @SuppressWarnings("unchecked")
> public static <T extends EObject> T getContainerOfType(EObject ele,
> Class<T> type) {
> if (type.isAssignableFrom(ele.getClass()))
> return (T) ele;
> if (ele.eContainer() != null)
> return getContainerOfType(ele.eContainer(), type);
> return null;
> }
>
> calls eContainer twice on the same object, and uses a recursive call. I
> expect the alternative form to be at least twice as fast.
>>

Good point. I've adapted the method as you suggested.

Regards,

--knut
Re: linking performance [message #987626 is a reply to message #987594] Tue, 27 November 2012 11:05 Go to previous message
Sebastian Zarnekow is currently offline Sebastian Zarnekow
Messages: 2832
Registered: July 2009
Senior Member
Hi Knut,

thanks for applying the suggested change. Unfortunately it turned out
that the improvements are not too big since the JVM is quite good at
optimizing and JIT'ing. Nevertheless, the rewritten from is easier to
debug :-)

http://zarnekow.blogspot.de/2012/11/performance-is-not-obvious.html

Regards,
Sebastian
--
Looking for professional support for Xtext, Xtend or Eclipse Modeling?
Go visit: http://xtext.itemis.com

Am 27.11.12 10:50, schrieb Knut Wannheden:
> Hi Ed,
>
> On 11/14/12 6:48 AM, Ed Merks wrote:
>>> CPU hotspot #1 is EcoreUtil2.getContainerOfType.
>> Looking at the code, it would be better written like this:
>>
>> @SuppressWarnings("unchecked")
>> public static <T extends EObject> T getContainerOfType(EObject ele,
>> Class<T> type) {
>> for (EObject e = ele; e != null; e = e.eContainer()) {
>> if (type.isAssignableFrom(e.getClass()))
>> return (T) ele;
>> }
>> return null;
>> }
>>
>> I.e., the original code
>>
>> @SuppressWarnings("unchecked")
>> public static <T extends EObject> T getContainerOfType(EObject ele,
>> Class<T> type) {
>> if (type.isAssignableFrom(ele.getClass()))
>> return (T) ele;
>> if (ele.eContainer() != null)
>> return getContainerOfType(ele.eContainer(), type);
>> return null;
>> }
>>
>> calls eContainer twice on the same object, and uses a recursive call. I
>> expect the alternative form to be at least twice as fast.
>>>
>
> Good point. I've adapted the method as you suggested.
>
> Regards,
>
> --knut
Previous Topic:Strategies for using embedded editor
Next Topic:Introduce factory created EObjects
Goto Forum:
  


Current Time: Tue Sep 23 06:24:07 GMT 2014

Powered by FUDForum. Page generated in 0.06850 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software