Home » Modeling » EMF » [EMF] Retrieving URI fragments(Regarding performance of: resource.getURIFragment(eObject);)
[EMF] Retrieving URI fragments [message #1287214] |
Mon, 07 April 2014 14:39 |
Konstantinos Barmpis Messages: 12 Registered: July 2013 |
Junior Member |
|
|
I noticed that calling resource.getURIFragment(eObject); to a resource whose eObjects do not have specific URI fragments set (and are hence generated on the fly) takes an extremely long time to run, as opposed to calling it to a resource with URI fragments which have already been set in the persisted XMI beforehand.
I was wondering whether this is due to the lazy-loading of such fragments and the consequent repetitive iteration of the in-memory resource tree, to calculate them.
If so, is there a configuration option to auto-resolve these on resource loading so that when iterating through it they can be efficiently retrieved?
If not, is there any other way to speed up the retrieval of the URI fragments of all the eObjects in a resource at all (or are we limited to very bad performance in cases where the URI fragments are generated and not provided a priori)?
Thank you in advance for your time and help provided,
Yours faithfully,
Kostas
[Updated on: Mon, 07 April 2014 14:40] Report message to a moderator
|
|
|
Re: Retrieving URI fragments [message #1287235 is a reply to message #1287214] |
Mon, 07 April 2014 15:02 |
Ed Merks Messages: 33140 Registered: July 2009 |
Senior Member |
|
|
Konstantinos,
Comments below.
On 07/04/2014 4:39 PM, Konstantinos Barmpis wrote:
> I noticed that calling resource.getURIFragment(eObject); to a resource
> whose eObjects do not have specific URI fragments set (and are hence
> generated on the fly) takes an extremely long time to run, as opposed
> to calling it to a resource with URI fragments which have already been
> set in the persisted XMI beforehand.
Extremely long in some relative sense you mean. It's widely used and
has never appeared as a performance hotspot relative to any application
I've ever measured... And of course while a mapped looked is one of the
fastest "complex" things you can do in Java, the job of maintaining the
map is not free, but is distributed in other places you'd notice only if
you profiled the whole application... And of course there's the
consideration of memory footprint as well; maps are fast at the cost of
using up extremely much memory (compared to the zero memory used by
fragment paths)...
>
> I was wondering whether this is due to the lazy-loading of such fragments
You need only look at the implementation to see the differences.
> and the consequent repetitive iteration of the in-memory resource
> tree, to calculate them.
It's not clear specifically which scenario you're referring to. Fragment
computation only involves a traversal from the object being reference up
to the root, so given a tree size O(n) one generally expects the depth
of the tree to be O(log n), so yes, fragment computation is O(log n)
while mapped lookup is O(1).
> If so, is there a configuration option to auto-resolve these on
> resource loading so that when iterating through it they can be
> efficiently retrieved?
Now you're talking loading, but you mentioned getURIFragment which
happens during serialization.
> If not, is there any other way to speed up the retrieval of the URI
> fragments of all the eObjects in a resource at all (or are we limited
> to very bad performance in cases where the URI fragments are generated
> and not provided a priori)?
Are you asking the question backwards? Are you really asking about the
resolution of URI fragments? And in fact, is a URI fragment even
involved or are you looking up an ID without a map? For the latter case
(intrinsic ID lookup), it's very helpful, in your resource factory to
use
org.eclipse.emf.ecore.resource.impl.ResourceImpl.setIntrinsicIDToEObjectMap(Map<String,
EObject>) to assign a map. You should also consider using
org.eclipse.emf.ecore.xmi.XMLResource.OPTION_DEFER_IDREF_RESOLUTION in
combination with that approach, so that all ID resolution happens after
the resource is fully loaded. In that case, there will be a single
traversal of the resource, which populates the assigned intrinsic ID
map, and subsequent lookup will find things in that map, resulting in an
amortized lookup cost of O(1).
> Thank you in advance for your time and help provided,
>
> Yours faithfully,
>
> Kostas
Ed Merks
Professional Support: https://www.macromodeling.com/
|
|
| | | | | | | |
Re: Retrieving URI fragments [message #1287876 is a reply to message #1287346] |
Tue, 08 April 2014 04:57 |
Ed Merks Messages: 33140 Registered: July 2009 |
Senior Member |
|
|
Kostas,
Comments below.
On 07/04/2014 7:44 PM, Konstantinos Barmpis wrote:
> In the first case I assume
Why assume? You have the source code and can see how it's implemented.l
> the generation of the (URI fragment) IDs is done dynamically, when
> requested by resource.getURIFragment(eObject)
> as the eObjects do not have any ID in the XMI.
>
> In the second case I assume the generation of the IDs is done when
> persisting the model to XMI
No, typically it's done as the model objects are attached to the
resource (and of course during loading, it's done by loading what's in
the XMI).
> (as they are specifically set by the creator - or some
> parameter/configuration option passed to the Resource). If such
> creation is automated by EMF by passing such a parameter I would
> appreciate if I can be informed about it so that I can generate the
> XMI using it instead. If this is not possible then we are back at
> square 1 in my eyes.
There are org.eclipse.emf.ecore.xmi.impl.XMLResourceImpl.useUUIDs() and
org.eclipse.emf.ecore.xmi.impl.XMLResourceImpl.assignIDsWhileLoading()
which you can use to influence generating extrinsic IDs automatically
(or even to generate them when loading a resource that doesn't contain
them already).
>
> To clarify, the two models used in these examples are from different
> sources but both EMF-based XMI.
And the relative sizes of the files?
> In short what I'm trying to understand/improve is how the method call to
> resource.getURIFragment(eObject)
> is taking so much longer (per eObject call) in one case.
But you're doing so by making assumptions about the implementation and
without making measurements. That probably won't get you very far...
>
> I would be more than happy to provide this model (for the second
> (slow) use-case) if it would help (even though it is 300MB in size, I
> can point to the online resource used to generate it).
Which version of EMF are you using?
>
> Again thank you for your help,
>
> Yours sincerely,
>
> Kostas
Ed Merks
Professional Support: https://www.macromodeling.com/
|
|
|
Re: Retrieving URI fragments [message #1288172 is a reply to message #1287876] |
Tue, 08 April 2014 10:37 |
Konstantinos Barmpis Messages: 12 Registered: July 2013 |
Junior Member |
|
|
>>org.eclipse.emf.ecore.xmi.impl.XMLResourceImpl.assignIDsWhileLoading()
Something like this would probably be what would be relevant for ensuring generating the IDs on load (assuming this is the actual problem). Both these methods seem to be protected in the XMLResourceImpl class and already pre-set to True and False respectively (which would denote that the ids should have been assigned in the first place for any model loaded by the XML resource class and consequently by the XMI resource loader).
>>And the relative sizes of the files?
The sizes are both around 300MB but I am only measuring the time it takes to traverse 50,000 elements in each case (after the resource is in memory - memory being >10GB)
>>But you're doing so by making assumptions about the implementation and without making measurements.
I believe I have taken relevant measurements in each case I just didn't originally think that it would be beneficial to present a much longer discussion (or prepare further tests) for what I had originally (incorrectly) assumed to be a simple question. I will be performing a minimal case-study on this matter shortly.
>>version
EMF - Eclipse Modeling Framework SDK 2.7.2.v20120130-0943
Again I thank you for your time and effort,
Kostas
EDIT:
after debugging the source code of EMF ResourceImpl and XMLResourceImpl I can pinpoint the exact calls which take longer to execute:
if the XMI document has ids in it for each eobject, in XMLResourceImpl we get:
String id = getID(eObject);
if (id != null)
{
return id;
}
so it returns "immediately" causing the iteration of 50,000 elements to happen in ~ 500ms.
if the XMI does not have any ids in it the same code calls "super" which calls the following in ResourceImpl:
if (internalEObject.eDirectResource() == this || unloadingContents != null && unloadingContents.contains(internalEObject))
{
return "/" + getURIFragmentRootSegment(eObject);
}
which calls getURIFragmentRootSegment(eObject);
which consequently takes the majority of time needed for the overall iteration of the 50,000 elements as it requires a call to:
Integer.toString(contents.indexOf(eObject))
------------
My question would be:
since the id in the second case is not set, would there be no way to improve this process? (my initial thoughts would be that this cannot be improved as the fragment generation is, as you mentioned in one of your early posts, done as efficiently as possible and only one time for each element).
As such my thoughts are that the only way to improve this is during the XMI creation process by encouraging the creator of the XMI to create ids for the elements.
Would you agree with my thought process? if so then thank you again for your time and for helping me figure out what was happening behind the scenes.
If not I'm open to any more suggestions you may have.
Again, truly thank you for your time,
Kostas
[Updated on: Tue, 08 April 2014 12:47] Report message to a moderator
|
|
| |
Re: Retrieving URI fragments [message #1288406 is a reply to message #1288386] |
Tue, 08 April 2014 14:35 |
Konstantinos Barmpis Messages: 12 Registered: July 2013 |
Junior Member |
|
|
protected boolean useUUIDs()
{
return false;
}
yes that is False, assignIDsWhileLoading() is True, which is what I meant by "True and False respectively" (apologies for the vague statement). This is why my XMI file with ids loads almost instantly (as it effectively circumvents all the code traversing the resource to find the fragment (as it has an id already)).
>>A profiler... how that cost is broken down.
I have manually inserted System.nanotime() statements in all the methods I have mentioned, in my code and EMF's code to do this in a "quick and dirty" yet accurate manner.
>>What's the reason you're needing to do this in the first place?
I am indexing the model in a database so I need to store these in order to have a direct element link between the index and the XMI file.
>>I expect/hope that 2.9 performs better
I will manually update to the newer version (eclipse updater doesn't seem to want to do that automatically) and try it out.
Thanks again!
Kostas
EDIT:
using EMF version (2.9.2) is giving me similar results.
As an interesting observation though loading the 300MB resource takes ~52 seconds in 2.9 while it took ~22 seconds in 2.7 (mean over 10 tries, resetting jvm in each case).
[Updated on: Tue, 08 April 2014 15:40] Report message to a moderator
|
|
|
Re: Retrieving URI fragments [message #1288489 is a reply to message #1288406] |
Tue, 08 April 2014 16:01 |
Ed Merks Messages: 33140 Registered: July 2009 |
Senior Member |
|
|
Kostas,
Comments below.
On 08/04/2014 4:35 PM, Konstantinos Barmpis wrote:
> protected boolean useUUIDs()
> {
> return false;
> }
>
> yes that is False, assignIDsWhileLoading() is True, which is what I
> meant by "True and False respectively" (apologies for the vague
> statement). This is why my XMI file with ids loads almost instantly
> (as it effectively circumvents all the code traversing the resource to
> find the fragment (as it has an id already)).
But not because one is assigned... And you're back to talking about
loading again...
>
>>> A profiler... how that cost is broken down.
> I have manually inserted System.nanotime() statements in all the
> methods I have mentioned, in my code and EMF's code to do this in a
> "quick and dirty" yet accurate manner.
That generally doesn't work. The Javadoc says
* <p>This method provides nanosecond precision, but not necessarily
* nanosecond resolution (that is, how frequently the value changes)
* - no guarantees are made except that the resolution is at least as
* good as that of {@link #currentTimeMillis()}.
So while it's precise, it's not necessarily accurate.
>
>>> What's the reason you're needing to do this in the first place?
> I am indexing the model in a database so I need to store these in
> order to have a direct element link between the index and the XMI file.
It sounds like you'd be better off overriding useUUIDs to true.
>
>>> I expect/hope that 2.9 performs better
> I will manually update to the newer version (eclipse updater doesn't
> seem to want to do that automatically) and try it out.
If you want an index of fragment paths, you'd be better off to rely on
the mechanism by why they are computed. I.e., do a recursive traversal
of the containment hierarchy and rely on the fact that the fragment path
for the child is appended to the fragment path of the container via
@<feature-name>{.<index>} setps. This way you can compose all fragments
quickly with a single traversal of the tree, without recomputing the
path to the root from each node in the tree. I think your current
approach is O(n^2 log n) while a more direct approach could be O(n).
Keep in mind that even for a resource that supports IDs, fragment paths
will still be interpreted correctly.
But then, I'm not sure I understand the point of the index...
> Thanks again!
>
> Kostas
Ed Merks
Professional Support: https://www.macromodeling.com/
|
|
| |
Goto Forum:
Current Time: Thu Apr 25 13:11:55 GMT 2024
Powered by FUDForum. Page generated in 0.04758 seconds
|