Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » EMF » Serialisation of EMF-Model to XML file takes several hours
Serialisation of EMF-Model to XML file takes several hours [message #1691498] Tue, 07 April 2015 16:18 Go to next message
Thomas Zwickl is currently offline Thomas ZwicklFriend
Messages: 37
Registered: May 2014
Member
I have the following problem that when I have a very big EMF Model (>1G on heap) to serialise to a XML file it takes several hours. I've no idea if I'm doing something wrong that causes that long delay or if this is common to take that long. We have a lot of lists in the model but otherwise there are just a lot of objects which are graph nodes with a very long UUID and a few parameters which are mostly integers and further string values like names and so on.

That's an excerpt of my serialisation routine of my EMF model:

// Register the XMI resource factory
Resource.Factory.Registry reg = Resource.Factory.Registry.INSTANCE;
reg.getExtensionToFactoryMap().put(uri.fileExtension(), new XMIResourceFactoryImpl());

// Obtain a new resource set
ResourceSet resSet = new ResourceSetImpl();

// create a resource
Resource resource = resSet.createResource(uri);

// get resource content
EList<EObject> resourceContent = resource.getContents();

resourceContent.add(objectsToAdd);

// save to file
resource.save(ResourceAdder.createOptions());



That's how my options look like:

public static Map<?, ?> createOptions() {
HashMap<String, Object> options = new HashMap<String, Object>();
  options.put(XMLResource.OPTION_ENCODING, "UTF-8"); //$NON-NLS-1$
  options.put(XMLResource.OPTION_CONFIGURATION_CACHE, Boolean.TRUE);
  options.put(Resource.OPTION_SAVE_ONLY_IF_CHANGED, Resource.OPTION_SAVE_ONLY_IF_CHANGED_MEMORY_BUFFER);
  return options;
}


So my question is if it is common to take that long to serialise a large EMF-Model? What do you suggest I could do to reduce the amount of time it takes to serialise the model. I already considered using Teneo and serialise the entire EMF-Model to a local Derby database but I haven't tested it yet if it would improve the runtime. Thanks for any pointers or suggestions you can provide.

[Updated on: Tue, 07 April 2015 16:19]

Report message to a moderator

Re: Serialisation of EMF-Model to XML file takes several hours [message #1691545 is a reply to message #1691498] Wed, 08 April 2015 04:43 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 30550
Registered: July 2009
Senior Member
Thomas,

Comments below.


On 07/04/2015 6:18 PM, Thomas Zwickl wrote:
> I have the following problem that when I have a very big EMF Model
> (>1G on heap) to serialise to a XML file it takes several hours. I've
> no idea if I'm doing something wrong that causes that long delay or if
> this is common to take that long.
How big is the file? I've never seen an example that takes so long.
Most options are focused on making loading faster which generally takes
several times longer than serializing. Perhaps you just need a bigger
heap; sometimes a JVM can just thrash when it's close to running out of
heap space... Otherwise, best you measure with a profiler; my suspicion
is something that's been had code in the model that's doing expensive
computation, but otherwise I have no idea. Serialization performance
has not generally been a concern...
> We have a lot of lists in the model but otherwise there are just a lot
> of objects which are graph nodes with just a very long UUID and a few
> parameters which are mostly integers and further string values like
> names and so on.
>
> That's an excerpt of my serialisation routine of my EMF model:
>
> // Register the XMI resource factory
> Resource.Factory.Registry reg = Resource.Factory.Registry.INSTANCE;
> reg.getExtensionToFactoryMap().put(uri.fileExtension(), new
> XMIResourceFactoryImpl());
>
> // Obtain a new resource set
> ResourceSet resSet = new ResourceSetImpl();
>
> // create a resource
> Resource resource = resSet.createResource(uri);
>
> // get resource content
> EList<EObject> resourceContent = resource.getContents();
>
> resourceContent.add(objectsToAdd);
>
> // save to file
> resource.save(ResourceAdder.createOptions());
>
>
> That's how my options look like:
>
> public static Map<?, ?> createOptions() {
> HashMap<String, Object> options = new HashMap<String, Object>();
> options.put(XMLResource.OPTION_ENCODING, "UTF-8"); //$NON-NLS-1$
> options.put(XMLResource.OPTION_CONFIGURATION_CACHE, Boolean.TRUE);
> options.put(Resource.OPTION_SAVE_ONLY_IF_CHANGED,
> Resource.OPTION_SAVE_ONLY_IF_CHANGED_MEMORY_BUFFER);
> return options;
> }
>
> So my question is if it is common to take that long to serialise a
> large EMF-Model? What do you suggest I could do to reduce the amount
> of time it takes to serialise the model. I already considered using
> Teneo and serialise the entire EMF-Model to a local Derby database but
> I haven't tested it yet if it would improve the runtime. Thanks for
> any pointers or suggestions you can provide.
Re: Serialisation of EMF-Model to XML file takes several hours [message #1691637 is a reply to message #1691545] Wed, 08 April 2015 16:30 Go to previous messageGo to next message
Thomas Zwickl is currently offline Thomas ZwicklFriend
Messages: 37
Registered: May 2014
Member
Hi Merks,

thanks for the quick reply. I have started an analysis today with VisualVM to monitor the heap size and the garbage collector activity.
This was only a small graph but still took several minutes to serialise. The size of all XML-Files together is 250 MB.

http://s1.postimg.org/eaaa4tdxr/Heap_Analysis.png

I have set my maximal heap size of my JVM to 4G ...
What do you mean by code in the model which is doing complex computations? We let all the code generate automatically and never add any logic to the generated source files. All our logic is in a different bundle.

Best regards,
Thomas
Re: Serialisation of EMF-Model to XML file takes several hours [message #1691685 is a reply to message #1691637] Thu, 09 April 2015 04:37 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 30550
Registered: July 2009
Senior Member
Thomas,

Comments below.

On 08/04/2015 6:30 PM, Thomas Zwickl wrote:
> Hi Merks,
>
> thanks for the quick reply. I have started an analysis today with
> VisualVM to monitor the heap size and the garbage collector activity.
> This was only a small graph but still took several minutes to
> serialise. The size of all XML-Files together is 250 MB.
So there are multiple resources involved? 250MB doesn't sound small to me.
>
>
>
> I have set my maximal heap size of my JVM to 4G ...
> What do you mean by code in the model which is doing complex
> computations? We let all the code generate automatically and never add
> any logic to the generated source files.
You say that as if it should be obvious. But more often than not,
problems are caused by what people have hand coded...
> All our logic is in a different bundle.
You can't address performance problems without a profiler. It's just
impossible. And I can't guess at what might be going on from what
you've told me. Your subject line suggests a single file is involved,
but in this post you mention several. You say it's a small graph, and
takes several minutes, but 250MB is not something I would consider
small, and I have no idea what would be big to you. But if small
produces 250MB in a few minutes, then you must be talking about very
many GB in the course of hours of serialization. It's hard to imagine
that much data as XML or how long it would take to deserialize; I
mentioned that deserialization is generally significantly slower
(several times slower) than serialization...
>
> Best regards,
> Thomas
Re: Serialisation of EMF-Model to XML file takes several hours [message #1691754 is a reply to message #1691685] Thu, 09 April 2015 13:44 Go to previous messageGo to next message
Thomas Zwickl is currently offline Thomas ZwicklFriend
Messages: 37
Registered: May 2014
Member
Hi Merks,

Quote:
So there are multiple resources involved? 250MB doesn't sound small to me.

Yes I logically split the model up into several XML-Files because we have implemented some kind of lazy loading where we only load the XML-File we currently need and unload it if we no longer need it.
But I have already put everything in a single file and got almost the same results. The size of the XML-File was a bit less, around 180MB what isn't surprising because there are less references in the XML-Files, but it took the same amount of time to serialise ...

Quote:
You say that as if it should be obvious. But more often than not,
problems are caused by what people have hand coded...

Ohh ok I wasn't aware of that Very Happy I always thought why should someone want to change a code which is generated automatically Wink

Quote:
You say it's a small graph, and
takes several minutes, but 250MB is not something I would consider
small, and I have no idea what would be big to you. But if small
produces 250MB in a few minutes, then you must be talking about very
many GB in the course of hours of serialization.

Yes we have UML graphs with different sizes and this one is one of the smallest. Yeah the biggest one produces around 1.5G on heap, but I never managed to serialise this one because it seems, as strange as it sound, that the time needed for serialisation growth exponentially with the size. Because I would assume when it takes around 8 mins to serialise 250MB that it should, logically take around 38 mins to serialise 1.5G but this isn't the case ... I already waited for at least 2 hours before I canceled it and it still wasn't finished ...

Quote:
It's hard to imagine
that much data as XML or how long it would take to deserialize; I
mentioned that deserialization is generally significantly slower
(several times slower) than serialization...

And here comes the for me strangest part, because I assumed the same thing that if it takes 8 mins to serialise the graph it would at least take 10 mins and more to deserialise it, but this isn't the case Very Happy Deserialisation of the entire 250MB file takes only around 20 secs ...

Regards,
Thomas

[Updated on: Thu, 09 April 2015 13:59]

Report message to a moderator

Re: Serialisation of EMF-Model to XML file takes several hours [message #1691778 is a reply to message #1691754] Thu, 09 April 2015 15:11 Go to previous messageGo to next message
Ed Merks is currently offline Ed MerksFriend
Messages: 30550
Registered: July 2009
Senior Member
Thomas,

Comments below.


On 09/04/2015 3:44 PM, Thomas Zwickl wrote:
> Hi Merks,
>
> Quote:
>> So there are multiple resources involved? 250MB doesn't sound small
>> to me.
>
> Yes I logically split the model up into several XML-Files because we
> have implemented some kind of lazy loading where we only load the
> XML-File we currently need and unload it if we no longer need it. But
> I have already put everything in a single file and got almost the same
> results. The size of the XML-File was a bit less, around 180MB what
> isn't surprising because there are less references in the XML-Files,
> but it took the same amount of time to serialise ...
I see.
>
> Quote:
>> You say that as if it should be obvious. But more often than not,
>> problems are caused by what people have hand coded...
>
> Ohh ok I wasn't aware of that :d I always thought why should someone
> want to change a code which is generated automatically ;)
:-)
>
> Quote:
>> You say it's a small graph, and takes several minutes, but 250MB is
>> not something I would consider small, and I have no idea what would
>> be big to you. But if small produces 250MB in a few minutes, then you
>> must be talking about very many GB in the course of hours of
>> serialization.
>
> Yes we have UML graphs with different sizes and this one is one of the
> smallest. Yeah the biggest one produces around 1.5G on heap, but I
> never managed to serialise this one because it seems, as strange as it
> sound, that the time needed for serialisation growth exponentially
> with the size.
That's hard to imagine...
> Because I would assume when it takes around 8 mins to serialise 250MB
> that it should, logically take around 38 mins to serialise 1.5G but
> this isn't the case ...
Yes, that would seem a reasonable assumption. I'd expect it to grow
linearly, although tree depth would affect the computation of fragment
paths (for references) but that's generally O(log n), but my impression
is that you use IDs.
> I already waited for at least 2 hours before I canceled it and it
> still wasn't finished ...
>
> Quote:
>> It's hard to imagine that much data as XML or how long it would take
>> to deserialize; I mentioned that deserialization is generally
>> significantly slower (several times slower) than serialization...
>
> And there comes the for me strangest part, because I assumed the same
> thing that if it takes 8 mins to serialise the graph it would at least
> take 10 mins and more to deserialise it, but this isn't the case :d
> Deserialisation of the entire 250MB file takes only around 20 secs ...
There's definitely something pathological going on, but I'd need to
measure it.
>
> Regards,
> Thomas
Re: Serialisation of EMF-Model to XML file takes several hours [message #1691781 is a reply to message #1691778] Thu, 09 April 2015 15:24 Go to previous messageGo to next message
Thomas Zwickl is currently offline Thomas ZwicklFriend
Messages: 37
Registered: May 2014
Member
OK I've now changed my serialisation to binary and it goes now a bit faster around 2 mins but still too long for my taste ... The file size of course dropped significantly to 39MB ...

Quote:
Yes, that would seem a reasonable assumption. I'd expect it to grow
linearly, although tree depth would affect the computation of fragment
paths (for references) but that's generally O(log n), but my impression
is that you use IDs.

What do you mean we use IDs?

There's definitely something pathological going on, but I'd need to measure it.

What measurements would you propose I can do to get to the bottom of this problem?

CPU Times during serialisation:
http://s3.postimg.org/t5z6puiur/CPUProfiling.png

Regards,
Thomas

[Updated on: Thu, 09 April 2015 15:36]

Report message to a moderator

Re: Serialisation of EMF-Model to XML file takes several hours [message #1691789 is a reply to message #1691781] Thu, 09 April 2015 16:01 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 6481
Registered: July 2009
Senior Member
Hi

You want to identify what is happenng quadratically, so start by
counting lines in your file, that tells you the number of elements,
roughly. Then put counters in routines that might be suspect and see
what is dramatically disproportionate.

EcoreUtil.generateUUID
EList construction
EcoreUtil.resolveProxy

and always a good one: use Wireshark to check that you are not making
any internet accesses. I think the bug on OMG namespaces being accessed
unnecessarily is still open.

Regards

Ed Willink


On 09/04/2015 16:24, Thomas Zwickl wrote:
> OK I've now changed my serialisation to binary and it goes now a bit
> faster around 2 mins but still too long for my taste ... The file size
> of course dropped noticeably to 39MB ...
>
> There's definitely something pathological going on, but I'd need to
> measure it.
> What measurements would you propose I can do to get to the bottom of
> this problem?
>
> Regards,
> Thomas
Re: Serialisation of EMF-Model to XML file takes several hours [message #1691830 is a reply to message #1691789] Fri, 10 April 2015 04:16 Go to previous message
Ed Merks is currently offline Ed MerksFriend
Messages: 30550
Registered: July 2009
Senior Member
Ed,

If there are extrinsic IDs involved, those are generally assigned as the
object is added to the resource or to a descendant of a resource, i.e.,
during construction or potentially while loading but not while saving.
The serializer generally uses eIsSet, so shouldn't generally not create
all ELists of all multi-valued features. Nor should the serializer
generally resolve proxies but rather it should reuse the URI in the
proxy; hence there is not expected to be any resource loading or web
access while saving.

Nothing replaces a profiler for answering the question "Why is it so slow?"


On 09/04/2015 6:01 PM, Ed Willink wrote:
> Hi
>
> You want to identify what is happenng quadratically, so start by
> counting lines in your file, that tells you the number of elements,
> roughly. Then put counters in routines that might be suspect and see
> what is dramatically disproportionate.
>
> EcoreUtil.generateUUID
> EList construction
> EcoreUtil.resolveProxy
>
> and always a good one: use Wireshark to check that you are not making
> any internet accesses. I think the bug on OMG namespaces being
> accessed unnecessarily is still open.
>
> Regards
>
> Ed Willink
>
>
> On 09/04/2015 16:24, Thomas Zwickl wrote:
>> OK I've now changed my serialisation to binary and it goes now a bit
>> faster around 2 mins but still too long for my taste ... The file
>> size of course dropped noticeably to 39MB ...
>>
>> There's definitely something pathological going on, but I'd need to
>> measure it.
>> What measurements would you propose I can do to get to the bottom of
>> this problem?
>>
>> Regards,
>> Thomas
>
Previous Topic:Extending a metamodel
Next Topic:Ecore model generator - use tab or white spaces for indentation
Goto Forum:
  


Current Time: Wed Oct 16 19:33:37 GMT 2019

Powered by FUDForum. Page generated in 0.02285 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top