Eclipse Community Forums: EMF

Home » Modeling » EMF » CDO performance

Fri, 29 January 2016 12:17

Eclipse User

Hello,

I have a few questions regarding the performance (speed and memory) of CDO.
I will describe our performance tests and results. I cannot interpret if the results are acceptable or not in relation to the size of the test models.

The model has just containment references.
We use two processes one for the CDO server and one for the CDO client each with 4GB of memory.
In our first commit we check in a model containing about 560.000 instances with a coarse estimated memory footprint of 120MB (the size of the XML representation).
We estimate that we will have to handle models with about 10.000.000 instances (about 2GB).
In subsequent commits
* we load the CDOResource (resource.getContents().get(0))
* clone a part of the model (about 300.000 instances, let's say 100MB but it should be less).
* add the cloned part to the model
* commit the resource

The results vary, but some trend can be noticed.
Below the results from one arbitrary run.
parse XML file into EMF model 8s
initial commit (into empty repo) 50s
server uses 1.5GB
client uses 1.5GB

1st iteration
load resource 12s
commit 28s
server uses 2.2GB
client uses 2.6GB

2nd iteration
load resource 11s
commit 31s
server uses 2.5GB
client uses 3.1GB

3rd iteration
load resource 22s
commit 19s
server uses 2.8GB
client uses 3.7GB

4th iteration
load resource 25s
commit 41s
server uses 2.7GB
client uses 3.7GB

the H2 database file has a size of: 477MB.

What we noticed is:
* at about the 6th or 7th iteration we run into an OutOfMemoryError.
* the model after 4 iterations shouldn't need more than 600MB, however both the client and the server need a lot more memory.
* the time needed to load the CDOResource isn't linearly proportional to the model size.
* the time needed to commit the transaction isn't linearly proportional to the model size or model change.
* the memory footprint of the server and client increases even if after the initial commit the changes are not additive (e.g. we change values of attributes). So in the longer run they will run out of memory too. The time to commit the transaction increases as well although the number of changes and the model size remains the same.

What possibilities are there to handle large models?

You can find the source code here:
https://bitbucket.org/cdlflex/cdlflex-cdo-performance/
See README on how to execute the tests.

Re: CDO performance [message #1722163 is a reply to message #1721747]

Wed, 03 February 2016 02:37

Eclipse User

Hi Stefan,

Comments below...

Cheers
/Eike

----
http://www.esc-net.de
http://thegordian.blogspot.com
http://twitter.com/eikestepper

Am 29.01.2016 um 18:17 schrieb Stefan Scheiber:
> Hello,
>
> I have a few questions regarding the performance (speed and memory) of CDO.
> I will describe our performance tests and results. I cannot interpret if the results are acceptable or not in relation
> to the size of the test models.
>
> The model has just containment references.
> We use two processes one for the CDO server
Are you using auditing or branching? What's your mapping strategy? Any other special server configurations?

> and one for the CDO client
Is your model generated natively for CDO? No legacy model parts?

> each with 4GB of memory.
Physical RAM or -Xmx4G?

> In our first commit we check in a model containing about 560.000 instances with a coarse estimated memory footprint of
> 120MB (the size of the XML representation). We estimate that we will have to handle models with about 10.000.000
> instances (about 2GB).
The size of a CDOObject is larger than the size of an EObject. That's the price for being able to work with arbitrarily
large models (if you do it right, i.e., don't hold on too many objects). The size of a single commit is restricted by
the heap size, though.

You may find this artucle interesting: http://thegordian.blogspot.de/2008/11/how-scalable-are-my-models.html

> In subsequent commits * we load the CDOResource (resource.getContents().get(0))
> * clone a part of the model (about 300.000 instances, let's say 100MB but it should be less).
> * add the cloned part to the model
> * commit the resource
>
> The results vary, but some trend can be noticed. Below the results from one arbitrary run. parse XML file into EMF
> model 8s
> initial commit (into empty repo) 50s
> server uses 1.5GB
> client uses 1.5GB
> 1st iteration load resource 12s
> commit 28s
> server uses 2.2GB client uses 2.6GB
>
> 2nd iteration load resource 11s
> commit 31s
> server uses 2.5GB client uses 3.1GB
>
> 3rd iteration load resource 22s
> commit 19s
> server uses 2.8GB
> client uses 3.7GB
>
> 4th iteration load resource 25s
> commit 41s
> server uses 2.7GB
> client uses 3.7GB
>
> the H2 database file has a size of: 477MB.
>
> What we noticed is:
> * at about the 6th or 7th iteration we run into an OutOfMemoryError.
Where? On the server or the client? Where's the stacktrace?

> * the model after 4 iterations shouldn't need more than 600MB,
How do you conclude this? From the XML file size?

> however both the client and the server need a lot more memory.
Both the client and the server use memory sensitive caches for various things, i.e., that memory is released when it's
needed for other things.

> * the time needed to load the CDOResource isn't linearly proportional to the model size.
For several reasons (caching, DB characteristics, etc...) I wouldn't expect it to be strictly linear.

More importantly, if you know that your application needs to load entire subtrees you should use prefetching, wich
benefits largely from caches. Server-side cache warming can also help a lot.

In addition I'm currently working on a brand new mechansim to define and load "units" (arbitrary sub trees of the
model). With this mechansim we can reduce the load time of, e.g., a "Project" with 300,000 objects from 5 minutes to 7
seconds. See https://bugs.eclipse.org/bugs/show_bug.cgi?id=486458

> * the time needed to commit the transaction isn't linearly proportional to the model size or model change.
I wouldn't expect it to be linear (see above).

> * the memory footprint of the server and client increases even if after the initial commit the changes are not
> additive (e.g. we change values of attributes). So in the longer run they will run out of memory too.
OutOfMemoryErrors are normally the result of some mistake in the application. Maybe the answers to my questions above
shed some light on this.

> The time to commit the transaction increases as well although the number of changes and the model size remains the same.
Updating database index, for example, takes longer for bigger tables. Caches add some degree of uncertainty on top.

> What possibilities are there to handle large models?
>
> You can find the source code here: https://bitbucket.org/cdlflex/cdlflex-cdo-performance/
> See README on how to execute the tests.
>

Re: CDO performance [message #1722631 is a reply to message #1722163]

Sun, 07 February 2016 07:23

Eclipse User

Thanks for your reply.

The genmodel dindn't generate native CDO Objects. I have changed that (thanks for the hint) and this improved things a little bit. The 4GB refers to -Xmx memory. The PC where we tested has 8GB of memory.

I changed the test case a bit in order to reduce the effects of the testcase itself on the memory consumption. (the OutOfMemoryError occured while i tried to clone a part of the model using EcoreUtil.copy)

So the new testcase loooks like this
* parse the model file into CDO Objects (562.521 instances).
* commit the model
* do 20 times
** for a subelement in the model change all attribute values ( i use cdoprefetch to load the subtree)
** commit

So in my opinion the size of the model remains the same (as no new elements will be added) and in each commit the same number of changes are present (i.e. the number of changed attributes). However, the memory used by both client and server increases over time. When it gets near max (4GB), and it gets there pretty soon (after 10-15 iterations) the performance drops drastically.

I used JavaMissionControl to look into whats eating up all that memory. (see uploaded files)
I can't quite interpret the results but I think that on the server side a lot of memory is needed by the H2 driver (i have then changed to a Postgresql database but the performance was even worse)
On the client side the most memory is used by CDOListWithElementProxiesImpl (about 7 million instances) and Object arrays (about 8 million instances).

What creates the CDOLists and why are there so many compared to the number of model objects (about 500.000). Is there a way to tweak how the caches work or what other options do I have?

thanks,
stefan

Attachment: flight_recording_cdo_client_updatenative.jfr
(Size: 4.24MB, Downloaded 172 times)
Attachment: flight_recording_cdo_server_updatenative.jfr
(Size: 6.51MB, Downloaded 160 times)

Previous Topic:	Invalid feature when setting EObject's reference value
Next Topic:	Tycho/Maven XCore code generation

Goto Forum:

-=] Back to Top [=-

Current Time: Sun Nov 09 17:48:00 EST 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter