Eclipse Community Forums: EMF » EList performance issues

Help

Home

Home » Modeling » EMF » EList performance issues

Show: Today's Messages :: Show Polls :: Message Navigator

EList performance issues [message #425717]

Fri, 05 December 2008 13:42

Sebastien Gemme

Messages: 4
Registered: July 2009

Junior Member

I'm a happy EMF user for the last few years. I'm using EMF for all my
project now.

I am now facing some issues with the EList generated using the genmodel:

He is my problem, I am trying to work with point clouds, i.e. xyz
coordinates data coming from a laser scanner we are using on our Mars rover.

These point clouds can be pretty big, a few 100 000 points or more.

- Even when specifying 'is unique' to false in the ecore, the
'isUnique()' method of the generatee EList still returns true. Everytime
I call add or addAll methods on the EList, the isUnique() returns true
making a search in the whole list every time I insert a new item. As
soon as I hit more than 10 000 points, the time dramatically increases.
- Looks like the EList is backed with an array, so every time I insert a
new item:
- a new array of 'size+1' is allocated
- the data from the previous array is copied into this grown array
using System.arraycopy
- the item I want to insert is inserted at the end of this grown array.
When the list is small this works fine but as soon as you hit a few
1000s elements, the time also increases dramatically.
- I would have expected to see a linked list approach on this one.

I would like to know if there is a way to make the EList super fast for
applications using larger datasets.

I wanted to show those c++ developers that Java can be as fast a c++ but
so far: they are not impressed !!!

Thanks for taking time to look at my questions,

Sebastien Gemme
Canadian Space Agency

Report message to a moderator

Re: EList performance issues [message #425718 is a reply to message #425717]

Fri, 05 December 2008 13:58

Ed Merks

Messages: 33142
Registered: July 2009

Senior Member

This is a multi-part message in MIME format.
--------------060403040306060401030709
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Sebastien,

Comments below.

Sebastien Gemme wrote:
> I'm a happy EMF user for the last few years. I'm using EMF for all my
> project now.
>
> I am now facing some issues with the EList generated using the genmodel:
>
> He is my problem, I am trying to work with point clouds, i.e. xyz
> coordinates data coming from a laser scanner we are using on our Mars
> rover.
>
> These point clouds can be pretty big, a few 100 000 points or more.
>
> - Even when specifying 'is unique' to false in the ecore, the
> 'isUnique()' method of the generatee EList still returns true.
> Everytime I call add or addAll methods on the EList, the isUnique()
> returns true making a search in the whole list every time I insert a
> new item. As soon as I hit more than 10 000 points, the time
> dramatically increases.
Probably modeling a point as an EClass is not a great approach. EObject
have significant overhead so I suspect you'd be better to define an
EDataType to wrap a more compact Point representations.
> - Looks like the EList is backed with an array, so every time I insert
> a new item:
> - a new array of 'size+1' is allocated
No, the size of the array grows by roughly 50%.
> - the data from the previous array is copied into this grown array
> using System.arraycopy
> - the item I want to insert is inserted at the end of this grown
> array.
> When the list is small this works fine but as soon as you hit a few
> 1000s elements, the time also increases dramatically.
I think you're making assumptions.
> - I would have expected to see a linked list approach on this one.
Linked lists have bad footprint issues though. Each entry has like 20
bytes of overhead. And your EObjects are taking far more space that the
minimum you'd get for a class with three floats.
>
> I would like to know if there is a way to make the EList super fast
> for applications using larger datasets.
Probably the majority of your issue comes from the duplicate checking
that all EObject-containing lists impose

https://bugs.eclipse.org/bugs/show_bug.cgi?id=89325

>
> I wanted to show those c++ developers that Java can be as fast a c++
> but so far: they are not impressed !!!
So advice one is that I'm sure you'll be happier in the longer term if
you don't make points be EObjects. Probably points should be immutable
instances that you wrap as an EDataType. When you're adding to a list
and you know the things you are adding are unique, you can cast to
InternalEList and call addUnique to bypass the uniqueness checking;
that's probably the majority of your performance issue.

Is your list a containment list?
>
> Thanks for taking time to look at my questions,
>
> Sebastien Gemme
> Canadian Space Agency

--------------060403040306060401030709
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Sebastien, 
 
Comments below. 
 
Sebastien Gemme wrote:
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite">I'm a
happy EMF user for the last few years. I'm using EMF for all my project
now.
 
 
I am now facing some issues with the EList generated using the
genmodel:
 
 
He is my problem, I am trying to work with point clouds, i.e. xyz
coordinates data coming from a laser scanner we are using on our Mars
rover.
 
 
These point clouds can be pretty big, a few 100 000 points or more.
 
 
- Even when specifying 'is unique' to false in the ecore, the
'isUnique()' method of the generatee EList still returns true.
Everytime I call  add or addAll methods on the EList, the isUnique()
returns true making a search in the whole list every time I insert a
new item. As soon as I hit more than 10 000 points, the time
dramatically increases.
 
</blockquote>
Probably modeling a point as an EClass is not a great approach. 
EObject have significant overhead so I suspect you'd be better to
define an EDataType to wrap a more compact Point representations. 
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite">-
Looks like the EList is backed with an array, so every time I insert a
new item:
 
   - a new array of 'size+1' is allocated
 
</blockquote>
No, the size of the array  grows by roughly 50%. 
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite">   -
the data from the previous array is copied into this grown array using
System.arraycopy
 
   - the item I want to insert is inserted at the end of this grown
array.
 
  When the list is small this works fine but as soon as you hit a few
1000s elements, the time also increases dramatically.
 
</blockquote>
I think you're making assumptions. 
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite">  - I
would have expected to see a linked list approach on this one.
 
</blockquote>
Linked lists have bad footprint issues though.  Each entry has like 20
bytes of overhead.  And your EObjects are taking far more space that
the minimum you'd get for a class with three floats. 
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite"> 
I would like to know if there is a way to make the EList super fast for
applications using larger datasets.
 
</blockquote>
Probably the majority of your issue comes from the duplicate checking
that all EObject-containing lists impose 
<blockquote><a
href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=89325">https://bugs.eclipse.org/bugs/show_bug.cgi?id=89325</a> 
</blockquote>
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite"> 
I wanted to show those c++ developers that Java can be as fast a c++
but so far: they are not impressed !!!
 
</blockquote>
So advice one is that I'm sure you'll be happier in the longer term if
you don't make points be EObjects.   Probably points should be
immutable instances that you wrap as an EDataType.  When you're adding
to a list and you know the things you are adding are unique, you can
cast to InternalEList and call addUnique to bypass the uniqueness
checking; that's probably the majority of your performance issue. 
 
Is your list a containment list? 
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite"> 
Thanks for taking time to look at my questions,
 
 
Sebastien Gemme
 
Canadian Space Agency
 
</blockquote>
</body>
</html>

--------------060403040306060401030709--

Ed Merks
Professional Support: https://www.macromodeling.com/

Report message to a moderator

Re: EList performance issues [message #425723 is a reply to message #425718]

Fri, 05 December 2008 15:13

Sebastien Gemme

Messages: 4
Registered: July 2009

Junior Member

Ed,

thanks for taking time to answer my questions.

Please find my comments below:

Ed Merks wrote:
> Sebastien,
>
> Comments below.
>
> Sebastien Gemme wrote:
>> I'm a happy EMF user for the last few years. I'm using EMF for all my
>> project now.
>>
>> I am now facing some issues with the EList generated using the genmodel:
>>
>> He is my problem, I am trying to work with point clouds, i.e. xyz
>> coordinates data coming from a laser scanner we are using on our Mars
>> rover.
>>
>> These point clouds can be pretty big, a few 100 000 points or more.
>>
>> - Even when specifying 'is unique' to false in the ecore, the
>> 'isUnique()' method of the generatee EList still returns true.
>> Everytime I call add or addAll methods on the EList, the isUnique()
>> returns true making a search in the whole list every time I insert a
>> new item. As soon as I hit more than 10 000 points, the time
>> dramatically increases.
> Probably modeling a point as an EClass is not a great approach. EObject
> have significant overhead so I suspect you'd be better to define an
> EDataType to wrap a more compact Point representations.
>> - Looks like the EList is backed with an array, so every time I insert
>> a new item:
>> - a new array of 'size+1' is allocated
> No, the size of the array grows by roughly 50%.
>> - the data from the previous array is copied into this grown array
>> using System.arraycopy
>> - the item I want to insert is inserted at the end of this grown
>> array.
>> When the list is small this works fine but as soon as you hit a few
>> 1000s elements, the time also increases dramatically.
> I think you're making assumptions.

Here is the source code, taken from BasicEList:

public boolean add(E object)
{
if (isUnique() && contains(object))
{
return false;
}
else
{
addUnique(object);
return true;
}
}

public void addUnique(E object)
{
// ++modCount
//
grow(size + 1);

assign(size, validate(size, object));
didAdd(size++, object);
didChange();
}

public void grow(int minimumCapacity)
{
++modCount;
int oldCapacity = data == null ? 0 : data.length;
if (minimumCapacity > oldCapacity)
{
Object oldData[] = data;

// This seems to be a pretty sweet formula that supports good growth.
// Adding an object to a list will create a list of capacity 4,
// which is just about the average list size.
//
int newCapacity = oldCapacity + oldCapacity / 2 + 4;
if (newCapacity < minimumCapacity)
{
newCapacity = minimumCapacity;
}
data = newData(newCapacity);
if (oldData != null)
{
System.arraycopy(oldData, 0, data, 0, size);
}
}
}

You are right, the growing is of 50% every time. I still don't
understand why it's backed by an array and not a Linked List approach.

>> - I would have expected to see a linked list approach on this one.
> Linked lists have bad footprint issues though. Each entry has like 20
> bytes of overhead. And your EObjects are taking far more space that the
> minimum you'd get for a class with three floats.
>>
>> I would like to know if there is a way to make the EList super fast
>> for applications using larger datasets.
> Probably the majority of your issue comes from the duplicate checking
> that all EObject-containing lists impose
>
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=89325
>
>>
>> I wanted to show those c++ developers that Java can be as fast a c++
>> but so far: they are not impressed !!!
> So advice one is that I'm sure you'll be happier in the longer term if
> you don't make points be EObjects. Probably points should be immutable
> instances that you wrap as an EDataType. When you're adding to a list
> and you know the things you are adding are unique, you can cast to
> InternalEList and call addUnique to bypass the uniqueness checking;
> that's probably the majority of your performance issue.
I will try that.
>
> Is your list a containment list?

My model is as follows:

PointCloud
points: Point (1..*) (---> that's my EList, with a containment
because conceptually, it makes sense ...).

The Point object only have 3 attributes: x,y,z

A point cloud has many points.

I wanted to use the EObject approach to have the great benefits of emf:
- Persistence
- Notifications

In your opinion, as a guideline: what is the maximum size for an EList,
above which we would consider not using EMF ?

Thanks,
Sebastien

>>
>> Thanks for taking time to look at my questions,
>>
>> Sebastien Gemme
>> Canadian Space Agency

Report message to a moderator

Re: EList performance issues [message #425724 is a reply to message #425723]

Fri, 05 December 2008 15:31

Felix Dorner

Messages: 676
Registered: July 2009

Senior Member

Hey Sebastien

>
> I wanted to use the EObject approach to have the great benefits of emf:
> - Persistence

You can use EDataTypes and override the corresponding string conversion
methods in the generated Factory.

> - Notifications

Does a point ever change its state?

Felix

Report message to a moderator

Re: EList performance issues [message #425726 is a reply to message #425724]

Fri, 05 December 2008 15:47

Sebastien Gemme

Messages: 4
Registered: July 2009

Junior Member

Felix Dorner wrote:
> Hey Sebastien
>
>>
>> I wanted to use the EObject approach to have the great benefits of emf:
>> - Persistence
>
> You can use EDataTypes and override the corresponding string conversion
> methods in the generated Factory.
>
>> - Notifications
>
> Does a point ever change its state?
>
> Felix

Actually, the point can change its state but I don't listen to it.

The feature that interest me is to be able to listen for
removal/addition of point in the PointCloud object.

I did a quick performance test between EList and java.util.LinkedList
and I get a performance ratio of about 6 times faster for LinkedList.

This number is without having any listener on the EList.

I will have to site down and think if I can trade the performance of the
LinkedList for the features of the EList at the cost of an important
performance degradation.

At this point, the question becomes a design issue.

Thanks for your help,
Sebastien

Report message to a moderator

Re: EList performance issues [message #425728 is a reply to message #425723]

Fri, 05 December 2008 16:00

Ed Merks

Messages: 33142
Registered: July 2009

Senior Member

Sebastien,

Comments below.

Sebastien Gemme wrote:
> Ed,
>
> thanks for taking time to answer my questions.
>
>
> Please find my comments below:
>
>
> Ed Merks wrote:
>> Sebastien,
>>
>> Comments below.
>>
>> Sebastien Gemme wrote:
>>> I'm a happy EMF user for the last few years. I'm using EMF for all
>>> my project now.
>>>
>>> I am now facing some issues with the EList generated using the
>>> genmodel:
>>>
>>> He is my problem, I am trying to work with point clouds, i.e. xyz
>>> coordinates data coming from a laser scanner we are using on our
>>> Mars rover.
>>>
>>> These point clouds can be pretty big, a few 100 000 points or more.
>>>
>>> - Even when specifying 'is unique' to false in the ecore, the
>>> 'isUnique()' method of the generatee EList still returns true.
>>> Everytime I call add or addAll methods on the EList, the isUnique()
>>> returns true making a search in the whole list every time I insert a
>>> new item. As soon as I hit more than 10 000 points, the time
>>> dramatically increases.
>> Probably modeling a point as an EClass is not a great approach.
>> EObject have significant overhead so I suspect you'd be better to
>> define an EDataType to wrap a more compact Point representations.
>>> - Looks like the EList is backed with an array, so every time I
>>> insert a new item:
>>> - a new array of 'size+1' is allocated
>> No, the size of the array grows by roughly 50%.
>>> - the data from the previous array is copied into this grown
>>> array using System.arraycopy
>>> - the item I want to insert is inserted at the end of this grown
>>> array.
>>> When the list is small this works fine but as soon as you hit a
>>> few 1000s elements, the time also increases dramatically.
>> I think you're making assumptions.
>
> Here is the source code, taken from BasicEList:
>
> public boolean add(E object)
> {
> if (isUnique() && contains(object))
> {
> return false;
> }
> else
> {
> addUnique(object);
> return true;
> }
> }
>
> public void addUnique(E object)
> {
> // ++modCount
> //
> grow(size + 1);
>
> assign(size, validate(size, object));
> didAdd(size++, object);
> didChange();
> }
>
> public void grow(int minimumCapacity)
> {
> ++modCount;
> int oldCapacity = data == null ? 0 : data.length;
> if (minimumCapacity > oldCapacity)
> {
> Object oldData[] = data;
>
> // This seems to be a pretty sweet formula that supports good
> growth.
> // Adding an object to a list will create a list of capacity 4,
> // which is just about the average list size.
> //
> int newCapacity = oldCapacity + oldCapacity / 2 + 4;
> if (newCapacity < minimumCapacity)
> {
> newCapacity = minimumCapacity;
> }
> data = newData(newCapacity);
> if (oldData != null)
> {
> System.arraycopy(oldData, 0, data, 0, size);
> }
> }
> }
>
> You are right, the growing is of 50% every time. I still don't
> understand why it's backed by an array and not a Linked List approach.
Because linked lists aren't random access and you'd likely find perform
far far worse than what you see now. For example, to add to a linked
list, you always have to do a heap allocation and you'll find that this
is WAY slower than setting a value into and array. If you consider even
that the array is 1.5 times the size it often needs to be, that's 6
bytes per value, but an entry takes 20 bytes so uses more than 3-5 times
more memory.
>
>
>>> - I would have expected to see a linked list approach on this one.
>> Linked lists have bad footprint issues though. Each entry has like
>> 20 bytes of overhead. And your EObjects are taking far more space
>> that the minimum you'd get for a class with three floats.
>>>
>>> I would like to know if there is a way to make the EList super fast
>>> for applications using larger datasets.
>> Probably the majority of your issue comes from the duplicate checking
>> that all EObject-containing lists impose
>>
>> https://bugs.eclipse.org/bugs/show_bug.cgi?id=89325
>>
>>>
>>> I wanted to show those c++ developers that Java can be as fast a c++
>>> but so far: they are not impressed !!!
>> So advice one is that I'm sure you'll be happier in the longer term
>> if you don't make points be EObjects. Probably points should be
>> immutable instances that you wrap as an EDataType. When you're
>> adding to a list and you know the things you are adding are unique,
>> you can cast to InternalEList and call addUnique to bypass the
>> uniqueness checking; that's probably the majority of your performance
>> issue.
> I will try that.
>>
>> Is your list a containment list?
>
> My model is as follows:
>
> PointCloud
> points: Point (1..*) (---> that's my EList, with a containment
> because conceptually, it makes sense ...).
I think this is a bad approach.
>
> The Point object only have 3 attributes: x,y,z
I think you should design an immutable class so that each point takes
only 20 bytes (assuming it's a float).
>
> A point cloud has many points.
>
> I wanted to use the EObject approach to have the great benefits of emf:
> - Persistence
As an EDataType, you'd just specialize the XyzFactoryImpl to serialize
it as a string value. This will take far less space and be more
efficient to parse..
> - Notifications
You should make points immutable and then you can listen to the entire
point value being added or removed and not have to worry about the parts
of the point changing.
>
> In your opinion, as a guideline: what is the maximum size for an
> EList, above which we would consider not using EMF ?
I think you'll find the list works much better if you use and EDataType
and then the isUnique being false will be respected.

In the long term, as you proceed from worry about speed and worry about
representing huge models, you'll be happy you worried about footprint
now. And of course speed will benefit as a result as well. You'll see.
>
> Thanks,
> Sebastien
>
>
>
>>>
>>> Thanks for taking time to look at my questions,
>>>
>>> Sebastien Gemme
>>> Canadian Space Agency

Ed Merks
Professional Support: https://www.macromodeling.com/

Report message to a moderator

Re: EList performance issues [message #425729 is a reply to message #425726]

Fri, 05 December 2008 16:03

Ed Merks

Messages: 33142
Registered: July 2009

Senior Member

Sebastien,

Comments below.

Sebastien Gemme wrote:
> Felix Dorner wrote:
>> Hey Sebastien
>>
>>>
>>> I wanted to use the EObject approach to have the great benefits of emf:
>>> - Persistence
>>
>> You can use EDataTypes and override the corresponding string
>> conversion methods in the generated Factory.
>>
>>> - Notifications
>>
>> Does a point ever change its state?
>>
>> Felix
>
> Actually, the point can change its state but I don't listen to it.
Best that the point never change and be replaced by a new point if it
needs to change. Just as a String or Integer never changes.
>
> The feature that interest me is to be able to listen for
> removal/addition of point in the PointCloud object.
You'll get that either way.
>
> I did a quick performance test between EList and java.util.LinkedList
> and I get a performance ratio of about 6 times faster for LinkedList.
Without context, this statement is not so meaningful. An add to an
array list that doesn't need to grow will definitely be much faster than
any single add to a linked list because of the heap allocation.
>
> This number is without having any listener on the EList.
>
> I will have to site down and think if I can trade the performance of
> the LinkedList for the features of the EList at the cost of an
> important performance degradation.
I can assure you that you're worried about the wrong thing right now.
You'll find things like addAll work much faster on array lists.
LinkedLists are not so commonly used for a good reason...
>
> At this point, the question becomes a design issue.
Hopefully you'll reconsider your design for Points. You'll be happier
in the long term.
>
> Thanks for your help,
> Sebastien

Ed Merks
Professional Support: https://www.macromodeling.com/

Report message to a moderator

Re: EList performance issues [message #425731 is a reply to message #425726]

Fri, 05 December 2008 16:07

Felix Dorner

Messages: 676
Registered: July 2009

Senior Member

Hey Sebastien,

> Actually, the point can change its state but I don't listen to it.

If you model it as an EClass, and are absolutely sure you never want to
listen, you could override eIsNotificationRequired() to false. I don't
know if that weights much though.

> The feature that interest me is to be able to listen for
> removal/addition of point in the PointCloud object.
> I will have to site down and think if I can trade the performance of the
> LinkedList for the features of the EList at the cost of an important
> performance degradation.

It would really surprise me if the adapter/notification mechanism
wouldn't work for lists that contain eDatatypes (instead of eObjects)!?

Felix

Report message to a moderator

Previous Topic:	XSD to Java code generation with multiple import of the same namespace
Next Topic:	Trying to reference an interface from an EClass

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Fri Apr 26 13:55:46 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter