Home » Modeling » EMF » EList performance issues
EList performance issues [message #425717] |
Fri, 05 December 2008 13:42 |
Sebastien Gemme Messages: 4 Registered: July 2009 |
Junior Member |
|
|
I'm a happy EMF user for the last few years. I'm using EMF for all my
project now.
I am now facing some issues with the EList generated using the genmodel:
He is my problem, I am trying to work with point clouds, i.e. xyz
coordinates data coming from a laser scanner we are using on our Mars rover.
These point clouds can be pretty big, a few 100 000 points or more.
- Even when specifying 'is unique' to false in the ecore, the
'isUnique()' method of the generatee EList still returns true. Everytime
I call add or addAll methods on the EList, the isUnique() returns true
making a search in the whole list every time I insert a new item. As
soon as I hit more than 10 000 points, the time dramatically increases.
- Looks like the EList is backed with an array, so every time I insert a
new item:
- a new array of 'size+1' is allocated
- the data from the previous array is copied into this grown array
using System.arraycopy
- the item I want to insert is inserted at the end of this grown array.
When the list is small this works fine but as soon as you hit a few
1000s elements, the time also increases dramatically.
- I would have expected to see a linked list approach on this one.
I would like to know if there is a way to make the EList super fast for
applications using larger datasets.
I wanted to show those c++ developers that Java can be as fast a c++ but
so far: they are not impressed !!!
Thanks for taking time to look at my questions,
Sebastien Gemme
Canadian Space Agency
|
|
|
Re: EList performance issues [message #425718 is a reply to message #425717] |
Fri, 05 December 2008 13:58 |
Ed Merks Messages: 33142 Registered: July 2009 |
Senior Member |
|
|
This is a multi-part message in MIME format.
--------------060403040306060401030709
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sebastien,
Comments below.
Sebastien Gemme wrote:
> I'm a happy EMF user for the last few years. I'm using EMF for all my
> project now.
>
> I am now facing some issues with the EList generated using the genmodel:
>
> He is my problem, I am trying to work with point clouds, i.e. xyz
> coordinates data coming from a laser scanner we are using on our Mars
> rover.
>
> These point clouds can be pretty big, a few 100 000 points or more.
>
> - Even when specifying 'is unique' to false in the ecore, the
> 'isUnique()' method of the generatee EList still returns true.
> Everytime I call add or addAll methods on the EList, the isUnique()
> returns true making a search in the whole list every time I insert a
> new item. As soon as I hit more than 10 000 points, the time
> dramatically increases.
Probably modeling a point as an EClass is not a great approach. EObject
have significant overhead so I suspect you'd be better to define an
EDataType to wrap a more compact Point representations.
> - Looks like the EList is backed with an array, so every time I insert
> a new item:
> - a new array of 'size+1' is allocated
No, the size of the array grows by roughly 50%.
> - the data from the previous array is copied into this grown array
> using System.arraycopy
> - the item I want to insert is inserted at the end of this grown
> array.
> When the list is small this works fine but as soon as you hit a few
> 1000s elements, the time also increases dramatically.
I think you're making assumptions.
> - I would have expected to see a linked list approach on this one.
Linked lists have bad footprint issues though. Each entry has like 20
bytes of overhead. And your EObjects are taking far more space that the
minimum you'd get for a class with three floats.
>
> I would like to know if there is a way to make the EList super fast
> for applications using larger datasets.
Probably the majority of your issue comes from the duplicate checking
that all EObject-containing lists impose
https://bugs.eclipse.org/bugs/show_bug.cgi?id=89325
>
> I wanted to show those c++ developers that Java can be as fast a c++
> but so far: they are not impressed !!!
So advice one is that I'm sure you'll be happier in the longer term if
you don't make points be EObjects. Probably points should be immutable
instances that you wrap as an EDataType. When you're adding to a list
and you know the things you are adding are unique, you can cast to
InternalEList and call addUnique to bypass the uniqueness checking;
that's probably the majority of your performance issue.
Is your list a containment list?
>
> Thanks for taking time to look at my questions,
>
> Sebastien Gemme
> Canadian Space Agency
--------------060403040306060401030709
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Sebastien,<br>
<br>
Comments below.<br>
<br>
Sebastien Gemme wrote:
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite">I'm a
happy EMF user for the last few years. I'm using EMF for all my project
now.
<br>
<br>
I am now facing some issues with the EList generated using the
genmodel:
<br>
<br>
He is my problem, I am trying to work with point clouds, i.e. xyz
coordinates data coming from a laser scanner we are using on our Mars
rover.
<br>
<br>
These point clouds can be pretty big, a few 100 000 points or more.
<br>
<br>
- Even when specifying 'is unique' to false in the ecore, the
'isUnique()' method of the generatee EList still returns true.
Everytime I call add or addAll methods on the EList, the isUnique()
returns true making a search in the whole list every time I insert a
new item. As soon as I hit more than 10 000 points, the time
dramatically increases.
<br>
</blockquote>
Probably modeling a point as an EClass is not a great approach.
EObject have significant overhead so I suspect you'd be better to
define an EDataType to wrap a more compact Point representations.<br>
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite">-
Looks like the EList is backed with an array, so every time I insert a
new item:
<br>
- a new array of 'size+1' is allocated
<br>
</blockquote>
No, the size of the array grows by roughly 50%.<br>
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite"> -
the data from the previous array is copied into this grown array using
System.arraycopy
<br>
- the item I want to insert is inserted at the end of this grown
array.
<br>
When the list is small this works fine but as soon as you hit a few
1000s elements, the time also increases dramatically.
<br>
</blockquote>
I think you're making assumptions.<br>
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite"> - I
would have expected to see a linked list approach on this one.
<br>
</blockquote>
Linked lists have bad footprint issues though. Each entry has like 20
bytes of overhead. And your EObjects are taking far more space that
the minimum you'd get for a class with three floats.<br>
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite"><br>
I would like to know if there is a way to make the EList super fast for
applications using larger datasets.
<br>
</blockquote>
Probably the majority of your issue comes from the duplicate checking
that all EObject-containing lists impose<br>
<blockquote><a
href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=89325">https://bugs.eclipse.org/bugs/show_bug.cgi?id=89325</a><br>
</blockquote>
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite"><br>
I wanted to show those c++ developers that Java can be as fast a c++
but so far: they are not impressed !!!
<br>
</blockquote>
So advice one is that I'm sure you'll be happier in the longer term if
you don't make points be EObjects. Probably points should be
immutable instances that you wrap as an EDataType. When you're adding
to a list and you know the things you are adding are unique, you can
cast to InternalEList and call addUnique to bypass the uniqueness
checking; that's probably the majority of your performance issue.<br>
<br>
Is your list a containment list?<br>
<blockquote cite="mid:ghbb4b$369$2@build.eclipse.org" type="cite"><br>
Thanks for taking time to look at my questions,
<br>
<br>
Sebastien Gemme
<br>
Canadian Space Agency
<br>
</blockquote>
</body>
</html>
--------------060403040306060401030709--
Ed Merks
Professional Support: https://www.macromodeling.com/
|
|
|
Re: EList performance issues [message #425723 is a reply to message #425718] |
Fri, 05 December 2008 15:13 |
Sebastien Gemme Messages: 4 Registered: July 2009 |
Junior Member |
|
|
Ed,
thanks for taking time to answer my questions.
Please find my comments below:
Ed Merks wrote:
> Sebastien,
>
> Comments below.
>
> Sebastien Gemme wrote:
>> I'm a happy EMF user for the last few years. I'm using EMF for all my
>> project now.
>>
>> I am now facing some issues with the EList generated using the genmodel:
>>
>> He is my problem, I am trying to work with point clouds, i.e. xyz
>> coordinates data coming from a laser scanner we are using on our Mars
>> rover.
>>
>> These point clouds can be pretty big, a few 100 000 points or more.
>>
>> - Even when specifying 'is unique' to false in the ecore, the
>> 'isUnique()' method of the generatee EList still returns true.
>> Everytime I call add or addAll methods on the EList, the isUnique()
>> returns true making a search in the whole list every time I insert a
>> new item. As soon as I hit more than 10 000 points, the time
>> dramatically increases.
> Probably modeling a point as an EClass is not a great approach. EObject
> have significant overhead so I suspect you'd be better to define an
> EDataType to wrap a more compact Point representations.
>> - Looks like the EList is backed with an array, so every time I insert
>> a new item:
>> - a new array of 'size+1' is allocated
> No, the size of the array grows by roughly 50%.
>> - the data from the previous array is copied into this grown array
>> using System.arraycopy
>> - the item I want to insert is inserted at the end of this grown
>> array.
>> When the list is small this works fine but as soon as you hit a few
>> 1000s elements, the time also increases dramatically.
> I think you're making assumptions.
Here is the source code, taken from BasicEList:
public boolean add(E object)
{
if (isUnique() && contains(object))
{
return false;
}
else
{
addUnique(object);
return true;
}
}
public void addUnique(E object)
{
// ++modCount
//
grow(size + 1);
assign(size, validate(size, object));
didAdd(size++, object);
didChange();
}
public void grow(int minimumCapacity)
{
++modCount;
int oldCapacity = data == null ? 0 : data.length;
if (minimumCapacity > oldCapacity)
{
Object oldData[] = data;
// This seems to be a pretty sweet formula that supports good growth.
// Adding an object to a list will create a list of capacity 4,
// which is just about the average list size.
//
int newCapacity = oldCapacity + oldCapacity / 2 + 4;
if (newCapacity < minimumCapacity)
{
newCapacity = minimumCapacity;
}
data = newData(newCapacity);
if (oldData != null)
{
System.arraycopy(oldData, 0, data, 0, size);
}
}
}
You are right, the growing is of 50% every time. I still don't
understand why it's backed by an array and not a Linked List approach.
>> - I would have expected to see a linked list approach on this one.
> Linked lists have bad footprint issues though. Each entry has like 20
> bytes of overhead. And your EObjects are taking far more space that the
> minimum you'd get for a class with three floats.
>>
>> I would like to know if there is a way to make the EList super fast
>> for applications using larger datasets.
> Probably the majority of your issue comes from the duplicate checking
> that all EObject-containing lists impose
>
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=89325
>
>>
>> I wanted to show those c++ developers that Java can be as fast a c++
>> but so far: they are not impressed !!!
> So advice one is that I'm sure you'll be happier in the longer term if
> you don't make points be EObjects. Probably points should be immutable
> instances that you wrap as an EDataType. When you're adding to a list
> and you know the things you are adding are unique, you can cast to
> InternalEList and call addUnique to bypass the uniqueness checking;
> that's probably the majority of your performance issue.
I will try that.
>
> Is your list a containment list?
My model is as follows:
PointCloud
points: Point (1..*) (---> that's my EList, with a containment
because conceptually, it makes sense ...).
The Point object only have 3 attributes: x,y,z
A point cloud has many points.
I wanted to use the EObject approach to have the great benefits of emf:
- Persistence
- Notifications
In your opinion, as a guideline: what is the maximum size for an EList,
above which we would consider not using EMF ?
Thanks,
Sebastien
>>
>> Thanks for taking time to look at my questions,
>>
>> Sebastien Gemme
>> Canadian Space Agency
|
|
| | |
Re: EList performance issues [message #425728 is a reply to message #425723] |
Fri, 05 December 2008 16:00 |
Ed Merks Messages: 33142 Registered: July 2009 |
Senior Member |
|
|
Sebastien,
Comments below.
Sebastien Gemme wrote:
> Ed,
>
> thanks for taking time to answer my questions.
>
>
> Please find my comments below:
>
>
> Ed Merks wrote:
>> Sebastien,
>>
>> Comments below.
>>
>> Sebastien Gemme wrote:
>>> I'm a happy EMF user for the last few years. I'm using EMF for all
>>> my project now.
>>>
>>> I am now facing some issues with the EList generated using the
>>> genmodel:
>>>
>>> He is my problem, I am trying to work with point clouds, i.e. xyz
>>> coordinates data coming from a laser scanner we are using on our
>>> Mars rover.
>>>
>>> These point clouds can be pretty big, a few 100 000 points or more.
>>>
>>> - Even when specifying 'is unique' to false in the ecore, the
>>> 'isUnique()' method of the generatee EList still returns true.
>>> Everytime I call add or addAll methods on the EList, the isUnique()
>>> returns true making a search in the whole list every time I insert a
>>> new item. As soon as I hit more than 10 000 points, the time
>>> dramatically increases.
>> Probably modeling a point as an EClass is not a great approach.
>> EObject have significant overhead so I suspect you'd be better to
>> define an EDataType to wrap a more compact Point representations.
>>> - Looks like the EList is backed with an array, so every time I
>>> insert a new item:
>>> - a new array of 'size+1' is allocated
>> No, the size of the array grows by roughly 50%.
>>> - the data from the previous array is copied into this grown
>>> array using System.arraycopy
>>> - the item I want to insert is inserted at the end of this grown
>>> array.
>>> When the list is small this works fine but as soon as you hit a
>>> few 1000s elements, the time also increases dramatically.
>> I think you're making assumptions.
>
> Here is the source code, taken from BasicEList:
>
> public boolean add(E object)
> {
> if (isUnique() && contains(object))
> {
> return false;
> }
> else
> {
> addUnique(object);
> return true;
> }
> }
>
> public void addUnique(E object)
> {
> // ++modCount
> //
> grow(size + 1);
>
> assign(size, validate(size, object));
> didAdd(size++, object);
> didChange();
> }
>
> public void grow(int minimumCapacity)
> {
> ++modCount;
> int oldCapacity = data == null ? 0 : data.length;
> if (minimumCapacity > oldCapacity)
> {
> Object oldData[] = data;
>
> // This seems to be a pretty sweet formula that supports good
> growth.
> // Adding an object to a list will create a list of capacity 4,
> // which is just about the average list size.
> //
> int newCapacity = oldCapacity + oldCapacity / 2 + 4;
> if (newCapacity < minimumCapacity)
> {
> newCapacity = minimumCapacity;
> }
> data = newData(newCapacity);
> if (oldData != null)
> {
> System.arraycopy(oldData, 0, data, 0, size);
> }
> }
> }
>
> You are right, the growing is of 50% every time. I still don't
> understand why it's backed by an array and not a Linked List approach.
Because linked lists aren't random access and you'd likely find perform
far far worse than what you see now. For example, to add to a linked
list, you always have to do a heap allocation and you'll find that this
is WAY slower than setting a value into and array. If you consider even
that the array is 1.5 times the size it often needs to be, that's 6
bytes per value, but an entry takes 20 bytes so uses more than 3-5 times
more memory.
>
>
>>> - I would have expected to see a linked list approach on this one.
>> Linked lists have bad footprint issues though. Each entry has like
>> 20 bytes of overhead. And your EObjects are taking far more space
>> that the minimum you'd get for a class with three floats.
>>>
>>> I would like to know if there is a way to make the EList super fast
>>> for applications using larger datasets.
>> Probably the majority of your issue comes from the duplicate checking
>> that all EObject-containing lists impose
>>
>> https://bugs.eclipse.org/bugs/show_bug.cgi?id=89325
>>
>>>
>>> I wanted to show those c++ developers that Java can be as fast a c++
>>> but so far: they are not impressed !!!
>> So advice one is that I'm sure you'll be happier in the longer term
>> if you don't make points be EObjects. Probably points should be
>> immutable instances that you wrap as an EDataType. When you're
>> adding to a list and you know the things you are adding are unique,
>> you can cast to InternalEList and call addUnique to bypass the
>> uniqueness checking; that's probably the majority of your performance
>> issue.
> I will try that.
>>
>> Is your list a containment list?
>
> My model is as follows:
>
> PointCloud
> points: Point (1..*) (---> that's my EList, with a containment
> because conceptually, it makes sense ...).
I think this is a bad approach.
>
> The Point object only have 3 attributes: x,y,z
I think you should design an immutable class so that each point takes
only 20 bytes (assuming it's a float).
>
> A point cloud has many points.
>
> I wanted to use the EObject approach to have the great benefits of emf:
> - Persistence
As an EDataType, you'd just specialize the XyzFactoryImpl to serialize
it as a string value. This will take far less space and be more
efficient to parse..
> - Notifications
You should make points immutable and then you can listen to the entire
point value being added or removed and not have to worry about the parts
of the point changing.
>
> In your opinion, as a guideline: what is the maximum size for an
> EList, above which we would consider not using EMF ?
I think you'll find the list works much better if you use and EDataType
and then the isUnique being false will be respected.
In the long term, as you proceed from worry about speed and worry about
representing huge models, you'll be happy you worried about footprint
now. And of course speed will benefit as a result as well. You'll see.
>
> Thanks,
> Sebastien
>
>
>
>>>
>>> Thanks for taking time to look at my questions,
>>>
>>> Sebastien Gemme
>>> Canadian Space Agency
Ed Merks
Professional Support: https://www.macromodeling.com/
|
|
| | |
Goto Forum:
Current Time: Fri Apr 26 13:55:46 GMT 2024
Powered by FUDForum. Page generated in 0.03071 seconds
|