Guys,
Comments below.
Markus Kohler wrote:
Hi Michael,
Thanks for the info.
Yes, there a ways to minimize the overhead and IMHO in practice
a naive implementation of this pattern has just too much overhead.
Yes, hash maps are just about the worst case of memory footprint you
can imagine, especially given that most implementations use instances
of Map.Entry to cause bloat in addition to the large index.
I know at least one real world example, where the memory usage
of a software component using this pattern could be reduced by a
factor of 10.
The only potential upside of the naive pattern might be huge sparsely
populated instances. I.e., you have 1000 feature but only two or three
tend to be set on average.
People sometimes claim that memory is so cheap that this kind of
optimizations don't really matter.
Sometimes I make the silly claim that Java doesn't scale because
although my hardware has 4G I can't have a heap anywhere close to 2G
in size. The cheap memory claim is just silly.
I don't believe in this, just because if you use 10x more memory
per user, your scalability will most likely be limited by the memory
usage.
Which basically means you will need more machines to serve the
same number of users, just because you didn't care that much about
memory usage.
It's just a stupid claim.
We had a discussion here about "bloat" lately and my
understanding is, that this topic is becoming more important because e4
will support a multi user environment (please correct me if I'm wrong).
A lot of that talk was about bloat in the byte code and also about
static data that can never be garbage collected, but instance size is
quite important too.
I've been prototyping techniques for significantly reducing the size of
EObjectImpl. Perhaps by as much as 50% or more... In my opinion, ever
byte saved is a byte earned. :-P
In such a multi user environment the main concern is the amount
of memory you need per user, because as you increase the number of
users at some point in time the memory usage will be dominated by the
objects that are needed per user.
Therefore, if we talk about bloat I think that duplicated code
might not be the biggest problem, but rather duplicated data,
especially data duplicated per user.
I think they all add up. Often people are surprised by the byte code
as an issue because it's not an issue that scales, but rather is a
constant. I recall a case where folks changed their EMF generation
feature delegation pattern from the normal one to the less time
efficient Reflection delegation pattern. They also changed the
GenPackage's to use Initialize by Loading. They had
huge models
that generally were used only during initialization. The reduction in
byte code resulted in a huge improvement in startup time and a huge
reduction in "retained memory", which the the performance loss for data
access and the increased memory footprint of the instances had no
negative impact. This was an excellent example of the opposite of what
you might expect and a great reminder that measurements speak louder
than mental exercises and abstract thinking..
IMHO the only approach that can avoid bloat is therefore to
carefully design which data can be shared between users and which data
needs to be there per user.
I think it would make sense to constantly monitor the memory
usage using automatic tests.
The Eclipse Memory Analyzer could be used for this kind of
memory usage tests.
I so totally agree. Measure, measure, measure again. Measure
everything. And when it comes to performance measurement, remember
that the observer often affects the observed and that unfortunately
that different JREs and different JIT implementations have a huge
impact on performance; often more than the optimizations you might be
trying to achieve with the changes you make.
Yes, I thought both posts were interesting.
EMF
is something in between.
Almost like a panacea. :-P
If you use
generated classes (fixed properties), the overhead is 4
additional object attributes.
A little worse than that, but I'm working on it in my copious spare
time.
In
case of dynamic EMF you
are much better than using HashMaps,
It's always much better than HashHaps, even for dynamic. And the
performance is better as well.
because the attributes
are stored in an array and the key (IStructuralFeature) has
an index into that array (I am sure Ed can give some
numbers here).
I think Eric confirmed that a EObject.eGet(feature) is twice as fast as
HashMap.get(key), and we even have InternalEObject.eGet(featureID)
which is faster yet...
So,
with EMF you have the choice
between dynamic and fixed properties and you can
mix both approaches.....
In the sense you're using here, the set of properties is fixed; it's
just a case of are individual fields allocated per feature, or is an
array of slots allocated to hold all the features.
Unfortunately EMF is not good at delegating non existing
properties to another instance.
That's not quite true either. :-P
EMF supports the same type of thing as XML Schema's wildcards. So you
can have a property just like <xsd:any>. Other models
(<schema>s) can then declare global elements and those global
elements (properties of the document root of the corresponding
EPackage) can be used as properties on the object with the wildcard
property.
Just
two weeks ago I
worked with a colleague on an extension of EMF that
allows this (in fact it adds a kind of aspects (AOP) to
EMF that allows interception of the set/get methods).
Pretty interesting article but quite long -- I started reading
but after 30 min I decided to "fast read" the rest...
Yes, I'm not sure I agree with the overall outlook. Often people see
difference where I'll see commonalities. For example, I see little
significant difference between UML and XML Schema for the purpose of
this article. They're both modeling languages, each with a few
features the other doesn't have, but modeling languages nevertheless.
Michael
Hi all,
I agree that's an interesting post. But Steve IMHO doesn't point out
that the main problem with this approach is that it can have a high
memory overhead.
I once did some calculations for a simple Hashmap implementation versus
just using instance variables. See my old blog at
http://www.sdn.sap.com/irj/sdn/weblogs?blog=/pub/wlg/5163
Regards,
Markus
<mailto:eclipse-incubator-e4-dev@xxxxxxxxxxx>
------------------------------------------------------------------------