Home » Eclipse Projects » EclipseLink » Repeatble unit indirectly referenced by server session(Caching memory analysis)
Repeatble unit indirectly referenced by server session [message #1689589] |
Mon, 23 March 2015 22:09 |
Nuno Godinho de Matos Messages: 34 Registered: September 2012 |
Member |
|
|
Hi,
This post will be relatively large, as I am trying to get some some enlightenment about the EclipseLink repeatable units of work and I have found some confusing facts about these instances.
The repeatable units of work supposedly corresponds to the local unit of work cache of ongoing transactions. This is what the documentation online traditionally hints at. It is supposed to store a local instance cache of objects accessed during a transaction in order to keep track of the changes done to them and such. This type of definition would imply that these instances should conceptually be short lived, the transaction ends, the Entity Manager goes away, the repeatable units of work should go away as well. The application may continue referencing entites that have become detached by the entity manager, but that should not have major impact on server memory consumption - at least conceptually.
However, the above paragraph seems to not correspond to reality.
In fact, when doing a heap dump, it is pretty standard to see the eclipse link RepeatbleUnitOfWork as the overall champion consumer of heap memory on the JVM. This is quite unexpected. You would have assumed that perhaps the ServerSession cache would be it, or the JSF stored views, or something along these lines.
This is typically not the case, the repeatble units of work typically exist in abundance and are the top consumers.
To make matters more complicated, it is absolutely non trivial to accurately say: "why so many of them" or "why so big". The why so big is normally easier to answer, there might have been a transaction that has accessed to many entities - for example.
When searching for the garbage collection roots for these instances, as said above, I would typically expect to see each and every single one of them pointing up to a a live ongoing thread needing an entity manage.
However, this seems not to be the case.
So my first question is: are repeatble units of work supposed to outlive the container managed entity manager life span?
In which cases would i for example see 100 different instances of repeatble units of work without a single living instance of an Entity Manager?
RepeatableUnitsOfWork seem to be easy to spread throughout an application. For example, if you use a local Enterprise Java Bean to return you an DB entity, and you keep this returned entity in a Page bean (say a session Bean), the entity will apparently continue attached to its repeatble units of work. I have at one point created a trivial Local EJB that returned an entity A comprising a one to one relationship to a second entity B and a one to many rellationship to a list of entities C. I would save the output of the EJB call and saved it on a bean state.
After a Heap dump, it was obvious that:
- Entity A still had a relationship to Repeatble Unit of work in its Lazy loaded indirect list; the entity B reference by the one-to-one relationship itself was bound to RepeableUnitOfWork instance and this instance was not the same one that was present in entity A.
My question in this case would be:
Is this to be expected? If so why and what is the best way to ensure that long lived entities will not consume too much memory on the sever through its repeatable units of work?
I would have expected that as the container closes the transaction all the lazy loaded entity attributes that normally seem to encapsulate references to Repeatble unit of works would get cleaned out
The title of the thread is: " Repeatble unit indirectly referenced by server session "
In this case, I have also seen rather large RepetableUnits of work that was taking a significant chunk of server memory be kept alive through a relatively long indrection to the server session cache. The following pipeline illustrates overal what the garbage collection root looked like.
(1) BigRepeableUniOfWork -> (2) QueryBasedValueHolder -> (3) InderList of my Entity A OneToManyRelationShip -> (4) Parent Entity B -> HashEntry [Cache of Entities B] -> (5) Hash Entry [Global Cache ] -> (6) ServerSessionDelegate
In summary, we have the server session cache that looks like map of the form:
Map < Class< T extends entity> , Map < T exntends EntityKey, SoftReference<Entity>>.
My question here is:
- Why would the server session cache refer to entities that are refferring to Repeatble Units of work that end up taking more server memory than the entities themselves on the cache? How will this entity cache ever scale over time if an entity in the cache potentially is a few bytest large but its unit of work several mega bytes?
Would this be an indication of an error in the application? Or is this to be expected?
Kindest regards,
And thanks for your help.
[Updated on: Tue, 24 March 2015 20:00] Report message to a moderator
|
|
|
Re: Repeatble unit indirectly referenced by server session [message #1691367 is a reply to message #1689589] |
Mon, 06 April 2015 16:29 |
Chris Delahunt Messages: 1389 Registered: July 2009 |
Senior Member |
|
|
Question 1:
"So my first question is: are repeatble units of work supposed to outlive the container managed entity manager life span?
In which cases would i for example see 100 different instances of repeatble units of work without a single living instance of an Entity Manager?"
Yes, it is possible under some conditions. As you mentioned, the RepeatbleUnitOfWork (aka UnitOfWork or UOW) represents the transactional context behind an EntityManager. It also maintains the identity and cache for the EntityManager, which allows repeated find and queries to return the same entity instance - which is also important when building object graphs. When building object graphs with lazy relationships, the objects are built with a valueholder that reference the UOW, so that it can go to the UOW when it is triggered to continue building the graph. If you keep around a detached entity, it is those references that are keeping the UOW and its cache, avoiding exceptions that occur when you traverse lazy untriggered relationships on detached entities.
As JPA has multiple levels of caching, it is inadvisable to cache your entities in the application. If you must, you can
a) reduce the overhead by clearing the EntityManager - but note that this will have an effect on object identity if traversing untriggered lazy relationships
b) call em.detach on entities, or use a copy policy to make your own detached copies: http://stackoverflow.com/a/17549583/496099
c) use read-only versions, giving you an entity from the shared cache - these entities should not be modified ( https://www.eclipse.org/eclipselink/documentation/2.4/jpa/extensions/q_read_only.htm )
d) configure the UOW to use weak references, allowing entities to be GCd when your application no longer references them ( https://www.eclipse.org/eclipselink/documentation/2.4/concepts/cache001.htm#CDEJAHDJ ) This will allow you to keep a single EM around that can be used for entities that might be cached by your application, or allow some of the resources to be cleaned up in the current architecture. Draw backs are that larger transactions should be flushed periodically so that changes to entities that are no longer referenced are not lost when GC runs.
Question 2)
"After a Heap dump, it was obvious that:
- Entity A still had a relationship to Repeatble Unit of work in its Lazy loaded indirect list; the entity B reference by the one-to-one relationship itself was bound to RepeableUnitOfWork instance and this instance was not the same one that was present in entity A."
I don't follow this, so I can't say it is expected. It sounds like you are suggesting A references UOW1, while its referenced B references UOW2. There should only be a single UOW unless the application is modifying the object model with entities read in from different contexts. See options above as your follow up is along the same lines.
expectation:
"I would have expected that as the container closes the transaction all the lazy loaded entity attributes that normally seem to encapsulate references to Repeatble unit of works would get cleaned out"
Some other providers and the wording of the spec itself sets that expectation, but EclipseLink viewed it as less useful to applications to throw an exception when the underlying context is still accessible ( it is really only inaccessible when the factory is closed, or the entity is serialized). The feature is described by Doug here https://community.oracle.com/message/1708796
Question 3)
"- Why would the server session cache refer to entities that are refferring to Repeatble Units of work that end up taking more server memory than the entities themselves on the cache? How will this entity cache ever scale over time if an entity in the cache potentially is a few bytest large but its unit of work several mega bytes?
Would this be an indication of an error in the application? Or is this to be expected?"
I don't quite follow how you determined the shared session was referring to entities read in from a UOW, but this would be a problem. It might occur if you are modifying read-only entity instances and pointing them to other entities read from other contexts - corrupting the shared cache. Or it could be that you are reading the tool wrong, as there is no problem with an Entity indirectly referencing the shared session. I can't say if there is a problem without more to go on.
Best Regards,
Chris
|
|
|
Re: Repeatble unit indirectly referenced by server session [message #1696550 is a reply to message #1691367] |
Wed, 27 May 2015 08:30 |
Nuno Godinho de Matos Messages: 34 Registered: September 2012 |
Member |
|
|
Hi Chris,
And many thanks on your reply, it kind of confirms what I was expecting from the heap dumps I've so far analysed.
"As JPA has multiple levels of caching, it is inadvisable to cache your entities in the application. If you must, you can "
I fully agree with you on this one.
On my opinion the detach API - should be removed all-together from the JPA specification or otherwise, it should be mandatory that when entity A is detached, the full Object graph - all entities associated to the entity A one-to-one loaded one-to-may or many-many relationships become themselves detached, and any lazy loadable list not yet triggered become unusable.
The current level of service of this API is dangerous.
You detach an Entity that migh potentially be associated to a large entity graph loaded by a transaction, and you will have the Repetable Unit of work, attribute change listeners, and all of those hanging around on the heap dump.
Caching entities - I start to find is simply dangerous.
If you need to cache an entity copy it over to a non JPA related object with the information you need, and you will safeguard your memory and potential corruption of the JPA shared cache (e.g. when your detached entity has some sort of @Transient field that also exists in the server session cached entity).
"I don't follow this, so I can't say it is expected. It sounds like you are suggesting A references UOW1, while its referenced B references UOW2. There should only be a single UOW unless the application is modifying the object model with entities read in from different contexts. See options above as your follow up is along the same lines."
You followed perfectly. That is preceisly what I am saying.
On the heap-dump, drilling down a base entity, I am able to find dow the object hierarchy different repetable units of work.
Thank you for making it clear that this is not expect to ever be found on JPA entity graph. The problem them becomes, by which means would I be able to track how the entity got corrupted - not easy at all.
"I don't quite follow how you determined the shared session was referring to entities read in from a UOW, but this would be a problem. It might occur if you are modifying read-only entity instances and pointing them to other entities read from other contexts - corrupting the shared cache. Or it could be that you are reading the tool wrong, as there is no problem with an Entity indirectly referencing the shared session. I can't say if there is a problem without more to go on. "
Once you start knowing how to read the Heap Dumps, it eventually becomes quite easy to track if an entity is or is not being held in memory by the server session / second level / shared cache.
You will see that in the GC root path, that entity is being held in memory by either a Weak or a Soft reference. And in turn that weak or soft reference is part of a Hashmap whose keys are Entity Classes and whose value is - for example - a SoftCacheWeakIdentitfyMap full of entities of that class type.
So it is clear:
I have an Entity A that is being held by either a Weak or SoftReference, which leads all the way up to the server session.
It is 100% an entity under the administration of the ServerSession.
The problem then becomes, for example, if that entitiy had a @Transient field, that transient field will be gobbling up memory because it looks like is still living on an ongoing transaction. So one thing that has become apparent - Transient fields are dangerous especially if they are caching entities.
But there clearly are also other forms by which an entity in the Server Session cache continues keeping alive the RepeatbleUnitsOfWork.
I am right now looking at an heap dump where a very simple entity, let us call it Jacket has an @Embedded field called Dimensions.
And while the Jacket instance is on the server session cache and appears to overall be cleaned of RepeatbleUnits of work, the dimensions embedded entity continues to have "jpa enhanced":
_persistence_listener poitining to an AggregateAttributeChangeListener that itself points to a RepeatbleUnitOfWork.
And I can assure you, this Jacket entity appears unbreakable.
The Dimensions attribute can only be set on the entity by calling a setter.
And the setter that we have on entity is not doing a plain old:
this.dimentions = inputAttributeDimensions.
Instead, it is doing something of the form:
this.dimensions = new Dimensions(inputAttributeDimens.fieldA, ...)
So it wold be literally impossible for that enitity that is on the server session cache to have been corrupted by some dimensions instance that we might have cache somewhere at some point in time.
But if I can summarize the main points that you've clarified in your response these are:
(1) Entities on the server session cache should neither directly or indirectly (e.g. via lazy loadable lists that were expanded) hold in memory any reference to a repeatle unit of work
Or, said in a different manner, a RepetableUnitOfwork garabage collection root should never point to an entity on the ServerSession cache.
(2) RepetableUnits of work can outlive the life-span of the entity manager that gave it birth if the Application caches the entity somewhere (e.g. by using Local EJBs that return entities)
(3) Eclipselink has taken the decision of allowing the entities to keep themselves bound to RepetableUnitOf work outside the scope of transaction in order to avoid runtime exceptions on expansion of lazy loadable fields.
- I personally disagree with this decision, but this is my oppion and I accept your decison as a valid one.
Personally, I honestly think this is a good indention with very bad consequences that promotes developer misuse of the framework and poor code quality of people that do not know how to properly use eclipselink. My personal oppion is that it is much preferable to keep the memory clean and blow up the developer's code with: this field has not been loaded, either do not use JPA entities outside of a transaction or make sure you load all that you need.
If I am able to provide you a sample application that reproduces some of the issues I am studying right now, I will, to make it easier to get your input.
Kindest regards,
Nuno.
|
|
| | | | |
Goto Forum:
Current Time: Mon Sep 16 12:58:35 GMT 2024
Powered by FUDForum. Page generated in 0.04236 seconds
|