Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » EclipseLink » Hanging and long delays obtaining read locks
Hanging and long delays obtaining read locks [message #1108983] Sat, 14 September 2013 18:31 Go to next message
Randy Tidd is currently offline Randy Tidd
Messages: 8
Registered: August 2013
Junior Member
We are using EclipseLink 2.4.2 in a J2EE stack with WebLogic 10.3.6. We have a relatively small system with about 5-10 users in testing, eventually about 30 users in production, and a database with about 100 tables. The primary table has about 500 rows, most other tables have 1000-5000 rows, and there are a couple tables with 100,000-200,000 rows. I mention this just to indicate that there isn't much data and most operations are quick, for example doing a "select all" from a table typically only returns a few thousand rows and should take around 100-500 msec.

We often have a lot of "stuck threads", caused by lines in EclipseLink that are getting read locks, such as:

org.eclipse.persistence.internal.helper.ConcurrencyManager.acquireDeferredLock(ConcurrencyManager.java:198)

org.eclipse.persistence.internal.helper.ConcurrencyManager.acquire(ConcurrencyManager.java:94)

Looking through the stack traces of the thread dumps, the deadlocks seem to occur only during reading. EclipseLink is building objects for the cache based on the results from the queries, and one thread is writing to the cache which makes others wait, and somehow the waiting thread are kept waiting for more than 10 minutes (WebLogic's stuck thread timeout). I've attached a sample stack trace at the end of this message.

I have looked at the oft-cited FAQ about this:

http://wiki.eclipse.org/EclipseLink/FAQ/JPA#How_to_diagnose_and_resolve_hangs_and_deadlocks.3F

and have tried these suggestions but this has not helped. We are on a pretty recent version (2.4.2 is circa July 2013). Our queries are lazy, and we removed all join-fetches. We need the L2 cache (it is one of the main reasons we are using JPA).

I tried this:

setCacheTransactionIsolation(CONCURRENT_READ_WRITE)

which had no effect. Looking through the EclipseLink 2.4.2 source code, I don't see that this is even referenced anywhere? What behavior is it supposed to modify?

I wrote a standalone test which runs from the command line (outside of WebLogic) that simply performs a fetch of all 2000 rows in one table repeatedly in 20 threads. I then set breakpoints on all of the places in EclipseLink where it has been blocking on wait() calls. Usually within a few minutes the breakpoints are hit, indicating that there is thread contention when obtaining locks. My test does eventually finish but takes a long time, suggesting that the threads aren't completely hung, but are only significantly delayed for some reason.

We have noticed that the deadlocks/delays often occur when one thread is doing a "get all", for example fetching all objects from a table. This is usually <= 2000 objects, and so should be quick, but I guess building an object for each of those results and then writing it to the cache causes a lot of contention.

Frankly we are completely stumped at why we are having such problems with locks with such a small system and are at a loss for how to debug it. We are taking steps to decrease the amount of database traffic our system produces, getting rid of "get all" and "select *" queries when possible, but believe that there shouldn't be enough database traffic to so easily lock up EclipseLink.

I have gone through the EclipseLink source code looking at the areas where it is hanging, and read the comments and referenced bug reports, but don't have any more insight into what could be happening. If anyone has any suggestions for debugging or troubleshooting this, or EclipseLink parameters that we can set that might either improve the behavior or add more diagnostics, I would be extremely grateful.

java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
org.eclipse.persistence.internal.helper.ConcurrencyManager.acquireDeferredLock(ConcurrencyManager.java:198)
org.eclipse.persistence.internal.identitymaps.CacheKey.acquireDeferredLock(CacheKey.java:184)
org.eclipse.persistence.internal.identitymaps.AbstractIdentityMap.acquireDeferredLock(AbstractIdentityMap.java:98)
org.eclipse.persistence.internal.identitymaps.IdentityMapManager.acquireDeferredLock(IdentityMapManager.java:119)
org.eclipse.persistence.internal.sessions.IdentityMapAccessor.acquireDeferredLock(IdentityMapAccessor.java:75)
org.eclipse.persistence.internal.sessions.AbstractSession.retrieveCacheKey(AbstractSession.java:4810)
org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:782)
org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildWorkingCopyCloneNormally(ObjectBuilder.java:723)
org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObjectInUnitOfWork(ObjectBuilder.java:676)
org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:609)
org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:564)
org.eclipse.persistence.queries.ObjectLevelReadQuery.buildObject(ObjectLevelReadQuery.java:777)
org.eclipse.persistence.queries.ReadAllQuery.registerResultInUnitOfWork(ReadAllQuery.java:797)
org.eclipse.persistence.queries.ReadAllQuery.executeObjectLevelReadQuery(ReadAllQuery.java:434)
org.eclipse.persistence.queries.ObjectLevelReadQuery.executeDatabaseQuery(ObjectLevelReadQuery.java:1150)
org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:852)
org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:1109)
org.eclipse.persistence.queries.ReadAllQuery.execute(ReadAllQuery.java:393)
org.eclipse.persistence.queries.ObjectLevelReadQuery.executeInUnitOfWork(ObjectLevelReadQuery.java:1197)
org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:2879)
org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1607)
org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1589)
org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1554)
org.eclipse.persistence.internal.jpa.QueryImpl.executeReadQuery(QueryImpl.java:231)
org.eclipse.persistence.internal.jpa.QueryImpl.getResultList(QueryImpl.java:411)
(service call to get all objects from one table)

Re: Hanging and long delays obtaining read locks [message #1110930 is a reply to message #1108983] Tue, 17 September 2013 14:18 Go to previous messageGo to next message
James Sutherland is currently offline James Sutherland
Messages: 1939
Registered: July 2009
Location: Ottawa, Canada
Senior Member

For it to be using acquireDeferredLock instead of acquireLock, this means the class must have a non LAZY relationship, or the query is using a fetch-join, so double check this.

We have not seen any deadlocks or significant waiting in any of our concurrency tests, which run with more than 20 threads, so what you are seeing is odd. For a deadlock to occur, there must by a cycle of locks, so a single thread dump is not enough to diagnose an issue.

If you can recreate something with a simplified test, your best option may be to log a bug.

Also, first try the latest 2.5.x release.



James : Wiki : Book : Blog : Twitter
Re: Hanging and long delays obtaining read locks [message #1110974 is a reply to message #1110930] Tue, 17 September 2013 15:27 Go to previous messageGo to next message
Randy Tidd is currently offline Randy Tidd
Messages: 8
Registered: August 2013
Junior Member
Thank you for your reply. Incidentally, I just found this message from the eclipselink users mailing list (Jan 2013) which looks very similar to our issue, though unfortunately there were no replies to the message. This was with EclipseLink 2.3.2 (we are on 2.4.2).

http://dev.eclipse.org/mhonarc/lists/eclipselink-users/msg07690.html

I actually spent some time trying to figure out why it was trying to obtain deferred locks. ClassDescriptor.postInitialize() is setting setShouldAcquireCascadedLocks(true) for the class descriptor used in the query based on this code:

            if (!shouldAcquireCascadedLocks()) {
                if (mapping.isForeignReferenceMapping()){
                    if (!((ForeignReferenceMapping)mapping).usesIndirection()){
                        setShouldAcquireCascadedLocks(true);
                    }
                    hasRelationships = true;
                }

I am not sure I totally understand this but it seems that if my entity is the destination of a ManyToOne relationship, this flag is set to true, and this flag is part of this line in ObjectLevelReadQuery:

setRequiresDeferredLocks(DeferredLockManager.SHOULD_USE_DEFERRED_LOCKS && (hasJoining() || (this.descriptor.shouldAcquireCascadedLocks())));

If there were a way for us to avoid deferred locks and get around that deadlock, I would be eager to hear more about it. We are not intentionally using non-lazy or join fetches, though I suspect that EclipseLink is using them behind the scenes for some reason.

All of our JPA classes have an embedded type; all of our database tables have audit columns to track creation/update times and this is implemented via @Embedded. Could this have anything to do with it?

We make use of inheritance, though we see deadlocks on fetches for entities both with and without inheritance.

I suspect that there must be something unusual about our data model (inheritance, relationships, etc) that is tripping a concurrency bug that is not the normal case, however we must find an answer for this.

However, note that we also have deadlocks in the non-deferred case, i.e. ConcurrencyManager.acquire(). Below is one of the stacks that leads to that call. The source of this stack is a JPQL query like "select d from Deal d", which in this case is fetching only 75 objects.

I would really appreciate any tips or ideas that you (or anyone) else might have to help dianogse this issue. It is very widespread and is a show-stopper from putting our system in production because it can't run for more than a few hours without hanging. Thanks very much in advance.

Randy

java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:205)
weblogic.work.ExecuteThread.run(ExecuteThread.java:226)
"[STUCK] ExecuteThread: '49' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock org.eclipse.persistence.internal.helper.ConcurrencyManager@26e574b2 WAITING
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
org.eclipse.persistence.internal.helper.ConcurrencyManager.acquire(ConcurrencyManager.java:94)
org.eclipse.persistence.internal.identitymaps.CacheKey.acquire(CacheKey.java:133)
org.eclipse.persistence.internal.identitymaps.AbstractIdentityMap.acquireLock(AbstractIdentityMap.java:122)
org.eclipse.persistence.internal.identitymaps.IdentityMapManager.acquireLock(IdentityMapManager.java:150)
org.eclipse.persistence.internal.sessions.IdentityMapAccessor.acquireLock(IdentityMapAccessor.java:93)
org.eclipse.persistence.internal.sessions.IdentityMapAccessor.acquireLock(IdentityMapAccessor.java:84)
org.eclipse.persistence.internal.sessions.AbstractSession.retrieveCacheKey(AbstractSession.java:4834)
org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:782)
org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildWorkingCopyCloneNormally(ObjectBuilder.java:723)
org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObjectInUnitOfWork(ObjectBuilder.java:676)
org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:609)
org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:564)
org.eclipse.persistence.queries.ObjectLevelReadQuery.buildObject(ObjectLevelReadQuery.java:777)
org.eclipse.persistence.queries.ReadAllQuery.registerResultInUnitOfWork(ReadAllQuery.java:797)
org.eclipse.persistence.queries.ReadAllQuery.executeObjectLevelReadQuery(ReadAllQuery.java:434)
org.eclipse.persistence.queries.ObjectLevelReadQuery.executeDatabaseQuery(ObjectLevelReadQuery.java:1150)
org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:852)
org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:1109)
org.eclipse.persistence.queries.ReadAllQuery.execute(ReadAllQuery.java:393)
org.eclipse.persistence.queries.ObjectLevelReadQuery.executeInUnitOfWork(ObjectLevelReadQuery.java:1197)
org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:2879)
org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1607)
org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1589)
org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1554)
org.eclipse.persistence.internal.jpa.QueryImpl.executeReadQuery(QueryImpl.java:231)
org.eclipse.persistence.internal.jpa.QueryImpl.getResultList(QueryImpl.java:411)
Re: Hanging and long delays obtaining read locks [message #1111595 is a reply to message #1110974] Wed, 18 September 2013 12:37 Go to previous messageGo to next message
James Sutherland is currently offline James Sutherland
Messages: 1939
Registered: July 2009
Location: Ottawa, Canada
Senior Member

Ensure you are correctly using weaving. LAZY is not supported unless you are using weaving.

James : Wiki : Book : Blog : Twitter
Re: Hanging and long delays obtaining read locks [message #1111670 is a reply to message #1111595] Wed, 18 September 2013 14:56 Go to previous messageGo to next message
Randy Tidd is currently offline Randy Tidd
Messages: 8
Registered: August 2013
Junior Member
Thanks again James for your reply. We are using "internal" weaving:

<property name="eclipselink.weaving.internal" value="true"/>

To be honest we haven't given any consideration to weaving beyond this setting. Do you think that configured weaving, or static weaving, may have an impact on this cache behavior?

This is a J2EE setup with WebLogic 10.3.6 and EJB 3.0 and I have read that weaving is supported automatically.

My assumption about dynamic vs. static weaving is that static weaving could save some time at startup but wouldn't otherwise influence the performance of the system. We have about 120 JPA classes in the system, so more than a few but I wouldn't expect this to cause major problems. If I have that wrong, please let me know.

If there is any documentation about weaving that gives a good view of the pros and cons of different approaches, that'd be really helpful. I found this guide but it does not have a lot of detail:

http://www.eclipse.org/eclipselink/documentation/2.4/solutions/testingjpa004.htm
Re: Hanging and long delays obtaining read locks [message #1116669 is a reply to message #1111670] Wed, 25 September 2013 18:26 Go to previous messageGo to next message
James Sutherland is currently offline James Sutherland
Messages: 1939
Registered: July 2009
Location: Ottawa, Canada
Senior Member

If you are using a managed JPA context in JavaEE (such as injected into EJB) then weaving should be enabled by default.
If you are using application managed persistence units, or Spring, then it may be possible that weaving is not enabled. It should be obvious if weaving is not enabled, as you should see all of your 1-1 relationship eager loading.

Also ensure you are using LAZY fetching in all of the your 1-1 relationships.


James : Wiki : Book : Blog : Twitter
Re: Hanging and long delays obtaining read locks [message #1129689 is a reply to message #1116669] Tue, 08 October 2013 22:26 Go to previous messageGo to next message
Marvin Toll is currently offline Marvin Toll
Messages: 30
Registered: July 2009
Member
Can someone help me understand if the query interaction with the ConcurrencyManager takes place in the context of accessing L1 (Persistence Context) cache or L2 (shared) cache - OR BOTH?


Marvin Toll
CTO, Pattern Enabled Development
http://PatternEnabled.com
Re: Hanging and long delays obtaining read locks [message #1129819 is a reply to message #1129689] Wed, 09 October 2013 01:42 Go to previous messageGo to next message
Michael Nielson is currently offline Michael Nielson
Messages: 15
Registered: April 2011
Junior Member
Your stack is very similar to this bug I've logged:

https://bugs.eclipse.org/bugs/show_bug.cgi?id=413775

The gist of the problem is this:

(Thread 1) running the MergeManager has acquired the ConcurrencyManager and then as part of the merge triggers evaluation of a lazy Value Holder.

(Thread 2) holds the lock on a Value Holder and is attempting to evaluate this lazy Value Holder but is blocked attempting to acquire the ConcurrencyManager.

In my scenario the thread stack you posted would come from Thread 2.

I have both threads posted here as an attachment to the bug:

https://bugs.eclipse.org/bugs/attachment.cgi?id=233811
Re: Hanging and long delays obtaining read locks [message #1129820 is a reply to message #1129689] Wed, 09 October 2013 01:46 Go to previous messageGo to next message
Michael Nielson is currently offline Michael Nielson
Messages: 15
Registered: April 2011
Junior Member
If I'm understanding correctly and L1 Cache is the UnitOfWork cache and L2 cache is the IdentityMap I believe that the ConcurrencyManager is only acquired for the L2 cache (The UnitOfWork cache can only be accessed by one thread.)
Re: Hanging and long delays obtaining read locks [message #1130266 is a reply to message #1129820] Wed, 09 October 2013 11:28 Go to previous messageGo to next message
Marvin Toll is currently offline Marvin Toll
Messages: 30
Registered: July 2009
Member
There are more terms for L1 and L2 cache than there are days of the week! http://www.eclipse.org/eclipselink/documentation/2.4/concepts/cache001.htm

[The link is a wonderful example of confusion... the article uses one set of terms and the image a different set!]

If we use the terms:

L1 = Isolated Persistence Context Cache - Managed by the EntityManger
L2 = Shared Persistence Unit Cache - Managed by an EntityManagerFactory


I'm trying to discern if varying the Entity Manager life-cycle implementation has a potential impact on the this read lock issue? Particularly in a multicore (or multi-threading or parallel processing) context using the ForkJoinPool - where concurrent WORKER THREADS (not the main thread) are accessing the Isolated Persistence Context Cache.

Having been enlightened by Michael's stack traces... the revised question is:

Does query interaction with the MergeManager take place in the context of accessing L1 (Isolated Persistence Context) cache or L2 (Shared Persistence Unit) cache - OR BOTH?



Marvin Toll
CTO, Pattern Enabled Development
http://PatternEnabled.com
Re: Hanging and long delays obtaining read locks [message #1130348 is a reply to message #1130266] Wed, 09 October 2013 13:14 Go to previous messageGo to next message
Chris Delahunt is currently offline Chris Delahunt
Messages: 1023
Registered: July 2009
Senior Member
For any deadlock, the full thread dump needs to be taken into account as well as the objects involved. Most stacks of a locked thread will look very similar to one another if you look at past issues, even though they may have very different causes. Marvin, if you are hitting a deadlock in an isolate cache environment, you will have to check that your EntityManagers are not being used concurrently in different threads. EMs are not thread safe and will cause issues, among them is the potential for locks.

The mergeManager is used when merging. So in the context of locking, it would only need locks on objects that could be shared, so the L2 Cache. At the end of a transaction, the last step is to merge changes into the L2/shared cache so that other threads/users can see them without requiring a database hit. In an isolated persistence context there is no shared cache, and so this merge should not occur.

As for the documentation - it has picked up different names for similar concepts over the years as the standards and even the product name has changed. Legacy or native EclipseLink uses a 3 tier pattern with a ServerSession-ClientSession-UnitOfWork. Within JPA, the EMF replaces the ServerSession, while the EM stands in for the ClientSession and UnitOfWork. The EM/ClientSession/UnitOfWork is the L1/isolated cache while the ServerSession/EMF is the shared/L2 cache.

Each level has both an identity map and a cache depending on settings. For instance, a full identitymap keeps everything with hard references, while a soft-cache weak-identity map (the default) as the name suggests has a fixed size cache that uses soft references, but still reference everything for identity purposes with weak references.

Hope that helps with the terminology used within the documentation.

Best Regards,
Chris
Re: Hanging and long delays obtaining read locks [message #1130671 is a reply to message #1130348] Wed, 09 October 2013 20:34 Go to previous messageGo to next message
Marvin Toll is currently offline Marvin Toll
Messages: 30
Registered: July 2009
Member
Do you think that EclipseLink is using older non-atomic collections and therefore requires a Concurrency Manager? I'm particularily curious about the Identity Map - which I have not been able to find in the source... is it a ConcurrentHashMap<K, V>?

Said another way, is the root cause problem of read related locking attributable to the fact that the aging Toplink code base has not been updated with Doug Lea's java.util.concurrent.* fields?



Marvin Toll
CTO, Pattern Enabled Development
http://PatternEnabled.com
Re: Hanging and long delays obtaining read locks [message #1131363 is a reply to message #1130671] Thu, 10 October 2013 13:02 Go to previous messageGo to next message
Chris Delahunt is currently offline Chris Delahunt
Messages: 1023
Registered: July 2009
Senior Member
Probably not. Locking is needed, and while the mechanisms for locking might have been expanded, the problems locking cause haven't really changed. EclipseLink requires a complex mechanism to allow some threads to modify parts of the object graph while other threads attempt to read and refresh that same object graph all concurrently. Depending on where these threads start their operations, there are always going to be the potential for conflicts. Are you running into an issue, or is this a code investigation?

[Updated on: Mon, 14 July 2014 12:57]

Report message to a moderator

Re: Hanging and long delays obtaining read locks [message #1131457 is a reply to message #1130266] Thu, 10 October 2013 14:44 Go to previous messageGo to next message
Michael Nielson is currently offline Michael Nielson
Messages: 15
Registered: April 2011
Junior Member
The IdentityMap has quite a few responsibilities. The core collection where all Entities are stored is a ConcurrentHashMap.

The MergeManager's locking is used to apply the changes from a completed transaction (The changes made in the UnitOfWork/L1) to the L2 cache. Locking is required so the changes from two separate transactions are applied correctly to the master L2 cache copy.

Chris's explanation is much better.

From the stack trace you gave it looks like the query attempting to acquire the MergeManager is not originating from a ValueHolder. A more complete thread dump would help a lot. Chris is right that the stack you gave does not demonstrate the same bug (although it could be a symptom or just present in another thread).

Multi-thread access to an EntityManager is very dangerous, definitely make sure that your app is not doing that.



Re: Hanging and long delays obtaining read locks [message #1170140 is a reply to message #1131457] Mon, 04 November 2013 12:39 Go to previous messageGo to next message
Marvin Toll is currently offline Marvin Toll
Messages: 30
Registered: July 2009
Member
Let's take another tack... instead of focusing on what we should not do... how do we engage all of our cores when using EclipseLink? That is, what is the recommended guidance to get the fantastic performance gains available when using concurrent (or multicore) processing?

http://patternenabled.com/soaj/performance/



Marvin Toll
CTO, Pattern Enabled Development
http://PatternEnabled.com
Re: Hanging and long delays obtaining read locks [message #1397392 is a reply to message #1108983] Wed, 09 July 2014 16:56 Go to previous message
John Bedell is currently offline John Bedell
Messages: 2
Registered: July 2014
Junior Member
Randy, did you ever get resolution to this issue? We are running into the same problem.
Previous Topic:MSSQL encoding issue
Next Topic:how to implement join for Same table
Goto Forum:
  


Current Time: Tue Sep 23 10:42:56 GMT 2014

Powered by FUDForum. Page generated in 0.02740 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software