| Deadlock in cache under load [message #491205] |
Tue, 13 October 2009 11:59  |
Adam Esterline Messages: 2 Registered: July 2009 |
Junior Member |
|
|
We are seeing dead locks occur under load with EclipseLink 1.1.1 and 1.1.2. We started seeing this issue when we moved from the JBoss transaction manager to a Spring based transaction manager. We took the Srping transaction manager for TopLink and changed the packages so it works with EclipseLink. We are seeing this issue on Jetty 6.1.20 as well as our previous version of JBoss (4.0.5) using the Spring based transaction manager. The deadlock is occurring in the cache on reads.
I have the thread dumps attached (gzipped). I have tried to attach them, but the attachment stuff isn't working on the forum. You can download the gzipped text file here: http://jefmsmit.googlepages.com/server-out.txt.gz
Thanks,
Adam
[Updated on: Tue, 13 October 2009 12:02] Report message to a moderator
|
|
|
| Re: Deadlock in cache under load [message #492221 is a reply to message #491205] |
Mon, 19 October 2009 10:05   |
James Sutherland Messages: 1844 Registered: July 2009 |
Senior Member |
|
|
Deadlock issue are normally involved, so you may be best off contacting Oracle technical support on the issue if you have a support contract.
Looking at your thread dump, there seems to be a very large number of threads involved, so it is difficult to find the issue. If you could recreate the issue with fewer threads that would be helpful. I could not see any nested locks that would be required to cause a deadlock, but there were some deferred locks that may be related to the issue.
Do you only get the issue with Spring, and not with normal JEE? This would seem to indicate a Spring related issue. Could possible be the beginEarlyTransaction call the Spring integration may be doing.
What are you doing to cause the issue? Are you using fetch-joining? Do you have non-lazy relationships?
Some things you can try,
- disable the cache (shared=false) should be a workaround
- disable deferred locks may resolve the issue, (inc ode set, DeferredLockManager.SHOULD_USE_DEFERRED_LOCKS=false)
- ensure you are using lazy relationships always
- try the latest patch release or build or 1.2
James : Wiki : Book : Blog
|
|
|
|
| Re: Deadlock in cache under load [message #504067 is a reply to message #496677] |
Wed, 16 December 2009 20:04   |
John Mikula Messages: 8 Registered: December 2009 |
Junior Member |
|
|
I'm having the same problem.
I'm running an application on Sun GlassFish Enterprise Server v2.1 (9.1.1) (build b60e-fcs) with EclipseLink 1.1.2 as my JPA implementation. Our application has been getting a lot more load lately, and I've frequently encountered a situation in which all of my HTTP worker threads are locked waiting at
org.eclipse.persistence.internal.helper.ConcurrencyManager.a cquire(ConcurrencyManager.java:89)
Except for one, which is stuck in a loop at
org.eclipse.persistence.internal.helper.ConcurrencyManager.r eleaseDeferredLock(ConcurrencyManager.java:454)
I can provide a full thread dump of the locked state.
I've followed some of the advice from this thread:
http://forums.oracle.com/forums/thread.jspa?threadID=851676
Particularly setting
DeferredLockManager.SHOULD_USE_DEFERRED_LOCKS = false;
didn't seem to help at all. And setting "eclipselink.cache.shared.default"="false" causes several of my unit tests to fail. From what I understand setting that option isolates the read cache from the write cache, which seems to be having adverse side effects on my application.
I am not using join fetching, but I am using QueryHint.REFRESH_CASCADE. Do these have similar behavior?
Moreover, how do I resolve my problem? I've looked at the code for ConcurrencyManager in the more recent releases and I see no changes in this part of the code, so if it's a known bug, it doesn't appear to have been addressed.
In my case, it's not a true deadlock, but an infinite loop. Here is a code snip from ConcurrencyManager.releaseDeferredLock()
// Thread have three stages, one where they are doing work (i.e. building objects)
// two where they are done their own work but may be waiting on other threads to finish their work,
// and a third when they and all the threads they are waiting on are done.
// This is essentially a busy wait to determine if all the other threads are done.
while (true) {
// 2612538 - the default size of Map (32) is appropriate
Map recursiveSet = new IdentityHashMap();
if (isBuildObjectOnThreadComplete(currentThread, recursiveSet)) {// Thread job done.
lockManager.releaseActiveLocksOnThread();
removeDeferredLockManager(currentThread);
AbstractSessionLog.getLog().log(SessionLog.FINER, "deferred_locks_released", currentThread.getName());
return;
} else {// Not done yet, wait and check again.
try {
Thread.sleep(1);
} catch (InterruptedException ignoreAndContinue) {
}
}
}
For some reason isBuildObjectOnThreadComplete(currentThread, recursiveSet) always returns false, so that thread stays in that loop, and the concurrency manager never issues a notify() to release the waiting threads.
Cheers
|
|
|
| Re: Deadlock in cache under load [message #504180 is a reply to message #491205] |
Thu, 17 December 2009 10:21   |
James Sutherland Messages: 1844 Registered: July 2009 |
Senior Member |
|
|
Try the latest release, there was an issue fixed that may be related.
Refreshing should not be causes an issue.
Ensure you are using LAZY on all relationships.
There was a bug in the DeferredLockManager.SHOULD_USE_DEFERRED_LOCKS = false backdoor, so avoid that (unless on latest release, I think it was fixed).
There are some debug methods on IdentityMapAccessor that may be of some use.
Include the threads dumps for the blocked threads.
If you have a support contract, contact technical support.
James : Wiki : Book : Blog
|
|
|
|
|
| Re: Deadlock in cache under load [message #504430 is a reply to message #504247] |
Fri, 18 December 2009 12:23  |
John Mikula Messages: 8 Registered: December 2009 |
Junior Member |
|
|
My team spent some effort reproducing the problem in a test environment. We found that if we rapidly and repeatedly made API calls against our application (which in turn made API calls to eclipselink JPA) we could bring glassfish down within 10-20 minutes.
I was then able to experiment with a variety of options to help pinpoint the problem, including deploying historical versions of my application from our SCM.
As it turns out, the problem only manifested itself after I added QueryHint.REFRESH to all of my queries. I suspect there is a bug in EclipseLink that causes this, but for my part, the reason I put all the refresh hints in the code is that I am running on multiple servers and haven't gotten cache coordination working properly yet.
Hope this helps anyone who comes after.
Cheers
|
|
|
Powered by
FUDForum. Page generated in 0.01871 seconds