Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » EclipseLink » Eclipselink - dead locks in aqcuire and release deferred locks - Eclipselink 2.4.2 and above
Eclipselink - dead locks in aqcuire and release deferred locks - Eclipselink 2.4.2 and above [message #1801870] Mon, 28 January 2019 19:02 Go to next message
Nuno Godinho de Matos is currently offline Nuno Godinho de MatosFriend
Messages: 34
Registered: September 2012
Member
Hi,

In an ongoing project running on Weblogic 12.1.2, which bundles eclipselink 2.4.2, every two or three weeks, the hole system crashes beacuse of what appears to be dead locks in the ConcurrencyManager / CacheKeys of eclipselink.


Pretty much, the system goes into a situation that everything dies Singleton EJBs are forever, deltaspike locks are forever locked, etc... because ultimately threads at the eclipselink layer are forever stopped either doing:
- Aqcuire Lock
- Release Deferred Lock (and it is interesteding that a method called release deferred lock can actually cause thread to be waiting forever).

In the last thread dump I have looked at, we had the interesting scenario.
11 THREADS were STUCK with stack traces similar to the one bellow.
NOTE: - This Eclipselink 2.4.2 - The lines of code in the concurrency manager of 2.6.4 would be elsewhere.


------------------------ ACQUIRE LOCK STUCH THREAD START SNIPPET ------------------------------
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at org.eclipse.persistence.internal.helper.ConcurrencyManager.acquire(ConcurrencyManager.java:94)
- locked <0x00000007119e05e8> (a org.eclipse.persistence.internal.helper.ConcurrencyManager)
at org.eclipse.persistence.internal.identitymaps.CacheKey.acquire(CacheKey.java:133)
at org.eclipse.persistence.internal.identitymaps.AbstractIdentityMap.acquireLock(AbstractIdentityMap.java:122)
at org.eclipse.persistence.internal.identitymaps.IdentityMapManager.acquireLock(IdentityMapManager.java:150)
at org.eclipse.persistence.internal.sessions.IdentityMapAccessor.acquireLock(IdentityMapAccessor.java:93)
at org.eclipse.persistence.internal.sessions.IdentityMapAccessor.acquireLock(IdentityMapAccessor.java:84)
at org.eclipse.persistence.internal.sessions.AbstractSession.retrieveCacheKey(AbstractSession.java:4834)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:782)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:611)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:564)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.buildObject(ObjectLevelReadQuery.java:777)
at org.eclipse.persistence.queries.ReadObjectQuery.executeObjectLevelReadQuery(ReadObjectQuery.java:462)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeDatabaseQuery(ObjectLevelReadQuery.java:1150)
at org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:852)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:1109)
at org.eclipse.persistence.queries.ReadObjectQuery.execute(ReadObjectQuery.java:421)
at org.eclipse.persistence.internal.sessions.AbstractSession.internalExecuteQuery(AbstractSession.java:2977)
at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1607)
at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1589)
at org.eclipse.persistence.internal.indirection.NoIndirectionPolicy.valueFromQuery(NoIndirectionPolicy.java:326)
at org.eclipse.persistence.mappings.ForeignReferenceMapping.valueFromRowInternal(ForeignReferenceMapping.java:2162)
at org.eclipse.persistence.mappings.OneToOneMapping.valueFromRowInternal(OneToOneMapping.java:1717)
at org.eclipse.persistence.mappings.ForeignReferenceMapping.valueFromRow(ForeignReferenceMapping.java:2051)
at org.eclipse.persistence.mappings.ForeignReferenceMapping.readFromRowIntoObject(ForeignReferenceMapping.java:1386)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildAttributesIntoObject(ObjectBuilder.java:448)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.refreshObjectIfRequired(ObjectBuilder.java:3706)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:842)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:611)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:564)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.buildObject(ObjectLevelReadQuery.java:777)
at org.eclipse.persistence.queries.ReadObjectQuery.executeObjectLevelReadQuery(ReadObjectQuery.java:462)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeDatabaseQuery(ObjectLevelReadQuery.java:1150)
at org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:852)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:1109)
at org.eclipse.persistence.queries.ReadObjectQuery.execute(ReadObjectQuery.java:421)
at org.eclipse.persistence.internal.sessions.AbstractSession.internalExecuteQuery(AbstractSession.java:2977)
at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1607)
at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1589)
at org.eclipse.persistence.internal.indirection.NoIndirectionPolicy.valueFromQuery(NoIndirectionPolicy.java:326)
at org.eclipse.persistence.mappings.ForeignReferenceMapping.valueFromRowInternal(ForeignReferenceMapping.java:2162)
at org.eclipse.persistence.mappings.OneToOneMapping.valueFromRowInternal(OneToOneMapping.java:1717)
at org.eclipse.persistence.mappings.ForeignReferenceMapping.valueFromRow(ForeignReferenceMapping.java:2051)



------------------------ ACQUIRE LOCK STUCK THREAD END SNIPPET ------------------------------



In the next snippet I will put the code of eclipse link running this acquire:

---------------------------------------------
ECLIPSELINK CODE:
---------------------------------------
/**
* Wait for all threads except the active thread.
* If the active thread just increment the depth.
* This should be called before entering a critical section.
* called with true from the merge process, if true then the refresh will not refresh the object
*/
public synchronized void acquire(boolean forMerge) throws ConcurrencyException {
while (((this.activeThread != null) || (this.numberOfReaders > 0)) && (this.activeThread != Thread.currentThread())) {
// This must be in a while as multiple threads may be released, or another thread may rush the acquire after one is released.
try {
this.numberOfWritersWaiting++;
wait(); <-------------------------------------------------------------- THE THREAD IS HERE IN AN ETERNAL LOOP AND NOT GOING ANYWHERE (most likely because of a dead lock)
this.numberOfWritersWaiting--;
} catch (InterruptedException exception) {
throw ConcurrencyException.waitWasInterrupted(exception.getMessage());
}
}
if (this.activeThread == null) {
this.activeThread = Thread.currentThread();
if (shouldTrackStack){
this.stack = new Exception();
}
}
this.lockedByMergeManager = forMerge;
this.depth++;
}
----------------------------------------------------------------------------


Ok the code above has a couple of things that I do not really understand.
1 - We are doing a TIMELESS WAIT inside of a WHILE.
I get the point while the condition of the while keeps returning TRUE the thread cannot be allowed to go forward and believe it has acuired the cache key.
Fine.
But if the code is not DEAD LOCK free - which I am guessing it is not - it should at least be DEAD LOCK resistent.
In the sense that if after TOO LONG we are still inside of this WHILE.
It is time to interrupt the transition, roll it back.,
Releas any of the acquired LOCKs by the thread.
And let other threads go forward with their transactions.
IN this manner - if transaction Synchronization is GOOD - which it appears to be - but not bullet proof - which appears to be not - at least if threads are given as RULE max allowed time to acquire a lock, we can be certain that a dead lock does not run eternal.
After a while threads will start abroding and releasing their locks.

This does not happen.


Point 2.
The code above looks wrong to me.
At least I do not understand why the thread - if it is interrupted does not reset the writers count.

Look at this:

wait();
this.numberOfWritersWaiting--;
} catch (InterruptedException exception) {
throw ConcurrencyException.waitWasInterrupted(exception.getMessage());
}

Ok - So if my WAIT is interrupted:
this.numberOfWritersWaiting-- will never be called.

So the Cache Key will forever have a GHOST Writer.
Or am I missing something?


------------ END of ACQUIRE STACK TRACE COMMENTS ------------------


The other part of the story in this dead lock are the 47 threads that apparently managed to get STUCK forever doing a RELEASE DEFERRED LOCKS.

The stack traces of these threads look all very similar to this:

------------------------ BEGIN SNIPPET RELEASE DEFFERRED LOCK STACK TRACE ------------------------------------
[STUCK] ExecuteThread: '130' for queue: 'weblogic.kernel.Default (self-tuning)' - priority:2 - threadId:0x000000002b07e800 - nativeId:0x2380 - nativeId (decimal):9088 - state:TIMED_WAITING
stackTrace:
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.eclipse.persistence.internal.helper.ConcurrencyManager.releaseDeferredLock(ConcurrencyManager.java:466) CASE 1
at org.eclipse.persistence.internal.identitymaps.CacheKey.releaseDeferredLock(CacheKey.java:419)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:872)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildWorkingCopyCloneNormally(ObjectBuilder.java:723)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObjectInUnitOfWork(ObjectBuilder.java:676)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:609)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:564)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.buildObject(ObjectLevelReadQuery.java:777)
at org.eclipse.persistence.queries.ReadObjectQuery.conformResult(ReadObjectQuery.java:355)
at org.eclipse.persistence.queries.ReadObjectQuery.registerResultInUnitOfWork(ReadObjectQuery.java:783)
at org.eclipse.persistence.queries.ReadObjectQuery.executeObjectLevelReadQuery(ReadObjectQuery.java:460)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeDatabaseQuery(ObjectLevelReadQuery.java:1150)
at org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:852)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:1109)
at org.eclipse.persistence.queries.ReadObjectQuery.execute(ReadObjectQuery.java:421)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeInUnitOfWork(ObjectLevelReadQuery.java:1197)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:2879)
at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1607)
at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1589)
at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1540)
at org.eclipse.persistence.internal.jpa.EntityManagerImpl.executeQuery(EntityManagerImpl.java:838)
at org.eclipse.persistence.internal.jpa.EntityManagerImpl.findInternal(EntityManagerImpl.java:778)
at org.eclipse.persistence.internal.jpa.EntityManagerImpl.find(EntityManagerImpl.java:671)
at org.eclipse.persistence.internal.jpa.EntityManagerImpl.find(EntityManagerImpl.java:543)
at sun.reflect.GeneratedMethodAccessor249.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at weblogic.persistence.BasePersistenceContextProxyImpl.invoke(BasePersistenceContextProxyImpl.java:111)
at weblogic.persistence.TransactionalEntityManagerProxyImpl.invoke(TransactionalEntityManagerProxyImpl.java:82)
at weblogic.persistence.BasePersistenceContextProxyImpl.invoke(BasePersistenceContextProxyImpl.java:92)



------------------------ END SNIPPET RELEASE DEFFERRED LOCK STACK TRACE ------------------------------------


----------- ECLIPSELINK CODE RELATED TO RELEASE DEFFERRED LOCK ----------------
/**
* Release the deferred lock.
* This uses a deadlock detection and resolution algorithm to avoid cache deadlocks.
* The deferred lock manager keeps track of the lock for a thread, so that other
* thread know when a deadlock has occurred and can resolve it.
*/
public void releaseDeferredLock() throws ConcurrencyException {
Thread currentThread = Thread.currentThread();
DeferredLockManager lockManager = getDeferredLockManager(currentThread);
if (lockManager == null) {
return;
}
int depth = lockManager.getThreadDepth();

if (depth > 1) {
lockManager.decrementDepth();
return;
}

// If the set is null or empty, means there is no deferred lock for this thread, return.
if (!lockManager.hasDeferredLock()) {
lockManager.releaseActiveLocksOnThread();
removeDeferredLockManager(currentThread);
return;
}

lockManager.setIsThreadComplete(true);

// Thread have three stages, one where they are doing work (i.e. building objects)
// two where they are done their own work but may be waiting on other threads to finish their work,
// and a third when they and all the threads they are waiting on are done.
// This is essentially a busy wait to determine if all the other threads are done.
while (true) {
try{
// 2612538 - the default size of Map (32) is appropriate
Map recursiveSet = new IdentityHashMap();
if (isBuildObjectOnThreadComplete(currentThread, recursiveSet)) {// Thread job done.
lockManager.releaseActiveLocksOnThread();
removeDeferredLockManager(currentThread);
AbstractSessionLog.getLog().log(SessionLog.FINER, SessionLog.CACHE, "deferred_locks_released", currentThread.getName());
return;
} else {// Not done yet, wait and check again.
try {
Thread.sleep(1); <-------------------- This is where threads are stuck forever
} catch (InterruptedException interrupted) {
AbstractSessionLog.getLog().logThrowable(SessionLog.SEVERE, SessionLog.CACHE, interrupted);
lockManager.releaseActiveLocksOnThread();
removeDeferredLockManager(currentThread);
throw ConcurrencyException.waitWasInterrupted(interrupted.getMessage());
}
}
} catch (Error error) {
AbstractSessionLog.getLog().logThrowable(SessionLog.SEVERE, SessionLog.CACHE, error);
lockManager.releaseActiveLocksOnThread();
removeDeferredLockManager(currentThread);
throw error;
}
}
}
--------------------------------------------------------------------------

I actually prefer the code of the release deferred lock to that of the acquire.
In this release deferred lock the thread is not given a TIMELESS WAIT.

I deslike still the fact that the code in this method suffers from the same problem as the acuire. If you have a problem - and the threads are stuck forever in the eclipselink layer of code they:
(a) Will never get out of it
(b) They will never tell you what locks they have aquired, what cache keys they did not acquire that somebody else has acuired etc...

I would imagine that if a thread is stuck in RELEASED deferred lock for too long.
It should be possible to build a graph stating:
- This thread is owning Cache Key A,B,C and wants cache key D.
Bu cache KEY D is owned by thead 2. And thread 2 is trying to acquire cache KEY A.
HEre is your dead lock.

No. We look at a stack trace and we have literaly no Idea what entity cache key is currently being the root of our problem.
We have no Idea of knowing what resources each THREAD is currently owning.
And what resources it cannot acquire.
We have none of this information.

IT would already be a very big help if at least even without this information, if threads get into a cyclic dead lock they they can explode via an exception out of this.


NOTE:
For the time being, to try to survive the ongoing problem, what I have done was attack the DEFERRED lock source code.
Since the deferred lock source code seems to not suffer from the cuncrrency exception bug of the acquire lock source code that forgets to decrement thenumber of writers.

And in the RELEASE DEFERRED lock, right now, if the process is in that while(true) for more than 40 seconds - configurable by system property - I make sure that I simulate a concurrency exception.
This in hope that with the simulated exception the code:
lockManager.releaseActiveLocksOnThread();
removeDeferredLockManager(currentThread);

Will run and solve my dead lock.



---------------------

Question:
- Since eclipselink 2.4.2 have there been improvements to the concurrency mangament?
- Have any dead lock risks been resolved/mitigated since then?
- Have any features to explode threads out of dead loclks been implemented?

- Is it not a BUG that the acquire lock, if INTERRUPTED, does not do the same thing that the releaseDeferredLock code is doing, namely, the thread is not going to be releasing the active locks on the thread. Which menas - the exception handling of an interrupted thread in releasing a deferred lock is logically correct, but that of the acquire lock will break the cache leaving the cache key corrupted with the counters of writters incremented and the locks aquired by the thread still blocked by the thread that was interrupted?


NOTE:
I have seen a very similar thread dump occuring in an eclipselink 2.6.4 running weblogic 12.2.1.2.
So I am guessing that depending on the degree of concurrency and number of JDBC connections allowed by an application, the risk for dead locks is also present in 2.6.4. But if 2.6.4 has improvements in relation to 2.4.2 - it would be a good Idea, in my oppinion to down port the concurrency manager imprvements to 2.4.X branch as well...
Would this be possible? Are there any improvements?



Are there any plans of improving the concurrency manager to ensure that the system dead lock TOLERANT by, say, aborting transacitons that are stuck for too long try to aquire release locks?


A third and useful improvement, would be to make a dead lock Explanation tool. I am guessing as well that for someone knowing eclipselink lock management system, that it is is not that hard to make a search algorithm that starts an Apparently stuck thread and sees what locks it cannot aquire but wants to aquire. Expans the thread that is currently owning the lock and does the same to ID until it finally finds out a lock already expanded in the past.
The report could say:
- (a) Here is the set of threads in a dead lock
- (b) Here is the Locks each has acquired that create a circular dependency.

Could this be done?
Are there any plans to do this?


If we know which threads got stuck together, and by looking at the stack trace we know what business logic these threads are triggering, perhaps there would be a possibility to serialize these processes and from the business side mitigate the risk of dead lock.

Can anybody comment on this?

Both in regards to the version 2.4.2 of WLS 12.1.2 and 2.6.4 of WLS 12.2.1.2?


What recommendations would you have to mitigate the dead lock problem?

The thread dumps say tones about hte problem but at the same time they say very little.
To have any chance of really understanding the dead lock one:
- Needs a Heap dump that we can look at stuck thread memory state and figure out what cache key is getting us burned to aquire
- Needs to understand the domain model of eclipselink in depth to understand how these deferred locks are supposed to work and be managed
- Needs a lot of luck as well, because some information is simply not having a level of detail necessary.
Such as a counter of Numbers of Reads/Writters.
This is not very helpful to know that there are 4 Readers on a cache key.
Because a counter of readers is just an integer, it tells us nothing about the threads that are currently being counted as readers for example.
So I consider it boderline impossible for a non eclipselink expert to have any chance of determining the reason for the dead lock.
The answer to the question should come from eclipselink code itsef - after the system is percieved to start mal functioning on the DB layer.
Could something be done to improve this?



Thanks for any help.
Thanks for any comments and suggestons.

For the time being - I am going with a bazuka approach. The release deferred lock in the elcipselink we are using is tuned to if the method does not finish in 40 seocnds - assume the dead lock is in place and run the interrupt exception logic that releases all of the deferred locks.
Would there be any better approach as a work around?


Kindest regards

[Updated on: Tue, 29 January 2019 07:33]

Report message to a moderator

Re: Eclipselink - dead locks in aqcuire and release deferred locks - Eclipselink 2.4.2 and above [message #1801944 is a reply to message #1801870] Tue, 29 January 2019 17:31 Go to previous messageGo to next message
Nuno Godinho de Matos is currently offline Nuno Godinho de MatosFriend
Messages: 34
Registered: September 2012
Member
In the case of hacking eclipselink 2.6.4.

This is what the HACKED class currently looks like:

/*******************************************************************************
 * Copyright (c) 1998, 2013 Oracle and/or its affiliates. All rights reserved.
 * This program and the accompanying materials are made available under the 
 * terms of the Eclipse Public License v1.0 and Eclipse Distribution License v. 1.0 
 * which accompanies this distribution. 
 * The Eclipse Public License is available at http://www.eclipse.org/legal/epl-v10.html
 * and the Eclipse Distribution License is available at 
 * http://www.eclipse.org/org/documents/edl-v10.php.
 *
 * Contributors:
 *     Oracle - initial API and implementation from Oracle TopLink
 ******************************************************************************/  
package org.eclipse.persistence.internal.helper;

import java.io.*;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;

import org.eclipse.persistence.config.SystemProperties;
import org.eclipse.persistence.exceptions.*;
import org.eclipse.persistence.internal.localization.*;
import org.eclipse.persistence.internal.identitymaps.CacheKey;
import org.eclipse.persistence.logging.*;

/**
 * INTERNAL:
 * <p>
 * <b>Purpose</b>: To maintain concurrency for a particular task.
 * It is a wrappers of a semaphore that allows recursive waits by a single thread.
 * <p>
 * <b>Responsibilities</b>:
 * <ul>
 * <li> Keep track of the active thread.
 * <li> Wait all other threads until the first thread is done.
 * <li> Maintain the depth of the active thread.
 * </ul>
 */
public class ConcurrencyManager implements Serializable {
    
    protected int numberOfReaders;
    protected int depth;
    protected int numberOfWritersWaiting;
    protected volatile transient Thread activeThread;
    public static Map<Thread, DeferredLockManager> deferredLockManagers = initializeDeferredLockManagers();
    protected boolean lockedByMergeManager;
    
    protected static boolean shouldTrackStack = System.getProperty(SystemProperties.RECORD_STACK_ON_LOCK) != null;
    protected Exception stack;

    /**
     * Initialize the newly allocated instance of this class.
     * Set the depth to zero.
     */
    public ConcurrencyManager() {
        this.depth = 0;
        this.numberOfReaders = 0;
        this.numberOfWritersWaiting = 0;
    }

    /**
     * Wait for all threads except the active thread.
     * If the active thread just increment the depth.
     * This should be called before entering a critical section.
     */
    public void acquire() throws ConcurrencyException {
        this.acquire(false);
    }

    /**
     * Wait for all threads except the active thread.
     * If the active thread just increment the depth.
     * This should be called before entering a critical section.
     * called with true from the merge process, if true then the refresh will not refresh the object
     */
    public synchronized void acquire(boolean forMerge) throws ConcurrencyException {
     // FIX - APQ-2473 - Flag the time when we start the while loop
        final Date whileStartDate = new Date();
        Thread currentThread = Thread.currentThread();
        DeferredLockManager lockManager = getDeferredLockManager(currentThread);
        
        while (((this.activeThread != null) || (this.numberOfReaders > 0)) && (this.activeThread != Thread.currentThread())) {
            // This must be in a while as multiple threads may be released, or another thread may rush the acquire after one is released.
            try {
                this.numberOfWritersWaiting++;
                
                // FIX - APQ-2473 - Do not wait forever - max allowed time to wait is 10 second and then check conditions again
                // wait();
                wait(10000l);
                
                
                // FIX - APQ-2473 -
                // Run a method that will fire up an exception if we  having been sleeping for too long
                HackingEclipseHelperUtil.SINGLETON.determineIfReleaseDeferredLockAppearsToBeDeadLocked(this, whileStartDate, lockManager);
                
                // FIX - APQ-2473 -  Moved to finally block to ensure it always runs
                // this.numberOfWritersWaiting--;
            } catch (InterruptedException exception) {
                // FIX - APQ-2473 - If the thread is interrupted we want to make sure we release all of the locks the thread was owning
                releaseAllLocksAquiredByThread(lockManager);
                
                throw ConcurrencyException.waitWasInterrupted(exception.getMessage());
            } finally {
                // FIX - APQ-2473 -
                // Since above we incremente the number of writers
                // whether or not the thread is exploded by an interrupt 
                // we need to make sure we decrement the number of writer to not allow the code to be corrupeted 
                this.numberOfWritersWaiting--;
            }
        }
        if (this.activeThread == null) {
            this.activeThread = Thread.currentThread();
            if (shouldTrackStack){
                this.stack = new Exception();
            }
        }
        this.lockedByMergeManager = forMerge;
        this.depth++;
    }

    /**
     * If the lock is not acquired already acquire it and return true.
     * If it has been acquired already return false
     * Added for CR 2317
     */
    public boolean acquireNoWait() throws ConcurrencyException {
        return acquireNoWait(false);
    }

    /**
     * If the lock is not acquired already acquire it and return true.
     * If it has been acquired already return false
     * Added for CR 2317
     * called with true from the merge process, if true then the refresh will not refresh the object
     */
    public synchronized boolean acquireNoWait(boolean forMerge) throws ConcurrencyException {
        if ((this.activeThread == null && this.numberOfReaders == 0) || (this.activeThread == Thread.currentThread())) {
            //if I own the lock increment depth
            acquire(forMerge);
            return true;
        } else {
            return false;
        }
    }

    /**
     * If the lock is not acquired already acquire it and return true.
     * If it has been acquired already return false
     * Added for CR 2317
     * called with true from the merge process, if true then the refresh will not refresh the object
     */
    public synchronized boolean acquireWithWait(boolean forMerge, int wait) throws ConcurrencyException {
        if ((this.activeThread == null && this.numberOfReaders == 0) || (this.activeThread == Thread.currentThread())) {
            //if I own the lock increment depth
            acquire(forMerge);
            return true;
        } else {
            try {
                wait(10000l);
            } catch (InterruptedException e) {
                return false;
            }
            if ((this.activeThread == null && this.numberOfReaders == 0) || (this.activeThread == Thread.currentThread())){
                acquire(forMerge);
                return true;
            }
            return false;
        }
    }

    /**
     * If the activeThread is not set, acquire it and return true.
     * If the activeThread is set, it has been acquired already, return false.
     * Added for Bug 5840635
     * Call with true from the merge process, if true then the refresh will not refresh the object.
     */
    public synchronized boolean acquireIfUnownedNoWait(boolean forMerge) throws ConcurrencyException {
        // Only acquire lock if active thread is null. Do not check current thread. 
        if (this.activeThread == null && this.numberOfReaders == 0) {
             // if lock is unowned increment depth
            acquire(forMerge);
            return true;
        } else {
            return false;
        }
    }

    /**
     * Add deferred lock into a hashtable to avoid deadlock
     */
    public void acquireDeferredLock() throws ConcurrencyException {
        Thread currentThread = Thread.currentThread();
        DeferredLockManager lockManager = getDeferredLockManager(currentThread);
        if (lockManager == null) {
            lockManager = new DeferredLockManager();
            putDeferredLock(currentThread, lockManager);
        }
        lockManager.incrementDepth();
        synchronized (this) {
            // FIX - APQ-2473 - Flag the time when we start the while loop
            final Date whileStartDate = new Date();
            
            while (this.numberOfReaders != 0) {
                // There are readers of this object, wait until they are done before determining if
                //there are any other writers.  If not we will wait on the readers for acquire.  If another
                //thread is also waiting on the acquire then a deadlock could occur.  See bug 3049635
                //We could release all active locks before releasing deferred but the object may not be finished building
                //we could make the readers get a hard lock, but then we would just build a deferred lock even though
                //the object is not being built.
                try {
                    this.numberOfWritersWaiting++;
                  
                    // FIX - APQ-2473 - Do not wait forever - max allowed time to wait is 10 second and then check conditions again
                    // wait();
                    wait(1000l);
                    
                    // FIX - APQ-2473 -  Moved to finally block to ensure it always runs
                    // this.numberOfWritersWaiting--;
                    

                    // FIX - APQ-2473 -
                    // Run a method that will fire up an exception if we  having been sleeping for too long
                    HackingEclipseHelperUtil.SINGLETON.determineIfReleaseDeferredLockAppearsToBeDeadLocked(this, whileStartDate, lockManager);                    
                } catch (InterruptedException exception) {   
                    // FIX - APQ-2473 - If the thread is interrupted we want to make sure we release all of the locks the thread was owning
                    releaseAllLocksAquiredByThread(lockManager);
                    
                    
                    throw ConcurrencyException.waitWasInterrupted(exception.getMessage());
                } finally {
                    // FIX - APQ-2473 -
                    // Since above we incremente the number of writers
                    // whether or not the thread is exploded by an interrupt 
                    // we need to make sure we decrement the number of writer to not allow the code to be corrupeted 
                    this.numberOfWritersWaiting--;
                }
            }
            if ((this.activeThread == currentThread) || (!isAcquired())) {
                lockManager.addActiveLock(this);
                acquire();
            } else {
                lockManager.addDeferredLock(this);
                if (AbstractSessionLog.getLog().shouldLog(SessionLog.FINER) && this instanceof CacheKey) {
                    AbstractSessionLog.getLog().log(SessionLog.FINER, SessionLog.CACHE, "acquiring_deferred_lock", ((CacheKey)this).getObject(), currentThread.getName());
                }
            }
        }
    }
        
    /**
     * Check the lock state, if locked, acquire and release a deferred lock.
     * This optimizes out the normal deferred-lock check if not locked.
     */
    public void checkDeferredLock() throws ConcurrencyException {
        // If it is not locked, then just return.
        if (this.activeThread == null) {
            return;
        }
        acquireDeferredLock();
        releaseDeferredLock();
    }
    
    /**
     * Check the lock state, if locked, acquire and release a read lock.
     * This optimizes out the normal read-lock check if not locked.
     */
    public void checkReadLock() throws ConcurrencyException {
        // If it is not locked, then just return.
        if (this.activeThread == null) {
            return;
        }
        acquireReadLock();
        releaseReadLock();
    }
    
    /**
     * Wait on any writer.
     * Allow concurrent reads.
     */
    public synchronized void acquireReadLock() throws ConcurrencyException {
        // Cannot check for starving writers as will lead to deadlocks.
        while ((this.activeThread != null) && (this.activeThread != Thread.currentThread())) {
            try {
                wait();
            } catch (InterruptedException exception) {
                throw ConcurrencyException.waitWasInterrupted(exception.getMessage());
            }
        }
        this.numberOfReaders++;
    }

    /**
     * If this is acquired return false otherwise acquire readlock and return true
     */
    public synchronized boolean acquireReadLockNoWait() {
        if ((this.activeThread == null) || (this.activeThread == Thread.currentThread())) {
            acquireReadLock();
            return true;
        } else {
            return false;
        }
    }

    /**
     * Return the active thread.
     */
    public Thread getActiveThread() {
        return activeThread;
    }

    /**
     * Return the deferred lock manager from the thread
     */
    public static DeferredLockManager getDeferredLockManager(Thread thread) {
        return getDeferredLockManagers().get(thread);
    }

    /**
     * Return the deferred lock manager hashtable (thread - DeferredLockManager).
     */
    protected static Map<Thread, DeferredLockManager> getDeferredLockManagers() {
        return deferredLockManagers;
    }
    
    /**
     * Init the deferred lock managers (thread - DeferredLockManager).
     */
    protected static Map initializeDeferredLockManagers() {
        return new ConcurrentHashMap();
    }

    /**
     * Return the current depth of the active thread.
     */
    public int getDepth() {
        return depth;
    }

    /**
     * Number of writer that want the lock.
     * This is used to ensure that a writer is not starved.
     */
    public int getNumberOfReaders() {
        return numberOfReaders;
    }

    /**
     * Number of writers that want the lock.
     * This is used to ensure that a writer is not starved.
     */
    public int getNumberOfWritersWaiting() {
        return numberOfWritersWaiting;
    }
    
    /**
     * Return if a thread has acquire this manager.
     */
    public boolean isAcquired() {
        return depth > 0;
    }

    /**
     * INTERNAL:
     * Used byt the refresh process to determine if this concurrency manager is locked by
     * the merge process.  If it is then the refresh should not refresh the object
     */
    public boolean isLockedByMergeManager() {
        return this.lockedByMergeManager;
    }

    /**
     * Check if the deferred locks of a thread are all released
     */
    public static boolean isBuildObjectOnThreadComplete(Thread thread, Map recursiveSet) {
        if (recursiveSet.containsKey(thread)) {
            return true;
        }
        recursiveSet.put(thread, thread);

        DeferredLockManager lockManager = getDeferredLockManager(thread);
        if (lockManager == null) {
            return true;
        }

        Vector deferredLocks = lockManager.getDeferredLocks();
        for (Enumeration deferredLocksEnum = deferredLocks.elements();
                 deferredLocksEnum.hasMoreElements();) {
            ConcurrencyManager deferedLock = (ConcurrencyManager)deferredLocksEnum.nextElement();
            Thread activeThread = null;
            if (deferedLock.isAcquired()) {
                activeThread = deferedLock.getActiveThread();

                // the active thread may be set to null at anypoint
                // if added for CR 2330 
                if (activeThread != null) {
                    DeferredLockManager currentLockManager = getDeferredLockManager(activeThread);
                    if (currentLockManager == null) {
                        return false;
                    } else if (currentLockManager.isThreadComplete()) {
                        activeThread = deferedLock.getActiveThread();
                        // The lock may suddenly finish and no longer have an active thread.
                        if (activeThread != null) {
                            if (!isBuildObjectOnThreadComplete(activeThread, recursiveSet)) {
                                return false;
                            }
                        }
                    } else {
                        return false;
                    }
                }
            }
        }
        return true;
    }

    /**
     * Return if this manager is within a nested acquire.
     */
    public boolean isNested() {
        return depth > 1;
    }

    public void putDeferredLock(Thread thread, DeferredLockManager lockManager) {
        getDeferredLockManagers().put(thread, lockManager);
    }

    /**
     * Decrement the depth for the active thread.
     * Assume the current thread is the active one.
     * Raise an error if the depth become < 0.
     * The notify will release the first thread waiting on the object,
     * if no threads are waiting it will do nothing.
     */
    public synchronized void release() throws ConcurrencyException {
        if (this.depth == 0) {
            throw ConcurrencyException.signalAttemptedBeforeWait();
        } else {
            this.depth--;
        }
        if (this.depth == 0) {
            this.activeThread = null;
            if (shouldTrackStack){
                this.stack = null;
            }
            this.lockedByMergeManager = false;
            notifyAll();
        }
    }

    /**
     * Release the deferred lock.
     * This uses a deadlock detection and resolution algorithm to avoid cache deadlocks.
     * The deferred lock manager keeps track of the lock for a thread, so that other
     * thread know when a deadlock has occurred and can resolve it.
     */
    public void releaseDeferredLock() throws ConcurrencyException {
        Thread currentThread = Thread.currentThread();
        DeferredLockManager lockManager = getDeferredLockManager(currentThread);
        if (lockManager == null) {
            return;
        }
        int depth = lockManager.getThreadDepth();

        if (depth > 1) {
            lockManager.decrementDepth();
            return;
        }

        // If the set is null or empty, means there is no deferred lock for this thread, return.
        if (!lockManager.hasDeferredLock()) {
            lockManager.releaseActiveLocksOnThread();
            removeDeferredLockManager(currentThread);
            return;
        }

        lockManager.setIsThreadComplete(true);
        
        // FIX - APQ-2473 - start a date time
        // Thread have three stages, one where they are doing work (i.e. building objects)
        // two where they are done their own work but may be waiting on other threads to finish their work,
        // and a third when they and all the threads they are waiting on are done.
        // This is essentially a busy wait to determine if all the other threads are done.
        final Date whileStartDate = new Date();

        // Thread have three stages, one where they are doing work (i.e. building objects)
        // two where they are done their own work but may be waiting on other threads to finish their work,
        // and a third when they and all the threads they are waiting on are done.
        // This is essentially a busy wait to determine if all the other threads are done.
        while (true) {
            try{
                // 2612538 - the default size of Map (32) is appropriate
                Map recursiveSet = new IdentityHashMap();
                if (isBuildObjectOnThreadComplete(currentThread, recursiveSet)) {// Thread job done.
                    lockManager.releaseActiveLocksOnThread();
                    removeDeferredLockManager(currentThread);
                    AbstractSessionLog.getLog().log(SessionLog.FINER, SessionLog.CACHE, "deferred_locks_released", currentThread.getName());
                    return;
                } else {// Not done yet, wait and check again.
                    try {
                        Thread.sleep(1);
                        
                        // FIX - APQ-2473 -
                        // Run a method that will fire up an exception if we  having been sleeping for too long
                        HackingEclipseHelperUtil.SINGLETON.determineIfReleaseDeferredLockAppearsToBeDeadLocked(this, whileStartDate, lockManager);
                        
                    } catch (InterruptedException interrupted) {
                        AbstractSessionLog.getLog().logThrowable(SessionLog.SEVERE, SessionLog.CACHE, interrupted);
                        releaseAllLocksAquiredByThread(lockManager);
                        throw ConcurrencyException.waitWasInterrupted(interrupted.getMessage());
                    }
                }
            } catch (Error error) {
                AbstractSessionLog.getLog().logThrowable(SessionLog.SEVERE, SessionLog.CACHE, error);
                releaseAllLocksAquiredByThread(lockManager);
                throw error;
            }
        }
    }
    
    

    /**
     * Decrement the number of readers.
     * Used to allow concurrent reads.
     */
    public synchronized void releaseReadLock() throws ConcurrencyException {
        if (this.numberOfReaders == 0) {
            throw ConcurrencyException.signalAttemptedBeforeWait();
        } else {
            this.numberOfReaders--;
        }
        if (this.numberOfReaders == 0) {
            notifyAll();
        }
    }

    /**
     * Remove the deferred lock manager for the thread
     */
    public static DeferredLockManager removeDeferredLockManager(Thread thread) {
        return getDeferredLockManagers().remove(thread);
    }

    /**
     * Set the active thread.
     */
    public void setActiveThread(Thread activeThread) {
        this.activeThread = activeThread;
    }

    /**
     * Set the current depth of the active thread.
     */
    protected void setDepth(int depth) {
        this.depth = depth;
    }

    /**
     * INTERNAL:
     * Used by the mergemanager to let the read know not to refresh this object as it is being
     * loaded by the merge process.
     */
    public void setIsLockedByMergeManager(boolean state) {
        this.lockedByMergeManager = state;
    }

    /**
     * Track the number of readers.
     */
    protected void setNumberOfReaders(int numberOfReaders) {
        this.numberOfReaders = numberOfReaders;
    }

    /**
     * Number of writers that want the lock.
     * This is used to ensure that a writer is not starved.
     */
    protected void setNumberOfWritersWaiting(int numberOfWritersWaiting) {
        this.numberOfWritersWaiting = numberOfWritersWaiting;
    }
    
    public synchronized void transitionToDeferredLock() {
        Thread currentThread = Thread.currentThread();
        DeferredLockManager lockManager = getDeferredLockManager(currentThread);
        if (lockManager == null) {
            lockManager = new DeferredLockManager();
            putDeferredLock(currentThread, lockManager);
        }
        lockManager.incrementDepth();
        lockManager.addActiveLock(this);
    }

    /**
     * Print the nested depth.
     */
    public String toString() {
        Object[] args = { Integer.valueOf(getDepth()) };
        return Helper.getShortClassName(getClass()) + ToStringLocalization.buildMessage("nest_level", args);
    }

    public Exception getStack() {
        return stack;
    }

    public void setStack(Exception stack) {
        this.stack = stack;
    }

    public static boolean shouldTrackStack() {
        return shouldTrackStack;
    }

    /**
     * INTERNAL:
     * This can be set during debugging to record the stacktrace when a lock is acquired.
     * Then once IdentityMapAccessor.printIdentityMapLocks() is called the stack call for each
     * lock will be printed as well.  Because locking issues are usually quite time sensitive setting 
     * this flag may inadvertently remove the deadlock because of the change in timings.
     * 
     * There is also a system level property for this setting. "eclipselink.cache.record-stack-on-lock"
     * @param shouldTrackStack
     */
    public static void setShouldTrackStack(boolean shouldTrackStack) {
        ConcurrencyManager.shouldTrackStack = shouldTrackStack;
    }
    
    // Hacking APIs
    /**
     * For the thread to release all of its locks.
     * @param lockManager
     *      the deferred lock manager
     */
    // FIX - APQ-2473 - Put in one place the code that frees up the locks acuired by the current thread
    protected void releaseAllLocksAquiredByThread(DeferredLockManager lockManager) {
        Thread currentThread = Thread.currentThread();
        lockManager.releaseActiveLocksOnThread();
        removeDeferredLockManager(currentThread);
        
        // Log some information in the LOG
        String cacheKeyToString = HackingEclipseHelperUtil.SINGLETON.createToStringExplainingOwnedCacheKey(this);
        String errMsg = "releaseAllLocksAquiredByThread has been invoked. On  " + cacheKeyToString;
        AbstractSessionLog.getLog().log(SessionLog.SEVERE, SessionLog.CACHE, "deferred_locks_released", currentThread.getName());
        
        // Question:
        // Would it be safe to send a NOTIFY ALL?
        // probably not. But there might be threads waiting for the release of the lock
        // Normally the Notifyall is necessary when locks are released. But in this case we are even dealing with not one but ALL
        // of hte locks owned by thread so there would be multiple objects notify - we probably cannot do this
    }
    
}




And then there is this helper UTIL:

/*******************************************************************************
 * Copyright (c) 1998, 2013 Oracle and/or its affiliates. All rights reserved.
 * This program and the accompanying materials are made available under the 
 * terms of the Eclipse Public License v1.0 and Eclipse Distribution License v. 1.0 
 * which accompanies this distribution. 
 * The Eclipse Public License is available at http://www.eclipse.org/legal/epl-v10.html
 * and the Eclipse Distribution License is available at 
 * http://www.eclipse.org/org/documents/edl-v10.php.
 *
 * Contributors:
 *     Oracle - initial API and implementation from Oracle TopLink
 ******************************************************************************/  
package org.eclipse.persistence.internal.helper;

import java.io.*;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;

import org.eclipse.persistence.config.SystemProperties;
import org.eclipse.persistence.exceptions.*;
import org.eclipse.persistence.internal.localization.*;
import org.eclipse.persistence.internal.identitymaps.CacheKey;
import org.eclipse.persistence.logging.*;

/**
 * Helper class to help us build diagnostic strings
 */
public class HackingEclipseHelperUtil implements Serializable {
    
    public static final HackingEclipseHelperUtil SINGLETON = new HackingEclipseHelperUtil();
    
   
    /**
     * @return check if the user has specified a system property to control how long we are willing to wait before firing up an exception
     */
    public long getMaxAllowedSleepTimeMs() {
        long defaultMaxAllowedSleepTimeMs = 40000;
        
        // (a) Check if a system property was provided by the user to control how long we tolerate before exploding thread waiting to release
        // deferred locks
        String eclipseLinkConcurrencyManagerMaxAllowedSleepTimeMs = System.getProperty("eclipseLinkConcurrencyManagerMaxAllowedSleepTimeMs");
        if(eclipseLinkConcurrencyManagerMaxAllowedSleepTimeMs != null) {
            try {
                return Long.parseLong(eclipseLinkConcurrencyManagerMaxAllowedSleepTimeMs.trim());
            } catch(Exception ignoreE) {
                return defaultMaxAllowedSleepTimeMs ;
            }            
        }
        return defaultMaxAllowedSleepTimeMs;
    }
    
    /**
     * Create a print of the ACTIVE locks associated to the DeferredLockManager
     */
    public String createStringWithSummaryOfActiveLocksOnThread(DeferredLockManager lockManager) {
        StringBuilder sb = new StringBuilder();
        sb.append("DeferredLockManager - Listing of all Deferred Locks.");
        sb.append("\n\n");
        sb.append("Thread Name: ").append(Thread.currentThread().getName());
        sb.append("\n\n");
        
        // Loop over al of the active locks and print them
        long activeLock = 0;
        Vector activeLocks = lockManager.getActiveLocks();
        if (!activeLocks.isEmpty()) {
            for (Enumeration activeLocksEnum = activeLocks.elements();
                     activeLocksEnum.hasMoreElements();) {
                activeLock++;
                ConcurrencyManager manager = (ConcurrencyManager)activeLocksEnum.nextElement();
                String concurrencyManagerActiveLock = SINGLETON.createToStringExplainingOwnedCacheKey(manager);
                sb.append("ACTIVE LOCK NR: ").append("" + activeLock).append("ConcurrencyManager: ").append(concurrencyManagerActiveLock);
                sb.append("\n\n");
            }
        }
        return sb.toString();        
    }
    
    /**
     * 
     * @return A to string of the owned cache key
     */
    public String createToStringExplainingOwnedCacheKey(ConcurrencyManager concurrencyManager) {
        if(concurrencyManager instanceof CacheKey) {
            CacheKey cacheKey = (CacheKey) concurrencyManager;
            Object cacheKeykey = cacheKey.getKey();
            Object cacheKeyObject = cacheKey.getObject();
            String canonicalName = cacheKeyObject != null? cacheKeyObject.getClass().getCanonicalName():null;
            return String.format("ConcurrencyManager-CacheKey: %1$s ownerCacheKey (key,object, canonicalName) = (%2$s, %3$s, %4$s).", this, cacheKeykey, cacheKeyObject, canonicalName);
        } else {
            return String.format("ConcurrencyManager: %1$s. .", this);
        }
        
    }
    
    /**
     * Throw an interrupted exception if appears that eclipse link code is taking too long to release a deferred lock. 
     * @param whileStartDate
     *      the start date of the while tru loop for releasing a deferred lock 
     * @throws InterruptedException
     *  we fire an interupted exception to ensure that the code  blows up and releases all of the locks it had.
     */
    public void determineIfReleaseDeferredLockAppearsToBeDeadLocked(ConcurrencyManager concurrencyManager, final Date whileStartDate, DeferredLockManager lockManager) throws InterruptedException {
        // Determine if we believe to be dealing with a dead lock
        Thread currentThread = Thread.currentThread();
        final long maxAllowedSleepTime40ThousandMs = HackingEclipseHelperUtil.SINGLETON.getMaxAllowedSleepTimeMs();
        Date whileCurrentDate = new Date();
        long elpasedTime = whileCurrentDate.getTime() - whileStartDate.getTime();
        if (elpasedTime > maxAllowedSleepTime40ThousandMs) {
            // We believe this is a dead lock so now we will log some information
            String ownedCacheKey = createToStringExplainingOwnedCacheKey(concurrencyManager);
           
            String errorMessageBase = String.format(
                    "RELEASE DEFERRED LOCK PROBLEM:  The release deffered log process has not managed to finish in: %1$s ms.%n"
                            + " (ownerCacheKey) = (%2$s). Current thread: %3$s. %n",
                    elpasedTime, ownedCacheKey, currentThread.getName());
            String errorMessageExplainingActiveLocksOnThread = HackingEclipseHelperUtil.SINGLETON.createStringWithSummaryOfActiveLocksOnThread(lockManager);
            String errorMessage = errorMessageBase +  errorMessageExplainingActiveLocksOnThread;

            AbstractSessionLog.getLog().log(SessionLog.SEVERE, SessionLog.CACHE, errorMessage,
                    currentThread.getName());
            
            throw new InterruptedException(errorMessage);
        }
    }
    
}



Re: Eclipselink - dead locks in aqcuire and release deferred locks - Eclipselink 2.4.2 and above [message #1801945 is a reply to message #1801944] Tue, 29 January 2019 17:34 Go to previous messageGo to next message
Nuno Godinho de Matos is currently offline Nuno Godinho de MatosFriend
Messages: 34
Registered: September 2012
Member
In the above hacked code.

The modifications can all be found simply by searching for:
// FIX - APQ-2473

And it will be pretty clear that what I am doing to the code is trying to ensure that there are not ETERNAL WAITS until someone remembers to notifY().

That every wait has a time limt.
And if we are waiting for too long the determineIfReleaseDeferredLockAppearsToBeDeadLocked -- will fire some fake thread interrupted exception.

And this leads to try to release all of the Locks.

I think that something like this - put properly implemented - is needed very direly.
And should ideally be back merged far back.

So that we can patch our concurrency managers.

Re: Eclipselink - dead locks in aqcuire and release deferred locks - Eclipselink 2.4.2 and above [message #1803447 is a reply to message #1801945] Thu, 28 February 2019 19:04 Go to previous messageGo to next message
Chris Delahunt is currently offline Chris DelahuntFriend
Messages: 1389
Registered: July 2009
Senior Member
There is a lot of information here, making it hard to understand what you are after. The locking scenario in EclipseLink is rather complex, but there is a good reason -Processes like updates, refreshes and the initial queries that bring objects into the cache must prevent other reading level processes from returning those objects until they are completed. A deferred lock is just a way that allows a reading thread to register and get in line for a notification that it can return and use those objects. In applications that use a shared cache and are heavy on the refreshes and updates on key/root components in an object graph, these root objects the graph might cause a bottleneck for the system. The way to identify this is to look at the threads involved in any apparent 'deadlock', and identify what they are doing; the object graphs they modify and access. If you know that, you can then modify your access or the model itself to remove those key spots of contention.

Commonly, mixing lazy and eager access in large graph will obfuscate the problem, as accessing a reference will force a query which goes to the cache and might load more objects then might be needed, causing other threads to build up on a slower write process. Putting in lazy everywhere possible sometimes helps, so these threads can do their thing without having to wait to fetch an entire object graph.

I don't disagree that some configurable level of detection on the ConcurrencyManager would be ideal, only that such changes usually have performance consequences. Unlike databases which have deadlock detection, EclipseLink does not control the container in which it runs and relies on the JVM to monitor its threads. If you discover such a situation, you might instrument your code to call printIdentityMaps which might show the objects involved in various EclipseLink locks, as mentioned here: https://www.eclipse.org/forums/index.php?t=msg&th=389907&goto=934039&#msg_934039

For others, the steps outlined here are also a good staring point before going into changing EclipseLink concurrency managers: http://wiki.eclipse.org/EclipseLink/FAQ/JPA#How_to_diagnose_and_resolve_hangs_and_deadlocks.3F
Re: Eclipselink - dead locks in aqcuire and release deferred locks - Eclipselink 2.4.2 and above [message #1822440 is a reply to message #1801870] Fri, 06 March 2020 10:22 Go to previous messageGo to next message
Nuno Godinho de Matos is currently offline Nuno Godinho de MatosFriend
Messages: 34
Registered: September 2012
Member
Hi Chris,

I would like to thank you for your reply. I am a bit delayed thanking you, since I had not noticed that I got a reply to my post.


I would also like to make you aware that the following bug has been opened:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=559307

On the bug report above:
Both weblogic 12.1.2 and 12.2.1.2 the only shield we have against having an application server that gets broken due to an accumulation of threads dead locked in the ObjectBuilder is the hacky code with which we modified eclipselink.
Our hacky code has prooven to be successful time and time again.
However, it is far from an ideal implementation. But it is definitely a minimal-surival change.

We believe that
- The base code should NOT be allowed to "wait in line" forever to aquire the deferred lock. After some reasonable time of waiting, it is time to abort the transaction, no matter what, and release any locks that might be held by this transaction that did not progress in time.
- The base code should do exactly what you mention in your post:
(Recognize the threads that are involved in a dead lock, so that e developer can try to mitigate the risks.) You probably have seen already stack traces from a frozen app server on dead locks. Youll never be able to tell what process is dead locking, because the situation escalates to the point that you eat all of the connecitons on the thread pool and after some time 100 of the JDBC connections manage to get stuck on dead lock. You look at the thread dump and there is nothing except threads waiting.

Implementing a dead lock detection mechanism should be simple, it is not a complex search problem if the domain model helps you know LOCK_A, OwnedBy Thread A.
My impression, is that with the todays eclipselink domain model, it would not be trivial to know who is the thread owning a specific lock, and that is a problem for writing a dead lock detection algorithm.

Obviously, a dead lock detection algorithm would never be used under normal circumstances.
But eclipselink is normally quite fast.
So it is its quite clear - from what I have seen now - if a thread is stuck acquiring a lock for 40 seconds:
--- there is a 100% guaranteed dead lock.
It is well worth the time to do a small search to find out how the dead lock is composed.
--- and most of the time when you detect the problem after 40 seconds, the number of threads that are hanging around is still somewhat limited. It takes a few minutes until the server goblles up all of the thread pool to the point that the thread pool will not give your more connections. So the search problem is even exploring a very limited and small search space. 5 ms problem are more than enough to figure out N threads in a dead lock.


On the points you have raised.
I do not believe, we suffer from the problem of EAGER loading relationships. At least on one-to-many.
But we definitely have some quite deep object graphs, where a ROOT entity A might have:
(a) one-to-one bi-directation relationship to an entity B (this is by defauly eager loaded)
(b) Many-to-one bi-directional relatinship to an entity C (this is by defautl eager loaded)
(c) one-to-many bi-directional to entity D (Should not be a problem) these are lazy loaded
(d) Many-to-many bi-directional relationship to entity E (should not be a problem these are lazy loaded)


And I believe if I were to forget all one to many and many to many relationships, I am pretty sure some specific entities we have in the system can go deep, quite deep, just based on the one-to-one many to one relatinohips.
I assume, based on your post, this would be a problem.
This would pose a risk of dead locking.


Can you please explain the following:
(a) If all you have would be READ ONLY transactions and no updates ony any entity, the ObjectBuilder would never go into dead lock in lock aquisition?
Dead locking would only take place if some thread has aquired a lock exclusively for writing?
Because if have seen a thread stuck trying to aquire a deferred lock on an entity that is NEVER updated for writing.

(b) If you Imagine a bi directional relation this Entity A <-> Entity B where the relationship is eager loaded because it is a one-to-one relationhship.
If you have one thread asking for Entity A, and the other asking to Entity B.
Could this pose a risk of dead locking?
Because one the threads would aquire first a "lock" on entity A an then need a lock on entity B, the other would get a lock first on Entity B and then need a lock Entity A ?


(c) When you run a Read query that returns results of type Entity A.
And this Entity A , let us assume, would have a gigantic Object grpah of relationships to other entities.
When eclipselink tries to load EntityA from its IndeityMapCaches, and searches for the entity A by its cache key.
Does it go to the Chache and simply load the entity on the cache for this Key?
Or does the algoritm need to:
First fetch the EntityA from IdentityMap byt its cache key.
Then "discard" all of the field relationhsips of this entityA to other entities.
E.g. EntityA has one to one relationship to EntityB.
So now EclipselInk on step(2) will do a lookup on the IdentityMap for the cached entity B and bind it on the EntityA?
And so continue until the object graph of EntityA has been completely loaded?
And while eclipselink is trying to build this EntityA fetching it as well as all corelated child relationships from the IdentityMap cache, it is holding onto its locks?
So in the example I was making above, I would assume eclipselink to build this object would be forced to:
- First Aquire Lock on Entity A, Contrinue drilling down and Acquire Lock on Entity B while not releaseing the lock it has on the Enity A.
And so on and so forth until the hole relationshiip graph of EntityA has been achieved?
These would be "exclusive locks" or would they be locks that other threads doing a "read" operation would be allowed to acquire as well?

And you give me a concrete example that illustrates this point?




(d) Can you give me a concrete logical example, with which two threads with a certain entity architecture could run themselves into a dead lock?
So that I get a better understanding how these dead locks are getting formed.
I would really like to understand a hands on illustration of how eclipselink could be forced to go into a dead lock, because that would help understand what we might need to fix in the way we defined entities and access entities.


(e) In the document:
https://wiki.eclipse.org/EclipseLink/FAQ/JPA#How_to_diagnose_and_resolve_hangs_and_deadlocks.3F

QUOTE:
"This is normally related to having relationships that do not use LAZY, ensure all relationship use LAZY."

Does this refer even to one-to-one relationhips? Or would those be allowed to be eager? Or is the ultimate point being: the deeper the entity hiearchy the higher the chances of having problems. So even if one-to-one with deep hierarchies better making them Lazy loaded?


(f) Would it help reducing the risk of dead locking.
If we haven EntityA and EntityB and they have a bidrectional relationship, one-to-one.
So both entities would appear as "root" entities.
Would it help if one of the entities says EntityA eagerly loads entityB.
But entityB LAZYLY loads EntityA?

(my idea with the question above, is trying to force the aquision of Locks n entities to always follow the same direction. Which I am guessing, will not be ensured for bi-directional relationships if both are eager).


(g) Do read only transactions (read queries and or one-to-many relationship resolution) end up aquiring WRITE locks for the purpose of doing ObjectBuidling?
From the thread dumps during dead-lock, that is the impression I have.
And I have confirmed that one of the deferred locks that "could not be /aquired or relesaded" in the released deferred lock method, was an entity that has a transaciton update count o 1 (it was never written in the system).
This seems to me that, for some reason,t he locking aquision that is taking place during READS of one-to-many and query result contruction, is "agressive" in the sense that entities you never imagine to get locked by anything other than a read lock, actually get locked via exclusive locks.
-- That is the current impresison I have.


Many thanks for your help.






[Updated on: Mon, 09 March 2020 07:25]

Report message to a moderator

Re: Eclipselink - dead locks in aqcuire and release deferred locks - Eclipselink 2.4.2 and above [message #1822552 is a reply to message #1822440] Mon, 09 March 2020 18:18 Go to previous messageGo to next message
Chris Delahunt is currently offline Chris DelahuntFriend
Messages: 1389
Registered: July 2009
Senior Member
Yes, suggestions to use lazy fetching refer even to OneToOne relationships. Your original stack trace showed some thread that were stuck on a find operation, fetching something that was 'possibly' held by another thread. One other thread thread shown was building an object and having to build at least 2 OneToOne relationships with a NoIndirectionPolicy, so in an A-B-C-D, any operations on 'C' should be suspect. Having indirection on these relationships might have allowed the thread to return 'A' not wait on the lock it was apparently stuck on, and so allowed any other threads waiting on locks it held to also return (if any). Only more details would show what that thread was waiting on - other threads that seem similar but slightly different are hard to spot when staring at massive stack traces - but are the only way to really dig out the root causes after the fact. Application profiling though should show these hot spots before deadlocks result, as they would show increasing wait times during concurrent/load access. Even if locking never occurs, lazy relationships are a way to speed up and tweak performance for particular use cases.

a) Deferred locks allow EclipseLink to traverse the object graph for reads and determine what it must wait for before it can return the graph. EclipseLink can't return A1 to you if it has an eager relationship to B1 if B1 isn't complete. The deferred lock is used to allow signalling when processes 'doing' something are done doing what ever it is, so that threads waiting to return the data can. That is why they wait to release it - they do what they need to, acquiring the deferred locks on everything in the graph, and then release them all at the end. Writing is just one process that can makes deferred locks wait - refreshing and cache invalidation are others. Anything that build an object in the cache prevents the deferred lock from releasing so that incomplete or changing graphs are not returned. If you have no writes or refreshes, in theory, the object graph could be loaded from any end - but this is never the case in my experience. Your stack should show one thread that is different from the others (except where the JVM has bugs and has threads die, but that hasn't been the case on modern JVMs) where it is refreshing or writing out data and holds write locks preventing deferred locks from being released.

B) I don't remember off hand. One thread fetches A, starts and tries building it, while the other starts from B. Because there is no indirection, neither can complete until the other does. This should be resolvable as each would obtain a lock on what they have and built, and get a deferred lock on what they don't have yet. They should release their write locks before waiting on a deferred lock, but I don't remember this level of implementation detail and haven't gone through the docs recently. In general, this is a bad situation regardless, as you can easily see the problems that can occur when writes and refreshes are involved, and I've never seen an issues occur on a model so simple as just an A-B relationship. Where an application accesses complex models needs to be profiled and pruned to avoid issues rather then rely on locking detection - it is an application bottleneck even if there is no deadlock. If a process only needs A, put a lazy reference to B there and avoid some of the risk in having to wait for B, or better yet, just have all access be on the A part of the graph.

c) Yes.
Not a great answer, but in honesty, locking is complex and I don't remember the finer aspects of the implementation to give you accurate specifics or an example, but what you state seems correct. Deferred locks are a way to allow waiting on other threads to complete their building of the graph without getting in the ways of writes (and so cause deadlocks). A deferred lock should not cause a deadlock, it is just that by and large, apps do more reads then writes, so it appears code is 'stuck' on reading. In truth, the stack should show 2 or more threads doing write (and/or refresh) operations that are causing a deadlock, and the reads are just casualties. But that is only IF there is a deadlock.

D) True deadlocks can occur just like on any database. They are almost always write related though, but I can see the potential for issues if you try refreshing an object graph from opposite ends concurrently. Refreshes involve locking the objects involved, so how the graph is traversed is important in how the locks are obtained. A-B->C->D vs D->C->B->A if both require refreshing the full graph is a whole lot of unnecessary... it is bound to have issues even if EclipseLink doesn't get into a deadlock. I don't remember the specifics, only what I watch out for in my own app.

What also commonly happens though isn't a deadlock, but slowdowns and bottlenecks. Should a large transaction hold something that is required by key object graphs, it can hold the lock and prevent reads from returning. The reads on different aspects of this key object graph then pile up, and even though the initial write (like a traffic accident) is cleared, the JVM doesn't clear the traffic as quickly as the incoming traffic. Partial thread starvation can appear as a deadlock, as large numbers of threads seem 'stuck' in deferred locking code. These would get cleared normally if the JVM allowed it to, but generally, incoming load and the application bottleneck keeps piling up until it crashes.

The best and only true ways to catch and fix issues like these is through profiling application use cases. You need to know where the bottlenecks are, and prune the graph and/or eliminate access from these points so that they don't interrupt access on your main thoroughfares. Generally, this should be a must before putting an app into production anyway, but is far too often overlooked until after problems appear.

f) YES! see above. The larger, more complex the graph, the more likely some other access point is going to write/refresh a part that your general use case must wait for. Access the root and only the root, or break of the branches so they can be individually accessed without exposing you to risks that some bad actor process isn't going to kill performance for your other use cases. You must analyze where in your graph you get the most bang for your buck on the eager vs lazy setting, but where in doubt, make it lazy and only make it eager if/where required. In an A-B relationship, eager doesn't give you any performance benefit as both would be cached through reads on the other side anyway. Where you use read optimizations (like fetch joins) you'd want to evaluate if the risks vs rewards of always accessing the B from the A side (by reading A instead of ever querying on B). If you are using fetch joins, you really shouldn't be accessing the A-B-C-D graph from B-C-D, as it is forcing inefficient queries if it needs to load the data anyway. This should be a must once you know how your object model is going to be used.

g) if you are looking at the code, I defer to your judgement. I would assume refresh would also obtain some non-deferred lock, as the process would change the values in the object with what is in the database much like a write, but I don't remember specifics. Also remember that just because there are problems releasing a deferred lock on 'A' in the A-B-C-D graph doesn't mean there has to be problems with the locks on A - as mentioned, bottlenecks on this graph make A seem a weak link to pileups due to objects further in the graph.
Re: Eclipselink - dead locks in aqcuire and release deferred locks - Eclipselink 2.4.2 and above [message #1832271 is a reply to message #1822552] Mon, 14 September 2020 21:47 Go to previous messageGo to next message
John Bullock is currently offline John BullockFriend
Messages: 5
Registered: September 2020
Junior Member
I and my organization are long-time users of eclipselink, both its current open-source version and the closed-source version "TOPLink", since 2000. We recently upgraded to Eclipselink 2.6.5, and have been experiencing this kind of "deadlock" in ConcurrencyManager.acquire() or acquireReadLock(). The problems typically occur with instances of the subclass CacheKey when accessing the ServerSession identity maps, and the vast majority are read accesses calling CacheKey.checkReadLock().

We have been running load-tests of our J2EE application with a locally-modified eclipselink with extra logging and tracking of acquired locks. The most interesting part of our defect case is that when threads block in acquireReadLock() the write lock is being held by a container thread which is not using the lock. That is, the "activeThread" instance variable of ConcurrencyManager refers to a thread which has already been returned the container's executor pool of available threads. This means the write lock will never be released. Whatever finally-block, transaction commit/rollback, or other stack-unwinding cleanup has already had a chance to release all held locks, and has failed to do so correctly. The threads blocked in acquireReadLock() eventually grow to several hundred, leading to application outages and eventually to JVM failure.

After months of study, we have determined that the core read-write lock class, ConcurrencyManager, is inherently not-threadsafe. It fails to correctly synchronize accesses and mutations of its critical shared-state and so fails under heavy contention. I have prepared a simple change to ConcurrencyManager to correct this problem and am running it successfully in our application.

I would like to promote this fix back to the community, but I am unfamiliar with how to proceed. Anyone, please advise how to prepare a pull request to donate back my change.

Thanks in advance.

[Updated on: Mon, 14 September 2020 21:51]

Report message to a moderator

Re: Eclipselink - dead locks in aqcuire and release deferred locks - Eclipselink 2.4.2 and above [message #1832378 is a reply to message #1801870] Thu, 17 September 2020 10:16 Go to previous messageGo to next message
Guillaume FAUVET is currently offline Guillaume FAUVETFriend
Messages: 2
Registered: September 2020
Junior Member
-- double post, see bellow --

[Updated on: Thu, 17 September 2020 14:26]

Report message to a moderator

Re: Eclipselink - dead locks in aqcuire and release deferred locks - Eclipselink 2.4.2 and above [message #1832390 is a reply to message #1832271] Thu, 17 September 2020 11:38 Go to previous messageGo to next message
Guillaume FAUVET is currently offline Guillaume FAUVETFriend
Messages: 2
Registered: September 2020
Junior Member
Hello,

we are also struggling with a lot of deadlocks related to the ConcurrencyManager, and we haven't found out a solution yet.
I am looking forward to see your patch !

Maybe you can submit a pull request on the github project ?
https://github.com/eclipse-ee4j/eclipselink/

Thanks in advance,
regards
Re: Eclipselink - dead locks in aqcuire and release deferred locks - Eclipselink 2.4.2 and above [message #1832608 is a reply to message #1832271] Mon, 21 September 2020 20:04 Go to previous messageGo to next message
Chris Delahunt is currently offline Chris DelahuntFriend
Messages: 1389
Registered: July 2009
Senior Member
You can always file a bug, and upload the diff to submit the code fix.

I have debugged similar issues in the past, where the container would 'kill' threads out from EclipseLink, preventing lock cleanup. They are difficult to find, but in my experiences, have been JVM and container bugs; EclipseLink had on issue I can recall on one container where it would issue before and after completion on completely different threads (manifest where a process would lock waiting on something it locked itself), but I have not seen issues with ConcurrencyManager itself. If you have a fix for your issue, it would be great to understand the problem driving the change first - what is it about that one thread that is preventing it from releasing the lock assuming it is successful (or even failing and rolling back) in its merge into the shared cache?

Synchronization can hinder performance and throughput, and so should not be added into such a called piece of code lightly, especially code that is running on systems with high transactional throughput.

[Updated on: Mon, 21 September 2020 20:05]

Report message to a moderator

Re: Eclipselink - dead locks in acquire and release deferred locks - Eclipselink 2.4.2 and above [message #1832658 is a reply to message #1832608] Wed, 23 September 2020 02:09 Go to previous message
John Bullock is currently offline John BullockFriend
Messages: 5
Registered: September 2020
Junior Member
I have read many of your detailed and thoughtful posts here and on bug reports about the various "dead-lock" and cache key locking problems. You have clearly been studying this code for a long time. I am eager to discuss my proposed change further with you.

I began the debugging of my current problem with the same assumption as you state: that I need to look for problems where a thread does not invoke release correctly on the "way out" of a service method. To that end, I and my team added stack trace capture to the various acquire methods so we could learn what code paths were acquiring locks but never releasing them. The idea was to use that information to identify the situations in which the paired release was skipped. But after studying these traces and the eclipselink code involved, we could not find any pattern that might lead us to even identify a particular situation which causes the situation.

This, combined with careful study of the ConcurrencyManager class, led me to conclude that the problem is inherent in that class.

I have tried to be a good student of the lessons in the wonderful book "Java Concurrency in Practice." I think anyone writing, maintaining or using a sophisticated server-side library should be aware of this book and its lessons. I trust you are, but if not, this old article by Goetz (written years before his book) is a great introduction: https://www.ibm.com/developerworks/library/j-threads1/index.html

The current ConcurrencyManager class is, by the principles laid out in Concurrency in Practice, almost guaranteed to yield incorrect results under load, especially on modern multicore processors. The difficulty is that this kind of defect is not reproducible with simple test cases. It needs heavy load, where many threads are contending for cpu cores and are being swapped on and off core often. My experience is that this is a real concern, not just fear-mongering. I think this is real concern that merits some discussion.




Previous Topic:ConcurrentModificationException in query execution
Next Topic:Eclipselink library failed to build(version 2.7.5). please help.
Goto Forum:
  


Current Time: Wed Sep 18 12:52:47 GMT 2024

Powered by FUDForum. Page generated in 0.06571 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top