Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [osgi-users] Deadlock with getService/ungetService

I've finally been able to identify the pattern leading to the deadlocks.

Long story short, we have some ComponentFactories that register all providers for a certain type and based on various conditions and priority will instantiate a component for the caller. Once the caller is done using the component, it calls dispose to release that component.

The issue stemmed from the fact that the dispose is called in the deactivate method when its "parent" is being deactivated. The problem, which I didn't realize and that I couldn't find as part of the specification after doing some research, is that deactivate (and activate as well) are called in a synchronized code which can greatly impact what one can do in those methods (and how long can be spent there).

The solution ended up having those factories release the components using an executor in a different thread and not messing up with the "normal" flow.

Would it be a good thing to clearly state that activate/deactivate run synchronized and the limitations that it imposes?

Cheers,
Alain


On Fri, Sep 10, 2021 at 12:58 PM BJ Hargrave <hargrave@xxxxxxxxxx> wrote:
Normally when multiple threads need the same locks, all the threads need to get them in the same order. A, B, C. Otherwise you can deadlock.
 
In your case, you are calling ComponentServiceObjects.getService/ungetService. So you are in charge of calling and should have some control over when and ordering. Obviously you must be extra careful here to avoid out-of-order lock acquisition. If A needs B and B needs A, you are in a tough situation and may need to manage your own synchronization around this activity.
--

BJ Hargrave
Senior Technical Staff Member, IBM // office: +1 386 848 1781
OSGi Fellow and OSGi Specification Project lead // mobile: +1 386 848 3788
hargrave@xxxxxxxxxx
 
 
----- Original message -----
From: "Alain Picard" <picard@xxxxxxxxxxxxxx>
Sent by: "osgi-users" <osgi-users-bounces@xxxxxxxxxxx>
To: "This is a community mail list for OSGi technology. Any OSGi technical discussion or questions are acceptable here." <osgi-users@xxxxxxxxxxx>
Cc:
Subject: [EXTERNAL] Re: [osgi-users] Deadlock with getService/ungetService
Date: Fri, Sep 10, 2021 11:54
 
I'm actually in a deep debugging session with this. Have finally been able to get full strack traces (by default ThreadInfo#toString() only ever returns the top 8 stack trace elements even if the MXBean was asked to collect all, so had to create a derived version to get all the extras and the full trace).
 
But what are the rules to track ordering here? Remember having to apply some of that ordering approach over 30 years ago to some deadlocks with a database, but here I'm still unsure where I would know and define this ordering.
 
Alain
 
On Fri, Sep 10, 2021 at 11:48 AM BJ Hargrave <hargrave@xxxxxxxxxx> wrote:
Late to the discussion, but it seems you have a cycle in your code. One thread is modifying (get/unget) a service while modifying a second service and the other thread is doing that in the reverse order.
--

BJ Hargrave
Senior Technical Staff Member, IBM // office: +1 386 848 1781
OSGi Fellow and OSGi Specification Project lead // mobile: +1 386 848 3788
hargrave@xxxxxxxxxx
 
 
----- Original message -----
From: "Thomas Watson" <tjwatson@xxxxxxxxxx>
Sent by: "osgi-users" <osgi-users-bounces@xxxxxxxxxxx>
To: osgi-users@xxxxxxxxxxx
Cc:
Subject: [EXTERNAL] Re: [osgi-users] Deadlock with getService/ungetService
Date: Thu, Sep 2, 2021 15:52
 
On that point, this should probably be taken to an Equinox bug report for the idea of having some timeout on the lock there (https://bugs.eclipse.org/bugs/enter_bug.cgi?product=Equinox)
 
Not sure it is a good idea or not, but I do not think it is a spec issue.  The framework should be free to throw some ServiceException if there is a deadlock timeout detected.

Tom
 
 
 
----- Original message -----
From: "Alain Picard" <picard@xxxxxxxxxxxxxx>
Sent by: "osgi-users" <osgi-users-bounces@xxxxxxxxxxx>
To: "This is a community mail list for OSGi technology. Any OSGi technical discussion or questions are acceptable here." <osgi-users@xxxxxxxxxxx>
Cc:
Subject: [EXTERNAL] Re: [osgi-users] Deadlock with getService/ungetService
Date: Thu, Sep 2, 2021 2:10 PM
 
Raymond,
 
Thanks, I will do that, but started here since deadlock was not directly in Felix SCR code.
 
Cheers,
Alain
 
 
On Thu, Sep 2, 2021 at 3:08 PM Raymond Augé via osgi-users <osgi-users@xxxxxxxxxxx> wrote:
Hi Alain,
 
I suggest taking this topic over to the Felix user mail list [1]
There are more SCR folks over there :)
 
Sincerely,
Ray
 
 
On Thu, Sep 2, 2021 at 2:33 PM Alain Picard <picard@xxxxxxxxxxxxxx> wrote:
BTW, is there any reason why the code where the deadlock occurred couldn't instead use a ReentrantLock with tryLock(long timeout, TimeUnit unit) and a configurable time limit and throw an exception otherwise. IMHO that would be a much better option if such a situation should happen on a production system, to avoid bringing everything down. I'm willing to provide a PR if that is an acceptable option.
 
Alain
 
On Thu, Sep 2, 2021 at 2:05 PM Alain Picard <picard@xxxxxxxxxxxxxx> wrote:
Tom,
 
Unfortunately that is the most stack trace that threadInfo will return even if we ask for MAX. I would also like to get a full stack trace and that would surely help.
 
As for why we are using ungetServices, in some cases we need to do it since our prototype instances are either instantiated from outside of SCR and in other cases we have components that will get a stateful instance, run some process and then unget it after. But the most important reason started with the problem that we found which led to this issue https://issues.apache.org/jira/browse/FELIX-5974 that might have been fixed by now but our code was in place by then.
 
Cheers,
Alain
 
 
 
 
Alain Picard
Chief Strategy Officer
Castor Technologies Inc
o:514-360-7208
m:813-787-3424
 
On Thu, Sep 2, 2021 at 12:44 PM Thomas Watson <tjwatson@xxxxxxxxxx> wrote:
It is hard for me to tell what the cause is here without the full stack traces of the two threads involved in the deadlock.  Overall I am confused why you have SCR components that are invoking ungetService directly instead of having SCR do that for you.  But perhaps I misunderstood what you mean by " we had made some changes where a number of ungetServices were moved outside of our components deactivate method."

Tom
 
 
 
----- Original message -----
From: "Alain Picard" <picard@xxxxxxxxxxxxxx>
Sent by: "osgi-users" <osgi-users-bounces@xxxxxxxxxxx>
To: "This is a community mail list for OSGi technology. Any OSGi technical discussion or questions are acceptable here." <osgi-users@xxxxxxxxxxx>
Cc:
Subject: [EXTERNAL] [osgi-users] Deadlock with getService/ungetService
Date: Thu, Sep 2, 2021 10:11 AM
 
Yesterday our application ended up in a deadlock state in production. With the help of HealthCheck we were able to identify that there was a thread deadlock and capture the thread info for those offending threads.
 
Thread 1
osgi> threadInfo "qtp2111139171-925"
Found 1 threads named qtp2111139171-925
Info:
"qtp2111139171-925" prio=5 Id=925 BLOCKED on org.eclipse.osgi.internal.serviceregistry.PrototypeServiceFactoryUse@19a0fb19 owned by "qtp2111139171-948" Id=948
        at org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.ungetService(ServiceRegistrationImpl.java:614)
        -  blocked on org.eclipse.osgi.internal.serviceregistry.PrototypeServiceFactoryUse@19a0fb19
        at org.eclipse.osgi.internal.serviceregistry.ServiceObjectsImpl.ungetService(ServiceObjectsImpl.java:135)
        at org.apache.felix.scr.impl.helper.ComponentServiceObjectsHelper$ComponentServiceObjectsImpl.close(ComponentServiceObjectsHelper.java:142)
        at org.apache.felix.scr.impl.helper.ComponentServiceObjectsHelper.closeServiceObjects(ComponentServiceObjectsHelper.java:95)
        at org.apache.felix.scr.impl.manager.DependencyManager.invokeUnbindMethod(DependencyManager.java:1933)
        at org.apache.felix.scr.impl.manager.DependencyManager.close(DependencyManager.java:1682)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.disposeImplementationObject(SingleComponentManager.java:417)
        at org.apache.felix.scr.impl.manager.ServiceFactoryComponentManager.ungetService(ServiceFactoryComponentManager.java:170)
        ...
Thread 2
osgi> threadInfo "qtp2111139171-948"
Found 1 threads named qtp2111139171-948
Info:
"qtp2111139171-948" prio=5 Id=948 BLOCKED on org.eclipse.osgi.internal.serviceregistry.PrototypeServiceFactoryUse@5a01ccb9 owned by "qtp2111139171-925" Id=925
        at org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.getService(ServiceRegistrationImpl.java:521)
        -  blocked on org.eclipse.osgi.internal.serviceregistry.PrototypeServiceFactoryUse@5a01ccb9
        at org.eclipse.osgi.internal.serviceregistry.ServiceObjectsImpl.getService(ServiceObjectsImpl.java:92)
        at org.apache.felix.scr.impl.helper.ComponentServiceObjectsHelper$ComponentServiceObjectsImpl.getService(ComponentServiceObjectsHelper.java:166)
        at com.castortech.iris.ecp.view.spi.core.zk.BaseControlZKRendererImpl.activate(BaseControlZKRendererImpl.java:70)
        at jdk.internal.reflect.GeneratedMethodAccessor259.invoke(Unknown Source)
        at java.base@11.0.6/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base@11.0.6/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod.invokeMethod(BaseMethod.java:228)
        ...
Here we can see one thread getting a service and the other is ungetting a service.
 
In the last few days we had made some changes where a number of ungetServices were moved outside of our components deactivate method.
 
This is using org.eclipse.osgi v 3.14.0v20190517 and Felix SCR 2.1.14.v20190123, both of which don't seem to have changed much in the classes that are in the trace.
 
I have 2 questions: what can be causing this and how to avoid it, and if there is no mechanism to avoid deadlocks, then shouldn't there be at least a timeout mechanism so that one thread fails and this doesn't have to bring down the application and force a restart?
 
Cheers,
Alain
_______________________________________________
osgi-users mailing list
osgi-users@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/osgi-users
 


_______________________________________________
osgi-users mailing list
osgi-users@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/osgi-users
_______________________________________________
osgi-users mailing list
osgi-users@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/osgi-users


--
Raymond Augé (@rotty3000)
Senior Software Architect Liferay, Inc. (@Liferay)
OSGi Fellow, Java Champion
_______________________________________________
osgi-users mailing list
osgi-users@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/osgi-users
_______________________________________________
osgi-users mailing list
osgi-users@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/osgi-users
 
 
_______________________________________________
osgi-users mailing list
osgi-users@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/osgi-users
 


_______________________________________________
osgi-users mailing list
osgi-users@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/osgi-users
_______________________________________________
osgi-users mailing list
osgi-users@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/osgi-users
 


_______________________________________________
osgi-users mailing list
osgi-users@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/osgi-users

Back to the top