Hi Pablo,
Thanks for the report. This is an interesting problem...and I
would like to enlist your help in diagnosing.
I've read over both your email and your stackoverflow posting and
have a couple of hypotheses about what's going on...as well as
some questions that might give me a clearer idea of what's going
wrong. I'm hopeful that with a little investigation we can
diagnose and fix what's going wrong here.
I think it might be best, though, to create a new bug on which to
have the discussion, q and a, etc. Would you be willing to create
a new bug to record the info that you've collected? Here is the
ECF bug category:
https://bugs.eclipse.org/bugs/enter_bug.cgi?product=ECF
I know this is some additional work for you, but it would be
better for you to create the bug and put the info you've collected
there, rather than me or another committer...since we haven't yet
actually experienced it in test or other code.
A couple of initial thoughts/hypotheses/questions:
1) It seems possible to me that there is some timing problem with
discovery (zookeeper in your case). Have you tried turning on the
Zookeeper debug/tracing info?
2) Another possibility is that in some startup cases, the export
of the remote services is not happening fast enough for ds (i.e.
within the default 30s). It's very strange to me that it reports
the equinox.event as the one being waited on however (the
equinox.event is indeed used by the ECF RSA implementation...for
import, however). Could you describe on the bug what service
topology you have (i.e. which services are being exported by which
nodes, and which services are being imported by which nodes).
WRT this question:
<stuff deleted>
Is there any way to "refresh/force" the service
annoucement/search? (or avoid the warning message, which is the
thing that is affecting the services, probably).
[Scott] Yes. The OSGi RSA specification makes it possible to
explicitly control the export and import of remote services, by
creating and using a custom 'topology manager'. There's some
info on this here:
http://wiki.eclipse.org/Remote_Services_Admin
And much more (of course) in the OSGi RSA spec. I'm hoping for
this use case that won't be necessary, however (i.e. you
can/could continue to use the ECF default topology manager), but
if that becomes necessary it is an option. ECF's implementation
of the topology manger can easily be extended/replaced...again
if that proves to be necessary. But let's investigate this
issue closely on this new bug first...as the issue you are
seeing does look timing related...and may have more to do with a
bug than with a need for a replacement topology manager.
Thanks,
Scott
This issue is happening randomly in all the four
nodes of my network.
Thanks in advance!
[1] Message:
WARNING 26 [SCR - WorkThread] Timeout occurred! Thread was
blocked on processing [QueuedJob] WorkPerformer:
org.eclipse.equinox.internal.ds.SCRManager@19336006;
actionType 1
WARNING 26 [SCR] Enabling components of bundle
org.eclipse.equinox.event did not complete in 30000 ms
!SESSION 2013-01-14 20:46:11.168
-----------------------------------------------
eclipse.buildId=unknown
java.version=1.6.0_24
java.vendor=Sun Microsystems Inc.
BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk,
NL=es_ES
Framework arguments: -vmargs
Command-line arguments: -vmargs -consoleLog -console
-configuration configuration -clean
!ENTRY org.eclipse.equinox.ds 2 0 2013-01-14 20:46:42.134
!MESSAGE [SCR] Enabling components of bundle
org.eclipse.equinox.event did not complete in 30000 ms
!ENTRY org.eclipse.equinox.ds 2 0 2013-01-14 20:46:42.135
!MESSAGE [SCR - WorkThread] Timeout occurred! Thread was
blocked on processing [QueuedJob] WorkPerformer:
org.eclipse.equinox.internal.ds.SCRManager@19336006;
actionType 1
--
Pablo García Sánchez
Departamento de Arquitectura y Tecnología de Computadores
Universidad de Granada
http://geneura.ugr.es/~pgarcia