Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ecf-dev] zookeeper discovery comes and goes

Hi Bryan,

Currenly, this client session timeout is not configurable but we can easily externalize it  to some system property or such,  what do you think  Wim?
I personnally would question the benifits of  widening that session timeout due to the very  nature of the discovery role. Generally, the sooner a service  (name for instance r_osgi using this sevice info to locate and proxy services) relying on information gotten
 from zoodiscovery is notified of change, the healthier the services' interaction would be.  Of course this would be of less impact  for test purposes.

To get back to the intial point,  I'd appreciate having you confirm (as I can't reproduce your situation) this is a delay issue. That is, the method  org.eclipse.ecf.provider.zookeeper.node.internal.NodeReader.process(WatchedEvent event) has
indeed received either KeeperState.Disconnected or KeeperState.Expired event before declaring a service is undiscovered.

In waiting the bug fix (if agreed upon) of extenalizing client session-timeout configuration, you might want to see the effect forehand  yourself for your own case by changing that session-timeout value in the source code (org.eclipse.ecf.provider.zookeeper) and
 using this  changed source bundle  in your runtime instead. There are 3 methods needing minor change:

in these methods just change the occuring 3000 value (indicates session interval) in "new ZooKeeper(.., 3000, ..)" to the value you want to test with.

By the way, how come you've experienced such a delay while running the whole thing on the same machine!

Hope it helped a bit,

kind regards,

From: Bryan Hunt <bhunt@xxxxxxx>
To: "Eclipse Communication Framework (ECF) developer mailing list." <ecf-dev@xxxxxxxxxxx>
Date: 15-10-2010 16:28
Subject: Re: [ecf-dev] zookeeper discovery comes and goes

Hi Ahmed,

I assume that ECF takes care of sending the heart beats.  How do I configure the session-timeout interval?


On Oct 13, 2010, at 8:08 AM, ahmed.aadel@xxxxxxxxxxxxxxxxxx wrote:


A service  is signaled as undiscovered either because it is indeed unpublished from the other end point (nothing new) ,  the connection is lost or session expired. These are  dealt with in
NodeReader.process(WatchedEvent event),  this is the closest point to the underlying zookeeper that get informed that a data node is deleted or no more reachable, so I'd say debug that short method to see wether it is a genuine serivce undiscovery or one masking a session/connection loss. The recieved event is intuitive and should be strighforward.
A word about the session  mentioned above:   a  client should send heart beats within a session-timeout interval (currently set to 3000) to be considered still alive.

best regards,


From: Wim Jongman <wim.jongman@xxxxxxxxx>
To: "Eclipse Communication Framework (ECF) developer mailing list." <ecf-dev@xxxxxxxxxxx>, ahmed.aadel@xxxxxxxxxxxxxxxxxx
Date: 13-10-2010 11:02
Subject: Re: [ecf-dev] zookeeper discovery comes and goes

Hi Bryan,

AFAIK the discovery mechanism does not in itself guard the connection and decides to drop it. It merely listens and propagates service registrations/deregistrations to the distribution provider.

I would say this behavior should be seen with other discovery mechanisms as well.

Ahmed can you think of reasons why this could happen?



On Tue, Oct 12, 2010 at 4:56 PM, Bryan Hunt <
bhunt@xxxxxxx> wrote:
I'm seeing a strange problem with zookeeper service discovery.  I've been experimenting with YourKit and noticed a ConcurrentModificationException as reported here:

When I went to capture a better stack trace, I noticed that when I remotely connected with YourKit, the client reported zookeeper undiscovered one of my 4 remote services.  When I disconnect and reconnect YourKit, the client reports that zookeeper discovers 4 remote services.

This problem appears repeatable.  The client and all 4 remote services are running on the same machine.  Each remote service is running in it's own JVM.  Any ideas on what is going on, or suggestions for tracking down the problem?  Is there by chance some sort of very short keepalive that might be having problems if there were a network delay or thread delay when YourKit connects to the client JVM?

ecf-dev mailing list


ecf-dev mailing list

ecf-dev mailing list

Back to the top