| Hi Christoph,
 Sorry for jumping in the middle of your discussion with Wim.   I'm
      just starting my day here though and I hope to contribute to this.
 
 On 2/1/2016 7:42 AM, Keimel, Christoph wrote:
 
 
      
      
      
      
        Hi Wim,   Lets’ see if I get this correctly:   Let’s say that A wants to know when a touch
            sensor on B is getting pressed (true/false). What I am doing
            right now is: A puts up a whiteboard service
            (TouchSensorSniffer). This service is picked up by B using a
            ServiceTracker [1]. B then holds on to the service and calls
            TouchSensorSniffer#onStateChanged whenever the touch sensor
            state changes. (Of course I also clear my internal cache
            when the service gets removed.)   I’ll use this simple setup to describe my
            situation: After both A and B are started everything is fine
            and B has discovered the TouchSensorSniffer from A. Now I
            disconnect B from the network by pulling the LAN cable. Both
            A and B continue to run.  Yes they continue to run, but one question is:   On B (svc consumer)
    does the remote service proxy get unregistered after 30s/keepAlive
    timeout?   If using service tracker, this should result in the
    removeService method being called.   It won't happen immediately
    (since the default keepAlive is 30s), but it should happen.  This is
    because the generic provider has failure detection.
 
 
 
      
        If the state of the touch sensor changes at
            this moment B would try to send this information over the
            TouchSensorSniffer to A. But since B is disconnected from
            the network, this request fails after the timeout. B thinks
            this is a temporary error and just logs it.  B should probably do something other than just log this as a
    temporary error.
 
 
 
      
        
   If I reconnect the LAN cable after a couple of
            seconds and the press my touch sensor again, B will again
            use the TouchSensorSniffer service to send the state change.
            This time everything works out because the network is back
            up: Cool. But let’s assume I don’t reconnect right away but
            I wait until the keepalive period (default 30 seconds) is
            over. What happens now is that the TouchSensorSniffer is
            unregistered in B which is ok, since we assume that the
            connection is gone for good.  Right...this is referred to as 'fail stop'.  One has to assume that
    the connection is gone for good, because it may actually be gone for
    good :).
 
 
 
      
        If I touch the sensor now B sees that no
            TouchSensorSniffer services are registered and therefore
            doesn’t send this information anywhere. Also good. Now,
            after 60 seconds, I reconnect the LAN cable. Both A and B
            are still running but B doesn’t pick up on the
            TouchSensorSniffer from A. They stay disconnected. Right.
 
 
 
      
        
   This last part is based on my observations, so
            I’m not sure I understand this completely. Does my
            description come close to the truth and is this the result
            that is to be expected?  Yes, I think so.
 
 
 
      
        Or would you expect the discovery on B to find
            the TouchSensorSniffer from A again after the network
            connection has been reestablished? This is where the specifics of the discovery provider interact with
    the specifics of the distribution provider.    Wim is the expert on
    zookeeper, but just because the network connection is reestablished
    I don't believe that will trigger a rediscovery of a previously
    discovered service.
 
 
 
      
        
   Or is the problem that I am holding on to an
            instance of TouchSensorSniffer on B?  I think that holding onto the instance of TouchSensorSniffer on B is
    essentially assuming that this existing connection will be
    reestablished *within 30s*, and I think that this is probably not a
    reasonable assumption for your problematic network.
 
 
 
 
      
        I could stop using a ServiceTracker and look
            into the OSGi service registry directly to search for all
            implementations of TouchSensorSniffer anytime the state
            changes via BundleContext#getServiceReferences. I see that
            this would change to situation slightly, because I would use
            BundleContext#ungetService right after sending the
            information and then getting the service again for the next
            event. But I am not sure that this would change the basic
            situation, since the registry itself is already caching the
            available remote services. Or am I wrong about this? The service registry is holding onto the remote service proxy's
    ServiceReference, but this proxy will be unregistered when/if the
    remote service is unregistered via the failure
    detection/keepAlive/timeout (30s by default).   This unregistration
    of the proxy should result in removeService (ServiceTracker) and
    unbind for DS.   Basically you need something to notify your code
    when the proxy becomes unregistered so that you can give up/stop
    using the TouchSensorSniffer on B.
 
 Now, one question is:  once detected, what should B do to recover
    from a network failure?  This can be a difficult question to answer
    in general, because the failure could be permanent (so no use
    retrying), or it could be very short and would/will heal very
    quickly.   Predicting the future is difficult :).
 
 There are mechanisms to deal with these problems.   One is
    extending/customizing the OSGi Topology Manager, which would allow
    implementing some recovery strategy for a service that has gone away
    (e.g. import retry).   Also there are/is some tuning of the ECF
    generic provider failure detection that can be done.  Finally, the
    ECF generic provider (and others...like the JMS provider) also have
    some notion of communication groups and group membership, and so
    this can be used to associate remote services with each other.
 
 Scott
 
 
 |