[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
| Re: [ecf-dev] service discovery working even if port mis configured | 
Hi Peter,
On 4/24/2014 7:48 AM, Peter Hermsdorf wrote:
Hi Scott,
I don't think I understand what exactly you mean by 'stopping the 
host'.  Do you mean just remote service unregistration?...or do you 
mean unceremonious host shutdown (e.g. kill -9 ), or something in 
between, or ?
<deleted>
I'm not sure.  I think it hinges on what you want WRT the 'stopping 
the host' and the 'restart leading to new bind event'.
short answer: in any case ;)
'Any case' would indeed be nice, but of course what we are talking about 
is byzantine fault tolerance [1]...a very hard set of distributed 
systems problems.
We have a RCP client using service(s) of a single server instance. 
When that server goes down (software update, network problem, crash 
etc) the client can continue to work (just can't use these services in 
that time), but need a way to reconnect/rediscover the service when 
the server is online again....(without Client restart)
in the end the client needs to get an unbind when the service is not 
available and a bind when he is online again.
Ok I see.
I'm going to break this down a little bit...as it relates to remote 
services...just to talk through the issues and choices that can be made 
about discovery, distribution, and their combination for implementing 
remote services.   Please forgive if this seems a little long-winded, 
but in truth there are no technical silver bullets here.
First...to get the client to 'unbind'...i.e. have the remote service 
proxy go away when the underlying host crashes...or the network 
partitions...it requires that the distribution provider do some failure 
detection.    The ECF generic provider does have/do this failure 
detection, and so when the host goes down (e.g. crashes), the generic 
provider will detect this, and the remote service proxy will be 
unregistered/go away/unbound...as you've already found in your tests.   
Note this is not necessarily true of all distribution providers and/or 
implementations of OSGi remote services...for example if your 
distribution provider is based upon connectionless http, then the http 
server may go down, and if the client already has a working proxy then 
it may not be able to know that the remote service host has 
crashed/become unavailable.  But again, the ECF generic provider does do 
such failure detection, and so the proxy unregistration upon host crash 
does occur.
Now...to get the client/service consumer to 'rebind' to the new 
service...when the host recovers and it becomes available...means that 
the new service instance metadata (edef) has to be communicated to the 
consumer *at that time*...i.e. dynamically via some sort of network 
discovery (zookeeper, etc) rather than an edef file.   This is why you 
are not seeing the rebind happen with the static (or template-based) 
edef...because that's completely initiated by the consumer/client...and 
doesn't happen when the host recovers and makes a new instance of the 
remote service available.
In short, I think what you probably need is *both* a distribution 
provider with failure detection (generic, r-osgi, jms), and to use some 
network discovery provider (e.g. zookeeper, dnssd, jslp, zeroconf).   
Then the distribution provider can detect the host failure...to unbind 
the remote service proxy when a crash happens...and the network 
discovery can communicate the host's making a new remote service 
available...*after* it becomes available.
Given your initial explanation of the remote service metadata (changing 
a few of the edef property values), I had thought that using edef or 
edef templates would meet your use case.  But it seems you have some 
additional requirements that make the dynamic aspect of network 
discovery necessary...as I've outlined.
Hopefully this discussion is helpful.   I do wish that the 
failure/reliability properties of remote services could be entirely 
hidden...but there's lots of distributed systems work that shows such 
network transparency is not really possible (or at least not well 
advised).  IMO, one thing that OSGi remote services uniquely 
provide...that makes them very attractive for remote services in 
general...is the ability to map network-based failure to the dynamics of 
OSGi and OSGi services (i.e. the service instances naturally come and go 
at runtime).
I hope that this explanation is somehow more clear ;)
I'll see you and raise you on that hope :).
Scott