[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[january-dev] remote services with IDatasets
|
Hi Folks,
Some of you may be familiar with ECF's impl of OSGi remote services
[1]. Remote services allows OSGi services (defined as one or more java
interfaces) to be made available/proxied outside of process. I believe
ICE is using it currently.
ECF's implementation of remote services is pluggable and allows
different transports to be used (at service registration time) to remote
a service. We call these 'distribution providers' and now have quite
a few of them [2], from rest/jax-rs, to rosgi, to jms, to xmlrpc, to
mqtt, to plain 'ol tcp and others. These distribution providers
encapsulate both the wire protocol/transport (e.g. http) as well as the
serialization scheme (e.g. json).
We have recently completed and are testing a distribution provider that
uses Py4j + Google's Protocol buffers [3]. After some experimentation,
I've found that protocol buffers (binary mode) are fairly performant and
relatively space efficient on serialization and deserialization, and
Py4j is fairly performant *if* parameters and return values are
serialized to byte[] (and therefore passed by value) rather than passed
by reference...reference is the default for py4j...except for byte
[]s. Pass-by-reference often causes many round trips between
(Python<->Java) in Py4j and this can quickly become a major performance
problem with large amounts of data being exchanged (our observation).
So what's the point of this? Some of us are using [3] to provide a
modular, performant localhost interaction between OSGi runtimes and
Python code...by using OSGi remote services and ECF's Py4j-based and
protocol buffers-based distribution provider [3]. This interaction is
bi-directional as OSGi services are bi-directional, but for our use case
we have been focusing on java code calling into data analysis code
implemented in Python.
An example using [4] is provided here [4]. This example has the OSGi
service interface in java, and this service is *implemented* by the
python code in the python-src directory [5]. The consumer bundle shows
how the service gets injected (by DS) at runtime and then is used as
with any other OSGi service.
One of the things I've been contemplating is to use protobuf to define
the serialization/deserialization of January IDatasets...to and from
numpy Datasets (or perhaps some subclass or metatype). This would allow
performant and easy exchange of Dataset/IDatasets between Python and
Java...which is what we (and perhaps others) are interested in.
I wanted to explain this work publicly to this list, as I won't be able
to attend the upcoming summit. I am, however, looking for possible
collaboration on parts of things and willing to make contributions to
January if they are desired.
Thanks,
Scott
[1] https://wiki.eclipse.org/Eclipse_Communication_Framework_Project
[2] https://wiki.eclipse.org/Distribution_Providers
[3] https://github.com/ECF/Py4j-RemoteServicesProvider
[4] https://github.com/ECF/Py4j-RemoteServicesProvider/tree/master/examples
[5]
https://github.com/ECF/Py4j-RemoteServicesProvider/tree/master/examples/org.eclipse.ecf.examples.protobuf.hello