Re: [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous
Oberhuber, Martin wrote:
good points, indeed! thanks for taking the time to write
such an elaborate reply.
When blocking the calling thread (e.g. any synchronous reads/writes)
results in system-wide 'problems' (e.g. UI is stopped, other server
Hm... IMHO this is not a use-case which requires async
because it couldn't get implemented with synchronous
calls. This just shows that somebody's using a synchronous
API in a way that's inappropriate for slow/unreliable
Yes...I guess the point is that any network is a relatively
slow/unreliable backend compared to any disk.
This does point out an important truth, though:
synchronous APIs may *encourage* usage of background Jobs
for slow operations, but cannot enforce this. Asynchronous
APIs, on the other hand, *force* the client to take actions
which are appropriate for use with slow/unreliable back-ends.
True...because the default assumption for the network is that it is
relatively slow and unreliable.
>From that point of view, it might actually make sense to
have the "true E4 resources kernel" only support async
file system access, and the backward compatibility wrappers
provide a bridge to synchronous access... that way we could
force "true E4" clients to take appropriate measures. Given
that ECF filetransfer is in Equinox already, I could imagine
getting rid of EFS and replacing it by ECF filetransfer
(probably extended) in the "core E4 Resources".
This seems too extreme to me. That is, EFS is an established, very nice
synchronous file system API. No reason to 'get rid' of it for technical
purity IMHO (i.e. everything must be asynchronous over network). Rather
it seems to me that having the ability to go between synchronous and
asynchronous is a way to go...while introducing mixed strategies (like
Hadoop-based EFS impls, which asynchronously replicate files/file blocks).
Futures as return value might be a concept that allows
using asynchronous APIs with minimal extra effort when
results are available "very fast and reliably".
I agree that futures (we have the class name 'AsynchResult'...the 'h' is
embarrasing for me) can be a very useful concept for bridging
asynchronous calls with with synchronous needs (BTW, we use AsynchResult
to get JRE 1.4 compatibility...the 1.5+ concurrent API also has futures
of course). But they are (still) a relatively foreign API
concept...that is, not too familiar for many programmers. Still, I
think they are useful.
Writing an EFS wrapper to ECF filetransfer for backward
compatibility should be an easy thing to do (and probably
you have done it already). In terms of the resource layer,
EFS is pretty separated from it already (only connected
by URI on the API). Having the Resources layer directly
make asynchronous calls (instead of using the EFS wrapper)
should be a very interesting experiment.
Well, no we haven't done this already, although we have done the reverse
(implement async ECF filetransfer on top of EFS+jobs). It might be a
useful exercise, but it seems to me like reusing more complete
replication approaches (i.e. Hadoop, etc) for implementing EFS on top of
asynchronous would be quicker and easier.
Well, if such an adapter is not available then they could do it
synchronously rather than asynchronously.
But that's exactly my point: we don't want clients having
to write code for both synchronous and asynchronous variants.
That's code duplication, resulting in bloat. I'd like to
shoot for ONE core e4 api for each concept (with additional
compatibility layers for backward compatibility where needed).
Although I share your desire to reduce bloat, I'm not sure that having
either synchronous xor asynchronous access to resources (whether remote
or local) is the natural way to keep bloat to a minimum for access to
By "adding async to the EFS API" I didn't think about any
technical measure such as blowing up the IFileStore interface.
What I meant was, that clients should be able to expect any
contributed file system to be accessible with all the API
that E4 resources FS exposes -- be it synchronous or
asynchronous, via 1 or multiple interfaces, obtained via
adapter pattern or otherwise.
It seems to me this is more a requirement on file system
implementer...i.e. that they implement all resources API (i.e. both
Although I think this is a good general principal (implementers should
implement entire relevant API), in practice I'm not sure how to require
it given a provider architecture (for EFS and for ECF). That is, I'm
sure that there will be incomplete EFS implementations, incomplete ECF
file transfer implementations, etc. Encouraging completeness will be
easy...requiring it will be hard I expect.
I disagree. I think the problem is with trying to make local
and remote access look exactly the same (network transparency)
Hm... on the other hand, a client that is prepared to deal
with remote files should easily be able to handle the local
case as well, no? I'd like to investigate technical measures
of how we can make it simple to program the remote case.
Yes, I agree that it should be easy to handle both the local and remote
cases...but that's the hard part...since the local and remote cases are
different...in performance, reliability, partial failure, etc. and as
the Note on Distributed Computing points out...these are differences
that are very hard to create a uniform API for...because the differences
in network and local behavior frequently 'bubble up' to the API.
But I do think that there is a lot of room for innovation...particularly
around replication/caching/synchronization for file systems (e.g. Hadoop).
If the core framework is remote-aware we can add layers for
simplified access if we want. We cannot do it the other way
Can anybody argue against using the asynchronous ECF
filetransfer APIs as the core E4 resources file system
Yes, I can (surprise :). I think introducing ECF/asynchronous for local
file system access would be a waste of time. Even though it would be
easily done (ECF's file transfer API has asynch access to the local file
system already), I don't think it would be worth doing.
Although I'm not sure what the best way to 'bridge' EFS and the ECF file
transfer APIs is (i.e. adapters, etc), I don't think it's really
necessary or desirable to strictly layer them. An example of this is
p2's usage of ECF...it only uses the file retrieval part of the ECF
filetransfer API (it has no use for upload, or directory navigation).
It's actually simpler and a better fit to just use that part (retrieval)
of ECF filetransfer...and not have to deal with other dependencies that
would be implied by including, say, all of EFS (with or without ECF
I understand (and fully appreciate) the desire to reduce API bloat (i.e.
client code duplication, multiple APIs, etc), but I'm not sure of the
best way to do that when it comes to synchronous/asynchronous (or
local/network rather) access to filesystems.