[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [eclipse-incubator-e4-dev] [resources] Asynchronous APIs for EFS
|
Hi Martin,
Interesting points...and good discussion to have, I believe. Some of my
thoughts/comments below.
Oberhuber, Martin wrote:
Hi all,
I thought a little bit about what it means to have both synchronous
and asynchronous APIs at the file system layer. Some questions come up:
* What does it mean for clients: does each client need to be aware
of both API variants ? How do clients pick any variant? It seems
like if we offer dual sync/async natures, that duplicate concept
would bubble up through all our architecture, which does not
seem desirable.
I think it's debatable whether it's desirable...I do see your point that
one API is always better than two (i.e. less complexity, fewer client
choices, etc).
But I think the evidence does show that both synchronous and
asynchronous APIs for IO (in particular) are useful, and in some cases
necessary. For example, java's new io (nio), is an asynchronous IO API
that is (IMHO)
1) Harder to use than a blocking/synchronous API (i.e. using the normal
java io/stream classes)
2) Useful/necessary for some API clients (e.g. those that require more
scalability)
* What is the granularity of being synchronous / asynchronous? Can
a provider choose returning synchronously or asynchronously with
each call, or does it need to pick one strategy once and for all?
This is a very interesting question...which, I don't have a ready answer
for :). I think that the most I can say about this is that clients that
'know' what their scalability, performance, reliability requirements are
of the communications layer should be able to use one or the other (sync
and/or async) as appropriate to their application. I do think that
'hiding' one behind the other at the file system layer ultimately
creates very hard issues of performance (e.g. running Eclipse over
EFS-ftp), and/or reliability (e.g. having asynchronous messaging with no
failure detection). In some ways this is related to the issue of
'transparency' in networked applications...i.e. whether the network's
characteristics (e.g. slower by orders of magnitude, much more likely to
partially fail than a local file system) can/should be 'hidden' behind a
single API that allows clients to use the same calls whether or not the
file system is local (File) or over network (FTP, etc).
So by personal disposition I would be inclined to allow call-level
decisions about synchronous vs asynchronous patterns. For example,
ECF's IRemoteCall interface allows several 'styles' of invocation of a
remote service:
http://www.eclipse.org/ecf/org.eclipse.ecf.docs/api/org/eclipse/ecf/remoteservice/IRemoteService.html
This does mean more complexity/decisions for clients (i.e. it can be a
proxy, but it doesn't *have* to be), but it does add a layer of
flexibility in use of a remote API (e.g. with AsychResults...i.e.
'Futures').
It's always possible to write a bridge between for an asynchronous API
to drive synchronous providers, or the other way round. But the
benefit of being synchronous or asynchronous in a particular situation
can only be leveraged if it bubbles up right into the application layer!
True, I agree.
* Asynchronous APIs add considerable overhead for fast queries
(i.e. have at least one Thread switch for the callback, even if
the result is available immediately); also, in terms of system
consistency and resource locking, what kinds of locks can really
be given up while an asynchronous request is pending but before
its result is in?
Good questions...but the use case you describe (i.e. fast/local queries)
smells like a good use case for synchronous invocation. And further,
thread context switching is almost always of very low cost relative to
actual inter-process communication (e.g. serialization, transmission/io
over wire, etc).
* Synchronous APIs add considerable overhead for slow queries
(i.e. explosion in the number of Threads in wait state, thus
locking Resources).
Agreed.
To be more concrete, let's look at the current primary EFS methods in
IFileStore that can be slow on a remote FS:
* Information Retrieval
o String[] childNames(int options, IProgressMonitor); //
and its relatives: childInfos(), childStores()
o IFileInfo fetchInfo(int options, IProgressMonitor);
* Manipulation
o void copy(IFileStore destination, int options,
IProgressMonitor);
o void delete(int options, IProgressMonitor);
o void mkdir(int options, IProgressMonitor);
o void move(IFileStore destination, int options,
IProgressMonitor);
o void putInfo(IFileInfo, int options, IProgressMonitor);
It certainly makes sense to have asynchronous variants of these, if
the provider is inherently asynchronous (like the ECF filetransfer
API). But how high would we allow this to bubble up, how would we
treat requests on the sychronous API if the provider is asynchronous
or vice versa?
A good question....though I believe the answer is application-specific
though (even in the context of 'app' as 'Eclipse plugin/bundle/tool/RCP
app, etc'). Which is why I do think having both sync and async variants
is useful, rather than trying to say to clients 'this one API (sync or
async) is your access to the file system' (remote and/or local). I
agree the simplicity of such a file system model is very appealing, but
I'm not convinced it is effective for all applications (i.e. some
applications require some other things).
Of course, with both sync and async APIs the respective advantages and
disadvantages (as Martin begins to lay out above) of both approaches
(performance/blocking behavior, OS-level resource usage,
locking/synchronization requirements, etc need to be specified as
clearly as possible in the API and implementations, so that clients can
make informed choices about which approaches make sense in any given
situation.
One other thing to point mention...one approach that we've (ECF) found
helpful for the creation of model-based editors is replication. That
is, for some use cases (when read access to a model must be very
fast...e.g. for rendering), but write access can reasonably be slower it
makes sense to replicate the state of the model, and then use async
messaging plus a synchronization approach...e.g. optimistic, pessimistic
[three-phase commit, etc], or something in between [e.g. ECF's cola for
real-time shared editing]. Then (frequent) read accesses to the model
will be very fast (accessing local replication), and write accesses can
be non-blocking (async). We've begun work on an ECF 'synchronization
strategy' API, that for documents provides a way to synchronize/resolve
conflicting local and/or remote changes that are delivered
asynchronously. It uses ECF's cola algorithm (which is based upon a
concept called 'operational transforms') to resolve conflicting local
and/or remote changes. If people are interested in the development of
this API please see:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=234142
Note for the moment it's focused on resolving changes in documents (i.e.
Strings), but the operational transform notion can be extended to other
model forms. We just haven't done so (yet).
My $0.03.
Scott