Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [eclipse-incubator-e4-dev] [resources] Asynchronous APIs for EFS

Hi Martin,

Interesting points...and good discussion to have, I believe. Some of my thoughts/comments below.

Oberhuber, Martin wrote:
Hi all,
I thought a little bit about what it means to have both synchronous and asynchronous APIs at the file system layer. Some questions come up:

    * What does it mean for clients: does each client need to be aware
      of both API variants ? How do clients pick any variant? It seems
      like if we offer dual sync/async natures, that duplicate concept
      would bubble up through all our architecture, which does not
      seem desirable.


I think it's debatable whether it's desirable...I do see your point that one API is always better than two (i.e. less complexity, fewer client choices, etc).

But I think the evidence does show that both synchronous and asynchronous APIs for IO (in particular) are useful, and in some cases necessary. For example, java's new io (nio), is an asynchronous IO API that is (IMHO)

1) Harder to use than a blocking/synchronous API (i.e. using the normal java io/stream classes) 2) Useful/necessary for some API clients (e.g. those that require more scalability)

    * What is the granularity of being synchronous / asynchronous? Can
      a provider choose returning synchronously or asynchronously with
      each call, or does it need to pick one strategy once and for all?


This is a very interesting question...which, I don't have a ready answer for :). I think that the most I can say about this is that clients that 'know' what their scalability, performance, reliability requirements are of the communications layer should be able to use one or the other (sync and/or async) as appropriate to their application. I do think that 'hiding' one behind the other at the file system layer ultimately creates very hard issues of performance (e.g. running Eclipse over EFS-ftp), and/or reliability (e.g. having asynchronous messaging with no failure detection). In some ways this is related to the issue of 'transparency' in networked applications...i.e. whether the network's characteristics (e.g. slower by orders of magnitude, much more likely to partially fail than a local file system) can/should be 'hidden' behind a single API that allows clients to use the same calls whether or not the file system is local (File) or over network (FTP, etc).

So by personal disposition I would be inclined to allow call-level decisions about synchronous vs asynchronous patterns. For example, ECF's IRemoteCall interface allows several 'styles' of invocation of a remote service:

http://www.eclipse.org/ecf/org.eclipse.ecf.docs/api/org/eclipse/ecf/remoteservice/IRemoteService.html

This does mean more complexity/decisions for clients (i.e. it can be a proxy, but it doesn't *have* to be), but it does add a layer of flexibility in use of a remote API (e.g. with AsychResults...i.e. 'Futures').

It's always possible to write a bridge between for an asynchronous API to drive synchronous providers, or the other way round. But the benefit of being synchronous or asynchronous in a particular situation can only be leveraged if it bubbles up right into the application layer!

True, I agree.

    * Asynchronous APIs add considerable overhead for fast queries
      (i.e. have at least one Thread switch for the callback, even if
      the result is available immediately); also, in terms of system
      consistency and resource locking, what kinds of locks can really
      be given up while an asynchronous request is pending but before
      its result is in?


Good questions...but the use case you describe (i.e. fast/local queries) smells like a good use case for synchronous invocation. And further, thread context switching is almost always of very low cost relative to actual inter-process communication (e.g. serialization, transmission/io over wire, etc).

    * Synchronous APIs add considerable overhead for slow queries
      (i.e. explosion in the number of Threads in wait state, thus
      locking Resources).


Agreed.

To be more concrete, let's look at the current primary EFS methods in IFileStore that can be slow on a remote FS:

    * Information Retrieval
          o String[] childNames(int options, IProgressMonitor);   //
            and its relatives: childInfos(), childStores()
          o IFileInfo fetchInfo(int options, IProgressMonitor);
    * Manipulation
          o void copy(IFileStore destination, int options,
            IProgressMonitor);
          o void delete(int options, IProgressMonitor);
          o void mkdir(int options, IProgressMonitor);
          o void move(IFileStore destination, int options,
            IProgressMonitor);
          o void putInfo(IFileInfo, int options, IProgressMonitor);

It certainly makes sense to have asynchronous variants of these, if the provider is inherently asynchronous (like the ECF filetransfer API). But how high would we allow this to bubble up, how would we treat requests on the sychronous API if the provider is asynchronous or vice versa?

A good question....though I believe the answer is application-specific though (even in the context of 'app' as 'Eclipse plugin/bundle/tool/RCP app, etc'). Which is why I do think having both sync and async variants is useful, rather than trying to say to clients 'this one API (sync or async) is your access to the file system' (remote and/or local). I agree the simplicity of such a file system model is very appealing, but I'm not convinced it is effective for all applications (i.e. some applications require some other things).

Of course, with both sync and async APIs the respective advantages and disadvantages (as Martin begins to lay out above) of both approaches (performance/blocking behavior, OS-level resource usage, locking/synchronization requirements, etc need to be specified as clearly as possible in the API and implementations, so that clients can make informed choices about which approaches make sense in any given situation.

One other thing to point mention...one approach that we've (ECF) found helpful for the creation of model-based editors is replication. That is, for some use cases (when read access to a model must be very fast...e.g. for rendering), but write access can reasonably be slower it makes sense to replicate the state of the model, and then use async messaging plus a synchronization approach...e.g. optimistic, pessimistic [three-phase commit, etc], or something in between [e.g. ECF's cola for real-time shared editing]. Then (frequent) read accesses to the model will be very fast (accessing local replication), and write accesses can be non-blocking (async). We've begun work on an ECF 'synchronization strategy' API, that for documents provides a way to synchronize/resolve conflicting local and/or remote changes that are delivered asynchronously. It uses ECF's cola algorithm (which is based upon a concept called 'operational transforms') to resolve conflicting local and/or remote changes. If people are interested in the development of this API please see:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=234142

Note for the moment it's focused on resolving changes in documents (i.e. Strings), but the operational transform notion can be extended to other model forms. We just haven't done so (yet).

My $0.03.

Scott





Back to the top