[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
RE: [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous
|
Hi Kevin,
great points. I think you're hitting the nail on the head
:-)
The points that I currently see with respect to sync vs. async
are:
-
In
terms of programming model, I agree that we should
allow doing simple
things in a simple way. Thus allow
clients working
synchronously.
-
Doing
that, I don't think we really sacrifice much. Because
in Resources, I
don't expect that we have hundreds or
thousands of distinct concurrent
queries... not much
chance of coalescing independent queries... which
means
that we can affort a number of jobs run in parallel for
synchronous access.
-
That
being said, some providers are "natively async" such
as the ECF ones,
while others are "natively synchronous".
Similarly, for the clients some
tasks may be "natively
async" while others are "natively synchronous". It
may be
worthwile allowing both sync and async variants at various
layers of the API... not in order to win any performance or
user
experience, but just in order to allow providers/clients
work the way that
they naturally would and thus avoid
conversion
loss.
I think that
what really hurts us with large, slow workspaces today
are conceptual things
more than sync vs. async FS API:
-
Lack of support
for a "lazy refresh" on portions of the
workspace. I'm aware that a lazy
refresh changes some
workspace semantics, such as visitors who expect to
walk the entire workspace. But do we really always
need a deep
refresh?
-
Lack of Resource
Filters to not ever look at things known
to be not
interesting.
-
Lack of API for
accessing portions of a huge file (editor
support for virtual paging
of huge files instead of just
InputStream).
-
Lack of API for
"Remotifying" the WS on a high level,
such that WS Visitors could run
entirely on the remote
-
Lack of multiple
Refresh Jobs... e.g. when a large slow
refresh job on /foo is pending, but
I quickly need /foo/bar/baz
refreshed in order to satisfy some UI query,
I'd like to
suspend the large Refresh Job, start a small one for the
UI query, then resume the large one but avoid doing the
small refresh
yet again. I'm aware that such a feature is
very tricky to get right and
may be a slippery road.
Cheers,
--
Martin Oberhuber, Senior Member of Technical
Staff, Wind River
Target Management Project
Lead, DSDP PMC Member
Just getting caught up on this
thread, great discussion.
Some
time ago I made the statement that maybe we should assume that all data is
remote/slow, and too large to bring local, then determine the right resource
model to support it. The notion would be that if in fact I was local and
fast, then its a bonus, since naively things that are written kindly towards
being slow just work better if its in fact fast.
But I realize now from this discussion that misses
important realities:
1) The
programming model for working async is much harder.
Meanwhile, one of our stated goals for e4
is to make programming in Eclipse easier. I'd love for us to
handle better remote and big resource sets, but I'm not sure I want to
sacrifice anything in the typical local programming to do so. Thus I
want to program sync for fast things and async for slow things.
Unfortunately that's two APIs, two slightly different programming
models, and ignores the fact that I (the programmer) might not be able to
guess if its the slow or fast performing case (the example mentioned of a
network share is a good one).
2)
The UI is different.
Right now we
try to do tricks for jobs/progress to try to optimize for short jobs (e.g.
delay showing the monitor), since there's nothing more distracting that
progress dialogs that appear and disappear. We're really trying to cheat
and provide two UI experiences, one for fast cases which happen to be
wrapped in jobs, and one for the real slow/async ones. But we fail in a
different way, since a delay before the appearance of a progress monitor can
be disconcerting and provide the false impression that the system is sluggish.
Thus ideally you'd get the right UI from the start, not based on the
type of task, but rather on its real performance characteristics (Step 1:
build time machine, Step 2: time the operation, Step 3: go back in time and
choose which UI to expose).
Inherently there's the question of whether to allow people to do other
tasks while the initial task is completing. Often this is in the nature
of the task. Lets say I'm drag/dropping a file. From a user task
point of view, this is a synchronous and continuous task. If quering the
drop targets was, for sake of argument async, that doesn't help me, since the
operation must continue to have the illusion of being synchronous otherwise
bad things will happen (imagine a progress dialog showing up, how odd).
Ideally though we'd like to have a reasonable timeout, so that if for
some reason the file system wasn't responsive, then the UI didn't remain hung
forever.
Regards,
Kevin
Scott Lewis
<slewis@xxxxxxxxxxxxx> Sent by: eclipse-incubator-e4-dev-bounces@xxxxxxxxxxx
10/22/2008 07:16 PM
Please respond
to E4 developer list
<eclipse-incubator-e4-dev@xxxxxxxxxxx> |
|
To
| E4 developer list
<eclipse-incubator-e4-dev@xxxxxxxxxxx>
|
cc
|
|
Subject
| Re: [eclipse-incubator-e4-dev]
[resources] EFS, ECF and asynchronous |
|
Hi Martin,
I agree with your examples below.
RE: proper
programming patterns...I think this is *the* hard thing in
terms of API
design. That is, completely valid assumptions in a local
world (that
a directory browse access won't block for multiple seconds
and block the
entire UI) are easily and frequently violated in the
network world (e.g.
because NFS blocks frequently and doesn't handle
remote failure very
robustly). This makes it extremely hard to define
APIs that aren't
based upon the 'worst case'.
'Well-behaved' programmers could protect
every potentially blocking i/o
method by using threads/jobs, but that
would make it very cumbersome to
use, and be wasted effort (and OS
resources) for the common case (local
disk access).
You are right
that asynchronous APIs force the client to do processing
and not wait
(unless the programmer explicitly builds in such a wait).
That
frequently makes them harder to use (because when a result is
required
doing all that listening for callbacks and explicit waiting is
a
pain).
Scott
Oberhuber, Martin wrote:
> Hi
Michael,
>
> to me, the difference between sync and async is not
so much
> about speed or the number of Threads anymore - it's
about
> enforcing proper programming patterns. That's something
I
> actually learned during this discussion.
>
> Some
examples:
>
> * Open a Directory Browse Dialog that happens to be
initialized
> with the URI remote://foohost/bar/baz and foohost
happens not
> to be online. All UI is blocked, you cannot even
cancel the
> request.
>
> * This can even happen
with a LOCAL file system, I've seen this
> repeatedly: My UNIX
homedir is shared via SMB to my Windows
> machine. In my UNIX
home I have some symbolic links that point
> to other NFS-shared
folders from machines that are offline.
> Just opening a
directory browse dialog takes like forever
> (even on Windows
Explorer!)
>
> * Dbl click large file foo.txt which is stored on a
local SMB
> shared, to load into the editor. While loading the
file,
> your network cable gets plugged off for some reason.
Depending
> on how the editor loading is implemented, all of
Eclipse may
> hang.
>
> * How often have you seen an
Eclipse Progress Monitor like
> "Waiting for Refresh Job to
complete..." ?
> Is it really necessary that the Refresh Job
locks the workspace
> for writing? Or could we allow more
concurrency here?
>
> Yes, of course you can defer all synchronous
queries into Jobs
> with Progress etc... but do we actually do that? Not
always.
> And rightly so, because the hassle of creating a Job to
make
> the synchronous API happy is likely more than dealing with
an
> async API right away.
>
> Asynchronous APIs just
*force* the client to do something useful
> until the response of the
request comes in. Where "something
> useful" could be just as simple as
allowing a user to press
> CANCEL.
>
> As an end user, I'm
OK with waiting if I know I must wait. But
> I'd like to cancel
operations that I believe won't return anyways,
> and I'd like to do
other stuff in parallel until my request
> completes.
>
>
Cheers,
> --
> Martin Oberhuber, Senior Member of Technical Staff,
Wind River
> Target Management Project Lead, DSDP PMC Member
>
http://www.eclipse.org/dsdp/tm
>
>
>
>
>> -----Original Message-----
>> From:
eclipse-incubator-e4-dev-bounces@xxxxxxxxxxx
>>
[mailto:eclipse-incubator-e4-dev-bounces@xxxxxxxxxxx] On
>> Behalf
Of Michael Scharf
>> Sent: Wednesday, October 22, 2008 9:14
AM
>> To: E4 developer list
>> Subject: Re:
[eclipse-incubator-e4-dev] [resources] EFS, ECF
>> and
asynchronous
>>
>> When it comes to sync versus async at the
EFS level, there
>> is something I don't understand (probably because
I don't
>> know all the details of the APIs): I thought that
IResource
>> is a kind of snapshot of the underlying EFS structure.
If I
>> don't synchronize my workspace then IResource might
show
>> me a structure that is not consistent with the file
system.
>> Eclipse can deal with that. It happens often to me
that
>> I open a file that does not exist anymore because
I
>> forget to synchronize a directory that I have
changed
>> externally.
>>
>> The synchronization is
already a process that can take long
>> (and it does with some huge
workspaces I have). So, where/when
>> is the of fast (synchronous)
access to EFS needed/used/expected?
>>
>> I think a user
that deals with a remote workspace is able to
>> understand that
things cannot go as fast as on a local file
>> system. She might
understand that caching is involved. And that
>> an update (of the
cache) takes time. I would not hide this.
>> So, what are the
cases/workflows where asynchronous access to
>> EFS is important if a
local cache is involved?
>>
>>
Michael
>>
>>
>>> Hi
Scott,
>>>
>>>
>>>> 2) Asynchronous access to files/resources is desirable
and in
>>>> some cases necessary (for some use
cases)
>>>>
>>> Could
you cite a use case where async access is
necessary?
>>>
>>> I think that (assuming all
synchronous methods have progress
>>> monitors for cancellation,
which is the case in EFS), the
>>> only difference between sync
and async access is
>>> (1) the number of Threads in
"wait" state,
>>> (2) locking of resources while Threads
synchronously wait,
>>> (3) potential for coalescing
multiple requests to the
>>> same item in the
case of asynchronous queries.
>>>
>>> In the
asynchronous case, no Threads are waiting and resources
>>> *may*
be unlocked until the callback returns, but this unlocking
>>> of
resources needs to be carefully considered in each case.
>>> Does
the system always remain in a consistent state? RESTful
>>>
systems ensure this by placing all state info right into the
>>>
request, which is a great idea but likely not always possible.
>>>
It's not only a matter of the API being complex or not. The fact
>>> is that the concept of being asynchronous as such is more
flexible,
>>> but also requires adopters to be more careful, or at
least think
>>> along different
lines.
>>>
>>> I also think that we should look into
the need for being
>>> asynchronous or not separately for the
kinds of requests:
>>> (A) Directory retrieval (aka
childNames())
>>> (B) Full file tree
retrieval
>>> (C) Status/Attribute retrieval for an
individual file
>>> (D) File contents
retrieval
>>>
>>> For (D) we already use Streams in
EFS, which can be
>>> implemented in an asynchronous manner.
What's currently
>>> missing in EFS is the ability to perform
random access,
>>> like the JSR 203 SeekableByteChannel [1].
Interestingly, nio2
>>> has both a synchronous FileChannel [2]
and
>>> AsynchronousFileChannel
[3].
>>>
>>> For (A), (B), (C) I'm not sure how much
we would win from
>>> an asynchronous variant, since I'd assume
that not much
>>> work could be done (and not much resources
freed) while
>>> asynchronously waiting for their result anyways.
But perhaps
>>> I'm wrong?
>>>
>>>
>>>> 3) Using (e.g.) adapters it's not necessary
to force such
>>>>
>> an
API on
>>
>>>> anyone (rather it can
be available when needed)
>>>>
>>> Hm... so, let's assume that client X wants to do something
>>> asynchronous. So it does
>>>
myFileStore.getAdapter(IAsyncFileStore.class);
>>> some file
systems would provide that adapter, others not.
>>> What's the
client's fallback strategy in case the async
>>> adapter is not
available?
>>>
>>> I'm afraid that if we use such
adapters, we end up with the
>>> same code in clients again and
again, because they need some
>>> fallbacks strategy. It seems
wiser to place the fallback
>>> strategy right into the EFS
provider, since it is always
>>> possible to write a bridge
between a synchronous and an
>>> asynchronous API in a single,
generic way.
>>>
>>> Therefore, I'm more in favor of
determining what APIs we want
>>> to be asynchronous, and just
adding them to EFS. The adapter
>>> idea could be used for adding
provisional API, but the final
>>> API should not need
that.
>>>
>>>
>>>>> To that extent, let's start assuming that files are
quick
>>>>>
>>>> and local. And
>>>>
>>>>> let's investigate how we could leverage ECF to
support remote file
>>>>> systems. If that doesn't meet our
needs, we can always add
>>>>>
>>>> async later.
>>>>
>>> I'm not sure if this is a good strategy. It
seems to lead
>>> towards more and more separation of local vs.
remote --
>>> which, I think, leads to either duplication of code
in the
>>> end, or non-uniform workflows for end
users.
>>>
>>> Let me draw some sceanrio of what the
world could look like
>>> in 10 years: with the Internet getting
more and more into
>>> our lives, you'd want to use an Eclipse
based product to
>>> dive into some code base that you just found
on the net.
>>> Without downloading everything in advance. Or you
browse into
>>> some mp3 music store. Add some remotely hosted
Open Source
>>> Library to your UML drawing just by drag and
drop.
>>>
>>> I think that users will more and more
want to operate on
>>> remote networked resources just the same as
on local
>>> resources. E4 gives us the chance to try and come up
with
>>> models that support such workflows in a uniform way.
Let's
>>> not throw away that chance
prematurely.
>>>
>>> I agree that we need to start on
concrete work items
>>> rather than endlessly discussing concepts.
But as we
>>> start on these work items, let's keep the concept
that
>>> things may be remote in our
minds.
>>>
>>>
>>>> Sounds reasonable. Just as an aside: I think
there's a lot
>>>> of potential to use asynchronous file
transfer + replication
>>>> to do caching of remote
resources.
>>>>
>>>
That's a great approach, especially if it works on the
>>> file
block level (such that random access to huge remote
>>> files can
be cached). Again, one thing that's missing from EFS
>>> today is
random access to files. Does ECF have it?
>>>
>>>
[1]
>>>
>>>
>>
http://openjdk.java.net/projects/nio/javadoc/java/nio/channels
>>
/SeekableB
>>
>>>
yteChannel.html
>>> [2]
>>>
>>>
>>
http://openjdk.java.net/projects/nio/javadoc/java/nio/channels
>>
/FileChann
>>
>>> el.html
>>>
[3]
>>>
>>>
>>
http://openjdk.java.net/projects/nio/javadoc/java/nio/channels
>>
/Asynchron
>>
>>>
ousFileChannel.html
>>>
>>> Cheers,
>>>
--
>>> Martin Oberhuber, Senior Member of Technical Staff, Wind
River
>>> Target Management Project Lead, DSDP PMC
Member
>>> http://www.eclipse.org/dsdp/tm
>>>
_______________________________________________
>>>
eclipse-incubator-e4-dev mailing list
>>>
eclipse-incubator-e4-dev@xxxxxxxxxxx
>>>
https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev
>>>
>>
_______________________________________________
>>
eclipse-incubator-e4-dev mailing list
>>
eclipse-incubator-e4-dev@xxxxxxxxxxx
>>
https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev
>>
>>
> _______________________________________________
>
eclipse-incubator-e4-dev mailing list
>
eclipse-incubator-e4-dev@xxxxxxxxxxx
>
https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev
>
_______________________________________________
eclipse-incubator-e4-dev
mailing
list
eclipse-incubator-e4-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev