[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [ecf-dev] E-intro [Was Efficient downloads]
|
Hi Filip,
Filip Hrbek wrote:
Hi Scott, comments inside.
- resume from a different location (e.g. different mirror)
Hmm. Don't know how you are going to accomplish that without
something quite different from normal http, but sounds interesting.
Not sure for what protocols we are able to implement. To do this, we
must be able to start downloading at a particular offset and finally
check the file consistency, e.g. using a digest file if available. We
also have to have a list of mirrors containing the same artifact
(let's assume we've obtained it somewhere). This should be possible
with http
There could be API supporting this feature.
This is what I would like to understand, as if additional API is
*required* I would like to get that API (probably implemented as an
adapter) into the ECF filetransfer API prior to the implementation.
Protocols which wouldn't support this would either make a workaround,
or throw an exception.
The approach we've generally been using to allow runtime access to
optional/additional features is IAdaptable:
ISomeInterface adapter = (ISomeInterface)
someAdaptable.getAdapter(ISomeInterface.class);
if (adapter == null) {
// optional feature not supported
} else {
// optional feature is supported...use it!
adapter.<whatever>
}
This makes it possible to introduce new API (ISomeInterface) in plugin
separate from filetransfer API, or in same plugin. It's quite handy,
also, in the use of the IAdapterManager OSGi service/extension point,
which lets new plugins set themselves up as implementers of a given
interface declaratively. In any event, we don't have to use this
mechanism to introduce new API, but we can if necessary/desired and it
will have minimal impact on existing API.
- retrieving information from special headers (like
Content-Disposition)
- detecting URL redirections to final mirrors
I'm not sure what you are going to use to implement this, but would
be curious to find out.
If you download a file from an URL, you have to discover the filename
if user doesn't specify it explicitly. The most precise solution is
parsing the Content-Disposition header if it's available (browsers use
it for determining the name of the file to save). Unlike other http
headers, Content-Disposion has a very complex syntax. We should be
able to parse it properly.
OK. Do all http x.y servers support Content-Disposition? Could you
also point to the spec for it (w3c?) just for my information? And do
you know if Apache httpclient 3.0.1 implements the parsing of
Content-Disposition? If so, then perhaps the existing
org.eclipse.ecf.provider.filetransfer.httpclient could simply be modified.
Detecting URL redirections would help us in statistics collection. It
would be wrong to assign statistics belonging to different mirrors to
one URL covering all the mirrors. This is why we should detect that
reading from the covering URL points to different mirrors on different
retrieval attempts. Finally we could automatically deprecate using
some of the black-listed mirrors to avoid speed or timeout problems.
OK, this does sound like new API/interfaces for collecting these
statistics.
I think you would need to describe what statistics are desired here.
We can easily add adapter interfaces for collecting statistics
associated with a given file retrieval/all to ecf or individual
providers, but would need to know what stats are of interest.
The most interesting statistics:
- average download speed (related to concrete mirrors, geographical
provider/consumer location, day time etc.)
- amount of bytes downloaded from particular location / during
particular time period
- frequency of timeouts including timeout values
- etc.
We could share the statistics among users in an application by storing
them on a server (the downloader would send the statistics to the
server automatically). This would prevent users from attempts to
access corrupted/slow repositories.
OK. Remy may want to comment on the overlap of these statistics with
bittorrent (have you looked at bt as a possible approach? as it's pretty
ubiquitous) and whether or not a common stats api could/should be
created for both. Remy is the committer that did the bittorrent impl.
We won't be able to do that immediately, given Europa finishing work, as
I'm sure you understand.
Scott