[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ecf-dev] E-intro [Was Efficient downloads]
- From: Filip Hrbek <filip.hrbek@xxxxxxxxxxxxxx>
- Date: Wed, 30 May 2007 19:03:21 +0200
- Delivered-to: email@example.com
- Organization: Cloudsmith Inc.
- User-agent: Thunderbird 22.214.171.124 (Windows/20070326)
Hi Scott, comments inside.
- resume from a different location (e.g. different mirror)
Hmm. Don't know how you are going to accomplish that without
something quite different from normal http, but sounds interesting.
Not sure for what protocols we are able to implement. To do this, we
must be able to start downloading at a particular offset and finally
check the file consistency, e.g. using a digest file if available. We
also have to have a list of mirrors containing the same artifact (let's
assume we've obtained it somewhere). This should be possible with http.
There could be API supporting this feature. Protocols which wouldn't
support this would either make a workaround, or throw an exception.
If you download a file from an URL, you have to discover the filename if
user doesn't specify it explicitly. The most precise solution is parsing
the Content-Disposition header if it's available (browsers use it for
determining the name of the file to save). Unlike other http headers,
Content-Disposion has a very complex syntax. We should be able to parse
- retrieving information from special headers (like Content-Disposition)
- detecting URL redirections to final mirrors
I'm not sure what you are going to use to implement this, but would be
curious to find out.
Detecting URL redirections would help us in statistics collection. It
would be wrong to assign statistics belonging to different mirrors to
one URL covering all the mirrors. This is why we should detect that
reading from the covering URL points to different mirrors on different
retrieval attempts. Finally we could automatically deprecate using some
of the black-listed mirrors to avoid speed or timeout problems.
I think you would need to describe what statistics are desired here.
We can easily add adapter interfaces for collecting statistics
associated with a given file retrieval/all to ecf or individual
providers, but would need to know what stats are of interest.
The most interesting statistics:
- average download speed (related to concrete mirrors, geographical
provider/consumer location, day time etc.)
- amount of bytes downloaded from particular location / during
particular time period
- frequency of timeouts including timeout values
We could share the statistics among users in an application by storing
them on a server (the downloader would send the statistics to the server
automatically). This would prevent users from attempts to access