Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ecf-dev] E-intro [Was Efficient downloads]

Hi Scott,

- retrieving information from special headers (like Content-Disposition)
- detecting URL redirections to final mirrors

I'm not sure what you are going to use to implement this, but would be curious to find out.
If you download a file from an URL, you have to discover the filename if user doesn't specify it explicitly. The most precise solution is parsing the Content-Disposition header if it's available (browsers use it for determining the name of the file to save). Unlike other http headers, Content-Disposion has a very complex syntax. We should be able to parse it properly.

OK. Do all http x.y servers support Content-Disposition? Could you also point to the spec for it (w3c?) just for my information? And do you know if Apache httpclient 3.0.1 implements the parsing of Content-Disposition? If so, then perhaps the existing org.eclipse.ecf.provider.filetransfer.httpclient could simply be modified.


This document describes the Content-Disposition syntax: http://www.faqs.org/ftp/rfc/pdf/rfc2183.txt.pdf
There might be some more official document, I'd have to search for it.

I looked into the HttpClient documentation. I guess that this API could be used for parsing such a header: http://jakarta.apache.org/commons/httpclient/apidocs/org/apache/commons/httpclient/HeaderElement.html
(not tested by me yet, I might be wrong).

The Content-Disposition header doesn't have to be supported directly by the server. Let me introduce two basic use cases of pointing to a file with an URL

a) Direct download
There's a physical file saved on the server. The web server is able to serve this file directly. No Content-Disposition header is sent to the client. URL example: http://my.downloads.com/filestorage/my-wonderful-piece-of-work-1.0.0.zip The browser/application doesn't find the Content-Disposition header, but it can use the last segment of URL to determine the file name (i.e. my-wonderful-piece-of-work-1.0.0.zip)

b) Download of a virtual file
There's no physical file saved on the server (can be stored as a blob in a database or generated on demand at the moment of the download request), or its real location is secret. The web server is able to serve the data dynamically (using php, jsp or whatever else). No filename is visible in the URL, but Content-Disposition contains information about the file. URL example: http://my.downloads.com/virtualstorage/download.php?id=142355&use_best_mirror=1 Content-Disposition header: attachment; filename="my-wonderful-piece-of-work-1.0.0.zip" The browser/application finds the Content-Disposition header. The information retrieved from there has higher priority than any information from the URL.

What to do if there's no Content-Disposition header in this case? It's a question. Saving the file as "download.php" is probably not a very good idea. There must be another alternative how to tell the downloader what to do if no reasonable filename can be retreived from the URL or http headers (e.g. specify explicit required filename and override automatic file name).


Now I'm talking about use cases, not about the API. As I've said before, I have to look into what you already have to be able to imagine what can be done right now with current API and what's needed to add or modify.

Regards
 Filip



Back to the top