Re: [cross-project-issues-dev] Download stats and p2

Before I respond to Martin's question, I'd like to apologize to the p2 team. While the suggested solution may have been a hack (unintended behaviour) in the update manager, it really is a purpose-designed feature in p2. It is hardly a "hack". I have added a section to the "Equinox p2 Getting Started for Releng" page [1] that describes the "Artifacts.xml mapping rule change".

Onto the response...

I believe that the list of mirrors sent to p2 *does not* include (I may be incorrect) so we won't have even enough data to base an approximation upon. We are leaning very heavily on our mirrors for this release (we are not adding any additional bandwidth).

The concern with using server logs from mirrors is that the best we can hope for is an approximation of what's really happening.

It is doubtful that all mirrors will participate. If only a couple of the major mirrors do not participate, our numbers will be woefully incorrect. Mirrors come and go, which would make maintaining an accurate approximation challenging.

We anticipate hat none of our major mirror providers will consent to providing us with their data. I may, for example, be able to convince the good folks at the University of Waterloo to hand it over; I might be able to convince them to set up some kind of job to do it on a regular basis. However, I am skeptical that they actually will.

You should also keep in mind that these organizations provide mirrors for many sites. Even if they do decide to hand it over, we will likely find ourselves buried in irrelevant log data.

I am willing to try approaching one or two of the mirror providers to see how feasible this is, but I am not hopeful.

FWIW, I haven't heard anything on this topic after the board meeting this week. Hopefully tomorrow, I'll get some feedback to see how big a deal this really is.



Oberhuber, Martin wrote:
Hi Wayne et al,

I'd like to ask back regarding option (1) from your E-Mail,
direct download stats from the web and ftp servers' access
logs on (and those mirrors who happen to give them to us).

I'm assuming that for such logs already
exist, and recalling Denis' excited "shooting for 1 Mio
downloads now" blog or similar in previous years, I'm further assuming that at least for the analysis is not that bad.

Going for the server logs gives the most accurate data at
zero impact for the release itself. I'm not a web guy, but
I do assume that tools exist for analyzing those access logs.
Why not just go and ask some of the mirrors and see who is willing to collaborate?

But perhaps that is happening already, the stats are being
prepared but details are confidential for strategic members only [some small reward for strategic membership]... while
some aggregate numbers are shared with the Community...

Martin Oberhuber, Senior Member of Technical Staff, Wind River
Target Management Project Lead, DSDP PMC Member
Greetings all. We have a small problem. Actually, I guess that the problem is as big as you choose to decide it is...

The Eclipse Foundation tracks downloads that go through the download.php script:[...]

This includes things like the packages and direct downloads provided by projects (assuming that everybody is using the script in their download links).

Downloads that occur through p2 do not go through this script. They go directly to our download server and to our mirrors. The mirrors do not (and arguably cannot reasonably) provide us with download stats.

So... if somebody, for example, downloads the "Eclipse IDE for PHP Developers" we will know that we have one more download of PDT. If they instead download the "Eclipse IDE for Java Developers" and then use p2 to add PDT to their configuration, we currently do not have any way of tracking that download of PDT.

Inability to accurately track downloads is a huge concern for the Eclipse Board.

We have explored several mechanisms for tracking this download. Unfortunately, we've not been holding these conversations as publicly as I'd like, so I'll summarize them briefly below...

1. Get mirrors to give us their download stats. We could ask. But most will not give them to us. Besides, their logs probably contain information about everything they mirror, which will be way more information than we need. And it'll be a heck of a lot of information for our webmasters to weed through.

2. Add a plug-in that gathers information from p2 post install and send that information to Effectively, this is a call-home mechanism that will require some additional UI elements and considerable effort awfully late in our development cycle. Ultimately, it will require some kind of opt-in from the user; many of whom will refuse leaving us with incomplete data. FWIW, we could use the UDC for this, but it has the same problem.

3. All p2 downloads go through Denis is concerned that the download.php script and--to some degree--the rest of our infrastructure will not be able to scale to handle the value that can potentially come from p2 downloads. FWIW, we're not increasing our bandwidth for Galileo; instead, we're depending very heavily on mirrors.

Bug 239668 [1] has been open for some time to discuss this issue.

We've decided that the best approach is something that we've been calling the "Single File Hack". In this hack, we configure the p2 metadata (artifacts.xml) to send requests for some small subset of the files to Ideally, we send requests for one plug-in or feature for each thing that we need to track. The number of files needs to be kept relatively small.

There are problems with this hack. For one, becomes a single point of failure for all downloads. Further, we will have to let organizations that mirror our downloads for internal consumption know how to turn it off.

What we're going to need from each project is the names of the files that we need to be tracking.

I'd love to hear your thoughts on this topic.


