[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [cross-project-issues-dev] Download stats and p2

Martin,

At the risk of sounding like a punter, there is nothing stopping "us" (you, me, anybody) from doing any of this right now. Simply subscribe to eclipse-mirrors and ask around.  I know some mirrors actually publish their log files online, although I can't remember which ones.

As a point of reference, the download.eclipse.org log for the 24 hour period yesterday is 1 GB, and we only have 80Mbps of bandwidth.  I can put this log somewhere on the build server for you if you'd like to play with it.

Denis


Oberhuber, Martin wrote:
Hi Wayne,

thanks for your answers. I have two notes:

1 - Even if we get only one mirror's logs that may be helpful
    to double check whether our mirroring / p2 strategies
    do really work as expected. How often is content.jar fetched?
    How often are the pack.gz fetched vs the original .jar ?
    Is there any kind of request that goes to eclipse.org only?
    How many failures are reported? ...

2 - Again I don't know what web server logs look like, but
    from my naïve understanding we could go a pretty long way
    with something very simple. If we just wont total numbers
    (not grouped by geo region of downloader), this may be enough:

cat /var/logs/httpd.access \
  | grep '{interesting date range}' \
  | grep /path/to/mirrors/eclipse \
  | sed -e '{extract filename only}' \
  | sort \
  | awk '{count consecutive occurrances}'

Assuming that Apache is the prevalent web server, the server logs
shouldn't be all that different. If we test such a script on our
own server to get total numbers and then ask one or two mirrors
to run this and mail back the results every day...

Cheers,
--
Martin Oberhuber, Senior Member of Technical Staff, Wind River
Target Management Project Lead, DSDP PMC Member
http://www.eclipse.org/dsdp/tm
 
 

  
-----Original Message-----
From: cross-project-issues-dev-bounces@xxxxxxxxxxx 
[mailto:cross-project-issues-dev-bounces@xxxxxxxxxxx] On 
Behalf Of Wayne Beaton
Sent: Donnerstag, 18. Juni 2009 04:51
To: Cross project issues
Subject: Re: [cross-project-issues-dev] Download stats and p2

Before I respond to Martin's question, I'd like to apologize 
to the p2 
team. While the suggested solution may have been a hack (unintended 
behaviour) in the update manager, it really is a purpose-designed 
feature in p2. It is hardly a "hack". I have added a section to the 
"Equinox p2 Getting Started for Releng" page [1] that describes the 
"Artifacts.xml mapping rule change".

Onto the response...

I believe that the list of mirrors sent to p2 *does not* include 
eclipse.org (I may be incorrect) so we won't have even enough data to 
base an approximation upon. We are leaning very heavily on 
our mirrors 
for this release (we are not adding any additional bandwidth).

The concern with using server logs from mirrors is that the 
best we can 
hope for is an approximation of what's really happening.

It is doubtful that all mirrors will participate. If only a couple of 
the major mirrors do not participate, our numbers will be woefully 
incorrect. Mirrors come and go, which would make maintaining 
an accurate 
approximation challenging.

We anticipate hat none of our major mirror providers will consent to 
providing us with their data. I may, for example, be able to convince 
the good folks at the University of Waterloo to hand it over; 
I might be 
able to convince them to set up some kind of job to do it on 
a regular 
basis. However, I am skeptical that they actually will.

You should also keep in mind that these organizations provide mirrors 
for many sites. Even if they do decide to hand it over, we 
will likely 
find ourselves buried in irrelevant log data.

I am willing to try approaching one or two of the mirror providers to 
see how feasible this is, but I am not hopeful.

FWIW, I haven't heard anything on this topic after the board meeting 
this week. Hopefully tomorrow, I'll get some feedback to see 
how big a 
deal this really is.

Wayne

[1]http://wiki.eclipse.org/Equinox_p2_Getting_Started_for_Releng

Oberhuber, Martin wrote:
    
Hi Wayne et al,

I'd like to ask back regarding option (1) from your E-Mail,
direct download stats from the web and ftp servers' access
logs on Eclipse.org (and those mirrors who happen to give them 
to us).

I'm assuming that for download.eclipse.org such logs already
exist, and recalling Denis' excited "shooting for 1 Mio
downloads now" blog or similar in previous years, I'm further 
assuming that at least for Eclipse.org the analysis is not 
that bad.

Going for the server logs gives the most accurate data at
zero impact for the release itself. I'm not a web guy, but
I do assume that tools exist for analyzing those access logs.
Why not just go and ask some of the mirrors and see who is 
willing to collaborate?

But perhaps that is happening already, the stats are being
prepared but details are confidential for strategic members 
only [some small reward for strategic membership]... while
some aggregate numbers are shared with the Community...

Cheers,
--
Martin Oberhuber, Senior Member of Technical Staff, Wind River
Target Management Project Lead, DSDP PMC Member
http://www.eclipse.org/dsdp/tm
 
 

  
      
-----Original Message-----
From: cross-project-issues-dev-bounces@xxxxxxxxxxx 
[mailto:cross-project-issues-dev-bounces@xxxxxxxxxxx] On 
Behalf Of Wayne Beaton
Sent: Freitag, 12. Juni 2009 20:51
To: cross project issues
Subject: [cross-project-issues-dev] Download stats and p2

Greetings all. We have a small problem. Actually, I guess that the 
problem is as big as you choose to decide it is...

The Eclipse Foundation tracks downloads that go through the 
download.php 
script:

http://www.eclipse.org/downloads/download.php?file=[...]

This includes things like the packages and direct downloads 
provided by 
projects (assuming that everybody is using the script in 
their download 
links).

Downloads that occur through p2 do not go through this 
script. They go 
directly to our download server and to our mirrors. The 
mirrors do not 
(and arguably cannot reasonably) provide us with download stats.

So... if somebody, for example, downloads the "Eclipse IDE for PHP 
Developers" we will know that we have one more download of 
PDT. If they 
instead download the "Eclipse IDE for Java Developers" and 
then use p2 
to add PDT to their configuration, we currently do not have 
any way of 
tracking that download of PDT.

Inability to accurately track downloads is a huge concern for the 
Eclipse Board.

We have explored several mechanisms for tracking this download. 
Unfortunately, we've not been holding these conversations as 
publicly as 
I'd like, so I'll summarize them briefly below...

1. Get mirrors to give us their download stats. We could ask. 
But most 
will not give them to us. Besides, their logs probably contain 
information about everything they mirror, which will be way more 
information than we need. And it'll be a heck of a lot of 
        
information 
    
for our webmasters to weed through.

2. Add a plug-in that gathers information from p2 post 
install and send 
that information to eclipse.org. Effectively, this is a call-home 
mechanism that will require some additional UI elements and 
considerable 
effort awfully late in our development cycle. Ultimately, it will 
require some kind of opt-in from the user; many of whom 
        
will refuse 
    
leaving us with incomplete data. FWIW, we could use the 
        
UDC for this, 
    
but it has the same problem.

3. All p2 downloads go through eclipse.org. Denis is 
concerned that the 
download.php script and--to some degree--the rest of our 
infrastructure 
will not be able to scale to handle the value that can 
potentially come 
from p2 downloads. FWIW, we're not increasing our bandwidth 
for Galileo; 
instead, we're depending very heavily on mirrors.

Bug 239668 [1] has been open for some time to discuss this issue.

We've decided that the best approach is something that we've been 
calling the "Single File Hack". In this hack, we configure the p2 
metadata (artifacts.xml) to send requests for some small 
subset of the 
files to eclipse.org. Ideally, we send requests for one plug-in or 
feature for each thing that we need to track. The number of 
files needs 
to be kept relatively small.

There are problems with this hack. For one, eclipse.org 
becomes a single 
point of failure for all downloads. Further, we will have to let 
organizations that mirror our downloads for internal 
        
consumption know 
    
how to turn it off.

What we're going to need from each project is the names of 
        
the files 
    
that we need to be tracking.

I'd love to hear your thoughts on this topic.

Wayne

[1]https://bugs.eclipse.org/bugs/show_bug.cgi?id=239668
_______________________________________________
cross-project-issues-dev mailing list
cross-project-issues-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev

    
        
_______________________________________________
cross-project-issues-dev mailing list
cross-project-issues-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev
  
      
_______________________________________________
cross-project-issues-dev mailing list
cross-project-issues-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev

    
_______________________________________________
cross-project-issues-dev mailing list
cross-project-issues-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev
  

--
Denis Roy
Manager, IT Infrastructure
Eclipse Foundation, Inc. -- http://www.eclipse.org/
Office: 613.224.9461 x224 (Eastern time)
denis.roy@xxxxxxxxxxx