[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: [cross-project-issues-dev] Download stats and p2

Hi Wayne,

thanks for your answers. I have two notes:

1 - Even if we get only one mirror's logs that may be helpful
    to double check whether our mirroring / p2 strategies
    do really work as expected. How often is content.jar fetched?
    How often are the pack.gz fetched vs the original .jar ?
    Is there any kind of request that goes to eclipse.org only?
    How many failures are reported? ...

2 - Again I don't know what web server logs look like, but
    from my naïve understanding we could go a pretty long way
    with something very simple. If we just wont total numbers
    (not grouped by geo region of downloader), this may be enough:

cat /var/logs/httpd.access \
  | grep '{interesting date range}' \
  | grep /path/to/mirrors/eclipse \
  | sed -e '{extract filename only}' \
  | sort \
  | awk '{count consecutive occurrances}'

Assuming that Apache is the prevalent web server, the server logs
shouldn't be all that different. If we test such a script on our
own server to get total numbers and then ask one or two mirrors
to run this and mail back the results every day...

Cheers,
--
Martin Oberhuber, Senior Member of Technical Staff, Wind River
Target Management Project Lead, DSDP PMC Member
http://www.eclipse.org/dsdp/tm
 
 

> -----Original Message-----
> From: cross-project-issues-dev-bounces@xxxxxxxxxxx 
> [mailto:cross-project-issues-dev-bounces@xxxxxxxxxxx] On 
> Behalf Of Wayne Beaton
> Sent: Donnerstag, 18. Juni 2009 04:51
> To: Cross project issues
> Subject: Re: [cross-project-issues-dev] Download stats and p2
> 
> Before I respond to Martin's question, I'd like to apologize 
> to the p2 
> team. While the suggested solution may have been a hack (unintended 
> behaviour) in the update manager, it really is a purpose-designed 
> feature in p2. It is hardly a "hack". I have added a section to the 
> "Equinox p2 Getting Started for Releng" page [1] that describes the 
> "Artifacts.xml mapping rule change".
> 
> Onto the response...
> 
> I believe that the list of mirrors sent to p2 *does not* include 
> eclipse.org (I may be incorrect) so we won't have even enough data to 
> base an approximation upon. We are leaning very heavily on 
> our mirrors 
> for this release (we are not adding any additional bandwidth).
> 
> The concern with using server logs from mirrors is that the 
> best we can 
> hope for is an approximation of what's really happening.
> 
> It is doubtful that all mirrors will participate. If only a couple of 
> the major mirrors do not participate, our numbers will be woefully 
> incorrect. Mirrors come and go, which would make maintaining 
> an accurate 
> approximation challenging.
> 
> We anticipate hat none of our major mirror providers will consent to 
> providing us with their data. I may, for example, be able to convince 
> the good folks at the University of Waterloo to hand it over; 
> I might be 
> able to convince them to set up some kind of job to do it on 
> a regular 
> basis. However, I am skeptical that they actually will.
> 
> You should also keep in mind that these organizations provide mirrors 
> for many sites. Even if they do decide to hand it over, we 
> will likely 
> find ourselves buried in irrelevant log data.
> 
> I am willing to try approaching one or two of the mirror providers to 
> see how feasible this is, but I am not hopeful.
> 
> FWIW, I haven't heard anything on this topic after the board meeting 
> this week. Hopefully tomorrow, I'll get some feedback to see 
> how big a 
> deal this really is.
> 
> Wayne
> 
> [1]http://wiki.eclipse.org/Equinox_p2_Getting_Started_for_Releng
> 
> Oberhuber, Martin wrote:
> > Hi Wayne et al,
> >
> > I'd like to ask back regarding option (1) from your E-Mail,
> > direct download stats from the web and ftp servers' access
> > logs on Eclipse.org (and those mirrors who happen to give them 
> > to us).
> >
> > I'm assuming that for download.eclipse.org such logs already
> > exist, and recalling Denis' excited "shooting for 1 Mio
> > downloads now" blog or similar in previous years, I'm further 
> > assuming that at least for Eclipse.org the analysis is not 
> > that bad.
> >
> > Going for the server logs gives the most accurate data at
> > zero impact for the release itself. I'm not a web guy, but
> > I do assume that tools exist for analyzing those access logs.
> > Why not just go and ask some of the mirrors and see who is 
> > willing to collaborate?
> >
> > But perhaps that is happening already, the stats are being
> > prepared but details are confidential for strategic members 
> > only [some small reward for strategic membership]... while
> > some aggregate numbers are shared with the Community...
> >
> > Cheers,
> > --
> > Martin Oberhuber, Senior Member of Technical Staff, Wind River
> > Target Management Project Lead, DSDP PMC Member
> > http://www.eclipse.org/dsdp/tm
> >  
> >  
> >
> >   
> >> -----Original Message-----
> >> From: cross-project-issues-dev-bounces@xxxxxxxxxxx 
> >> [mailto:cross-project-issues-dev-bounces@xxxxxxxxxxx] On 
> >> Behalf Of Wayne Beaton
> >> Sent: Freitag, 12. Juni 2009 20:51
> >> To: cross project issues
> >> Subject: [cross-project-issues-dev] Download stats and p2
> >>
> >> Greetings all. We have a small problem. Actually, I guess that the 
> >> problem is as big as you choose to decide it is...
> >>
> >> The Eclipse Foundation tracks downloads that go through the 
> >> download.php 
> >> script:
> >>
> >> http://www.eclipse.org/downloads/download.php?file=[...]
> >>
> >> This includes things like the packages and direct downloads 
> >> provided by 
> >> projects (assuming that everybody is using the script in 
> >> their download 
> >> links).
> >>
> >> Downloads that occur through p2 do not go through this 
> >> script. They go 
> >> directly to our download server and to our mirrors. The 
> >> mirrors do not 
> >> (and arguably cannot reasonably) provide us with download stats.
> >>
> >> So... if somebody, for example, downloads the "Eclipse IDE for PHP 
> >> Developers" we will know that we have one more download of 
> >> PDT. If they 
> >> instead download the "Eclipse IDE for Java Developers" and 
> >> then use p2 
> >> to add PDT to their configuration, we currently do not have 
> >> any way of 
> >> tracking that download of PDT.
> >>
> >> Inability to accurately track downloads is a huge concern for the 
> >> Eclipse Board.
> >>
> >> We have explored several mechanisms for tracking this download. 
> >> Unfortunately, we've not been holding these conversations as 
> >> publicly as 
> >> I'd like, so I'll summarize them briefly below...
> >>
> >> 1. Get mirrors to give us their download stats. We could ask. 
> >> But most 
> >> will not give them to us. Besides, their logs probably contain 
> >> information about everything they mirror, which will be way more 
> >> information than we need. And it'll be a heck of a lot of 
> information 
> >> for our webmasters to weed through.
> >>
> >> 2. Add a plug-in that gathers information from p2 post 
> >> install and send 
> >> that information to eclipse.org. Effectively, this is a call-home 
> >> mechanism that will require some additional UI elements and 
> >> considerable 
> >> effort awfully late in our development cycle. Ultimately, it will 
> >> require some kind of opt-in from the user; many of whom 
> will refuse 
> >> leaving us with incomplete data. FWIW, we could use the 
> UDC for this, 
> >> but it has the same problem.
> >>
> >> 3. All p2 downloads go through eclipse.org. Denis is 
> >> concerned that the 
> >> download.php script and--to some degree--the rest of our 
> >> infrastructure 
> >> will not be able to scale to handle the value that can 
> >> potentially come 
> >> from p2 downloads. FWIW, we're not increasing our bandwidth 
> >> for Galileo; 
> >> instead, we're depending very heavily on mirrors.
> >>
> >> Bug 239668 [1] has been open for some time to discuss this issue.
> >>
> >> We've decided that the best approach is something that we've been 
> >> calling the "Single File Hack". In this hack, we configure the p2 
> >> metadata (artifacts.xml) to send requests for some small 
> >> subset of the 
> >> files to eclipse.org. Ideally, we send requests for one plug-in or 
> >> feature for each thing that we need to track. The number of 
> >> files needs 
> >> to be kept relatively small.
> >>
> >> There are problems with this hack. For one, eclipse.org 
> >> becomes a single 
> >> point of failure for all downloads. Further, we will have to let 
> >> organizations that mirror our downloads for internal 
> consumption know 
> >> how to turn it off.
> >>
> >> What we're going to need from each project is the names of 
> the files 
> >> that we need to be tracking.
> >>
> >> I'd love to hear your thoughts on this topic.
> >>
> >> Wayne
> >>
> >> [1]https://bugs.eclipse.org/bugs/show_bug.cgi?id=239668
> >> _______________________________________________
> >> cross-project-issues-dev mailing list
> >> cross-project-issues-dev@xxxxxxxxxxx
> >> https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev
> >>
> >>     
> > _______________________________________________
> > cross-project-issues-dev mailing list
> > cross-project-issues-dev@xxxxxxxxxxx
> > https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev
> >   
> _______________________________________________
> cross-project-issues-dev mailing list
> cross-project-issues-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev
>