[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [p2-dev] Mirror ranking

This is why I tell people "if at first your update fails, restart Eclipse and try again" (and generally it's fine on the second or third attempt).

Problem is when you have a build depending on p2.director, it's harder to just try it again when your builds takes hours...

+1 for a better approach to *temporary* mirror failures.


Thomas Hallgren wrote:
I mirrored Helios today and it basically took forever. After a few hours, I was beginning to wonder what was going on and luckily, the process ran in a debugger. I found that the top ranked mirror was the one at eclipse.org. That surprised me since I know that I have a fast mirror in Sweden that serves up a copy of Helios.

First I checked if this mirror was included in the list served up by the mirror request to Eclipse.org. It was. Next, I stopped the debugger and patched the URL for entry number zero in my mirrors list with the URL of that mirror. I resumed and now the processing went very much faster. So the mirror was actually OK.

So why did download.eclipse.org move to the top of the list? It's supposed to be right at the bottom. The algorithm for sorting the mirrors looks like this:

        public int compareTo(Object o) {
            if (!(o instanceof MirrorInfo))
                return 0;
            MirrorInfo that = (MirrorInfo) o;
            //less failures is better
            if (this.failureCount != that.failureCount)
                return this.failureCount - that.failureCount;
            //faster is better
            if (this.bytesPerSecond != that.bytesPerSecond)
                return (int) (that.bytesPerSecond - this.bytesPerSecond);
            //trust that initial rank indicates geographical proximity
            return this.initialRank - that.initialRank;

A failure count of one will deem the mirror forever worse then a failure count the zero, no matter if that mirror is a hundred times faster. I think that was what caused my problem. All mirrors in the list have a failureCount of 1 and a byte-count of -1, except two, download.eclipse.org (initialRank = 55) and one other (initialRank=10) because after some initial failure, they were never given a second chance.

My guess is that something went wrong at the very beginning that caused all mirrors except download.eclipse.org and node number 10 to fail. Not sure what that was. That however, moved download.eclipse.org to the top and node number 10 to second place. And although I have mirrors 100 times faster close by, they are never consulted again. I'm downloading about 3.800 artifacts.

Mirrors may have temporary and fairly short outages. They may be incomplete in some respect, or just be under very heavy load for a short period of time. I think the algorithm could be improved by adding a periodic retry on mirrors with an initialRank value that indicates that it is geographically close. I also think that we should have a ratio between high transfer rate and failure count. Let's say that 5 times higher transfer rate is worth one failure. Perhaps a successful transfer should reset the failure count, or at least cut it in half so that failures are forgiven by subsequent good behavior.

One question that I don't know the answer to at this point is what happens when an artifact is missing although it should be there according to the artifact repository. Will the mirror get punished by that? If that's the case, then it's not so good. The same will be true on all mirrors but the best one will be punished.

What do you think?

- thomas

p2-dev mailing list

-- Nick Boldt :: http://nick.divbyzero.com Release Engineer :: Eclipse Modeling & Dash Athena