|Re: Local file system abstraction still a work in progress ? (was: [jgit-dev] Using JGit in Google AppEngine)|
On Sat, Aug 14, 2010 at 5:21 AM, Thomas Sauzedde <yaourt@xxxxxxxxxxxxxx> wrote: > Another dumb question... > In my git internals understanding quest, I realize that a single pack could > be quite huge (within a smart protocol transaction) ... Yes. It can be the size of the entire project, and is during an initial clone. For example the current history of the Linux kernel is upwards of 396 MiB the last time I checked its size in Git. So the initial clone of that project is a single 396 MiB pack being sent to the client. > AFAIU, a pack is required to be autonomous, in order to be able to > "reconstruct" loose objects on the other side of the channel. Yes. > I didn't check the numbers but with a large git repo (let's say something > like the linux kernel src), if I'm trying to clone such a repo and store it > on GAE, I suppose that I will reach GAE limitations. Yes. I don't want to discourage you, but I know a lot about GAE, even details that aren't generally public, and I think you are going to run into trouble with even smaller Git projects. Git is just very demanding, and GAE has fairly small request limits because its built for fast transaction web pages, not for bulk data processing. GAE is a very interesting platform, but its not designed for general computation. > Let's take this example, I'm cloning such a large repo locally, and then add > a remote (empty) repo stored in GAE (my target). > Then I'm pushing my local repo to this GAE remote ... > AFAIU, during this last operation, there will be a single pack per ref Its a pack for the entire transaction. If you push 3 refs in a single command line, its a single pack containing the data for all 3 refs. If you push the entire Linux kernel repository to an empty destination, it sends 396 MiB in a single HTTP POST request. > pushed and so I suppose I will reach a GAE limitation like the 10MB per HTTP > request (and / or the 30sec per request but this is another issue) ?!? Probably. Like I said above, GAE is built for fast transaction web pages where the response time target for a request is under 1 second, and the payload is small form data or small web content. It isn't suited to large data transfers. > I can afford such a thing, but I'm wondering if I understood how smart / > pack protocol is working ... Well, even with purchased quota on GAE I don't think they will let you exceed some hard limits on per-request CPU time, or per-request/response payload size. Smart HTTP transactions may still be capped at 10 MiB per transfer even when you purchase capacity, which means you can't push a large project like the Linux kernel. Even a smaller project like git.git is ~26 MiB for its entire history. It would be interesting to see what you can come up with, but I have studied this problem (hosting Git on a cloud platform) and its not as simple as it sounds if you want to handle any of the common repositories out there (git itself, Linux kernel, etc.). For a tiny toy project its probably quite trivial (their repositories are often below 1 MiB in size). -- Shawn.
Back to the top