[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Parsing PUSH request via smart http

On Tue, Aug 7, 2012 at 7:42 AM, Zsolt Koppany <zkoppanylist@xxxxxxxxxxx> wrote:
> I just would like to parse the information and send EVERYTHING to
> /usr/lib/git-core/git-http-backend and that should provide the answer/reply
> what I send back to the client.

You can't parse a Git pack stream to extract data from a commit easily.

> It works pretty good, I just would like to get (parse) that information from
> the request and store it into our (MySql) database.
>
> Do I understand it correctly that there is no easy way doing it with jgit?

There is no easy way to do this. The pack stream is a pretty complex
to process. If you want to scan it in JGit, you might as well run the
full JGit servlet code to handle the request, instead of passing the
stream off to git-http-backend.

One crazy option would be to use the pkt-line parser to read the
commands from the start of the stream, buffer these in RAM, then shove
the commands and the raw pack data through to git-http-backend, scan
its status report, and on successful results behave like the
PreReceiveHook I suggesting using where you use a RevWalk in JGit to
scan through the new commits in each successfully updated branch.
After that scan is complete, forward the status report onto the
client. (Or do the scan in the background, after sending the status
report to the client.) This is nuts because you need to handle a lot
of the pack protocol yourself, including corner cases around the
report-status exchange, side-band messages from the backend, etc. Its
really not a path you want to go down unless you are deeply familiar
with the wire protocol and can commit to keeping current with it. Few
companies do this... Google does it by implementing JGit and keeping
JGit up-to-date. GitHub does it with their own glue code. Nobody else
is this crazy.

> Is GitServlet as robust/fast as git-http-backend?

Its pretty good. We have been using it in production in Gerrit Code
Review for nearly 4 years. The hosting JVM needs a bunch of memory to
handle the transient data required to parse the pack stream.
git-http-backend forks a new child process git-receive-pack that
mallocs a ton of memory as necessary, and then exits and release this
memory back to the operating system. JGit can't do that, it has to be
all within the JVM heap, so the heap tends to be pretty large.

The entire Android operating system is developed using a Git server
that only runs JGit. Its not a small project. Its not a small
development team. :-)