Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[jgit-dev] pubsub work coming to JGit

Matthias asked what Ian's changes are about, here is my weak attempt
at describing it...

Ian is working on adding a pubsub feature to Git. At its simplest form
a client has a group of repositories that are already local that the
client wants to keep current. For example, I have the EGit and JGit
repositories on my workstation. I want those to always have available
to me the latest master that has been submitted on git.eclipse.org.
The pubsub client will be a JGit process running in the background on
my desktop, maintaing a persistent TCP socket with the git.eclipse.org
server. When this connection starts, the client tells the server which
repositories it wants (e.g. "jgit/jgit, egit/egit"). Whenever changes
are made at the server that the client is interested in, data is
pushed directly to the client over this persistent TCP socket.

This is really built not for the small-ish egit/jgit case, but for the
Android case where there are 400+ repositories constantly changing at
the server end. By registering subscriptions, clients can be informed
of updates, rather than polling for them with big for loops around git
fetch commands. It also really helps with a large number of clients.
Instead of computing deltas to update a client from its current
position to the server's branch tip, the server creates a pack once at
the time of branch update to go from the current branch value to the
new branch value, and then distributes that pack to all interested
clients. Most of these packs are going to be small enough that they
can be held in memory in the server and dumped out through an
NIO/select/poll type of distribution to the clients. This saves a lot
of server resources.

For clients it means they might only be seconds behind the server at
any given time. Which means doing a `git pull origin master` is no
longer network bound, but instead just has to update the local working
directory.

For remote distributed offices we are considering building a proxy in
the office that knows how to aggregate subscriptions upstream, and
fanout the data to its clients. This means a distributed office might
only need to have the data sent to it once, rather than N times for N
workstations. By making the proxy just a stream duplicator it has no
state, and does not really need to worry about the security of the
data it stores, its all transient in RAM. It also doesn't need to
worry about doing `git gc` on the proxy, as the proxy isn't really a
GIt repository. Its just a forwarding service.


The initial implementation is going into JGit, hopefully before Ian
finishes his internship with us.  :-)


Back to the top