[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] pubsub work coming to JGit

On Sun, Jun 17, 2012 at 3:08 PM, Matthias Sohn
<matthias.sohn@xxxxxxxxxxxxxx> wrote:
> 2012/6/16 Shawn Pearce <spearce@xxxxxxxxxxx>
>>
>> Matthias asked what Ian's changes are about, here is my weak attempt
>> at describing it...
>>
>> Ian is working on adding a pubsub feature to Git. At its simplest form
>> a client has a group of repositories that are already local that the
>> client wants to keep current. For example, I have the EGit and JGit
>> repositories on my workstation. I want those to always have available
>> to me the latest master that has been submitted on git.eclipse.org.
>> The pubsub client will be a JGit process running in the background on
>> my desktop, maintaing a persistent TCP socket with the git.eclipse.org
>> server. When this connection starts, the client tells the server which
>> repositories it wants (e.g. "jgit/jgit, egit/egit"). Whenever changes
>> are made at the server that the client is interested in, data is
>> pushed directly to the client over this persistent TCP socket.
>>
>> This is really built not for the small-ish egit/jgit case, but for the
>> Android case where there are 400+ repositories constantly changing at
>> the server end. By registering subscriptions, clients can be informed
>> of updates, rather than polling for them with big for loops around git
>> fetch commands. It also really helps with a large number of clients.
>> Instead of computing deltas to update a client from its current
>> position to the server's branch tip, the server creates a pack once at
>> the time of branch update to go from the current branch value to the
>> new branch value, and then distributes that pack to all interested
>> clients. Most of these packs are going to be small enough that they
>> can be held in memory in the server and dumped out through an
>> NIO/select/poll type of distribution to the clients. This saves a lot
>> of server resources.
>>
>> For clients it means they might only be seconds behind the server at
>> any given time. Which means doing a `git pull origin master` is no
>> longer network bound, but instead just has to update the local working
>> directory.
>>
>> For remote distributed offices we are considering building a proxy in
>> the office that knows how to aggregate subscriptions upstream, and
>> fanout the data to its clients. This means a distributed office might
>> only need to have the data sent to it once, rather than N times for N
>> workstations. By making the proxy just a stream duplicator it has no
>> state, and does not really need to worry about the security of the
>> data it stores, its all transient in RAM. It also doesn't need to
>> worry about doing `git gc` on the proxy, as the proxy isn't really a
>> GIt repository. Its just a forwarding service.
>>
>>
>> The initial implementation is going into JGit, hopefully before Ian
>> finishes his internship with us.  :-)
>
> that's a great idea, looking forward to smart git repository swarms :)
>
> Will the new protocol also support encrypted data transfer ?

It will run over the existing transports, so if you want encryption
use either SSH or HTTPS, just like with any other Git operation.

> Any plans for tunneling this over http?

Yes. It will run over HTTP. This is actually our first target case,
because our servers only speak HTTP and we really want to deploy this
for our internal users. :-)

> Maybe using websockets ?

No. Websockets is a massively overcomplicated solution in search of a
problem to solve. This is not the problem it would solve.