Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] JGit DFS backend - has anyone tried to implement Cassandra?

These things are possible, though there hasn't been an open sourced version of that approach. The implementation requires the DfsRefDatabase to be subclassed and supplied as part of (a subclass of) DfsRepository. The ref database can then reach out to something like Cassandra to store individual ref updates. The abstract methods are quite easy to implement - scanAllRefs needs to get a list and updates are controlled with compareAndPut or compareAndRemove.

The DfsObjDatabase is the one that needs more work - everything is treated as a list of .pack files. A pack file is the same as the ordinary git implementation but has additional metadata. You really need the pack files to be synchronised with the refs, so that if a ref is in the ref database then the pack is available everywhere. The internal implementation of the library does most of the work for you but if eg Cassandra is shared across many machines but the pack is written by (eg) nfs then an immediate read to the updated ref might not find the pack on a remote nfs mount. 

The approach that DfsGarbageCollector takes is to compress everything down to a single* pack file (or used to, anyway). I'm not sure whether Cassandra would make sense for pack files - my gut instinct would be "no" unless they are small repositories. 

I built in a retry if a pack wasn't immediately available as part of the implementation, and because the DfsReader was random access and DfsOutputStream was one-shot I ended up caching incoming content to a spool file and then calculating checksums to upload to the remote server (due to back end limitations). 

So these things are possible but implementations outside the InMemoryRepositoy aren't open source. You can use that as a template and I'd be happy to answer questions where I can. 

Alex

Sent from my iPhat 6

On 7 Jan 2016, at 12:44, Luca Milanesio <luca.milanesio@xxxxxxxxx> wrote:

Hi JGit and Gerrit devs :-)

I was looking for a more distributed, fault-tolerant and scalable backend for Git objects ... and I was thinking about using Apache Cassandra :-)
Has anyone tried the approach? Succeeded? Failed?

I've found some references from Shawn in a post a few years ago at [1] but have no idea if there was any further action on that.

Thank you in advance for the feedback.

Luca.

--
--
To unsubscribe, email repo-discuss+unsubscribe@xxxxxxxxxxxxxxxx
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@xxxxxxxxxxxxxxxx.
For more options, visit https://groups.google.com/d/optout.

Back to the top