Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » EGit » How to gc an amazon-s3 remote with jgit?(amazon s3 git garbage collection)
How to gc an amazon-s3 remote with jgit? [message #880684] Sat, 02 June 2012 17:42 Go to next message
Joshua Redstone is currently offline Joshua RedstoneFriend
Messages: 5
Registered: June 2012
Junior Member
Hi,
I've been using jgit for a while, configured to use amazon s3 as a backing store, by specifying a remote of amazon-s3 in .git/config. Does anyone know how to do a git-gc on the git state in s3? It's accumulating a ton of files and could use some cleaning.
Thanks,
Josh
Re: How to gc an amazon-s3 remote with jgit? [message #880927 is a reply to message #880684] Sun, 03 June 2012 11:46 Go to previous messageGo to next message
Christian Halstrick is currently offline Christian HalstrickFriend
Messages: 113
Registered: July 2009
Senior Member
can't you just mount the s3 bucket into your filesystem and then run a git-gc with native git. JGit has a garbage collector only in an open, not yet accepted change (https://git.eclipse.org/r/#/c/4705/) - so I suggest you use native git.

Ciao
Chris
Re: How to gc an amazon-s3 remote with jgit? [message #893624 is a reply to message #880927] Thu, 05 July 2012 01:24 Go to previous messageGo to next message
Joshua Redstone is currently offline Joshua RedstoneFriend
Messages: 5
Registered: June 2012
Junior Member
Hi Christian,
I tried s3fs and it looks like it can't read directories created with jgit.
There is a thread on the s3fs issues page (I can't provide link since I'm new to Eclipse) of people talking about s3fs being limited in ability to see dirs created by other packages. It's id=73. Any empirically, I can't see anything inside a bucket mounted via s3fs.

Any other ideas? Seems like there's gotta be a way.
Cheers,
Josh
Re: How to gc an amazon-s3 remote with jgit? [message #988920 is a reply to message #880927] Mon, 03 December 2012 17:33 Go to previous messageGo to next message
Joshua Redstone is currently offline Joshua RedstoneFriend
Messages: 5
Registered: June 2012
Junior Member
The code to GC over s3 has been committed and merged into jgit as of version 2.1.0
https://git.eclipse.org/r/#/c/4705/

Is there a way to trigger this code from the jgit CLI?
Re: How to gc an amazon-s3 remote with jgit? [message #989067 is a reply to message #988920] Tue, 04 December 2012 13:59 Go to previous messageGo to next message
Christian Halstrick is currently offline Christian HalstrickFriend
Messages: 113
Registered: July 2009
Senior Member
I think that we (and also native git) don't have a gc algorithm which works on remote repositories. It is always like that that the GC algorithms want to work on a local repository. But we do have in jgit multiple kinds of local repositories regarding how data is persisted. We do have traditional Filesystem based repositories (implmented in FileRepository) which store the repo date in a .git folder. There is also a kind of repo storing the data in a DFS (implemented in DfsRepository.java). The way how repos store their data influences heavily what kind of garbage can exist and therefore we have different GC implementations for these two repo types (implemented in GC.java and DfsGarbageCollector.java). But: we don't have a repository type which stores data in S3. I fear that doing a gc for S3 is quite a lot of effort. One has to know how data is stored in S3, which garbage can exist and so on. Can't you do a poor man's gc: clone the S3 based repo to the local Filesystem, run a local gc, delete the content on S3 and push the local repo to S3?



Ciao
Chris
Re: How to gc an amazon-s3 remote with jgit? [message #989071 is a reply to message #989067] Tue, 04 December 2012 14:04 Go to previous messageGo to next message
Christian Halstrick is currently offline Christian HalstrickFriend
Messages: 113
Registered: July 2009
Senior Member
Another thought: have you tried https://github.com/tongwang/s3fs-c?

Ciao
Chris
Re: How to gc an amazon-s3 remote with jgit? [message #989101 is a reply to message #989067] Tue, 04 December 2012 15:38 Go to previous messageGo to next message
Joshua Redstone is currently offline Joshua RedstoneFriend
Messages: 5
Registered: June 2012
Junior Member
Hi Christian,
Jgit has some support for s3. The jgit page on github (can't link because I'm too new to this site) says the s3 pack files don't support delta-ification, but functionally it works (and i use it). Looking at the source code, there's a suggestive bit:

public class TransportAmazonS3 extends HttpTransport implements WalkTransport

Given that it's implemented at what sounds like a low-level transport layer, the question becomes, will your GC code work with repositories stored via HttpTransports?

Re s3fs: I think the s3fs does not work with the way jgit uses s3. I think it's because of the way that the jgit s3 code names directories - it's not compatible with how s3fs does it (i vaguely remember seeing something about nulls and separators online somewhere)
Re: How to gc an amazon-s3 remote with jgit? [message #989110 is a reply to message #989101] Tue, 04 December 2012 16:25 Go to previous messageGo to next message
Christian Halstrick is currently offline Christian HalstrickFriend
Messages: 113
Registered: July 2009
Senior Member
I know about TransportAmazonS3. This means that transport related jgit commands (fetch, push, clone, ...) can work with remote repositories located in S3. Means we can clone a repo stored on S3 into a filesystem based local repo. Or we can push from a filesystem based local repo into a remote repo stored on S3. With TransportAmazonS3 we learn how to talk to remote repos stored in S3. But those repos are still remote repositories. And GarbageCollector have nothing to do with transports. They want to work on local repos. There is no support in jgit I know of that a local repository stores it's data in S3.

Re s3fs: I fully agree. s3fs seems not to work with the way jgit uses s3. But s3fs-c differs from s3fs exactly in this topic. And maybe s3fs-c is compatible with the way how jgit stores the path information.

Whats about clone-to-local -> garbage_collect_local -> delete_and_recreate_empty_repo_on_S3 -> push_local_to_S3





Ciao
Chris
Re: How to gc an amazon-s3 remote with jgit? [message #999714 is a reply to message #989110] Sun, 13 January 2013 00:07 Go to previous message
Joshua Redstone is currently offline Joshua RedstoneFriend
Messages: 5
Registered: June 2012
Junior Member
I've been fiddling around with s3fs-c and sort of got it to work with jgit/s3. A few things I've found:

- I needed to specify "sharedRepository = all" in the config file on s3.

- The -ouse_cache doesn't work at all for me. When I specify an existing, empty directory, it is not populated, and any attempt to access files on s3 produces a 'bad file descriptor' error, though listing directory contents works fine.

- I have not yet gotten git-gc to work on s3fs-c. It always dies part-way through with different kinds of data corruption errors. I've observed that s3fs-c doesn't like reading files after they've been written. E.g., if I modify the git config and then 'cat ./config', I get truncated contents, like it's using the wrong file size when reading.

Previous Topic:forced push automatically does force fetch?
Next Topic:GitHub Mylyn Connector 2.2
Goto Forum:
  


Current Time: Sat Nov 29 04:02:36 GMT 2014

Powered by FUDForum. Page generated in 0.03339 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software