| How to gc an amazon-s3 remote with jgit? [message #880684] |
Sat, 02 June 2012 13:42  |
Joshua Redstone Messages: 5 Registered: June 2012 |
Junior Member |
|
|
Hi,
I've been using jgit for a while, configured to use amazon s3 as a backing store, by specifying a remote of amazon-s3 in .git/config. Does anyone know how to do a git-gc on the git state in s3? It's accumulating a ton of files and could use some cleaning.
Thanks,
Josh
|
|
|
|
|
|
| Re: How to gc an amazon-s3 remote with jgit? [message #989067 is a reply to message #988920] |
Tue, 04 December 2012 08:59   |
Christian Halstrick Messages: 71 Registered: July 2009 |
Member |
|
|
I think that we (and also native git) don't have a gc algorithm which works on remote repositories. It is always like that that the GC algorithms want to work on a local repository. But we do have in jgit multiple kinds of local repositories regarding how data is persisted. We do have traditional Filesystem based repositories (implmented in FileRepository) which store the repo date in a .git folder. There is also a kind of repo storing the data in a DFS (implemented in DfsRepository.java). The way how repos store their data influences heavily what kind of garbage can exist and therefore we have different GC implementations for these two repo types (implemented in GC.java and DfsGarbageCollector.java). But: we don't have a repository type which stores data in S3. I fear that doing a gc for S3 is quite a lot of effort. One has to know how data is stored in S3, which garbage can exist and so on. Can't you do a poor man's gc: clone the S3 based repo to the local Filesystem, run a local gc, delete the content on S3 and push the local repo to S3?
Ciao
Chris
|
|
|
|
|
| Re: How to gc an amazon-s3 remote with jgit? [message #989110 is a reply to message #989101] |
Tue, 04 December 2012 11:25   |
Christian Halstrick Messages: 71 Registered: July 2009 |
Member |
|
|
I know about TransportAmazonS3. This means that transport related jgit commands (fetch, push, clone, ...) can work with remote repositories located in S3. Means we can clone a repo stored on S3 into a filesystem based local repo. Or we can push from a filesystem based local repo into a remote repo stored on S3. With TransportAmazonS3 we learn how to talk to remote repos stored in S3. But those repos are still remote repositories. And GarbageCollector have nothing to do with transports. They want to work on local repos. There is no support in jgit I know of that a local repository stores it's data in S3.
Re s3fs: I fully agree. s3fs seems not to work with the way jgit uses s3. But s3fs-c differs from s3fs exactly in this topic. And maybe s3fs-c is compatible with the way how jgit stores the path information.
Whats about clone-to-local -> garbage_collect_local -> delete_and_recreate_empty_repo_on_S3 -> push_local_to_S3
Ciao
Chris
|
|
|
| Re: How to gc an amazon-s3 remote with jgit? [message #999714 is a reply to message #989110] |
Sat, 12 January 2013 19:07  |
Joshua Redstone Messages: 5 Registered: June 2012 |
Junior Member |
|
|
I've been fiddling around with s3fs-c and sort of got it to work with jgit/s3. A few things I've found:
- I needed to specify "sharedRepository = all" in the config file on s3.
- The -ouse_cache doesn't work at all for me. When I specify an existing, empty directory, it is not populated, and any attempt to access files on s3 produces a 'bad file descriptor' error, though listing directory contents works fine.
- I have not yet gotten git-gc to work on s3fs-c. It always dies part-way through with different kinds of data corruption errors. I've observed that s3fs-c doesn't like reading files after they've been written. E.g., if I modify the git config and then 'cat ./config', I get truncated contents, like it's using the wrong file size when reading.
|
|
|
Powered by
FUDForum. Page generated in 0.01979 seconds