Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » EGit / JGit » LogCommand slows down after iterating 53k commits in large repository
LogCommand slows down after iterating 53k commits in large repository [message #1738660] Thu, 21 July 2016 20:40 Go to next message
Ravi KKFriend
Messages: 3
Registered: July 2016
Junior Member
LogCommand slows down after iterating 53k commits in large repository


I am using LogCommand with PathFilter for one file. The repository is very large and contains around 78k commits.
memory dont seem to be the problem. I have tried increasing xms and xmx.

public static void main(String[] args) throws Exception {
		File repoRoot = new File(
				"D:\\Workspaces\\xyz\\.git"); //$NON-NLS-1$
		FileRepository repo = new FileRepository(repoRoot);
		LogCommand log = new Git(repo).log();
		Iterable<RevCommit> walk = log
				 .addPath(
				 "a/b/c/filename.java")				 //$NON-NLS-1$
				.call();

		int i = 0;
		List<RevCommit> history = new ArrayList<RevCommit>();
		for (RevCommit commit : walk) {
			history.add(commit);
			i++;
		}
		System.out.println(i);
	}


I added some counter in org.eclipse.jgit.revwalk.PendingGenerator.next() and found that till 53k commits, the iteration is very fast (around 1 second). But after that its takes seconds for each 1000 commits.

	static int i = 0;
	@Override
	RevCommit next() throws MissingObjectException,
			IncorrectObjectTypeException, IOException {
		try {

			System.out
					.println("Start PendingGenerator.next() at " + new Date()); //$NON-NLS-1$
			for (;;) {
				final RevCommit c = pending.next();
				i++;

				if ((i > 50000 && i % 1000 == 0) || (i % 10000 == 0)) {
					System.out.println("Completed " + i + " commits at " //$NON-NLS-1$ //$NON-NLS-2$
							+ new Date() + " ."); //$NON-NLS-1$
				}


Following is the counter output from org.eclipse.jgit.revwalk.PendingGenerator.next(). Then i recloned the repository with depth=52000. Then my plugin is very fast since the LogCommand is very fast.

Start PendingGenerator.next() at Thu Jul 21 13:24:09 PDT 2016
Start PendingGenerator.next() at Thu Jul 21 13:24:09 PDT 2016
Start PendingGenerator.next() at Thu Jul 21 13:24:09 PDT 2016
Start PendingGenerator.next() at Thu Jul 21 13:24:09 PDT 2016
Start PendingGenerator.next() at Thu Jul 21 13:24:09 PDT 2016
Start PendingGenerator.next() at Thu Jul 21 13:24:09 PDT 2016
Completed 10000 commits at Thu Jul 21 13:24:10 PDT 2016 .
Completed 20000 commits at Thu Jul 21 13:24:10 PDT 2016 .
Start PendingGenerator.next() at Thu Jul 21 13:24:11 PDT 2016
Completed 30000 commits at Thu Jul 21 13:24:11 PDT 2016 .
Completed 40000 commits at Thu Jul 21 13:24:12 PDT 2016 .
Completed 50000 commits at Thu Jul 21 13:24:12 PDT 2016 .
Completed 51000 commits at Thu Jul 21 13:24:13 PDT 2016 .
Completed 52000 commits at Thu Jul 21 13:24:13 PDT 2016 .
Completed 53000 commits at Thu Jul 21 13:24:13 PDT 2016 .
Completed 54000 commits at Thu Jul 21 13:24:46 PDT 2016 .
Completed 55000 commits at Thu Jul 21 13:25:10 PDT 2016 .
Completed 56000 commits at Thu Jul 21 13:25:46 PDT 2016 .
Completed 57000 commits at Thu Jul 21 13:26:12 PDT 2016 .
Completed 58000 commits at Thu Jul 21 13:26:46 PDT 2016 .
Completed 59000 commits at Thu Jul 21 13:27:14 PDT 2016 .
Completed 60000 commits at Thu Jul 21 13:27:39 PDT 2016 .
Completed 61000 commits at Thu Jul 21 13:28:06 PDT 2016 .
Completed 62000 commits at Thu Jul 21 13:28:24 PDT 2016 .
Completed 63000 commits at Thu Jul 21 13:28:45 PDT 2016 .
Completed 64000 commits at Thu Jul 21 13:29:05 PDT 2016 .
Completed 65000 commits at Thu Jul 21 13:29:30 PDT 2016 .
Completed 66000 commits at Thu Jul 21 13:29:53 PDT 2016 .
Completed 67000 commits at Thu Jul 21 13:30:17 PDT 2016 .
Completed 68000 commits at Thu Jul 21 13:30:48 PDT 2016 .
Completed 69000 commits at Thu Jul 21 13:31:10 PDT 2016 .
Completed 70000 commits at Thu Jul 21 13:31:37 PDT 2016 .
Completed 71000 commits at Thu Jul 21 13:32:01 PDT 2016 .
Completed 72000 commits at Thu Jul 21 13:32:28 PDT 2016 .
Completed 73000 commits at Thu Jul 21 13:32:54 PDT 2016 .
Completed 74000 commits at Thu Jul 21 13:33:11 PDT 2016 .
Completed 75000 commits at Thu Jul 21 13:33:38 PDT 2016 .
Completed 76000 commits at Thu Jul 21 13:34:16 PDT 2016 .
Completed 77000 commits at Thu Jul 21 13:34:35 PDT 2016 .
Completed 78000 commits at Thu Jul 21 13:35:01 PDT 2016 .


For now we are going the restriction of depth=52k. But am interested in understanding the rootcause for this behavior.

Thanks,
Ravi
Re: LogCommand slows down after iterating 53k commits in large repository [message #1738961 is a reply to message #1738660] Tue, 26 July 2016 10:57 Go to previous messageGo to next message
Christian Halstrick is currently offline Christian HalstrickFriend
Messages: 241
Registered: July 2009
Senior Member
I have problems reproducing this. I took the linux repo [1] with 600k commits and searched for commits touching anything underneath "drivers/" (which are 300k commits). I couldn't see the behaviour you mention:

// should be called with two parameters: 1) the location of a linux repo, 2) the path which should be filtered
public class LogExistingRepo {
	public static void main(String args[]) throws IOException, GitAPIException, JGitInternalException {
		try (Git git = Git.open(new File(args[0]))) {
			System.out.println("opened repo at " + git.getRepository().getDirectory());
			int i = 0;
			long current, last=0;
			List<RevCommit> history = new ArrayList<RevCommit>();
			for (RevCommit c : git.log().addPath(args[1]).call()) {
				if (i==0) {
					last = System.currentTimeMillis();
				}
				history.add(c);
				i++;
				if (i > 0 && i % 100000 == 0) {
					current = System.currentTimeMillis();
					System.out.println("logged last 100000 commits in " + (current-last) + "ms");
					last = current;
				}
			}
			System.out.println("found " + i + " commits touching "+args[1]+ " in " + git);
		}
	}
}


The result was
opened repo at /home/chris/git/linux/.git
logged last 100000 commits in 46ms
logged last 100000 commits in 20ms
logged last 100000 commits in 16ms
found 310279 commits touching drivers/ in Git[Repository[/home/chris/git/linux/.git]]


But this is not exactly the use case you have where 78k commits really touch the same file. I don't have such a repo. Can your repo be shared?

Additional: in the moment we have a "history.add())" line in the loop. Maybe this ArrayList adds some extra performance penalty if you hit certain boundaries.


Ciao
Chris
Re: LogCommand slows down after iterating 53k commits in large repository [message #1738967 is a reply to message #1738961] Tue, 26 July 2016 11:13 Go to previous messageGo to next message
Christian Halstrick is currently offline Christian HalstrickFriend
Messages: 241
Registered: July 2009
Senior Member
Sorry: I forgot the link to the repo I tested:

[1] http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git


Ciao
Chris
Re: LogCommand slows down after iterating 53k commits in large repository [message #1739147 is a reply to message #1738967] Wed, 27 July 2016 18:22 Go to previous messageGo to next message
Ravi KKFriend
Messages: 3
Registered: July 2016
Junior Member
Thanks Chris for your reply.

yes I did not see the same slowness in the repository you have shared. am sorry i cannot share the repository.
One difference i see is that the repository is migrated from svn using git svn option. Other think i will have to check is whether any file of large size is involved in commits beyond 53k.

Thanks,
Ravi
Re: LogCommand slows down after iterating 53k commits in large repository [message #1739185 is a reply to message #1739147] Thu, 28 July 2016 07:36 Go to previous message
Christian Halstrick is currently offline Christian HalstrickFriend
Messages: 241
Registered: July 2009
Senior Member
The performance you see should be independent from the content of the commits. You are walking only over the commit graph from one commit to his parents. Your are not even looking at what's in a commit: how many files are there, how many files changed, how big are the files. It must be something different. I would also check this in different environments (try it from a different machines (different OS?)) . Often things like antiv-virus programms, files-systems which are not local have heavy influence.

Ciao
Chris
Previous Topic:Rebase Submodule
Next Topic:Apply patch problem
Goto Forum:
  


Current Time: Tue Sep 26 03:57:39 GMT 2017

Powered by FUDForum. Page generated in 0.02727 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software