Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[jgit-dev] Performance of indexing blob metadata

Hi JGit team,

I'm working on Lucene integration for the next release of Gitblit and
I'm running into a performance problem which maybe someone can help me
understand.

Here is the goal:  I want to traverse the head tree of each branch and
index the blobs with the appropriate author metadata.

Consider the following unit tests which implement the traversal logic
I'm using in my indexer.  if fullIndex = false then I take a shortcut
and assume the branch head metadata applies to all blobs.  This is
wrong, but doing so allows me to traverse all JGit
branches/blobs/commits in ~1.4 seconds on my older Debian box.  If
fullIndex = true, I get the correct metadata for the blobs but the same
traversal takes 833 seconds (~14 minutes).  Wow.  CPU is working really
hard when this is going on, heap stays fairly controlled.

Perhaps the obvious answer is to re-use blobWalk and not make a new one
on every treewalk traversal.  But when I do that each .next() after the
_first_ blob returns the branch head commit (markStart) instead of the
correct commit.  And that returned commit can not really be parsed for
author info - it throws null pointers so my getAuthor/getCommitter
methods return "unknown".

Am I doing something glaringly wrong here?  Or can someone suggest an
alternative?  I had considered traversing the commits first and building
a hashmap of file-lastcommitid.

Thanks for any tips.
-J

public class NastyTest {

	@Test
	public void quickTraversalTest() throws Exception {
		Repository repo = GitBlitSuite.getJGitRepository();
		traverse(repo, false);
	}

	@Test
	public void fullTraversalTest() throws Exception {
		Repository repo = GitBlitSuite.getJGitRepository();
		traverse(repo, true);
	}

	private void traverse(Repository repo, boolean fullIndex) throws
	Exception {
		Map<String, Ref> locals =
		repo.getRefDatabase().getRefs(Constants.R_HEADS);
		for (Map.Entry<String, Ref> entry : locals.entrySet()) {
			System.out.println("Traversing " +
			entry.getKey());
			int blobCount = 0;
			int commitCount = 0;

			Ref ref = entry.getValue();
			RevWalk revWalk = new RevWalk(repo);
			RevCommit head =
			revWalk.parseCommit(ref.getObjectId());

			TreeWalk treeWalk = new TreeWalk(repo);
			treeWalk.addTree(head.getTree());
			treeWalk.setRecursive(true);

			while (treeWalk.next()) {
				blobCount++;
				String blobPath =
				treeWalk.getPathString();
				RevCommit blobRev = head;

				RevWalk blobWalk = null;
				if (fullIndex) {
					// XXX this is _really_ slow,
					there must be a better way
					// determine the most recent
					commit for this blob
					blobWalk = new RevWalk(repo);
					blobWalk.markStart(blobWalk.parseCommit(head.getId()));
					TreeFilter filter =
					AndTreeFilter.create(
							PathFilterGroup.createFromStrings(Collections.singleton(blobPath)),
							TreeFilter.ANY_DIFF);
					blobWalk.setTreeFilter(filter);
					blobRev = blobWalk.next();
				}

				String blobAuthor = getAuthor(blobRev);
				String blobCommitter =
				getCommitter(blobRev);
				String blobDate =
				DateTools.timeToString(blobRev.getCommitTime()
				* 1000L,
						Resolution.MINUTE);

				if (blobWalk != null) {
					blobWalk.dispose();
				}

				// index blob here
			}

			treeWalk.release();

			revWalk.reset();
			revWalk.markStart(head);
			RevCommit rev;
			while ((rev = revWalk.next()) != null) {
				// index commit here
				commitCount++;
			}

			// finished
			revWalk.dispose();

			System.out.println(MessageFormat.format(
					"Traversed {0} found {1} blobs
					and {2} commits", ref.getName(),
					blobCount,
					commitCount));
		}
	}

	private String getAuthor(RevCommit commit) {
		String name = "unknown";
		try {
			name = commit.getAuthorIdent().getName();
			if (StringUtils.isEmpty(name)) {
				name =
				commit.getAuthorIdent().getEmailAddress();
			}
		} catch (NullPointerException n) {
		}
		return name;
	}

	private String getCommitter(RevCommit commit) {
		String name = "unknown";
		try {
			name = commit.getCommitterIdent().getName();
			if (StringUtils.isEmpty(name)) {
				name =
				commit.getCommitterIdent().getEmailAddress();
			}
		} catch (NullPointerException n) {
		}
		return name;
	}
}


Back to the top