[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[jgit-dev] RevWalk next is slow for git repos that have a long commit history. (2)
|
Hello jgit developers,
I'm working on improving the performance of Eclipse's Releng Copyright fix
tool:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=468850
We ran into the problem that RevWalk.next() is very slow for repositories
that have a long commit history.
e.g eclipse.jdt.ui has 26,000+ commits and 15,000+ files.
I was wondering if this is a known issue and if there is a way to improve
performance or work around it?
------ To be specific: -------
The tool traverses each file in a project, for each file:
- it finds it's repository,
- starting from git's HEAD commit it does a RevWalk.next() backwards through
history
to find the commit when the file was last modified.
- it extracts the year
- then updates the file's copyright header (2001-2011) -> (2001-2014).
The problem is that RevWalk.next() takes 2-3 seconds per file for
repositories that have very long commit histories (e.g eclipse.jdt.ui has
26,814 commits) and with +15,000 files in a project this operation can take
many hours to complete.
To be specific:
RevWalk.next()
-> StartGenerator.next()
-> FIFORevQueue constructor
-> 56: BlockRevQueue constructor (Generator s)
-- the 'for loop' can loop 10k+ times per file.
I found that the native git-log command is also very slow.
E.g calling git-log on 15000 files takes 13 minutes for eclipse.jdt.ui:
'time find . -name "*.java" -exec git log -1 {} \; > /dev/null
(in contrast 'cat-ing' every file takes only 6 seconds:
(find . -exec cat {} \; > /dev/null 2>&1)
Being aware of the git-log limitation, is there some way to e.g cache the
repo and the commit history or find the last-modified date of a file faster
than just traveling the git commit history?
Any advice/tips?
Thank you.
--
Leo Ufimtsev | Intern Software Engineer @ Eclipse Team