Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[jgit-dev] RevWalk next is slow for git repos that have a long commit history. (2)

Hello jgit developers,

I'm working on improving the performance of Eclipse's Releng Copyright fix
tool:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=468850

We ran into the problem that RevWalk.next() is very slow for repositories
that have a long commit history.
e.g eclipse.jdt.ui has 26,000+ commits and 15,000+ files.

I was wondering if this is a known issue and if there is a way to improve
performance or work around it?


------ To be specific:  -------

The tool traverses each file in a project, for each file:
 - it finds it's repository,
 - starting from git's HEAD commit it does a RevWalk.next() backwards through
 history
   to find the commit when the file was last modified.
 - it extracts the year
 - then updates the file's copyright header (2001-2011) -> (2001-2014).

The problem is that RevWalk.next() takes 2-3 seconds per file for
repositories that have very long commit histories (e.g eclipse.jdt.ui has
26,814 commits) and with +15,000 files in a project this operation can take
many hours to complete.

To be specific:
 RevWalk.next()
  -> StartGenerator.next()
    -> FIFORevQueue constructor
     -> 56: BlockRevQueue constructor (Generator s)
        -- the 'for loop' can loop 10k+ times per file.

I found that the native git-log command is also very slow.
E.g calling git-log on 15000 files takes 13 minutes for eclipse.jdt.ui:
'time find . -name "*.java" -exec git log -1 {} \; > /dev/null

(in contrast 'cat-ing' every file takes only 6 seconds:
(find . -exec cat {} \; > /dev/null 2>&1)


Being aware of the git-log limitation, is there some way to e.g cache the
repo and the commit history or find the last-modified date of a file faster
than just traveling the git commit history?

Any advice/tips?

Thank you.

--
Leo Ufimtsev | Intern Software Engineer @ Eclipse Team


Back to the top