Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[jgit-dev] RevWalk next is slow for git repos that have a long commit history. (2)

Hello jgit developers,

I'm working on improving the performance of Eclipse's Releng Copyright fix

We ran into the problem that is very slow for repositories
that have a long commit history.
e.g eclipse.jdt.ui has 26,000+ commits and 15,000+ files.

I was wondering if this is a known issue and if there is a way to improve
performance or work around it?

------ To be specific:  -------

The tool traverses each file in a project, for each file:
 - it finds it's repository,
 - starting from git's HEAD commit it does a backwards through
   to find the commit when the file was last modified.
 - it extracts the year
 - then updates the file's copyright header (2001-2011) -> (2001-2014).

The problem is that takes 2-3 seconds per file for
repositories that have very long commit histories (e.g eclipse.jdt.ui has
26,814 commits) and with +15,000 files in a project this operation can take
many hours to complete.

To be specific:
    -> FIFORevQueue constructor
     -> 56: BlockRevQueue constructor (Generator s)
        -- the 'for loop' can loop 10k+ times per file.

I found that the native git-log command is also very slow.
E.g calling git-log on 15000 files takes 13 minutes for eclipse.jdt.ui:
'time find . -name "*.java" -exec git log -1 {} \; > /dev/null

(in contrast 'cat-ing' every file takes only 6 seconds:
(find . -exec cat {} \; > /dev/null 2>&1)

Being aware of the git-log limitation, is there some way to e.g cache the
repo and the commit history or find the last-modified date of a file faster
than just traveling the git commit history?

Any advice/tips?

Thank you.

Leo Ufimtsev | Intern Software Engineer @ Eclipse Team

Back to the top