Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] [egit-dev] speeding up re-indexing

Hey All!

Now that the smudged index is fixed - back to the problem ;) (and thanks for the quick help with the smudging thing!)

I tried now to measure some times. When caches are cold (caches dropped) re-indexing takes 5 minutes and a few seconds. A second re-indexing then takes approximately 30 seconds (which is still long!). i see now that there are no smudged index entries anymore, only our 170 empty files ;)

Any more hints? i would really like to get the time down, or minimize the amount of re-indexing runs. any ideas?

Regards,
Markus

On 05/30/2012 07:56 AM, Markus Duft wrote:
> Hey Kevin,
> 
> i can now confirm with current master:
> 
> i have this as starting point:
> 
> A--B
> 
> now i create a new branch from A and commit something
> 
>  --C
> /
> A--B
> 
> smudged count changes from ~195 (all empty files) to ~198.
> 
> then i rebase C on B
> 
> A--B--C'
> 
> the smudged count increases to 33994. a more intelligent grep (i counted bin directories mistakingly last time) showed that this is every single file that is tracked... bad?
> 
> re-indexing in the ide still does not take notably longer (maybe off by 2-5 seconds), but i suspect that my machine has a super-great filesystem cache (i have a 16-core workstation with 12G ram and fast discs).
> 
> Regards,
> Markus
> 
> On 05/30/2012 06:47 AM, Kevin Sawicki wrote:
>> It would be good to know if the latest master exhibits the same behavior.
>>
>> Kevin
>>
>> On Tue, May 29, 2012 at 9:41 PM, Markus Duft <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> wrote:
>>
>>     master from 2.0.0.201204111131 - would an update to current master help..?
>>
>>     On 05/30/2012 06:40 AM, Kevin Sawicki wrote:
>>     > What version of EGit/JGit are you currently using?
>>     >
>>     > On Tue, May 29, 2012 at 9:34 PM, Markus Duft <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> wrote:
>>     >
>>     >     On 05/30/2012 06:27 AM, Kevin Sawicki wrote:
>>     >     > Hopefully both fixes will still make it into the 2.0 release next month.
>>     >
>>     >     oh - interesting; i did a fetch and rebase on the repo (just the two straight forward), and it gave me back all 33000 smudged index entries?!
>>     >
>>     >     Markus
>>     >
>>     >     >
>>     >     > On Tue, May 29, 2012 at 9:24 PM, Markus Duft <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>>> wrote:
>>     >     >
>>     >     >     On 05/30/2012 06:22 AM, Kevin Sawicki wrote:
>>     >     >     > Smudged index entries occur when the index is written and the timestamp of the index file is close to the timestamp of the file in the working directory.
>>     >     >     >
>>     >     >     > You can update them by running a git status on the command line.
>>     >     >
>>     >     >     Thanks for the hint. Sadly i cannot force all my developers to have a command line git, although i recommend it. Still, since we're building our own JGit/EGit versions with some minor workarounds anyway, i may apply one of the two if applicable.
>>     >     >
>>     >     >     Regards,
>>     >     >     Markus
>>     >     >
>>     >     >     >
>>     >     >     > You can read more about it here: https://raw.github.com/git/git/master/Documentation/technical/racy-git.txt
>>     >     >     >
>>     >     >     > There are a couple JGit fixes proposed to prevent these from being left in the index:
>>     >     >     >
>>     >     >     > https://git.eclipse.org/r/#/c/6137/
>>     >     >     >
>>     >     >     > https://git.eclipse.org/r/#/c/6138/
>>     >     >     >
>>     >     >     > Just to clarify, there are very valid cases for having smudged index entries.  The issue with JGit currently is that they aren't being updated and removed when applicable.
>>     >     >     >
>>     >     >     > Kevin
>>     >     >     >
>>     >     >     > On Tue, May 29, 2012 at 9:16 PM, Markus Duft <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>>>> wrote:
>>     >     >     >
>>     >     >     >     On 05/29/2012 05:20 PM, Kevin Sawicki wrote:
>>     >     >     >     > Have you tried checking if your index contains smudged entries?
>>     >     >     >
>>     >     >     >     hm. not yet. how can smudged entries occur?
>>     >     >     >
>>     >     >     >     >
>>     >     >     >     > This will trigger a full SHA-1 redigest each time the indexing occurs.
>>     >     >     >
>>     >     >     >     that could explain why it takes so long.
>>     >     >     >
>>     >     >     >     >
>>     >     >     >     > From the command line: git ls-files --debug -s | grep -B5 "size: 0"
>>     >     >     >
>>     >     >     >     "a few" - approximately 33993 smudged entries - what can i do about it?
>>     >     >     >
>>     >     >     >     thanks for helping!
>>     >     >     >     Markus
>>     >     >     >
>>     >     >     >     >
>>     >     >     >     > If this command shows any output for files where the SHA-1 is not equal to "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391" then the index entry is smudged.
>>     >     >     >     >
>>     >     >     >     > Kevin
>>     >     >     >     >
>>     >     >     >     > On Tue, May 29, 2012 at 8:14 AM, Markus Duft <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx
>>     <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>>>>> wrote:
>>     >     >     >     >
>>     >     >     >     >     On 05/29/2012 03:41 PM, Baumgart, Jens wrote:
>>     >     >     >     >     > I assume there is no "easy" way to speed it up.
>>     >     >     >     >     > I could imagine JGit giving more details in its events (indexChanged and
>>     >     >     >     >     > refsChanged). This would allow to avoid complete re-indexing in many cases.
>>     >     >     >     >     > I am wondering why it's so slow for your repo. Re-indexing takes some
>>     >     >     >     >     > seconds for big repos like the linux kernel.
>>     >     >     >     >     > Do you store large binaries in your repo?
>>     >     >     >     >
>>     >     >     >     >     hm, not binaries. we have .xml.zip files (models) that are ~5-10MB in size, but what we also have are .xml files that are ~50MB in size, and not one but a dozen of them. could that matter?
>>     >     >     >     >
>>     >     >     >     >     any chance to find out /what/ takes so long?
>>     >     >     >     >
>>     >     >     >     >     Regards,
>>     >     >     >     >     Markus
>>     >     >     >     >
>>     >     >     >     >     > --
>>     >     >     >     >     > Jens
>>     >     >     >     >     >
>>     >     >     >     >     > On 29.05.12 15:34, "Markus Duft" <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx
>>     <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>>>>> wrote:
>>     >     >     >     >     >
>>     >     >     >     >     >> Hey!
>>     >     >     >     >     >>
>>     >     >     >     >     >> is there an "easy" (meaning not weeks of work) way to speed up the
>>     >     >     >     >     >> re-indexing of repositories? it takes approx. 2 minutes for our repo
>>     >     >     >     >     >> (~95177 files to scan) on a linux machine with _fast_ discs. Not to speak
>>     >     >     >     >     >> of our poor windows developers with notebooks (~5-10 minutes!)
>>     >     >     >     >     >>
>>     >     >     >     >     >> Regards,
>>     >     >     >     >     >> Markus
>>     >     >     >     >     >> _______________________________________________
>>     >     >     >     >     >> egit-dev mailing list
>>     >     >     >     >     >> egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>>>>
>>     >     >     >     >     >> https://dev.eclipse.org/mailman/listinfo/egit-dev
>>     >     >     >     >     >
>>     >     >     >     >     _______________________________________________
>>     >     >     >     >     egit-dev mailing list
>>     >     >     >     >     egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>>>>
>>     >     >     >     >     https://dev.eclipse.org/mailman/listinfo/egit-dev
>>     >     >     >     >
>>     >     >     >     >
>>     >     >     >
>>     >     >     >
>>     >     >
>>     >     >
>>     >
>>     >
>>
>>
> _______________________________________________
> jgit-dev mailing list
> jgit-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/jgit-dev


Back to the top