Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » EGit / JGit » What is the difference in the INDEX file format compared to the Git bash?
What is the difference in the INDEX file format compared to the Git bash? [message #1836940] Tue, 19 January 2021 11:20 Go to next message
Rice Bauer is currently offline Rice BauerFriend
Messages: 34
Registered: August 2009
Member
Hi!

Whenever I use the Git Bash for some commands on a repo that I previously used with Eclipse, the Git Bash refreshes the index file, which takes pretty long on large repos.

Why is this and can I prevent it?

I've expereicnes this with various versions of Git for Windows and Egit. Currently I am on:
- Windows 10 1809
- Git for Windows 2.30.0.windows.2
- Eclipse 2020-12 (4.18.0)

Cheers
Rice
Re: What is the difference in the INDEX file format compared to the Git bash? [message #1836962 is a reply to message #1836940] Wed, 20 January 2021 10:57 Go to previous messageGo to next message
Thomas Wolf is currently offline Thomas WolfFriend
Messages: 576
Registered: August 2016
Senior Member
One difference I can think of might be file timestamps. Native resolution (which is what Git for Windows uses) is 100ns, but depending on the Java version you run Eclipse with, Java sees only microseconds at best on Windows. Compare OpenJDK bug JDK-8231174. Only with a JDK >= 14 Java would see sub-microseconds. So index entries written by JGit might have timestamps rounded/truncated to microseconds, but native git might see timestamps at 100ns granularity.

AFAIK that should be the only difference.

What index format are you using? (What's the value of git config index.format, if set at all?)

[Updated on: Wed, 20 January 2021 10:58]

Report message to a moderator

Re: What is the difference in the INDEX file format compared to the Git bash? [message #1837114 is a reply to message #1836962] Fri, 22 January 2021 10:45 Go to previous messageGo to next message
Rice Bauer is currently offline Rice BauerFriend
Messages: 34
Registered: August 2009
Member
index.format is not set at all. Is this an official config? Didn't find it here: https://git-scm.com/docs/git-config
If you menat index.version: This isn't set either.

I also made an attempt with openjdk15, but the problem remains.

The index file generated from Egit is smaller than the one from Git for Windows - 3057 KB vs. 3229 KB in my current repo.
Re: What is the difference in the INDEX file format compared to the Git bash? [message #1837121 is a reply to message #1837114] Fri, 22 January 2021 12:12 Go to previous messageGo to next message
Thomas Wolf is currently offline Thomas WolfFriend
Messages: 576
Registered: August 2016
Senior Member
Yes, I meant index.version. But if not set then it's not that.

Index size can be different, there are a number of optional index extensions that generate additional data and that JGit doesn't implement.

What timestamps are recorded in the EGit index on JDK 15? You could list them either with the command-line JGit command jgit debug-show-dir-cache, or, if that doesn't work on Windows, with the little Java program I posted a while ago (needs JGit, of course).
Re: What is the difference in the INDEX file format compared to the Git bash? [message #1837122 is a reply to message #1837121] Fri, 22 January 2021 12:18 Go to previous messageGo to next message
Thomas Wolf is currently offline Thomas WolfFriend
Messages: 576
Registered: August 2016
Senior Member
Just for completeness: git config feature.manyFiles is also not set? If set to true it also would switch on use of index format 4.
Re: What is the difference in the INDEX file format compared to the Git bash? [message #1837145 is a reply to message #1837122] Fri, 22 January 2021 16:55 Go to previous messageGo to next message
Rice Bauer is currently offline Rice BauerFriend
Messages: 34
Registered: August 2009
Member
feature.manyFiles is not set either.

I also ran your tool over my three versions of the index file - here are the results for the root pom.xml and the index itself:

eclipse with corretto11:
pom.xml 2021-01-22T16:26:36.285702Z
Index has timestamp 2021-01-22T16:26:38.426601Z

eclipse with openjdk15:
pom.xml 2021-01-22T16:43:28.933222300Z
Index has timestamp 2021-01-22T16:43:31.012228Z

Git-for-Windows:
pom.xml 2021-01-22T16:46:45.705082400Z
Index has timestamp 2021-01-22T16:46:45.868082Z

With openjdk15 and Git-for-Windows, I do get precision 9, but not for every file. For some files, including the index, I get precision 6 only.
Re: What is the difference in the INDEX file format compared to the Git bash? [message #1837178 is a reply to message #1837145] Sat, 23 January 2021 17:14 Go to previous messageGo to next message
Thomas Wolf is currently offline Thomas WolfFriend
Messages: 576
Registered: August 2016
Senior Member
Re: "For some files, including the index, I get precision 6 only."

I see no reason for that. Perhaps the output doesn't report nanoseconds if they're zero. Some files will likely have timestamps that fall exactly on a microsecond. In any case we see that with Java 15, we do get 100ns resolution. Does Git-for-Windows also rebuild the index if it was modified by an Eclipse running with Java 15?

I don't know why Git-for-Windows rebuilds the index. Analyzing this would require comparing the binary content of the indexes, figuring out what exactly are potentially significant differences, then digging through Git-for-Windows code to figure out why it might decide to rebuild the index, then digging through JGit code to figure out if there's something that could be done on that side to avoid this. As you can imagine this carries a high risk of being a time sink...
Re: What is the difference in the INDEX file format compared to the Git bash? [message #1837255 is a reply to message #1837178] Tue, 26 January 2021 10:18 Go to previous messageGo to next message
Rice Bauer is currently offline Rice BauerFriend
Messages: 34
Registered: August 2009
Member
Thomas Wolf wrote on Sat, 23 January 2021 12:14

I see no reason for that. Perhaps the output doesn't report nanoseconds if they're zero.

I agree, that's a probable reason. And I cannot remember any line with three zeros in the end.

Thomas Wolf wrote on Sat, 23 January 2021 12:14

Does Git-for-Windows also rebuild the index if it was modified by an Eclipse running with Java 15?

Yes, it does. It says "Refresh index..." and it takes like two minutes for my current repo. And afterwards, the index file differs completely according to WinMerge, BUT the file size remains almost identical. Git-for-Windows seems to only add its additional 200kB if I allow it to create the index from scratch by deleting the old one and saying "git reset --hard" (but that also applies to the Corretto-11-index).

Thomas Wolf wrote on Sat, 23 January 2021 12:14
... As you can imagine this carries a high risk of being a time sink...

Can imagine that, yes, but thanks anyway for your fast help here.
Re: What is the difference in the INDEX file format compared to the Git bash? [message #1837436 is a reply to message #1837255] Fri, 29 January 2021 21:17 Go to previous message
Matthias Sohn is currently offline Matthias SohnFriend
Messages: 1268
Registered: July 2009
Senior Member
you can inspect the index using git ls-files and jgit debug-show-dir-cache commands.

The first index entries for the jgit repository look like this (on MacOS 11.1 using adoptopenjdk 1.8.0_275 to run jgit command line) when shown by these commands:

jgit (master)]$ git ls-files --stage --debug
100644 e24be88639b36c35511be54605f47121881dc1a2 0 .bazelrc
ctime: 1611952676:836469104
mtime: 1611952676:836077774
dev: 16777222 ino: 1223328223
uid: 503 gid: 20
size: 370 flags: 0
100644 8faff82c7b0bbf069c53162c726c5ddef4dabe9f 0 .bazelversion
ctime: 1611952677:827111794
mtime: 1611952677:826706063
dev: 16777222 ino: 1223329009
uid: 503 gid: 20
size: 9 flags: 0
100644 f57840b7eec86b91b135afc40f20d950c5d86988 0 .gitattributes
ctime: 1611952675:879064883
mtime: 1611952675:878477372
dev: 16777222 ino: 1223327665
uid: 503 gid: 20
size: 17 flags: 0
...

format of the first line of each entry is

<file mode bits> <sha1 of file blob> <stage> <repo relative path>

find detailed explanations in [1] and [2]

jjgit (master)]$ jgit debug-show-dir-cache
100644 370 2021-01-29,21:37:56.836077774 e24be88639b36c35511be54605f47121881dc1a2 0 .bazelrc
100644 9 2021-01-29,21:37:57.826706063 8faff82c7b0bbf069c53162c726c5ddef4dabe9f 0 .bazelversion
100644 17 2021-01-29,21:37:55.878477372 f57840b7eec86b91b135afc40f20d950c5d86988 0 .gitattributes
...

format of entries with jgit is

<file mode bits> <size> <mtime> <sha1 of file blob> <stage> <repo relative path>

Resolution of file timestamps seen from Java depends on operating system, Java version and
which filesystem is used. See [3].

JGit's DirCache does not use the index fields
- ctime
- dev
- ino
- uid
- gid

[1] https://github.com/git/git/blob/master/Documentation/technical/index-format.txt
[2] https://mincong.io/2018/04/28/git-index/
[3] https://www.youtube.com/watch?v=m44cAozuLNI
https://speakerdeck.com/msohn/racy-jgit-a-short-history-of-time
Previous Topic:Determining if a file is ignored via .gitignore
Next Topic:Unable to add subproject to commit
Goto Forum:
  


Current Time: Thu Mar 28 07:55:40 GMT 2024

Powered by FUDForum. Page generated in 0.06039 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top