Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » EGit / JGit » Performance issue to a shared network drive from Windows.(Git repository size is about 46m and takes approximately 47 minutes.)
Performance issue to a shared network drive from Windows. [message #1850980] Wed, 23 March 2022 20:51 Go to next message
James Poli is currently offline James PoliFriend
Messages: 27
Registered: January 2022
Junior Member
Hello,
Using EGit/JGit (6.0.0.202111291000-r) to clone a project to a shared network drive from Windows, I have a performance issue. The network share is historical, used by a Linux build farm environment due to our colossal codebase. The Git repository size is about 46m and takes approximately 47 minutes. To a local drive, it takes approximately 30 seconds. My first inclination was to contact our IT department to check the performance of the network share (it's either NetApp or Samba, I'm not sure). But before that, I tried using Git for Windows from the command line, and its timing was approximately 18 seconds to the network share and 1.5 seconds locally. I looked at EGit/JGit log output and saw what appears to be a constant check of .gitconfig in my local HOME directory and the local Git directory in a class called FileSnapshot. Any help is appreciated.
Regards,
Jim
Re: Performance issue to a shared network drive from Windows. [message #1850989 is a reply to message #1850980] Thu, 24 March 2022 06:40 Go to previous messageGo to next message
Thomas Wolf is currently offline Thomas WolfFriend
Messages: 576
Registered: August 2016
Senior Member
Having a git repository on a Samba disk in Windows is not a good idea at all. The whole idea of git is to have local clones.

What operation is taking 47 minutes? Cloning?

There's not much we can do. In EGit we managed to reduce the number of times the git config is accessed for cases where the home directory (and thus the user config (global config)) is on a remote drive, which is fairly common in corporate setups. JGit doesn't have that mechanism.

It might be possible perhaps to figure out some improvements, but all git operations are file intensive. With a clone on a slow drive, all git operations will be slow. Especially on Windows, where even checking for file existence is slow.

It's really best to put clones on a local drive.
Re: Performance issue to a shared network drive from Windows. [message #1851014 is a reply to message #1850989] Thu, 24 March 2022 13:46 Go to previous messageGo to next message
James Poli is currently offline James PoliFriend
Messages: 27
Registered: January 2022
Junior Member
I agree on the Samba location for Git. The original use case used CVS, but projects have migrated to Git.

Yes, it's a clone that's taking 47 minutes.

At this point, I was looking for simple improvements like turning off some configuration validation that JGit implements (compared to Git for Windows, whose timing is tolerable).
Re: Performance issue to a shared network drive from Windows. [message #1851016 is a reply to message #1851014] Thu, 24 March 2022 15:02 Go to previous messageGo to next message
Thomas Wolf is currently offline Thomas WolfFriend
Messages: 576
Registered: August 2016
Senior Member
Nope, there is no simple setting to switch off some file accesses. Besides, I suspect if you have the git clone on the Samba drive, you'll also have the git working tree there. That will not only make nearly all git operations slow, it will also make Eclipse in general slow.
Re: Performance issue to a shared network drive from Windows. [message #1851022 is a reply to message #1851016] Thu, 24 March 2022 18:48 Go to previous messageGo to next message
James Poli is currently offline James PoliFriend
Messages: 27
Registered: January 2022
Junior Member
Alas, I was hoping JGit had some magic configuration option. Do you know why Git for Windows to the samba drive performance is better? My best option may be to support the plug-in on Linux. Thanks for looking at this, Thomas. Jim

[Updated on: Thu, 24 March 2022 19:18]

Report message to a moderator

Re: Performance issue to a shared network drive from Windows. [message #1851031 is a reply to message #1851022] Fri, 25 March 2022 07:51 Go to previous messageGo to next message
Thomas Wolf is currently offline Thomas WolfFriend
Messages: 576
Registered: August 2016
Senior Member
Quote:
Do you know why Git for Windows to the samba drive performance is better?

No, I don't know.

My first guess is that JGit accesses the git config file far too often. It does cache the config, but it checks for each call to Repository.getConfig() whether it has changed on disk. Which means at least getting the file attributes of three files: the repo config, global config, and system config. Accessing file attributes is expensive on Windows, and I imagine it's even more expensive with Samba.

My second guess would be that file timestamp resolution on a Samba drive is coarse, and JGit thus frequently considers files to be "racily clean" and re-loads them "just in case". ("Racily clean" basically means: if a file is modified twice within the file timestamp resolution, it may have been modified even if timestamp and size are the same, and the file must be re-loaded to catch it.)

But there may be other reasons why JGit is especially slow in this case.
Quote:
My best option may be to support the plug-in on Linux.

What plug-in?

You wrote the Samba share was because of a build system. Most build systems I've seen are de-coupled from repository management. In the simplest case, there's a git server hosting "canonical" central repositories. Developers clone from there, and then work with their local clones. When they push, their changes go into the central repositories on the git server and the CI build is triggered. The CI build is just another client of that git server and clones the repository or repositories it needs for its build, and then works with these local clones on the build machine.

If cloning during the build should not be possible, you could maybe share the directory where the git server holds the central repositories with the build machine and make the build work directly from those central repositories. But I've never come across such a case (and I have seen projects with "colossal codebases", too). The build can use shallow clones to speed up cloning.

Re: Performance issue to a shared network drive from Windows. [message #1851175 is a reply to message #1851031] Tue, 29 March 2022 12:43 Go to previous messageGo to next message
James Poli is currently offline James PoliFriend
Messages: 27
Registered: January 2022
Junior Member
Hi Thomas,
Sorry for the delay in response; I'm out sick and will reply when I get back to work.
Regards,
Jim
Re: Performance issue to a shared network drive from Windows. [message #1851239 is a reply to message #1851175] Thu, 31 March 2022 14:37 Go to previous messageGo to next message
James Poli is currently offline James PoliFriend
Messages: 27
Registered: January 2022
Junior Member
Hi Thomas,
I'm back at work.

FWIW, I see many "Racily clean" messages in the JGit output. I believe it is the culprit. I need to verify that our network share in this use case is Samba, but your post got me thinking. I found this information on Samba Time Synchronization https://www.samba.org/samba/docs/using_samba/ch11.html, Note specifically the "Time-Synchronziaiton Options" > "dos filetime resolution."

The plug-in I'm referring to is our internal company CDT add-on plug-in. Our "C" build system (farm) is de-coupled from our source control. Unfortunately, it only supports local Unix development playpens, so a network share is required. A developer can run a "remote build" to the build system on Unix to verify their changes.
Regards,
Jim
Re: Performance issue to a shared network drive from Windows. [message #1851252 is a reply to message #1851239] Thu, 31 March 2022 19:00 Go to previous messageGo to next message
Thomas Wolf is currently offline Thomas WolfFriend
Messages: 576
Registered: August 2016
Senior Member
I'm beginning to see how your environment is set up. I once had to set up such a system, and my initial idea indeed was similar: use a Samba share to give Windows users access. I gave up quickly.

Instead we abandoned the idea that a Windows developer could build locally. (Our application targeted Linux only anyway (and Ada 95, not C), but developers had either Windows PCs/laptops or terminal access via ssh on a Linux machine.) The build was Linux-only, and ran on Linux.

We used a git server with great pre-commit review and build support: Gerrit. We installed Gerrit and Jenkins in our Linux environment (different boxes). Windows users could clone from Gerrit, develop, push their changes to Gerrit. Gerrit would trigger a Jenkins job on the Linux build machine that would clone the repo, check out the change, build it and run tests. Users get e-mails about the build and test result, and can see build logs in the Jenkins Web UI. Once successfully built and approved in the Gerrit code review, the change could be merged in the Gerrit Web UI, which triggered another job to build the product.

Users who preferred to use vim in a Linux terminal could SSH in to Linux, use the git command line to clone into their Linux directory, work there, commit, and also push to Gerrit. Since those users were on Linux in the terminal, they could also compile their code manually, or even run tests. At a later stage we got a VNC server installed because we needed to be able to run UI tests in the Linux build, so one could even tunnel a VNC session through SSH to Linux and start Eclipse on the Linux machine, but have the Window on the PC and work that way. That way users got a Linux UI via RemoteDesktop, and could even manually compile their code in a Linux terminal, or run tests there. UI was minimal (X Windows and mwm as window manager) but fully sufficient.

All rather hodge-podge, but it worked very well.
Re: Performance issue to a shared network drive from Windows. [message #1851259 is a reply to message #1851252] Thu, 31 March 2022 22:40 Go to previous messageGo to next message
James Poli is currently offline James PoliFriend
Messages: 27
Registered: January 2022
Junior Member
Hi Thomas,
I'm planning to investigate our Samba configuration. So that I'm on the same page, can you give me a use case for the Racily clean algorithm?
Thanks in advance,
Jim
Re: Performance issue to a shared network drive from Windows. [message #1851304 is a reply to message #1851259] Fri, 01 April 2022 19:55 Go to previous messageGo to next message
Thomas Wolf is currently offline Thomas WolfFriend
Messages: 576
Registered: August 2016
Senior Member
See https://github.com/git/git/blob/master/Documentation/technical/racy-git.txt .
Re: Performance issue to a shared network drive from Windows. [message #1851724 is a reply to message #1851304] Wed, 13 April 2022 19:04 Go to previous messageGo to next message
James Poli is currently offline James PoliFriend
Messages: 27
Registered: January 2022
Junior Member
Hi Thomas,
Thanks for the link. Could you do a quick review of a console log of the clone to Samba? It's about 9M (46,915 lines). If so, let me know where I can upload it.
Regards,
Jim
Re: Performance issue to a shared network drive from Windows. [message #1851735 is a reply to message #1851724] Thu, 14 April 2022 08:47 Go to previous messageGo to next message
Thomas Wolf is currently offline Thomas WolfFriend
Messages: 576
Registered: August 2016
Senior Member
Jim, I can take a quick look to see whether I notice something specific, but I cannot invest hours in log analysis. 9Mb zipped? If 9Mb is unzipped, just zip it and send it via e-mail (the author address in my git commits works). Even if 9Mb is already zipped that might work; if not, upload wherever you want and send me a download link. If you don't have any location to upload to open a bug in bugzilla and attach the file there; I think 9Mb is below the limit there.
Re: Performance issue to a shared network drive from Windows. [message #1851740 is a reply to message #1851735] Thu, 14 April 2022 12:27 Go to previous messageGo to next message
James Poli is currently offline James PoliFriend
Messages: 27
Registered: January 2022
Junior Member
Hi Thomas,
I sent an e-mail with the zipped log (263k) to your committer address. Don't spend a lot of time on it; just hoping you'll see something obvious.
Thanks, Jim
Re: Performance issue to a shared network drive from Windows. [message #1851744 is a reply to message #1851740] Thu, 14 April 2022 15:47 Go to previous messageGo to next message
Thomas Wolf is currently offline Thomas WolfFriend
Messages: 576
Registered: August 2016
Senior Member
Thanks for the log. It's interesting. I've sent you a reply; I suggest to open a bug. Indeed it appears that in some late clone phase (delta resolution? checkout?) there are far too many accesses to the git config on disk.
Re: Performance issue to a shared network drive from Windows. [message #1851792 is a reply to message #1851744] Fri, 15 April 2022 18:19 Go to previous message
Thomas Wolf is currently offline Thomas WolfFriend
Messages: 576
Registered: August 2016
Senior Member
Thanks for opening bug 579715.
Previous Topic:Github.com RSA key with SHA-1 no longer allowed
Next Topic:How can send custom error message after git push from local git server by using jgit api
Goto Forum:
  


Current Time: Sat Sep 14 14:06:20 GMT 2024

Powered by FUDForum. Page generated in 0.06353 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top