Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » EGit / JGit » JGit: Semantics of diff hunks versus git commandline
JGit: Semantics of diff hunks versus git commandline [message #1795792] Fri, 28 September 2018 16:46 Go to next message
Dexter Haslem is currently offline Dexter HaslemFriend
Messages: 2
Registered: September 2018
Junior Member
I've been exploring a particular commit that ends up with different modified hunks than the git executable `git diff -p`. The patch it creates ends up correct, but the lines it moves vary. This is interesting to me because the vast majority of other commits I diff with JGit (set to myers) match git executable diffs around 90% of the time, so I was wondering if there are some knobs to tweak on the semantics of which lines it adds/removes in this scenario.

Here is a comparison of the diffs from DiffFormatter and git commandline -

JGit DiffFormatter diff:
diff --git a/foo.sh b/foo.sh
index d7d320e..3639f73 100644
--- a/foo.sh
+++ b/foo.sh
@@ -14,8 +14,15 @@
 echo `pip install -r src/foo_lib/requirements.pip`
 echo `pip install -r src/foo2/requirements.pip`
 echo
+echo "Updating Django models/data..."
+echo `python src/foo1/manage.py syncdb`
+echo
+echo "Updating NPM packages..."
+cd $WORKDIR/www/jsapp
+echo `make`
+echo `make install`
+echo
 echo "Running Grunt to build JS modules..."
-cd $WORKDIR/www/jsapp
 echo " -- executing: grunt build (from $WORKDIR/www/jsapp/)"
 echo `grunt build`
 echo


For comparison here is what `git diff` of it looks like:
diff --git a/build_foo.sh b/build_foo.sh
index caca979..5ca34f7 100644
--- a/build_foo.sh
+++ b/build_foo.sh
@@ -14,8 +14,15 @@ echo `pip install -r src/metadb/requirements.pip`
 echo `pip install -r src/foo_lib/requirements.pip`
 echo `pip install -r src/foo2/requirements.pip`
 echo
-echo "Running Grunt to build JS modules..."
+echo "Updating Django models/data..."
+echo `python src/foo1/manage.py syncdb`
+echo
+echo "Updating NPM packages..."
 cd $WORKDIR/www/jsapp
+echo `make`
+echo `make install`
+echo
+echo "Running Grunt to build JS modules..."
 echo " -- executing: grunt build (from $WORKDIR/www/jsapp/)"
 echo `grunt build`
 echo

Notice it leaves the "cd $WORKDIR..." line.

Below is a standalone unit test that creates a repo and commits and prints the diff above.
 @Test
    public void jgitDiffFormatterTest() throws IOException, GitAPIException
    {
        File localPath = File.createTempFile("test-case-repo", "");
        if (!localPath.delete())
        {
            throw new IOException("Could not delete temporary file " + localPath);
        }

        try
        {
            try (Git git = Git.init().setDirectory(localPath).call())
            {
                ClassLoader testClassReloader = this.getClass().getClassLoader();

                // add file initial commit
                final String testFileName = "foo.sh";
                File dstFile = new File(localPath, testFileName);

                File srcFile = new File(testClassReloader.getResource("resources/jgit-test-cases/testfile1.sh").getFile());
                assertTrue(srcFile.exists());

                FileUtils.copyFile(srcFile, dstFile);
                git.add().addFilepattern(testFileName).call();
                git.commit().setMessage("add test file").call();

                // change file to create diff
                File srcFileUpdated = new File(testClassReloader.getResource("resources/jgit-test-cases/testfile2.sh").getFile());
                assertTrue(srcFileUpdated.exists());

                FileUtils.copyFile(srcFileUpdated, dstFile);
                git.add().addFilepattern(testFileName).call();
                git.commit().setMessage("updated test file").call();

                Repository repo = git.getRepository();
                // now do the equiv of `git diff -p HEAD~1 HEAD` to diff the only two commits
                // and print the diff patch
                ObjectId oldHead = repo.resolve("HEAD^^{tree}");
                ObjectId head = repo.resolve("HEAD^{tree}");

                try (ObjectReader reader = repo.newObjectReader())
                {
                    CanonicalTreeParser oldTreeIter = new CanonicalTreeParser();
                    oldTreeIter.reset(reader, oldHead);
                    CanonicalTreeParser newTreeIter = new CanonicalTreeParser();
                    newTreeIter.reset(reader, head);

                    // finally get the list of changed files
                    List<DiffEntry> diffs = git.diff()
                        .setNewTree(newTreeIter)
                        .setOldTree(oldTreeIter)
                        .call();

                    // pass through formatter, we dont bother with full blown rename detector here on one
                    // simple file example
                    try (OutputStream tmpOutput = new ByteArrayOutputStream(2048))
                    {
                        DiffFormatter formatter = new DiffFormatter(tmpOutput);
                        formatter.setRepository(repo);
                        formatter.setDiffComparator(RawTextComparator.WS_IGNORE_ALL);
                        formatter.setDiffAlgorithm(DiffAlgorithm.getAlgorithm(SupportedAlgorithm.MYERS));
                        formatter.setDetectRenames(true);
                        formatter.getRenameDetector().setRenameScore(50);
                        formatter.format(diffs);

                        String diffPatchString = tmpOutput.toString();

                        System.out.println("** printing commit DIFF:");
                        System.out.println(diffPatchString);
                    }
                }
            }
        }
        finally
        {
            FileUtils.deleteDirectory(localPath);
        }
    }



Appreciate any feedback.
Re: JGit: Semantics of diff hunks versus git commandline [message #1796028 is a reply to message #1795792] Thu, 04 October 2018 09:55 Go to previous messageGo to next message
Christian Halstrick is currently offline Christian HalstrickFriend
Messages: 264
Registered: July 2009
Senior Member
Creating a minimal diff between two contents is not a task where you can expect one single correct solution. In this example both tools, native git and jgit, find different diffs which both consists of adding 8 lines and deleting 1 line. They are both correct in that sense that when applied they lead from the old content to the new content. It's hard to determine whats the 'better' diff. Ok, at least if it is a tie between two diffs of the same size I personally would prefer that JGit reports the same diffs as native git. But it is a hard task to teach the algorithms used in JGit (in MyersDiff.java or HistogramDiff.java) to behave exactly like native git.

To be honest: I have no good solution to your problem. You can in certain situations expect different diffs from JGit and native git. See the discussion from 8 years ago on that topic [1].

What you could to is to explicitly switch to other diff algorithms in JGit and in native git to see whether they match better. JGit diff command knows "[--algorithm [MYERS | HISTOGRAM]]' and native git knows "--diff-algorithm={patience|minimal|histogram|myers}". You may try setting both explicitly to myers or histogram.

Sorry, your test code doesn't work for me. I guess I am missing the resources testfile1.sh... . Maybe you could upload a change to gerrit?


[1] https://www.eclipse.org/lists/jgit-dev/msg00752.html







Ciao
Chris
Re: JGit: Semantics of diff hunks versus git commandline [message #1796140 is a reply to message #1795792] Sat, 06 October 2018 02:03 Go to previous messageGo to next message
Dexter Haslem is currently offline Dexter HaslemFriend
Messages: 2
Registered: September 2018
Junior Member
Sorry about the test resources, I forgot about those. There is a repo here with exactly the same diff: https://github.com/DexterHaslem/test-case-repo

I forgot to mention, I have tried explicitly setting both native git and Jgit to the same algorithms for both Meyers and Histogram and they never matched 100% as you are suspect. In this case, there are no changes whatsoever between the algorithms anyhow. Of course, you are right, neither git/jgit diff is 'better'.

Thanks for the link to the previous discussion, I completely understand the situation now. Just wanted to make sure I didn't miss any other ways to click diff generation closer to native git.
Re: JGit: Semantics of diff hunks versus git commandline [message #1817160 is a reply to message #1796140] Sun, 17 November 2019 02:30 Go to previous message
Himani Khanduja is currently offline Himani KhandujaFriend
Messages: 1
Registered: November 2019
Junior Member
@Dexter HaslemFriend,

I'm also running into similar problem as yours. Could you figure out any good solution to get Diffs same as native Git in Java?
Previous Topic:Signing not shown as verified by GitLab
Next Topic:Git Perspective Unresponsive
Goto Forum:
  


Current Time: Thu Dec 12 05:18:18 GMT 2019

Powered by FUDForum. Page generated in 0.04513 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top