Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] How to update index and working-tree?

Christian Halstrick <christian.halstrick@xxxxxxxxx> wrote:
> if I feed a NameConflictTreeWalk with CanonicalTreeParsers (for head
> and merge) and a DirCacheBuildIterator for the index then
> DirCacheBuildIterator will also report entries for trees.

Yes, it creates stub entries for each tree, to make its flat
namespace compatible with the non-flat namespace used by the
canonical tree format.


> But these
> tree entries don't have a sha1 computed.

They may, they may not.  It depends on whether or not the 'TREE'
cache extension is present and has a valid SHA-1 for the given
tree path.  *IF* you have a valid SHA-1 computed for the tree,
you are assured that the entire subtree in the index is unmodified
(however this says nothing about whether or not a working tree file
is stat-dirty relative to its index entry).

However, yes, the natural state for trees right now is to have no
SHA-1.  Because JGit usually strips the 'TREE' extension out of the
index, so its not available to the DirCache code during traversal.


> Means: I can't compare the
> index tree with e.g. a tree in head to properly act on D/F conflicts.

You can compare their modes to detect a D/F conflict.  But you
can't rely on the SHA-1 to tell you a D/F conflict.



> Is there an easy way to compute the SHA1 on such trees?

Not really, its a recursive problem.  In order to compute the
SHA-1 of this tree, you must first compute the SHA-1 of any subtree
it contains.  Doing that requires constructing the canonical tree
encoding for each tree, which isn't free.

DirCacheTree doesn't compute its ObjectId when DirCacheBuildIterator
invokes DirCache.getTree(true).  So if there was no 'TREE' extension
in the index, we have no SHA-1 to hand you.

*IF* it makes sense performance-wise, we could modify this code to
compute the SHA-1 on the fly.  To do that we need to disconnect the
idea of isValid() from the id, and ensure that the id doesn't get
written back out to the 'TREE' extension unless isValid() is true.

  Aside: The 'TREE' extension requires that the SHA-1 of a tree only
  appear in the extension if there are no unmerged paths beneath it
  (only stage 0 present) and the corresponding tree object appears
  in the repository.  We can compute it in memory if we choose,
  but we can't write it to disk without also writing the tree.

IMHO, assume that sometimes you don't have the tree's SHA-1 when
coming from the DirCache.  If its not present, you need to dive
into that subtree and process every path.  If it is present,
and the SHA-1s are identical, you may be able to skip that entire
subtree and move past.  This is an optimization that git-core uses,
and one I hope I built TreeWalk to support...

Eventually, one day, our DirCache class won't discard the 'TREE'
extension every time it touches the index, and we'll be able to
start taking advantage of those SHA-1s when they exist.  But right
now its usually not there.

 
> And even more complicated: I am also sometimes required to check
> whether the tree in index is "clean" compared to the working-dir.

This is why I was saying you need that 4th WorkingTreeIterator.  :-)

> For
> this check I would need again a SHA1 for the tree in the index and the
> SHA1 for the tree in the working-dir. Sounds too expensive to me
> (especially the working tree). Any workarounds you can think of?

You should check if the entry is stat-clean first.

Basically, you check the index entry timestamp against the file's
timestamp, and the index entry's length against the file's length.
If either differs, the working tree file is assumed modified.

*However*, if you are also doing "stat refresh", when the file is
modified you can check the SHA-1 of the file against the index entry,
and if its the same, you can update the index entry to this new
stat data, so it doesn't come into this code path on the next run.
If its different, you know for certain that the file is modified.

FileTreeIterator (and really, any WorkingTreeIterator) is supposed
to perform as little work as possible and cache some of this data
for you.  The length, mode and last modified of the current file
should be fairly cheap to obtain from the iterator, and can be
compared against the index.  The SHA-1 is expensive, its computed
on the fly when you ask for it.  So try to avoid it, like if the
stat data matches the index already and its not "racily clean".

-- 
Shawn.


Back to the top