|Re: [jgit-dev] How to update index and working-tree?|
Christian Halstrick <christian.halstrick@xxxxxxxxx> wrote: > if I feed a NameConflictTreeWalk with CanonicalTreeParsers (for head > and merge) and a DirCacheBuildIterator for the index then > DirCacheBuildIterator will also report entries for trees. Yes, it creates stub entries for each tree, to make its flat namespace compatible with the non-flat namespace used by the canonical tree format. > But these > tree entries don't have a sha1 computed. They may, they may not. It depends on whether or not the 'TREE' cache extension is present and has a valid SHA-1 for the given tree path. *IF* you have a valid SHA-1 computed for the tree, you are assured that the entire subtree in the index is unmodified (however this says nothing about whether or not a working tree file is stat-dirty relative to its index entry). However, yes, the natural state for trees right now is to have no SHA-1. Because JGit usually strips the 'TREE' extension out of the index, so its not available to the DirCache code during traversal. > Means: I can't compare the > index tree with e.g. a tree in head to properly act on D/F conflicts. You can compare their modes to detect a D/F conflict. But you can't rely on the SHA-1 to tell you a D/F conflict. > Is there an easy way to compute the SHA1 on such trees? Not really, its a recursive problem. In order to compute the SHA-1 of this tree, you must first compute the SHA-1 of any subtree it contains. Doing that requires constructing the canonical tree encoding for each tree, which isn't free. DirCacheTree doesn't compute its ObjectId when DirCacheBuildIterator invokes DirCache.getTree(true). So if there was no 'TREE' extension in the index, we have no SHA-1 to hand you. *IF* it makes sense performance-wise, we could modify this code to compute the SHA-1 on the fly. To do that we need to disconnect the idea of isValid() from the id, and ensure that the id doesn't get written back out to the 'TREE' extension unless isValid() is true. Aside: The 'TREE' extension requires that the SHA-1 of a tree only appear in the extension if there are no unmerged paths beneath it (only stage 0 present) and the corresponding tree object appears in the repository. We can compute it in memory if we choose, but we can't write it to disk without also writing the tree. IMHO, assume that sometimes you don't have the tree's SHA-1 when coming from the DirCache. If its not present, you need to dive into that subtree and process every path. If it is present, and the SHA-1s are identical, you may be able to skip that entire subtree and move past. This is an optimization that git-core uses, and one I hope I built TreeWalk to support... Eventually, one day, our DirCache class won't discard the 'TREE' extension every time it touches the index, and we'll be able to start taking advantage of those SHA-1s when they exist. But right now its usually not there. > And even more complicated: I am also sometimes required to check > whether the tree in index is "clean" compared to the working-dir. This is why I was saying you need that 4th WorkingTreeIterator. :-) > For > this check I would need again a SHA1 for the tree in the index and the > SHA1 for the tree in the working-dir. Sounds too expensive to me > (especially the working tree). Any workarounds you can think of? You should check if the entry is stat-clean first. Basically, you check the index entry timestamp against the file's timestamp, and the index entry's length against the file's length. If either differs, the working tree file is assumed modified. *However*, if you are also doing "stat refresh", when the file is modified you can check the SHA-1 of the file against the index entry, and if its the same, you can update the index entry to this new stat data, so it doesn't come into this code path on the next run. If its different, you know for certain that the file is modified. FileTreeIterator (and really, any WorkingTreeIterator) is supposed to perform as little work as possible and cache some of this data for you. The length, mode and last modified of the current file should be fairly cheap to obtain from the iterator, and can be compared against the index. The SHA-1 is expensive, its computed on the fly when you ask for it. So try to avoid it, like if the stat data matches the index already and its not "racily clean". -- Shawn.
Back to the top