Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Making RevCommit object representation more compact in memory

On Fri, Oct 8, 2010 at 3:06 AM, Aliaksandr Radzivanovich
<aradzivanovich@xxxxxxxxx> wrote:
> I was browsing through JGit source code and noticed that it strives to
> be fast and efficient, preserving every byte where possible. I
> appreciate this approach and I think it could be optimized even
> further. So I want to share and discuss my idea with the community.
>
> Every RevCommit object has array of parent commits, though most of the
> time the array will contain only one element, as most of the commits
> have single parent. The idea is simple: lets declare the parents field
> of type Object, that will either:
>
>  1. Point to a single parent RevCommit object, or
>  2. Will contain array of parents, if there are many of them.

I've thought about doing this myself over the years.  The reason I
haven't is that it may be more expensive to do the instanceof test
when checking parent count, or accessing a parent, and there are many
places where we avoided using the accessor methods and just directly
accessed the field.

On a 32 bit JVM, you would save about 16 bytes of memory per commit
that is in a string-of-pearls.  That's actually common enough that the
savings could be significant.  But the CPU hit to do the instanceof
test might be higher.  Unfortunately I just hadn't made the change and
written the performance tests necessary to justify keeping the change
in our tree.  :-)

>  public class RevCommit extends RevObject {
>      ...
>      /** Either single parent commit or array of parent commits
>       *  if there are many of them. */
>      Object parents;
>      ...
>      public final int getParentCount() {
>          if (parents instanceof RevCommit)
>              return 1;
>          return ((RevCommit[]) parents).length;
>      }
>
>      public final RevCommit getParent(final int nth) {
>          if (nth > 0)
>              return ((RevCommit[]) parents)[nth];
>          if (parents instanceof RevCommit)
>              return (RevCommit) parents;
>          return ((RevCommit[]) parents)[0];
>      }
>
>      public final RevCommit[] getParents() {
>          if (parents instanceof RevCommit)
>              return new RevCommit[]{(RevCommit) parents};
>          return (RevCommit[]) parents;
>      }

This seems fine.  But you need to upload that to our review server [1].

[1] http://wiki.eclipse.org/EGit/Contributor_Guide#Contributing_Patches

> There are few places in question, though, such as
> RewriteTreeFilter and RewriteGenerator. These classes manipulate
> parents array directly, and I'm afraid of modifying them as I don't
> understand what they do.

When a path limiter is given (e.g. `git log -- some-file.c`) the
parents are modified to elide out commits which do not affect the
path(s) given by the path limiter.  We edit the array to contain a
different ancestor, to show where the file was last modified.

These classes are probably preparing a new array (or maybe even edit
it in place if its the same size).  We'd need a package level
setParents() method on RevCommit to permit the parents to be modified
and have them be stored correctly (as a single RevCommit if there is
only one, or as an array if there is more than one).

-- 
Shawn.


Back to the top