[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [jgit-dev] Making RevCommit object representation more compact in memory
|
On Fri, Oct 8, 2010 at 3:06 AM, Aliaksandr Radzivanovich
<aradzivanovich@xxxxxxxxx> wrote:
> I was browsing through JGit source code and noticed that it strives to
> be fast and efficient, preserving every byte where possible. I
> appreciate this approach and I think it could be optimized even
> further. So I want to share and discuss my idea with the community.
>
> Every RevCommit object has array of parent commits, though most of the
> time the array will contain only one element, as most of the commits
> have single parent. The idea is simple: lets declare the parents field
> of type Object, that will either:
>
> 1. Point to a single parent RevCommit object, or
> 2. Will contain array of parents, if there are many of them.
I've thought about doing this myself over the years. The reason I
haven't is that it may be more expensive to do the instanceof test
when checking parent count, or accessing a parent, and there are many
places where we avoided using the accessor methods and just directly
accessed the field.
On a 32 bit JVM, you would save about 16 bytes of memory per commit
that is in a string-of-pearls. That's actually common enough that the
savings could be significant. But the CPU hit to do the instanceof
test might be higher. Unfortunately I just hadn't made the change and
written the performance tests necessary to justify keeping the change
in our tree. :-)
> public class RevCommit extends RevObject {
> ...
> /** Either single parent commit or array of parent commits
> * if there are many of them. */
> Object parents;
> ...
> public final int getParentCount() {
> if (parents instanceof RevCommit)
> return 1;
> return ((RevCommit[]) parents).length;
> }
>
> public final RevCommit getParent(final int nth) {
> if (nth > 0)
> return ((RevCommit[]) parents)[nth];
> if (parents instanceof RevCommit)
> return (RevCommit) parents;
> return ((RevCommit[]) parents)[0];
> }
>
> public final RevCommit[] getParents() {
> if (parents instanceof RevCommit)
> return new RevCommit[]{(RevCommit) parents};
> return (RevCommit[]) parents;
> }
This seems fine. But you need to upload that to our review server [1].
[1] http://wiki.eclipse.org/EGit/Contributor_Guide#Contributing_Patches
> There are few places in question, though, such as
> RewriteTreeFilter and RewriteGenerator. These classes manipulate
> parents array directly, and I'm afraid of modifying them as I don't
> understand what they do.
When a path limiter is given (e.g. `git log -- some-file.c`) the
parents are modified to elide out commits which do not affect the
path(s) given by the path limiter. We edit the array to contain a
different ancestor, to show where the file was last modified.
These classes are probably preparing a new array (or maybe even edit
it in place if its the same size). We'd need a package level
setParents() method on RevCommit to permit the parents to be modified
and have them be stored correctly (as a single RevCommit if there is
only one, or as an array if there is more than one).
--
Shawn.