Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geogig-dev] Spatial Index?

One quick thought RE: hindering the diff algorithm...I gathered that within the key-value store you are looking up the tree object by SHA.  If instead the key were something else such as a space filling curve value (plus perhaps the SHA appended to it) would that necessarily change the diff algorithm if the SHA was still associated with that tree object?  Basically, as long as you don't drop the SHA I would think you could still use it for a reasonable diff even if the SHA was not the key. Does that make sense or am I missing something?

On Thu, Feb 12, 2015 at 5:19 PM, David Winslow <dwinslow@xxxxxxxxxxxxxxxx> wrote:
Thanks Rich.

Just a point of terminology: When we talk about featureids in geogig we usually call them "paths" by analogy to files in git. "ids" or shas would be the hashes used to turn key-value stores into content-addressable storage.  I may have used ID for both when I was explaining this week, sorry to muddy the waters.

If we include spatial metadata in the ordering of the trees, then we will be hindering one of the optimizations in the diff algorithm - since tree nodes are hashed as well, we can omit entire subtrees if they have the same SHA (objectid.)  If you can think of a way to avoid it that's great, but my current thinking is that the best way would be to isolate the spatial index in a separate tree structure.  That also makes it easier to use non-tree indices if appropriate.

Another thing to keep in mind is that even if we only index one attribute of one layer we probably really have a family of indices - there should be an index of every version of a layer (or at least the facility for having that.)  In this case even though a tree structure is inefficient use of the key/value store we do benefit from some storage savings by being able to share unchanged parts of the index between versions.  OTOH spatial rebalancing may mean that this doesn't work as well for things other than the path.  How would you envision a non-tree-based index co-existing with what is in place already?  We do have frequent need for lookup by just the objectid.

On Wed, Feb 11, 2015 at 5:52 PM, Rich Fecher <rfecher@xxxxxxxxx> wrote:
David Winslow had just presented geogig and I wanted to follow up with thoughts on other ways of indexing the data to support efficient querying by more than an ID.

David indicated that although its not currently supported, there is a potential opportunity to store multiple trees per commit where each tree can be organized to support particular queries.  Alternatively it comes to mind that the identifiers for the features just need to be a consistent hash, but could be prefixed by something such as a space filling curve value to control sort order within a key value store.

So I am seeing if I could help with a spatially organized tree - I'm not sure if a variant of an R-Tree would be the most suitable because of the re-balancing required and representing the tree efficiently in a key value store can be counter-productive.

GeoWave and GeoMesa attempt to use space filling curves to reduce n-dimensional data to a single dimension that preserves locality.  I'm a GeoWave developer so I am most familiar with that so I can say that to get a good SFC value for a geometry should be a simple one-liner that is completely decoupled from the storage (such as accumulo). Is this a good way to be thinking of the problem initially?  Would it make sense to try that out to see how well you can keep the features (hashes) within the tree to organize themselves spatially?

Rich


_______________________________________________
geogig-dev mailing list
geogig-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geogig-dev



--

_______________________________________________
geogig-dev mailing list
geogig-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geogig-dev


Back to the top