[geowave-dev] Accumulo Key Structure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[geowave-dev] Accumulo Key Structure - Storing Point Data

From: Marcel <m.jacob@xxxxxxxxxxx>
Date: Mon, 12 Oct 2015 20:51:01 +0200
Delivered-to: geowave-dev@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mailman/private/geowave-dev>
List-help: <mailto:geowave-dev-request@locationtech.org?subject=help>
List-subscribe: <https://www.locationtech.org/mailman/listinfo/geowave-dev>, <mailto:geowave-dev-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://www.locationtech.org/mailman/options/geowave-dev>, <mailto:geowave-dev-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

Hello,

IÂ´ve got a couple of questions when storing point data. In theattachment you can find a drawing with my current understanding how thiskey structure might work.

I read your presentation at the accumulo summit, but itÂ´s not quiteclear how to determine some values.

http://accumulosummit.com/program/talks/geowave-geospatial-and-geotemporal-data-storage-and-retrieval-in-accumulo/

IÂ´ve chosen a very simple case with 8 cubes. The whole cube representsthe world from 2000-2015 (16 years).If I want to store my Point P(30, -180, 2010-01-01) it is said that wefirst have to determine the "tier". Because itÂ´s a point it will bestored in the highest tier number. In my case there are only tier 0 andtier 1. Now itÂ´s up to the bin. This is where my presumptions starts:We need a binID...In my drawing this is done by using a Hilbert-curve.Is this correct? Because my point P is the last of the 8 sub-cubes,binID would set to 8. Because the date range is known this could be donewithout any problems. But when I want to add my point P to Accumulowithout having any additional information this would causes someproblems. Is there are default date range which is used? Or will thebinID added later on, when all data is in Accumulo (now we know thedaterange)?Each bin has its own hilbert space. But which resolution do you use? (inmy drawing its also first order hilbert curve). Where do you store theboundaries for each bin (or are they calculated on the fly)? Theresulting entries in Accumulo for my example is at the bottom of mysheet of paper.Within the accumulo structure I canÂ´t see a parameter which partitionsthe data evenly across my nodes. Do you avoid hotspots with a random prefix?

I hope my sketch helps a little bit that you can understand what myproblems are with the Accumulo key structure. Please correct me if mydrawing is wrong. But itÂ´s hard to get an understanding of this complexstructure.

Is there a method which returns an entry in the accumulo data format? Iwrote a Scanner, but part of the results of the rowId were notreadable: "2003>)æ bï¿¿geowave-gdelt260176188"


Thanks in advance,
Marcel Jacob.

Attachment: geowave-key-structure.pdf
Description: Adobe PDF document

Follow-Ups:
- Re: [geowave-dev] Accumulo Key Structure - Storing Point Data
  - From: Rich Fecher

Prev by Date: [geowave-dev] Geowave Cassandra Proposal
Next by Date: Re: [geowave-dev] Accumulo Key Structure - Storing Point Data
Previous by thread: [geowave-dev] Geowave Cassandra Proposal
Next by thread: Re: [geowave-dev] Accumulo Key Structure - Storing Point Data
Index(es):
- Date
- Thread

Breadcrumbs