[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
| 
[geowave-dev] Accumulo Key Structure - Storing Point Data
 | 
Hello,
I´ve got a couple of questions when storing point data. In the 
attachment you can find a drawing with my current understanding how this 
key structure might work.
I read your presentation at the accumulo summit, but it´s not quite 
clear how to determine some values.
http://accumulosummit.com/program/talks/geowave-geospatial-and-geotemporal-data-storage-and-retrieval-in-accumulo/
I´ve chosen a very simple case with 8 cubes. The whole cube represents 
the world from 2000-2015 (16 years).
If I want to store my Point P(30, -180, 2010-01-01) it is said that we 
first have to determine the "tier". Because it´s a point it will be 
stored in the highest tier number. In my case there are only tier 0 and 
tier 1. Now it´s up to the bin. This is where my presumptions starts: 
We need a binID...In my drawing this is done by using a Hilbert-curve. 
Is this correct? Because my point P is the last of the 8 sub-cubes, 
binID would set to 8. Because the date range is known this could be done 
without any problems. But when I want to add my point P to Accumulo 
without having any additional information this would causes some 
problems. Is there are default date range which is used? Or will the 
binID added later on, when all data is in Accumulo (now we know the 
daterange)?
Each bin has its own hilbert space. But which resolution do you use? (in 
my drawing its also first order hilbert curve). Where do you store the 
boundaries for each bin (or are they calculated on the fly)? The 
resulting entries in Accumulo for my example is at the bottom of my 
sheet of paper.
Within the accumulo structure I can´t see a parameter which partitions 
the data evenly across my nodes. Do you avoid hotspots with a random prefix?
I hope my sketch helps a little bit that you can understand what my 
problems are with the Accumulo key structure. Please correct me if my 
drawing is wrong. But it´s hard to get an understanding of this complex 
structure.
Is there a method which returns an entry in the accumulo data format? I 
wrote a Scanner, but part of the results of the rowId were not 
readable: "2003>)æ bï¿¿geowave-gdelt260176188"
Thanks in advance,
Marcel Jacob.
Attachment:
geowave-key-structure.pdf
Description: Adobe PDF document