Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Key/Index construction question.

This is an amazing explanation!! Thank you very much for taking the time of being so clear.

Two additional questions:
1- If we are deconstructing non-point geometries into geohashes  of different precisions,and, say, I specified my key schema as being: "%~#s%foo#cstr%0,7#gh%99#r::_::_ (don't mind cf and cq, just an example) - in which I want to have a length 7 geohash in the row id, how do you fit the different precision you obtain into my 7 specification? Or I'm not making sense here?

2- In the index schema builder, the index or data flag (%#i) builds an "index" over a particular portion of the entire key? 

@Emilio: so if I understood you correctly you have 6 "entire" rows, but if you look at the cf or cq portions you might many more distinct values correct? 

For example, I store a polygon, and then I want to retrieve that particular polygon. How do you go about putting it together again? It has to depend in some sort of identifier no?

Thank you both again for your time,

Moises


On Fri, Sep 18, 2015 at 2:47 PM, Chris Eichelberger <cne1x@xxxxxxxx> wrote:
Moises,

Good question!  The good news is that there is nothing special about how
the keys are being constructed; the interesting part is in how GeoMesa
decides which keys should be constructed...

(Apologies in advance if, in the course of lecturing, I tell you things
you already know.)

The first point to remember is that each Geohash index-entry represents
a cell.  For 35-bit Geohashes, each cell is no more than ~150 meters
square.  A 0-bit (degenerate) Geohash is the entire surface of the
(flat) Earth.  Each bit of precision you add to a Geohash halves exactly
one of its dimensions (when zero-based, even bits halve longitude; odd
bits halve latitude).

Whenever you are indexing data that contain only single-point
geometries, there will be one index-key per record, because every point
will fall inside exactly one Geohash cell.  (Each Geohash cell in
GeoMesa includes its minimum X and minimum Y values, but excludes its
maximum X and maximum Y extents.)

Whenever you are indexing non-point geometries -- line strings;
polygons; etc. -- you have a problem:  How do you create a single
index-entry for a geometry that can cross multiple cell boundaries?  If
you only index the vertices, you lose information about the fact that
the geometry covers the space between them.  There are typically two
approaches to solving this problem:

1.  You can encode a single entry that represents the minimum-bounding
cell description that contains your geometry; or

2.  you can decompose your geometry into covering cells, at potentially
heterogeneous resolutions (different sizes), and index each of those
separately (and then de-duplicate results at query time so that each
feature appears no more than once in any given results set).

GeoMesa takes approach #2 (for now; we're experimenting with other ways
to do this).  This is how the polygon you quote, with a large number of
points, can be decomposed into just a few covering cells; each of those
covering cells receives its own index key.  I've attached an image to
this email that shows how a polygon and a line-string can be decomposed.
In practice, we do not allow non-point geometries to be decomposed into
so many covering Geohashes.  Here is the reference to the code in
GeoMesa where this decomposition is called:

https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/index/STIndexEntry.scala#L49

Please note that, with the advent of the new Z3 index, we will be
revisiting this scheme.  The Z3 index is much faster than the old
Geohash-based index, but does not yet support non-point geometries, so
it's a great opportunity for us to improve that feature.

I hope this addressed some of your questions; if not, or if you think of
new ones, please just let us know.

Thanks!

Sincerely,
  -- Chris


On Fri, 2015-09-18 at 14:14 -0400, Moises Baly wrote:
> Hi there:
>
>
> I've come across some tests in the project in my quest to understand
> how indexes work and how is the index partitioned in Accumulo's Key
> (what goes where, and how is constructed.
>
>
> val dummyType =
> SimpleFeatureTypes.createType("DummyType",s"foo:String,bar:Geometry,baz:Date,$DEFAULT_GEOMETRY_PROPERTY_NAME:Geometry,$DEFAULT_DTG_PROPERTY_NAME:Date,$DEFAULT_DTG_END_PROPERTY_NAME:Date")
>   val customType =
> SimpleFeatureTypes.createType("DummyType",s"foo:String,bar:Geometry,baz:Date,*the_geom:Geometry,dt_start:Date,$DEFAULT_DTG_END_PROPERTY_NAME:Date")
>   customType.setDtgField("dt_start")
>   val dummyEncoder = SimpleFeatureSerializers(dummyType,
> SerializationType.AVRO)
>   val customEncoder = SimpleFeatureSerializers(customType,
> SerializationType.AVRO)
>   val dummyIndexValueEncoder = IndexValueEncoder(dummyType)`
>   val geometryFactory = new GeometryFactory(new PrecisionModel, 4326)
>   val now = new DateTime().toDate
>
>   val Apr_23_2001 = new DateTime(2001, 4, 23, 12, 5, 0,
> DateTimeZone.forID("UTC")).toDate
>
>   val schemaEncoding = "%~#s%feature#cstr%99#r::%~#s%0,4#gh::%~#s%
> 4,3#gh%#id"
>
>   val index = IndexSchema.buildKeyEncoder(dummyType, schemaEncoding)
>  val line : Geometry = WKTUtils.read("LINESTRING(-78.5000092574703
> 38.0272986617359,-78.5000196719491 38.0272519798381,-78.5000300864205
> 38.0272190279085,-78.5000370293904 38.0271853867342,-78.5000439723542
> 38.027151748305,-78.5000509153117 38.027118112621,-78.5000578582629
> 38.0270844741902,-78.5000648011924 38.0270329867966,-78.5000648011781
> 38.0270165108316,-78.5000682379314 38.026999348366,-78.5000752155953
> 38.026982185898,-78.5000786870602 38.0269657099304,-78.5000856300045
> 38.0269492339602,-78.5000891014656 38.0269327579921,-78.5000960444045
> 38.0269162820211,-78.5001064588197 38.0269004925451,-78.5001134017528
> 38.0268847030715,-78.50012381616 38.0268689135938,-78.5001307590877
> 38.0268538106175,-78.5001411734882 38.0268387076367,-78.5001550593595
> 38.0268236046505,-78.5001654737524 38.0268091881659,-78.5001758881429
> 38.0267954581791,-78.5001897740009 38.0267810416871,-78.50059593303
> 38.0263663951609,-78.5007972751677 38.0261625038609)")
>       val item = AvroSimpleFeatureFactory.buildAvroFeature(dummyType,
> List("TEST_LINE", line, now, line, now, now), "TEST_LINE")
>       val toWrite = new FeatureToWrite(item, "", dummyEncoder,
> dummyIndexValueEncoder)
>       val indexEntries = index.encode(toWrite).toList
>       indexEntries.size must equalTo(1)
>       indexEntries.head.size() mustEqual(6)
>       val cf = new
> Text(indexEntries.head.getUpdates.get(0).getColumnFamily)
>       val cq = new
> Text(indexEntries.head.getUpdates.get(0).getColumnQualifier)
>       val keyStr = cf + "::" + cq val line : Geometry =
> WKTUtils.read("LINESTRING(-78.5000092574703
> 38.0272986617359,-78.5000196719491 38.0272519798381,-78.5000300864205
> 38.0272190279085,-78.5000370293904 38.0271853867342,-78.5000439723542
> 38.027151748305,-78.5000509153117 38.027118112621,-78.5000578582629
> 38.0270844741902,-78.5000648011924 38.0270329867966,-78.5000648011781
> 38.0270165108316,-78.5000682379314 38.026999348366,-78.5000752155953
> 38.026982185898,-78.5000786870602 38.0269657099304,-78.5000856300045
> 38.0269492339602,-78.5000891014656 38.0269327579921,-78.5000960444045
> 38.0269162820211,-78.5001064588197 38.0269004925451,-78.5001134017528
> 38.0268847030715,-78.50012381616 38.0268689135938,-78.5001307590877
> 38.0268538106175,-78.5001411734882 38.0268387076367,-78.5001550593595
> 38.0268236046505,-78.5001654737524 38.0268091881659,-78.5001758881429
> 38.0267954581791,-78.5001897740009 38.0267810416871,-78.50059593303
> 38.0263663951609,-78.5007972751677 38.0261625038609)")
>       val item = AvroSimpleFeatureFactory.buildAvroFeature(dummyType,
> List("TEST_LINE", line, now, line, now, now), "TEST_LINE")
>       val toWrite = new FeatureToWrite(item, "", dummyEncoder,
> dummyIndexValueEncoder)
>       val indexEntries = index.encode(toWrite).toList
>       indexEntries.size must equalTo(1)
>       indexEntries.head.size() mustEqual(6)
>       val cf = new
> Text(indexEntries.head.getUpdates.get(0).getColumnFamily)
>       val cq = new
> Text(indexEntries.head.getUpdates.get(0).getColumnQualifier)
>       val keyStr = cf + "::" + cq
>
>
> How all those points in the Linestring translate to encoding only 6
> rows in Accumulo? As far as I understand, the Key definition
> (string :: gh :: gh + ID) should encode a single point correct? What
> am I missing in the process here?
>
>
> If somebody could walk me through this example with special attention
> to how the key is being constructed it would be very much appreciated.
>
>
> Thank you for your time
>
>
> Moises
>
>
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> http://www.locationtech.org/mailman/listinfo/geomesa-users


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users


Back to the top