Re: [geomesa-users] Key/Index construction question.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Key/Index construction question.

From: Moises Baly <moises@xxxxxxxxxxxxx>
Date: Wed, 23 Sep 2015 10:45:50 -0400
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <http://www.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <http://www.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>

Hi there:

On the same subject of keys, I have a couple of questions when building them:

1- I only have one way to store non constant "strings" within the key - using the #id - correct? For example, I have a point and want to store something of the sort -> gh :: some_string_ie_HOUSE :: #cstr, changing that string on insertion into Acc. The way I would do this would be with a schema such as "%~#s%99#r%0,11#gh::%~#s%#id::%~#s%TEST#cstr". However, this gives me a parser error, I think because there is a restriction on the id() position - has to be at the end.

The idea is that I want to be able to filter first by location (gh), then by a particular string in the column family.

2- When building the key schema, '%#i' allows you to index what comes after right?

Thanks for your time,

Moises

On Fri, Sep 18, 2015 at 3:29 PM, Moises Baly <moises@xxxxxxxxxxxxx> wrote:

Perfect.

Thank you again for your answers, we are looking forward to go in production with GM.

Kind regards,

Moises

On Fri, Sep 18, 2015 at 3:22 PM, Chris Eichelberger <cne1x@xxxxxxxx> wrote:
Moises,

These are reasonable questions. I'll re-use your numbering.

1. We right-pad lower-precision (larger) Geohashes with periods, so a
10-bit Geohash for Charlottesville might be "dq..." when padded to 35
bits. This becomes a minor bit of hassle for the query planner, which
has to accommodate the (possible) presence of these characters in
addition to valid Geohash characters, but it's not too bad.

2. You are correct that each index key encodes a disjoint subset of the
entire geometry's covering. Fortunately, the entire geometry is stored
elsewhere in the value of the Accumulo entry, so no reconstruction is
required on the client side.

Sincerely,
-- Chris

On Fri, 2015-09-18 at 15:14 -0400, Moises Baly wrote:
> This is an amazing explanation!! Thank you very much for taking the
> time of being so clear.
>
>
> Two additional questions:
> 1- If we are deconstructing non-point geometries into geohashes of
> different precisions,and, say, I specified my key schema as being: "%
> ~#s%foo#cstr%0,7#gh%99#r::_::_ (don't mind cf and cq, just an example)
> - in which I want to have a length 7 geohash in the row id, how do you
> fit the different precision you obtain into my 7 specification? Or I'm
> not making sense here?
>
>
> 2- In the index schema builder, the index or data flag (%#i) builds an
> "index" over a particular portion of the entire key?
>
>
> @Emilio: so if I understood you correctly you have 6 "entire" rows,
> but if you look at the cf or cq portions you might many more distinct
> values correct?
>
>
> For example, I store a polygon, and then I want to retrieve that
> particular polygon. How do you go about putting it together again? It
> has to depend in some sort of identifier no?
>
>
> Thank you both again for your time,
>
>
> Moises
>
>
>
> On Fri, Sep 18, 2015 at 2:47 PM, Chris Eichelberger <cne1x@xxxxxxxx>
> wrote:
> Moises,
>
> Good question! The good news is that there is nothing special
> about how
> the keys are being constructed; the interesting part is in how
> GeoMesa
> decides which keys should be constructed...
>
> (Apologies in advance if, in the course of lecturing, I tell
> you things
> you already know.)
>
> The first point to remember is that each Geohash index-entry
> represents
> a cell. For 35-bit Geohashes, each cell is no more than ~150
> meters
> square. A 0-bit (degenerate) Geohash is the entire surface of
> the
> (flat) Earth. Each bit of precision you add to a Geohash
> halves exactly
> one of its dimensions (when zero-based, even bits halve
> longitude; odd
> bits halve latitude).
>
> Whenever you are indexing data that contain only single-point
> geometries, there will be one index-key per record, because
> every point
> will fall inside exactly one Geohash cell. (Each Geohash cell
> in
> GeoMesa includes its minimum X and minimum Y values, but
> excludes its
> maximum X and maximum Y extents.)
>
> Whenever you are indexing non-point geometries -- line
> strings;
> polygons; etc. -- you have a problem: How do you create a
> single
> index-entry for a geometry that can cross multiple cell
> boundaries? If
> you only index the vertices, you lose information about the
> fact that
> the geometry covers the space between them. There are
> typically two
> approaches to solving this problem:
>
> 1. You can encode a single entry that represents the
> minimum-bounding
> cell description that contains your geometry; or
>
> 2. you can decompose your geometry into covering cells, at
> potentially
> heterogeneous resolutions (different sizes), and index each of
> those
> separately (and then de-duplicate results at query time so
> that each
> feature appears no more than once in any given results set).
>
> GeoMesa takes approach #2 (for now; we're experimenting with
> other ways
> to do this). This is how the polygon you quote, with a large
> number of
> points, can be decomposed into just a few covering cells; each
> of those
> covering cells receives its own index key. I've attached an
> image to
> this email that shows how a polygon and a line-string can be
> decomposed.
> In practice, we do not allow non-point geometries to be
> decomposed into
> so many covering Geohashes. Here is the reference to the code
> in
> GeoMesa where this decomposition is called:
>
> https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/index/STIndexEntry.scala#L49
>
> Please note that, with the advent of the new Z3 index, we will
> be
> revisiting this scheme. The Z3 index is much faster than the
> old
> Geohash-based index, but does not yet support non-point
> geometries, so
> it's a great opportunity for us to improve that feature.
>
> I hope this addressed some of your questions; if not, or if
> you think of
> new ones, please just let us know.
>
> Thanks!
>
> Sincerely,
> -- Chris
>
>
> On Fri, 2015-09-18 at 14:14 -0400, Moises Baly wrote:
> > Hi there:
> >
> >
> > I've come across some tests in the project in my quest to
> understand
> > how indexes work and how is the index partitioned in
> Accumulo's Key
> > (what goes where, and how is constructed.
> >
> >
> > val dummyType =
> >
> SimpleFeatureTypes.createType("DummyType",s"foo:String,bar:Geometry,baz:Date,$DEFAULT_GEOMETRY_PROPERTY_NAME:Geometry,$DEFAULT_DTG_PROPERTY_NAME:Date,$DEFAULT_DTG_END_PROPERTY_NAME:Date")
> > val customType =
> >
> SimpleFeatureTypes.createType("DummyType",s"foo:String,bar:Geometry,baz:Date,*the_geom:Geometry,dt_start:Date,$DEFAULT_DTG_END_PROPERTY_NAME:Date")
> > customType.setDtgField("dt_start")
> > val dummyEncoder = SimpleFeatureSerializers(dummyType,
> > SerializationType.AVRO)
> > val customEncoder = SimpleFeatureSerializers(customType,
> > SerializationType.AVRO)
> > val dummyIndexValueEncoder = IndexValueEncoder(dummyType)`
> > val geometryFactory = new GeometryFactory(new
> PrecisionModel, 4326)
> > val now = new DateTime().toDate
> >
> > val Apr_23_2001 = new DateTime(2001, 4, 23, 12, 5, 0,
> > DateTimeZone.forID("UTC")).toDate
> >
> > val schemaEncoding = "%~#s%feature#cstr%99#r::%~#s%
> 0,4#gh::%~#s%
> > 4,3#gh%#id"
> >
> > val index = IndexSchema.buildKeyEncoder(dummyType,
> schemaEncoding)
> > val line : Geometry =
> WKTUtils.read("LINESTRING(-78.5000092574703
> > 38.0272986617359,-78.5000196719491
> 38.0272519798381,-78.5000300864205
> > 38.0272190279085,-78.5000370293904
> 38.0271853867342,-78.5000439723542
> > 38.027151748305,-78.5000509153117
> 38.027118112621,-78.5000578582629
> > 38.0270844741902,-78.5000648011924
> 38.0270329867966,-78.5000648011781
> > 38.0270165108316,-78.5000682379314
> 38.026999348366,-78.5000752155953
> > 38.026982185898,-78.5000786870602
> 38.0269657099304,-78.5000856300045
> > 38.0269492339602,-78.5000891014656
> 38.0269327579921,-78.5000960444045
> > 38.0269162820211,-78.5001064588197
> 38.0269004925451,-78.5001134017528
> > 38.0268847030715,-78.50012381616
> 38.0268689135938,-78.5001307590877
> > 38.0268538106175,-78.5001411734882
> 38.0268387076367,-78.5001550593595
> > 38.0268236046505,-78.5001654737524
> 38.0268091881659,-78.5001758881429
> > 38.0267954581791,-78.5001897740009
> 38.0267810416871,-78.50059593303
> > 38.0263663951609,-78.5007972751677 38.0261625038609)")
> > val item =
> AvroSimpleFeatureFactory.buildAvroFeature(dummyType,
> > List("TEST_LINE", line, now, line, now, now), "TEST_LINE")
> > val toWrite = new FeatureToWrite(item, "",
> dummyEncoder,
> > dummyIndexValueEncoder)
> > val indexEntries = index.encode(toWrite).toList
> > indexEntries.size must equalTo(1)
> > indexEntries.head.size() mustEqual(6)
> > val cf = new
> > Text(indexEntries.head.getUpdates.get(0).getColumnFamily)
> > val cq = new
> > Text(indexEntries.head.getUpdates.get(0).getColumnQualifier)
> > val keyStr = cf + "::" + cq val line : Geometry =
> > WKTUtils.read("LINESTRING(-78.5000092574703
> > 38.0272986617359,-78.5000196719491
> 38.0272519798381,-78.5000300864205
> > 38.0272190279085,-78.5000370293904
> 38.0271853867342,-78.5000439723542
> > 38.027151748305,-78.5000509153117
> 38.027118112621,-78.5000578582629
> > 38.0270844741902,-78.5000648011924
> 38.0270329867966,-78.5000648011781
> > 38.0270165108316,-78.5000682379314
> 38.026999348366,-78.5000752155953
> > 38.026982185898,-78.5000786870602
> 38.0269657099304,-78.5000856300045
> > 38.0269492339602,-78.5000891014656
> 38.0269327579921,-78.5000960444045
> > 38.0269162820211,-78.5001064588197
> 38.0269004925451,-78.5001134017528
> > 38.0268847030715,-78.50012381616
> 38.0268689135938,-78.5001307590877
> > 38.0268538106175,-78.5001411734882
> 38.0268387076367,-78.5001550593595
> > 38.0268236046505,-78.5001654737524
> 38.0268091881659,-78.5001758881429
> > 38.0267954581791,-78.5001897740009
> 38.0267810416871,-78.50059593303
> > 38.0263663951609,-78.5007972751677 38.0261625038609)")
> > val item =
> AvroSimpleFeatureFactory.buildAvroFeature(dummyType,
> > List("TEST_LINE", line, now, line, now, now), "TEST_LINE")
> > val toWrite = new FeatureToWrite(item, "",
> dummyEncoder,
> > dummyIndexValueEncoder)
> > val indexEntries = index.encode(toWrite).toList
> > indexEntries.size must equalTo(1)
> > indexEntries.head.size() mustEqual(6)
> > val cf = new
> > Text(indexEntries.head.getUpdates.get(0).getColumnFamily)
> > val cq = new
> > Text(indexEntries.head.getUpdates.get(0).getColumnQualifier)
> > val keyStr = cf + "::" + cq
> >
> >
> > How all those points in the Linestring translate to encoding
> only 6
> > rows in Accumulo? As far as I understand, the Key definition
> > (string :: gh :: gh + ID) should encode a single point
> correct? What
> > am I missing in the process here?
> >
> >
> > If somebody could walk me through this example with special
> attention
> > to how the key is being constructed it would be very much
> appreciated.
> >
> >
> > Thank you for your time
> >
> >
> > Moises
> >
> >
>
> > _______________________________________________
> > geomesa-users mailing list
> > geomesa-users@xxxxxxxxxxxxxxxx
> > To change your delivery options, retrieve your password, or
> unsubscribe from this list, visit
> > http://www.locationtech.org/mailman/listinfo/geomesa-users
>
>
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or
> unsubscribe from this list, visit
> http://www.locationtech.org/mailman/listinfo/geomesa-users
>
>
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> http://www.locationtech.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

Follow-Ups:
- Re: [geomesa-users] Key/Index construction question.
  - From: Chris Eichelberger

References:
- [geomesa-users] Key/Index construction question.
  - From: Moises Baly
- Re: [geomesa-users] Key/Index construction question.
  - From: Chris Eichelberger
- Re: [geomesa-users] Key/Index construction question.
  - From: Moises Baly
- Re: [geomesa-users] Key/Index construction question.
  - From: Chris Eichelberger
- Re: [geomesa-users] Key/Index construction question.
  - From: Moises Baly

Prev by Date: Re: [geomesa-users] Sorting
Next by Date: Re: [geomesa-users] Key/Index construction question.
Previous by thread: Re: [geomesa-users] Key/Index construction question.
Next by thread: Re: [geomesa-users] Key/Index construction question.
Index(es):
- Date
- Thread

Breadcrumbs