Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Attribute Indexing

Hi Marcel,

We use the mango lexicoders library:

https://github.com/calrissian/mango/tree/master/mango-core/src/main/java/org/calrissian/mango/types

We treat dates as longs, based on the standard java millis since epoch, so they get sorted based on that. Makes for very efficient range searches, as you say.

Thanks,

Emilio

On 10/09/2015 12:14 PM, Marcel Jacob wrote:
Okay, so there is no special strategy for indexing a date?
I´m asking because there were different date formats I can imagine:

1) yyyy-MM-dd
lexicographic sorting order:
...
2015-10-07
2015-10-08
2015-10-09

With this format it´s very efficient to query a date range because only one table scan is needed.

2) dd-MM-yyyy
lexicographic sorting order:
...
07-10-2015
07-11-2015
07-12-2015
...
08-10-2015
...
09-10-2015

When querying for [07-10-2015 til 09-10-2015] we need multiple table scans and building an additional index would be less efficient. (But now we can efficiently ask for queries on a special day, e.g. 7th day. Although I believe this type of query is very rare.)

So I will store my date in format 1).

Please correct me if my thoughts are wrong.

Thanks in advance,
Marcel Jacob.


> From: cne1x@xxxxxxxx
> To: geomesa-users@xxxxxxxxxxxxxxxx
> Date: Fri, 9 Oct 2015 07:23:57 -0400
> Subject: Re: [geomesa-users] Attribute Indexing
>
> Marcel,
>
> Roughly in order that you asked...
>
> 1. Yes, it is always possible to get the raw key-value pairs out of
> Accumulo. The easiest way is via the Accumulo shell:
>
>
> http://accumulo.apache.org/1.6/accumulo_user_manual.html#_accumulo_shell
>
> Login, and then scan the "_attr_idx" table with a command somewhat
> similar to this:
>
> scan -t geomesa_attr_idx
>
> 2. There is nothing particularly novel about the way GeoMesa stores
> secondary attribute indexes in the "_attr_idx" table. This is a
> straight lexicographically-encoded-value storage.
>
> 3. There are two parts to using secondary indexes effectively:
> encoding and querying. The best references are to the GeoMesa source
> where these occur:
>
> encoding:
> https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/data/tables/AttributeTable.scala
>
> querying:
> https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/index/AttributeIdxStrategy.scala
>
> Enjoy!
>
> Sincerely,
> -- Chris
>
>
> On Fri, 2015-10-09 at 11:27 +0200, Marcel wrote:
> > Hello,
> >
> > is there a possibility to get the complete "raw" key-value pair of a
> > table as it is saved (a sample would be enough)? I want to look in the
> > "_attr_idx" table and understand how the index is built, e.g. when
> > indexing additional attributes like another Date, an Integer or a
> > String. Is it this a special or a common strategy (adapted for Accumulo)
> > for indexing? What fields available in the Accumulo table dsign did you
> > use (RowId, ColumnFamily, etc.)?
> >
> > Thanks,
> > Marcel Jacob.
> >
> > _______________________________________________
> > geomesa-users mailing list
> > geomesa-users@xxxxxxxxxxxxxxxx
> > To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> > http://www.locationtech.org/mailman/listinfo/geomesa-users
>
>
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> http://www.locationtech.org/mailman/listinfo/geomesa-users


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users


Back to the top