Re: [geomesa-users] Executing a query

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Executing a query - process

From: Marcel Jacob <m.jacob@xxxxxxxxxxx>
Date: Wed, 28 Oct 2015 16:26:41 +0000
Accept-language: en-US
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <http://www.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <http://www.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
Thread-index: AQHRDZYkMLlIDMwSUkmW/GzMJYC2o56BH52A
Thread-topic: [geomesa-users] Executing a query - process

Some questions about the different tables:
1) general:
All tables (except metadata-table) contain a SimpleFeature in the value (only data entries in st_idx). Is this correct?

2) deprecated st_idx table (I want to understand the idea behind the structure):
Why do you separate index and data entries? Which advantage do I have while querying?
example from my output:
rowID: 0~0~event~000~2012071922 family: 00 qualifier: 70fac002-0004-4800-8189-dd3c6c8e5b4c visibility: value: ??? (What is stored here? In my case always 70 Bytes)
rowID: 0~1~event~000~2012071922 family: 00 qualifier: 70fac002-0004-4800-8189-dd3c6c8e5b4c visibility: value: encoded SimpleFeature

What would happen without the binary Index?
rowID: 0~event~000~2012071922 family: 00 qualifier: 70fac002-0004-4800-8189-dd3c6c8e5b4c visibility: value: encoded SimpleFeature

3) attr_idx_table
A singleTableScan is better than a join. But when do I have to use a join with the record table?
Secondary filters were applied against the record table...Why not attr_idx?

Best regards,
Marcel Jacob.

Am 22.10.2015 16:49, schrieb Emilio Lahr-Vivaz:

Hi Marcel,

Feature ID is determined when the feature is created (or more accurately, when it's persisted). We follow the geotools standard - if you set Hints.USE_PROVIDED_FID and/or Hints.PROVIDED_FID in the feature user data, then we will use the feature ID as provided. Otherwise, we generate a random feature ID - by default we use a modified UUID, which is what you see below. In general, we do not encode or rely on index data in the feature ID. (We do provide some advanced feature ID options that I won't get into here, as they're beyond the basic scope of the issue).

You can see the logic here:

https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/data/AccumuloFeatureWriter.scala#L88

Thanks,

Emilio

On 10/22/2015 10:27 AM, Marcel wrote:
Okay,
when the featureId sets up the ranges to scan, I need to know the featureId depends on.
I know that you use an own implemenation for a SimpleFeatureType, but I can´t find the place where you determine the featureId from a given SimpleFeature. I assume that you override the getId() method. A featureID looks like this, which seems to have five parts (separated by "-"):
909271fc-44fa-495b-b876-5843df987fef
The composition of these values reflects the hierarchy of the index, isn´t it? (from coarse-grained to fine-grained) If not, how do you determine the featureID? What are these five parts?

Best regards,
Marcel Jacob.

Am 21.10.2015 18:26, schrieb Emilio Lahr-Vivaz:

Hi Marcel,

Executing a query is basically 3 steps:

1. Choose the best index to scan given the query
2. Set up the scan ranges, iterators, etc for the given index
3. Transform the results coming back from accumulo into SimpleFeatures

- We use a cost-based strategy for picking an index to scan. Right now it is fairly static, but we allow for user hints to indicate a preferred strategy. The code for calculating costs is here:
https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/index/QueryStrategyDecider.scala

- We use the feature ID out of the index table in order to 'join' against the record table. You can see that code snippet here:
https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/index/AttributeIdxStrategy.scala#L153-L154

- When scanning the records table, we only care about the Value - that is the serialized simple feature, including ID. The row of the record table has the feature ID in it, which is how we set up a range to scan.

Hope that helps,

Emilio

On 10/21/2015 11:59 AM, Marcel wrote:

Hello all,
I´m struggling how the process of executing a query is done on a theoretical level.
Could you explain how Geomesa executes this query (step by step)?
Following points would be very interesting:
- Is there an order for the tables? (maybe look first at attr_idx, then st_idx and finally full table scan if needed...something like this).
- What is returned from the index table? (for mapping with records table - maybe feature id?)
- What about the records table? (current structure of rowid - including feature id?)

Best regards,
Marcel Jacob.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

Follow-Ups:
- Re: [geomesa-users] Executing a query - process
  - From: Emilio Lahr-Vivaz

References:
- [geomesa-users] Executing a query - process
  - From: Marcel
- Re: [geomesa-users] Executing a query - process
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Executing a query - process
  - From: Marcel
- Re: [geomesa-users] Executing a query - process
  - From: Emilio Lahr-Vivaz

Prev by Date: Re: [geomesa-users] geomesa geoserver plugin
Next by Date: Re: [geomesa-users] Executing a query - process
Previous by thread: Re: [geomesa-users] Executing a query - process
Next by thread: Re: [geomesa-users] Executing a query - process
Index(es):
- Date
- Thread

Breadcrumbs