Re: [geomesa-users] Executing a query

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Executing a query - process

From: Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>
Date: Wed, 28 Oct 2015 17:22:49 -0400
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <http://www.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <http://www.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

Hi Marcel,

See answers inline:

Thanks,

Emilio

On 10/28/2015 12:26 PM, Marcel Jacob wrote:

Some questions about the different tables:
1) general:
All tables (except metadata-table) contain a SimpleFeature in the value (only data entries in st_idx). Is this correct?

In general yes, but we don't always store the full simple feature.

2) deprecated st_idx table (I want to understand the idea behind the structure):
Why do you separate index and data entries? Which advantage do I have while querying?
example from my output:
rowID: 0~0~event~000~2012071922 family: 00 qualifier: 70fac002-0004-4800-8189-dd3c6c8e5b4c visibility: value: ??? (What is stored here? In my case always 70 Bytes)
rowID: 0~1~event~000~2012071922 family: 00 qualifier: 70fac002-0004-4800-8189-dd3c6c8e5b4c visibility: value: encoded SimpleFeature

The 'index' entries only contain the date and geometry for the feature. This is an optimized use case to drive maps, which don't need all the other attributes in the simple feature. This way we can save a lot of bytes being read/transferred.

What would happen without the binary Index?
rowID: 0~event~000~2012071922 family: 00 qualifier: 70fac002-0004-4800-8189-dd3c6c8e5b4c visibility: value: encoded SimpleFeature

We use the binary index so that we only scan either index or data entries. They index entries contain the same features as the data entries, but a subset of the attributes. Thus, we'd never want to scan both index and data entries at the same time.

3) attr_idx_table
A singleTableScan is better than a join. But when do I have to use a join with the record table?
Secondary filters were applied against the record table...Why not attr_idx?

We store the reduced date/geometry in the attribute index by default, to reduce disk space. If the query can be satisfied with those fields, we will just scan the attribute table. If not, we have to retrieve the full records from the records table. Secondary filters are applied at this point because we have already determined that they can't be satisfied against the reduced attribute data.

Note that if disk space is not a concern, you can store the full simple feature in the attribute index, and then you will never have to join against the record table.

Best regards,
Marcel Jacob.

Am 22.10.2015 16:49, schrieb Emilio Lahr-Vivaz:
Hi Marcel,

Feature ID is determined when the feature is created (or more accurately, when it's persisted). We follow the geotools standard - if you set Hints.USE_PROVIDED_FID and/or Hints.PROVIDED_FID in the feature user data, then we will use the feature ID as provided. Otherwise, we generate a random feature ID - by default we use a modified UUID, which is what you see below. In general, we do not encode or rely on index data in the feature ID. (We do provide some advanced feature ID options that I won't get into here, as they're beyond the basic scope of the issue).

You can see the logic here:

https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/data/AccumuloFeatureWriter.scala#L88

Thanks,

Emilio

On 10/22/2015 10:27 AM, Marcel wrote:
Okay,
when the featureId sets up the ranges to scan, I need to know the featureId depends on.
I know that you use an own implemenation for a SimpleFeatureType, but I can´t find the place where you determine the featureId from a given SimpleFeature. I assume that you override the getId() method. A featureID looks like this, which seems to have five parts (separated by "-"):
909271fc-44fa-495b-b876-5843df987fef
The composition of these values reflects the hierarchy of the index, isn´t it? (from coarse-grained to fine-grained) If not, how do you determine the featureID? What are these five parts?

Best regards,
Marcel Jacob.

Am 21.10.2015 18:26, schrieb Emilio Lahr-Vivaz:

Hi Marcel,

Executing a query is basically 3 steps:

1. Choose the best index to scan given the query
2. Set up the scan ranges, iterators, etc for the given index
3. Transform the results coming back from accumulo into SimpleFeatures

- We use a cost-based strategy for picking an index to scan. Right now it is fairly static, but we allow for user hints to indicate a preferred strategy. The code for calculating costs is here:
https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/index/QueryStrategyDecider.scala

- We use the feature ID out of the index table in order to 'join' against the record table. You can see that code snippet here:
https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/index/AttributeIdxStrategy.scala#L153-L154

- When scanning the records table, we only care about the Value - that is the serialized simple feature, including ID. The row of the record table has the feature ID in it, which is how we set up a range to scan.

Hope that helps,

Emilio

On 10/21/2015 11:59 AM, Marcel wrote:

Hello all,
I´m struggling how the process of executing a query is done on a theoretical level.
Could you explain how Geomesa executes this query (step by step)?
Following points would be very interesting:
- Is there an order for the tables? (maybe look first at attr_idx, then st_idx and finally full table scan if needed...something like this).
- What is returned from the index table? (for mapping with records table - maybe feature id?)
- What about the records table? (current structure of rowid - including feature id?)

Best regards,
Marcel Jacob.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

References:
- [geomesa-users] Executing a query - process
  - From: Marcel
- Re: [geomesa-users] Executing a query - process
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Executing a query - process
  - From: Marcel
- Re: [geomesa-users] Executing a query - process
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Executing a query - process
  - From: Marcel Jacob

Prev by Date: Re: [geomesa-users] Executing a query - process
Next by Date: [geomesa-users] Use of Landsat 8 imagery with GeoMesa
Previous by thread: Re: [geomesa-users] Executing a query - process
Next by thread: [geomesa-users] Use of Landsat 8 imagery with GeoMesa
Index(es):
- Date
- Thread

Breadcrumbs