Hi Marcel,
See answers inline:
Thanks,
Emilio
On 10/28/2015 12:26 PM, Marcel Jacob
wrote:
Some questions about the different tables:
1) general:
All tables (except metadata-table) contain a SimpleFeature in the
value (only data entries in st_idx). Is this correct?
In general yes, but we don't always store the full simple feature.
2) deprecated st_idx table (I want to understand the idea behind
the structure):
Why do you separate index and data entries? Which advantage do I
have while querying?
example from my output:
rowID: 0~0~event~000~2012071922 family: 00 qualifier:
70fac002-0004-4800-8189-dd3c6c8e5b4c visibility: value: ???
(What is stored here? In my case always 70 Bytes)
rowID: 0~1~event~000~2012071922 family: 00 qualifier:
70fac002-0004-4800-8189-dd3c6c8e5b4c visibility: value: encoded
SimpleFeature
The 'index' entries only contain the date and geometry for the
feature. This is an optimized use case to drive maps, which don't
need all the other attributes in the simple feature. This way we can
save a lot of bytes being read/transferred.
What would happen without the binary Index?
rowID: 0~event~000~2012071922 family: 00 qualifier:
70fac002-0004-4800-8189-dd3c6c8e5b4c visibility: value: encoded
SimpleFeature
We use the binary index so that we only scan either index or data
entries. They index entries contain the same features as the data
entries, but a subset of the attributes. Thus, we'd never want to
scan both index and data entries at the same time.
3) attr_idx_table
A singleTableScan is better than a join. But when do I have to use
a join with the record table?
Secondary filters were applied against the record table...Why not
attr_idx?
We store the reduced date/geometry in the attribute index by
default, to reduce disk space. If the query can be satisfied with
those fields, we will just scan the attribute table. If not, we have
to retrieve the full records from the records table. Secondary
filters are applied at this point because we have already determined
that they can't be satisfied against the reduced attribute data.
Note that if disk space is not a concern, you can store the full
simple feature in the attribute index, and then you will never have
to join against the record table.
Best regards,
Marcel Jacob.
Am 22.10.2015 16:49, schrieb Emilio
Lahr-Vivaz:
Hi
Marcel,
Feature ID is determined when the feature is created (or more
accurately, when it's persisted). We follow the geotools
standard - if you set Hints.USE_PROVIDED_FID and/or
Hints.PROVIDED_FID in the feature user data, then we will use
the feature ID as provided. Otherwise, we generate a random
feature ID - by default we use a modified UUID, which is what
you see below. In general, we do not encode or rely on index
data in the feature ID. (We do provide some advanced feature ID
options that I won't get into here, as they're beyond the basic
scope of the issue).
You can see the logic here:
https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/data/AccumuloFeatureWriter.scala#L88
Thanks,
Emilio
On 10/22/2015 10:27 AM, Marcel
wrote:
Okay,
when the featureId sets up the ranges to scan, I need to know
the featureId depends on.
I know that you use an own implemenation for a
SimpleFeatureType, but I can´t find the place where you
determine the featureId from a given SimpleFeature. I assume
that you override the getId() method. A featureID looks like
this, which seems to have five parts (separated by "-"):
909271fc-44fa-495b-b876-5843df987fef
The composition of these values
reflects the hierarchy of the index, isn´t it? (from
coarse-grained to fine-grained) If not, how do you determine
the featureID? What are these five parts?
Best regards,
Marcel Jacob.
Am 21.10.2015 18:26, schrieb
Emilio Lahr-Vivaz:
Hi
Marcel,
Executing a query is basically 3 steps:
1. Choose the best index to scan given the query
2. Set up the scan ranges, iterators, etc for the given
index
3. Transform the results coming back from accumulo into
SimpleFeatures
- We use a cost-based strategy for picking an index to scan.
Right now it is fairly static, but we allow for user hints
to indicate a preferred strategy. The code for calculating
costs is here:
https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/index/QueryStrategyDecider.scala
- We use the feature ID out of the index table in order to
'join' against the record table. You can see that code
snippet here:
https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/index/AttributeIdxStrategy.scala#L153-L154
- When scanning the records table, we only care about the
Value - that is the serialized simple feature, including ID.
The row of the record table has the feature ID in it, which
is how we set up a range to scan.
Hope that helps,
Emilio
On 10/21/2015 11:59 AM, Marcel wrote:
Hello all,
I´m struggling how the process of executing a query is
done on a theoretical level.
Could you explain how Geomesa executes this query (step by
step)?
Following points would be very interesting:
- Is there an order for the tables? (maybe look first at
attr_idx, then st_idx and finally full table scan if
needed...something like this).
- What is returned from the index table? (for mapping with
records table - maybe feature id?)
- What about the records table? (current structure of
rowid - including feature id?)
Best regards,
Marcel Jacob.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password,
or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
|