Hi Ben,
If there is a geometry and date field in your simple feature type,
then geomesa will use that for indexing. If you have more than one
date or geometry field, you can indicate which ones you want to be
used for the index - more below. If nothing is indicated, I believe
geomesa will default to the first one declared. You can check which
fields are indexed by scanning the geomesa metadata table in the
accumulo shell. The exact entries will vary by version, but you can
probably figure it out - if not please reply back with the scan
output and we can parse it for you.
Exactly how you indicate the defaults depends on how you are
creating your simple feature type. For dates, the end result should
be that there is a user-data entry for "geomesa.index.dtg" set to
the name of your date field. For geometries, the default should be
returned by simpleFeatureType.getGeometryDescriptor().
If you just have lat/lon, you will need to turn them into an actual
geometry type in your simple feature type.
During ingestion, you can then set the indexed fields to whatever
values you want (sys time, provided time, etc).
Thanks,
Emilio
On 02/14/2017 10:35 AM, Benjamin Weaver
wrote:
Thanks, Emilio,
There is a lot of very valuable information here.
Two questions just to clarify (you were clear in your
answers--the lack of clarity is in my understanding of
things):
1. How would we index in Geomesa on latitude, longitude, and
a time we provide from our own data, i.e. not a system
generated timestamp?
Hi Ben,
1. The key used to write in geomesa depends on the
particular index, but it will always include the feature ID,
so if the feature ID changes you will get a duplicate
record.
2. If you're using our converter framework, we do have some
methods to use an MD5 of the values as the feature ID, which
will prevent duplicates. If not, you can do the same thing
by generating the feature ID yourself and setting the
PROVIDED_FID or USE_PROVIDED_FID hint. We also have a
pluggable SPI interface for generating feature IDs when they
aren't set. See
http://www.geomesa.org/documentation/user/datastores/runtime_config.html#geomesa-feature-id-generator.
By default we generate a UUID that includes parts of the Z3
index, so that features grouped in space-time will also be
grouped in accumulo. Note that the feature ID is a string
and has no inherent restrictions on form.
3. The Z3 index uses the default date attribute to index
records, not the insertion time.
Let me know if anything isn't clear!
Thanks,
Emilio
On 02/12/2017 03:28 PM,
Benjamin Weaver wrote:
Hi all,
If we ingest, say, the same line of text data twice
(by mistake) in Geomesa 1.2.1 we end up with duplicate
data in our Accumulo (1.7.2) database. We are
ingesting using Gemesa-generated featureIDs (setting
our featureBuilder.setFeatureID to NULL without the
use of Hints).
A colleague asked me, why are duplicates generated in
this case? I realized I did not know.
1. How, exactly, in our
configuration of geomesa + Accumulo, is a geomesa row,
or record made unique? I know the importance of
Accumulo logical rows, but in this case of
identical data we would want to insure insertation
of only one geomesa record, namely, one
instance of our geomesa SimpleFeature.
1a. Are duplicate geomesa rows added because the time
at insertion differs? or because different featureIDs
are randomly generated on each insertion?
Potentially related questions:
2. How are featureIDs generated by geomesa? I thought
randomly, but I read a comment somewhere
suggesting that FeatureIDs were created out of an md5
hash of all the values in the feature. But a colleague
points out that even if this is so, a featureID does
not resemble an md5 hash, so must be composed at least
partially by other means
3. A potentially related question: can we create a z3
index by using a data-derived timestamp--not the
insertion timestamp-- as the time dimension?
All comments and perspectives are appreciated and
welcome!
Ben Weaver
This email (and any attachments) may contain confidential
information and is intended solely for the recipient(s) to
whom the email is addressed. If you received this email in
error, please inform us immediately and delete the email
and all attachments without further using, copying or
disclosing the information. This email and any attachments
are believed to be, but cannot be guaranteed to be, secure
or virus-free. Satellite Applications Catapult Limited is
registered in England & Wales. Company Number:
7964746. Registered office: Electron Building, Fermi
Avenue, Harwell Oxford, Didcot, Oxfordshire OX11 0QR.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
This email (and any attachments) may contain confidential
information and is intended solely for the recipient(s) to whom
the email is addressed. If you received this email in error,
please inform us immediately and delete the email and all
attachments without further using, copying or disclosing the
information. This email and any attachments are believed to be,
but cannot be guaranteed to be, secure or virus-free. Satellite
Applications Catapult Limited is registered in England &
Wales. Company Number: 7964746. Registered office: Electron
Building, Fermi Avenue, Harwell Oxford, Didcot, Oxfordshire OX11
0QR.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
|