Hi Ben,
That bug was related to spark in particular - if you aren't using
spark it shouldn't affect you.
I would suggest updating to the latest 1.2.x version, 1.2.7.3. That
is fully back-compatible with your current data, and contains lot of
bug fixes. I'm not sure about your particular bugs, I don't think
anyone has reported that before. If updating doesn't work, maybe you
can send us some data and queries that exhibit the issue.
Thanks,
Emilio
On 02/14/2017 11:54 AM, Benjamin Weaver
wrote:
Thank you,
This again is very useful information. Emilio, one reason why
we had these questions is because a simple time range query
was not pulling up all expected results. Certain timestamps
would fail (I could provide an extensive list) whereas others
succeed, and also, strangely, queries of this form [[TS
between timeX and timeY] AND [UUID like e745...]] That query
would succeed even though the time range component of it would
fail. So a logic failure as well.
Then I noticed notice of the following bug:
https://dev.locationtech.org/mhonarc/lists/geomesa-users/msg00575.html
Would this represent a fix to our problem? If so, how would
we incorporate this fix into a running version of Geomesa
1.2.1?
Any suggestions welcome. Thank you again for your help.
Ben
Hi Ben,
If there is a geometry and date field in your simple feature
type, then geomesa will use that for indexing. If you have
more than one date or geometry field, you can indicate which
ones you want to be used for the index - more below. If
nothing is indicated, I believe geomesa will default to the
first one declared. You can check which fields are indexed
by scanning the geomesa metadata table in the accumulo
shell. The exact entries will vary by version, but you can
probably figure it out - if not please reply back with the
scan output and we can parse it for you.
Exactly how you indicate the defaults depends on how you are
creating your simple feature type. For dates, the end result
should be that there is a user-data entry for
"geomesa.index.dtg" set to the name of your date field. For
geometries, the default should be returned by
simpleFeatureType.getGeometryDescriptor().
If you just have lat/lon, you will need to turn them into an
actual geometry type in your simple feature type.
During ingestion, you can then set the indexed fields to
whatever values you want (sys time, provided time, etc).
Thanks,
Emilio
On 02/14/2017 10:35 AM,
Benjamin Weaver wrote:
Thanks, Emilio,
There is a lot of very valuable information here.
Two questions just to clarify (you were clear in your
answers--the lack of clarity is in my understanding of
things):
1. How would we index in Geomesa on latitude,
longitude, and a time we provide from our own data,
i.e. not a system generated timestamp?
Hi Ben,
1. The key used to write in geomesa depends on the
particular index, but it will always include the
feature ID, so if the feature ID changes you will
get a duplicate record.
2. If you're using our converter framework, we do
have some methods to use an MD5 of the values as the
feature ID, which will prevent duplicates. If not,
you can do the same thing by generating the feature
ID yourself and setting the PROVIDED_FID or
USE_PROVIDED_FID hint. We also have a pluggable SPI
interface for generating feature IDs when they
aren't set. See
http://www.geomesa.org/documentation/user/datastores/runtime_config.html#geomesa-feature-id-generator.
By default we generate a UUID that includes parts of
the Z3 index, so that features grouped in space-time
will also be grouped in accumulo. Note that the
feature ID is a string and has no inherent
restrictions on form.
3. The Z3 index uses the default date attribute to
index records, not the insertion time.
Let me know if anything isn't clear!
Thanks,
Emilio
On 02/12/2017 03:28 PM,
Benjamin Weaver wrote:
Hi all,
If we ingest, say, the same line of text data
twice (by mistake) in Geomesa 1.2.1 we end up
with duplicate data in our Accumulo (1.7.2)
database. We are ingesting using
Gemesa-generated featureIDs (setting our
featureBuilder.setFeatureID to NULL without
the use of Hints).
A colleague asked me, why are duplicates
generated in this case? I realized I did not
know.
1. How, exactly, in our
configuration of geomesa + Accumulo, is a
geomesa row, or record made unique?
I know the importance of Accumulo logical
rows, but in this case of identical data we
would want to insure insertation of only
one geomesa record, namely, one instance of
our geomesa SimpleFeature.
1a. Are duplicate geomesa rows added because
the time at insertion differs? or because
different featureIDs are randomly generated on
each insertion?
Potentially related questions:
2. How are featureIDs generated by geomesa? I
thought randomly, but I read a comment
somewhere suggesting that FeatureIDs were
created out of an md5 hash of all the values
in the feature. But a colleague points out
that even if this is so, a featureID does not
resemble an md5 hash, so must be composed at
least partially by other means
3. A potentially related question: can we
create a z3 index by using a data-derived
timestamp--not the insertion timestamp-- as
the time dimension?
All comments and perspectives are appreciated
and welcome!
Ben Weaver
This email (and any attachments) may contain
confidential information and is intended solely
for the recipient(s) to whom the email is
addressed. If you received this email in error,
please inform us immediately and delete the email
and all attachments without further using, copying
or disclosing the information. This email and any
attachments are believed to be, but cannot be
guaranteed to be, secure or virus-free. Satellite
Applications Catapult Limited is registered in
England & Wales. Company Number: 7964746.
Registered office: Electron Building, Fermi
Avenue, Harwell Oxford, Didcot, Oxfordshire OX11
0QR.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
This email (and any attachments) may contain confidential
information and is intended solely for the recipient(s) to
whom the email is addressed. If you received this email in
error, please inform us immediately and delete the email
and all attachments without further using, copying or
disclosing the information. This email and any attachments
are believed to be, but cannot be guaranteed to be, secure
or virus-free. Satellite Applications Catapult Limited is
registered in England & Wales. Company Number:
7964746. Registered office: Electron Building, Fermi
Avenue, Harwell Oxford, Didcot, Oxfordshire OX11 0QR.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
This email (and any attachments) may contain confidential
information and is intended solely for the recipient(s) to whom
the email is addressed. If you received this email in error,
please inform us immediately and delete the email and all
attachments without further using, copying or disclosing the
information. This email and any attachments are believed to be,
but cannot be guaranteed to be, secure or virus-free. Satellite
Applications Catapult Limited is registered in England &
Wales. Company Number: 7964746. Registered office: Electron
Building, Fermi Avenue, Harwell Oxford, Didcot, Oxfordshire OX11
0QR.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
|