Hi Diane,
A 'key' in accumulo consists of a row, a column family, a column
qualifer, a timestamp and a visibiility marking. In general,
timestamps are 'hidden' by the versioning iterator, which always
applied by default. Thus, inserting a key with the same
row/cf/cq/vis but a newer timestamp will overwrite an older
timestamp. Technically the older value is still there, but it will
be hidden and eventually compacted out by accumulo.
The key used to write in geomesa depends on the particular index,
but it will always include the feature ID (FID). The Z3 index also
includes the default date and geometry.
In general, to be safe, we recommend always using an updating
feature writer if you are modifying an existing feature. That can be
slow though, so sometimes you can get away with appending, and
relying on the versioning iterator to clean up the duplicates. But
if any of the indexed values (date, geometry, indexed attributes)
change, you will end up with a different key in some of the indices,
and thus duplicate data.
I'm not sure why you would be seeing no change with the date
modification - maybe it's not the date being indexed? I wrote a
quick script to check the z3 row keys generated for a few different
inputs. You can see the key is changing each time:
original
Sat Dec 31 19:00:00 EST 2011, POINT (45 45)
%00;%08;%8f;5`%90;H$%12;%09; fid0
lon +1
Sat Dec 31 19:00:00 EST 2011, POINT (46 45)
%00;%08;%8f;5`%90;Xl6%09;!fid0
lat +1
Sat Dec 31 19:00:00 EST 2011, POINT (45 46)
%00;%08;%8f;5`%91;L%a6;R%09;2fid0
dtg +1h
Sat Dec 31 20:00:00 EST 2011, POINT (45 45)
%00;%08;%8f;5`%92;@$%90;%09;$fid0
dtg +1d
Sun Jan 01 19:00:00 EST 2012, POINT (45 45)
%00;%08;%8f;<D%02;%01;%00;%80;@ fid0
On 02/10/2017 01:01 PM, Diane Griffith
wrote:
We have data that will see updates to its
data. These updates can be to any of the fields including the
time field as well as the latitude and longitude fields used
to create the default Point geometry.
What we have observed is if the datetime
changed, even over a month difference, we still just got one
record back via WFS regardless of how many were still in the
z3 table and the record got updated the z2 and the records
tables in accumulo.
Again if we updated the latitude value,
even a slight change to it, and added the feature again
(instead of modify the feature or remove then add), then we
got 2 copies of the record that have the same id field that we
hint to use for FID.
I found an old post that sort of talked to
this around how a Versioning Iterator is configured to return
1 record for scan time and both minor and major compactions
and that had to so specifically for the Accumulo side of the
question. Then the response went on to talk to the GeoMesa
side of it and asked do the 2 copies of a SimpleFeature have
the same Feature ID? If yes then GeoMesa will write the same
Accumulo keys for the data. If not then different keys will
be written. So what does that mean, Feature ID…is that the
field hinted to use for FID or is that the 3 pieces that make
up the index key?
So why does time field differences not
impact duplicates coming back from WFS requests (using z3
index) but changes in latitude and longitude do? So is the
recommendation if updates to the latitude/longitude/point data
will happen then to identify the record is changed and either
modify or remove/add?
We are seeing this in GeoMesa 1.2.7.2
against Accumulo 1.6.2.
Thanks,
Diane Griffith
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
|