Hi Chris,
Great question. An Accumulo key is a 5-tuple (row id, column
family, column qualifier, timestamp, and column visibility). In
general, if you write two keys which only differ by the timestamp,
then multiple copies may exist in Accumulo. By default, the
Versioning Iterator is configured to return 1 record for scan time
and both minor and major compactions. Unless that's changed, then
at major compaction, there again only be one copy (the most recent)
of the data in the system.
Ok, that's the background for Accumulo. For GeoMesa, when you write
the same data twice, there is one thing to consider: Do the two
copies of a SimpleFeature have the same Feature ID? If yes, then
GeoMesa will write the same Accumulo keys for the data. If not,
different keys will be written.
From a quick read through of the GeoMesa GDELT ingest, the Global
Event ID is being used as the Feature ID. As such, running the
ingest twice should result in the same number of SimpleFeatures in
the GeoMesa tables. During the ingest, it will appear that the
number of records is increasing, and in a technical sense, there are
additional records being written. Accumulo automatically runs major
compactions, and when that happens, the duplicate entries will be
removed.
The total net result is that the new keys and values are going to be
kept, and the old ones will be removed. So, yes, this will look
like an update.
Thanks,
Jim
On 08/25/2015 05:03 PM, Chris Snider
wrote:
Hi,
Assuming I understand Accumulo correctly,
and “update/insert” to a table with identical data simply
updates the current record and does not add a new one. Is
this accurate?
So, for example, I load the
http://data.gdeltproject.org/events/20150824.export.CSV.zip
file using the Geomesa GDELT loader. If I run the exact same
file a second time, are all the records duplicated, or
following my initial understanding the existing records are
updated?
Chris Snider
Senior Software
Engineer
Intelligent
Software Solutions, Inc.
Direct (719) 452-7257

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
|