Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Regarding GDELT FULL loads

Jim,

 

Thanks for the information.

 

I’ll continue with the assumption that running the same ingest multiple times will, eventually, give me an accurate record list.

 

Thanks again,

 

Chris Snider

Senior Software Engineer

Intelligent Software Solutions, Inc.

Description: Description: Description: cid:image001.png@01CA1F1F.CBC93990

 

From: geomesa-users-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-users-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Jim Hughes
Sent: Tuesday, August 25, 2015 3:50 PM
To: geomesa-users@xxxxxxxxxxxxxxxx
Subject: Re: [geomesa-users] Regarding GDELT FULL loads

 

Hi Chris,

Great question.  An Accumulo key is a 5-tuple (row id, column family, column qualifier, timestamp, and column visibility).  In general, if you write two keys which only differ by the timestamp, then multiple copies may exist in Accumulo.  By default, the Versioning Iterator is configured to return 1 record for scan time and both minor and major compactions.  Unless that's changed, then at major compaction, there again only be one copy (the most recent) of the data in the system.

Ok, that's the background for Accumulo.  For GeoMesa, when you write the same data twice, there is one thing to consider:  Do the two copies of a SimpleFeature have the same Feature ID?  If yes, then GeoMesa will write the same Accumulo keys for the data.  If not, different keys will be written. 

>From a quick read through of the GeoMesa GDELT ingest, the Global Event ID is being used as the Feature ID.  As such, running the ingest twice should result in the same number of SimpleFeatures in the GeoMesa tables.  During the ingest, it will appear that the number of records is increasing, and in a technical sense, there are additional records being written.  Accumulo automatically runs major compactions, and when that happens, the duplicate entries will be removed.

The total net result is that the new keys and values are going to be kept, and the old ones will be removed.  So, yes, this will look like an update. 

Thanks,

Jim

On 08/25/2015 05:03 PM, Chris Snider wrote:

Hi,

 

Assuming I understand Accumulo correctly, and “update/insert” to a table with identical data simply updates the current record and does not add a new one.  Is this accurate?

 

So, for example, I load the http://data.gdeltproject.org/events/20150824.export.CSV.zip file using the Geomesa GDELT loader.  If I run the exact same file a second time, are all the records duplicated, or following my initial understanding the existing records are updated?

 

Chris Snider

Senior Software Engineer

Intelligent Software Solutions, Inc.

Direct (719) 452-7257

Description: Description: Description:
              cid:image001.png@01CA1F1F.CBC93990

 




_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

 


Back to the top