Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] What are best practices for rollback and possible update?

Ben,

The first failure case of a container going down can be safely restarted
if you are using deterministic feature IDs.  GeoMesa will overwrite
existing entries and you will not get duplicate rows.  If you're not
using deterministic IDs, then this will not work. You can manually set
the ID on the SimpleFeature by calling

((FeatureIdImpl)feature.getIdentifier).setID(<YOURID>);

Flushing data from a staging to source table involves exporting the
staging and importing into the source.  See a guide here:

https://accumulo.apache.org/1.7/examples/export

In order to do proper deletes across all tables, you'll want to use the
geomesa command line tools or api.  It will ensure that all the indexes
in the various tables get deleted appropriately.

I think we should develop some of these use cases as first class tools
in GeoMesa.  They've come up a few times.  Of course, it won't help you
too much until you upgrade.  Would you be interested in contributing?

Thanks,
Anthony

Benjamin Weaver <Benjamin.Weaver@xxxxxxxxxxxxxxxxxx> writes:

> Thanks, Anthony, for this useful information.
>
>
> For instance, a failed ingest might be caused by container failure on the mapreduce, with duplicate rows resulting. Or perhaps a certain set of data was loaded with improper (for us), non-UTC timestamp fields, a risk in Geomesa 1.2.1, which has not been upgraded to use Joda DateTime. We would want to roll back and re-ingest this lot with the duplicates removed or the timestamp fixed.
>
>
> Thanks for your suggestion: How do we flush data from staging to source table?
>
>
> A basic question for me concerns the case of the 5-table Geomesa table suite,** how does one delete by range, clone, flush, or export? With Accumulo or geomesa cmd-line commands? When deleting ranges, for example, which GeoMesa table(s) would one delete from?  _records? But then how are the index tables and metadata table updated?
>
>
> **Geomesa table suite:  GeoMesa.tablename,
>
> GeoMesa.tablename_records,
>
> GeoMesa.tablename_st_idx,
>
> GeoMesa.tablename_attr_idx (we use this)
>
> GeoMesa.tablename_<featureName>_z3
>
>
> It would be great simply to be able to rollback by deleting lots of rows through Accumulo row ranges. Is this possible on the GeoMesa table suite?
>
>
> Thanks loads for your help,
>
>
> Ben
>
> ________________________________
> From: geomesa-users-bounces@xxxxxxxxxxxxxxxx <geomesa-users-bounces@xxxxxxxxxxxxxxxx> on behalf of Anthony Fox <anthony.fox@xxxxxxxx>
> Sent: 25 October 2016 14:11
> To: Geomesa User discussions
> Subject: Re: [geomesa-users] What are best practices for rollback and possible update?
>
>
> Ben,
>
> Interesting question.  Can you give more info on what a failed ingest
> looks like?
>
> We've thought a lot about a write-ahead log based approach to ingest for
> performance reasons but I think it could apply in your scenario as well.
> It's basically the idea of the staging table.  Load data into the
> staging table and flush it to the main table.  Depending on how
> up-to-date you need for your queries, you'll need to hit both the main
> table and the staging table.
>
> There may be some clever things you can do with Accumulo's facility for
> cloning tables to support this.
>
> http://accumulo.apache.org/1.7/accumulo_user_manual#_cloning_tables
>
> In terms of update, you can use a regular FeatureWriter to update
> records.  It is much slower than the appending FeatureWriter because it
> needs to check the original record and delete any index entries.
>
> Thanks,
> Anthony
>
>
> Benjamin Weaver <Benjamin.Weaver@xxxxxxxxxxxxxxxxxx> writes:
>
>> Hi all,
>>
>>
>> We are using Geomesa 1.2.1 on Accumulo 1.7.2. We are seeking to implement a rollback procedure to use during Ingest. This is our first priority; a second would be the ability to update data already in the database. We lack the space to maintain a backup copy of our large tables. We have one large Accumulo table containing 1 geomesa SimpleFeature.
>>
>>
>> Load of a staging table, followed by geomesa cmd-line export of this table, and merge of this export into our main table, shows some promise as a hedge during ingest. Another approach would seem represented by the Accumulo command line, which enables row deletions but I did not know whether Accumulo shell could gracefully handle the Geomesa table suite.
>>
>>
>> But what is the best way to rollback on mid-way failure of ingest? Are examples available? I saw some Scala classes that include calls to trans.rollback(), etc. Are these or other classes to be used for batch, or bulk or total rollback? Or are there other rollback tools and techniques I have overlooked?
>>
>>
>> Any perspectives are appreciated and welcome!
>>
>>
>> Ben Weaver
>>
>> This email (and any attachments) may contain confidential information and is intended solely for the recipient(s) to whom the email is addressed. If you received this email in error, please inform us immediately and delete the email and all attachments without further using, copying or disclosing the information. This email and any attachments are believed to be, but cannot be guaranteed to be, secure or virus-free. Satellite Applications Catapult Limited is registered in England & Wales. Company Number: 7964746. Registered office: Electron Building, Fermi Avenue, Harwell Oxford, Didcot, Oxfordshire OX11 0QR.
>> _______________________________________________
>> geomesa-users mailing list
>> geomesa-users@xxxxxxxxxxxxxxxx
>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>> https://www.locationtech.org/mailman/listinfo/geomesa-users
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://www.locationtech.org/mailman/listinfo/geomesa-users
> This email (and any attachments) may contain confidential information and is intended solely for the recipient(s) to whom the email is addressed. If you received this email in error, please inform us immediately and delete the email and all attachments without further using, copying or disclosing the information. This email and any attachments are believed to be, but cannot be guaranteed to be, secure or virus-free. Satellite Applications Catapult Limited is registered in England & Wales. Company Number: 7964746. Registered office: Electron Building, Fermi Avenue, Harwell Oxford, Didcot, Oxfordshire OX11 0QR.
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://www.locationtech.org/mailman/listinfo/geomesa-users


Back to the top