Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-dev] Ingest performance issues with newest version of geomesa

The MapReduce jobs from the example is getting me about the same speed ingestion. I’m getting an average of (according to the accumulo overview site) around 120 per second ingests. Let me know if your polygons are getting any better performance than this and I’m just doing something wrong.

 

From: geomesa-dev-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-dev-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Anthony Fox
Sent: Wednesday, May 28, 2014 2:29 PM
To: Discussions between GeoMesa committers
Subject: Re: [geomesa-dev] Ingest performance issues with newest version of geomesa

 

Blake,

This is a good mailing list to contact us - you can also use the users mailing list (geomesa-users@xxxxxxxxxxxxxxxx).  We benchmarked against point data - I'll test out an area and lines ingest and let you know some numbers.  I'd recommend creating MapReduce jobs for your ingest (or a Storm job if it is streaming).  That way, you'd get lots of parallelism and the index requires no communication so parallelism is fine.  Check out the tutorial here:

http://geomesa.github.io/2014/04/17/geomesa-gdelt-analysis/

The code referenced in that tutorial (available on GitHub) demonstrates MapReduce based ingest.   For Storm, check out:

http://geomesa.github.io/2014/05/16/geomesa-osm-analysis/

Let me know if this helps.

Thanks,
Anthony

 

 

 

On Wed, May 28, 2014 at 2:23 PM, Peno, Blake <Blake.Peno@xxxxxxxxxxxxxxx> wrote:

Sorry, forgot to mention our cluster is 14 nodes.

 

From: geomesa-dev-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-dev-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Peno, Blake
Sent: Wednesday, May 28, 2014 1:21 PM


To: Discussions between GeoMesa committers
Subject: Re: [geomesa-dev] Ingest performance issues with newest version of geomesa

 

I’m using Java to push features as described in the documentation PDF. I’m getting a FeatureSource from the DataStore and using the addFeatures method. 500k/second is about 50k/second times faster than what I’ve been getting recently. Even before updating to the latest version I wasn’t getting anywhere near that. It seems to be much faster when using point data, of course, but most of my data is area and line features.

 

Also, side note, is this the mailing list I should be using? I know I’m not a developer of geomesa per say, but I didn’t know how else to contact you guys easily.

 

From: geomesa-dev-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-dev-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Anthony Fox
Sent: Wednesday, May 28, 2014 11:54 AM
To: Discussions between GeoMesa committers
Subject: Re: [geomesa-dev] Ingest performance issues with newest version of geomesa

 

Blake,

We recently switched from a text based encoding to an Avro binary encoding.  This should have actually sped up your ingest significantly - it performed very well in tests we ran during development of the binary encoding.  As a point of reference, we have been able to ingest (on a 21 node cluster) about 500K records per second using a map/reduce job.  Can you give a bit more detail about how you are performing your ingest?

Thanks,
Anthony

 

On Wed, May 28, 2014 at 12:48 PM, Peno, Blake <Blake.Peno@xxxxxxxxxxxxxxx> wrote:

Hi all,

 

I recently upgraded to the newest version of geomesa on github, and I’ve noticed that my performance has drastically dropped in regards to pushing features to geomesa. At this rate it’s going to take about a week to get all of my data uploaded. Has something changed that would cause this, or am I missing something simple?


_______________________________________________
geomesa-dev mailing list
geomesa-dev@xxxxxxxxxxxxxxxx
http://locationtech.org/mailman/listinfo/geomesa-dev

 


_______________________________________________
geomesa-dev mailing list
geomesa-dev@xxxxxxxxxxxxxxxx
http://locationtech.org/mailman/listinfo/geomesa-dev

 


Back to the top