When I use a MapReduce job, I use a split for each layer of my dataset, which is about 340ish splits. I’m getting ingest speeds of around 75/s using this MapReduce
job, but it’s actually must faster for me if I just push them one at a time without using any of the MapReduce stuff, so I have to assume I’m doing something incorrectly, but I’m not really sure what. You guys will have to forgive me, as I’m not very well
versed with hadoop in general, so working with geomesa is a bit of a learning experience for me.
If you could get me some information on how fast you can ingest polygons, I can confirm that the problem is on my end and just keep learning and fixing things
over here. I just want to make sure that it is just me getting these speed issues.
Blake
From: geomesa-dev-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-dev-bounces@xxxxxxxxxxxxxxxx]
On Behalf Of Anthony Fox
Sent: Thursday, May 29, 2014 7:48 AM
To: Discussions between GeoMesa committers
Subject: Re: [geomesa-dev] Ingest performance issues with newest version of geomesa
Blake, we're running some tests against polygons and will let you know the result. Can you tell me how many map tasks were instantiated by your MapReduce job?
Thanks,
Anthony
On Wed, May 28, 2014 at 5:31 PM, Peno, Blake <Blake.Peno@xxxxxxxxxxxxxxx> wrote:
The MapReduce jobs from the example is getting me about the same speed ingestion. I’m getting an
average of (according to the accumulo overview site) around 120 per second ingests. Let me know if your polygons are getting any better performance than this and I’m just doing something wrong.
From:
geomesa-dev-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-dev-bounces@xxxxxxxxxxxxxxxx]
On Behalf Of Anthony Fox
Sent: Wednesday, May 28, 2014 2:29 PM
To: Discussions between GeoMesa committers
Subject: Re: [geomesa-dev] Ingest performance issues with newest version of geomesa
Blake,
This is a good mailing list to contact us - you can also use the users mailing list (geomesa-users@xxxxxxxxxxxxxxxx). We benchmarked against point data - I'll test out an area and lines ingest
and let you know some numbers. I'd recommend creating MapReduce jobs for your ingest (or a Storm job if it is streaming). That way, you'd get lots of parallelism and the index requires no communication so parallelism is fine. Check out the tutorial here:
http://geomesa.github.io/2014/04/17/geomesa-gdelt-analysis/
Let me know if this helps.
On Wed, May 28, 2014 at 2:23 PM, Peno, Blake <Blake.Peno@xxxxxxxxxxxxxxx> wrote:
Sorry, forgot to mention our cluster is 14 nodes.
I’m using Java to push features as described in the documentation PDF. I’m getting a FeatureSource
from the DataStore and using the addFeatures method. 500k/second is about 50k/second times faster than what I’ve been getting recently. Even before updating to the latest version I wasn’t getting anywhere near that. It seems to be much faster when using point
data, of course, but most of my data is area and line features.
Also, side note, is this the mailing list I should be using? I know I’m not a developer of geomesa
per say, but I didn’t know how else to contact you guys easily.
From:
geomesa-dev-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-dev-bounces@xxxxxxxxxxxxxxxx]
On Behalf Of Anthony Fox
Sent: Wednesday, May 28, 2014 11:54 AM
To: Discussions between GeoMesa committers
Subject: Re: [geomesa-dev] Ingest performance issues with newest version of geomesa
Blake,
We recently switched from a text based encoding to an Avro binary encoding. This should have actually sped up your ingest significantly - it performed very well in tests we ran during development of the binary encoding. As a point of reference, we have been
able to ingest (on a 21 node cluster) about 500K records per second using a map/reduce job. Can you give a bit more detail about how you are performing your ingest?
Thanks,
Anthony
On Wed, May 28, 2014 at 12:48 PM, Peno, Blake <Blake.Peno@xxxxxxxxxxxxxxx> wrote:
Hi all,
I recently upgraded to the newest version of geomesa on github, and I’ve noticed that my performance has drastically dropped in regards to pushing features to geomesa. At this rate
it’s going to take about a week to get all of my data uploaded. Has something changed that would cause this, or am I missing something simple?
_______________________________________________
geomesa-dev mailing list
geomesa-dev@xxxxxxxxxxxxxxxx
http://locationtech.org/mailman/listinfo/geomesa-dev
_______________________________________________
geomesa-dev mailing list
geomesa-dev@xxxxxxxxxxxxxxxx
http://locationtech.org/mailman/listinfo/geomesa-dev
_______________________________________________
geomesa-dev mailing list
geomesa-dev@xxxxxxxxxxxxxxxx
http://locationtech.org/mailman/listinfo/geomesa-dev
|