Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Go

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc

From: Damiano Albani <damiano.albani@xxxxxxxxx>
Date: Tue, 28 Feb 2017 17:25:00 +0100
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>

Hello,

On Tue, Feb 21, 2017 at 4:46 PM, Damiano Albani <damiano.albani@xxxxxxxxx> wrote:

Now the remaining issue is that I don't understand the overall behavior of the MapReduce job on Google Dataproc: only 1 worker node (e.g. out of 2) gets tasks (albeit correctly 1 task per vCPU) and, even more surprising, I don't see any performance boost in Bigtable write throughput.

For the record, using the preview version of the Dataproc environment fixed my issue somehow.

MapReduce ingest jobs are now fully split over all nodes — so fast that I think Bigtable is now the bottleneck.

I mean, at least starting off an empty Bigtable instance.

This comment on StackOverflow made me think that it could be preferable to pre-split Bigtable before ingesting the data.

(Bigtable will eventually reorganize those splits if I understand correctly.)

Given that I use UUID strings as feature identifiers, I suppose I could use split prefixes going from "0" to "f"?
Anyway, I'll report if that improved the performance.

Regards,

Damiano Albani
Geodan

Follow-Ups:
- Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
  - From: Emilio Lahr-Vivaz

References:
- [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
  - From: Damiano Albani
- Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
  - From: Anthony Fox
- Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
  - From: Damiano Albani

Prev by Date: Re: [geomesa-users] Geomesa ingestion: HBase connection is null or closed
Next by Date: Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
Previous by thread: Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
Next by thread: Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
Index(es):
- Date
- Thread

Breadcrumbs