Re: [geomesa-users] GeoMesa range query performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] GeoMesa range query performance

From: Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>
Date: Mon, 20 May 2019 09:48:49 -0400
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

Are you using the geotools API programmatically then? There are a lot of things that can affect the query performance, a few things I would look at:

* Check if you data is distributed across the cluster. By default, GeoMesa will create 4 splits on ingestion. If your data doesn't reach the split threshold, then you will only be querying 4 regions on at most 4 servers.
* Check that client can handle the number of threads being used. GeoMesa spawns multiple client threads per query (based on the data store configuration), so by default you'd be running 8 threads per query.
* Try to determine the bottleneck - you may be saturating your network, or your client may not be reading results as fast as they are being delivered.

I'm not familiar with how SpatialHadoop works, so those things may or may not be affecting it as well.

At any rate, I don't think anyone has compared the two before. I'd be interested to see some more detailed results (code samples, timings, etc), if you'd share them.

Thanks,

Emilio

On 5/20/19 9:10 AM, Tin Vu wrote:

I used concurrent threads. 1 thread for 1 query.
On Mon, May 20, 2019, 6:00 AM Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:
Hello,

How are you submitting queries to GeoMesa?

Thanks,

Emilio

On 5/19/19 3:25 PM, Tin Vu wrote:
Hi Emilio,

Thanks for your response. I executed my experiments as follows:

1. Cluster: 1 master node, 12 slave nodes, 64 GB memory in each node.

2. Dataset: Open street map All Nodes (size 96 GB, 2.7 Billion records).

3. Queries: I created 10 batches of queries with different size (for example, query area / whole space area = 10^-12, 10^-11,...., 10^-2). Each batch contains 100 square query in the same size. Those query is randomly distributed in the whole space.

4. I submit those batches of queries to SpatialHadoop and GeoMesa, wait until they finish then count the running time.

Thanks,

Tin
On Thu, May 16, 2019 at 2:16 PM Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:
Hello,

Could you say more about how you're querying? SpatialHadoop uses map/reduce jobs, if I understand - it seems like there would be a lot of overhead to spin up the job. How long are your queries taking? How big is your cluster?

Thanks,

Emilio

On 5/16/19 3:20 PM, Tin Vu wrote:
Hi all,

I just wanted to to ask you a question about the performance of GeoMesa range query. This is my experimental set up:

1. Systems: GeoMesa on Accumulo, SpatialHadoop (http://spatialhadoop.cs.umn.edu/)

2. Dataset: All node dataset from http://spatialhadoop.cs.umn.edu/datasets.html, with 96GB and 2.7 billions points.

3. Query: range query with different selectivity: 10^-12, 10^-11, 10^-10, which is the ratio of query range and total area of the dataset space.

I saw that GeoMesa does not work better than SpatialHadoop, which is not expected. Since I think that GeoMesa (organize data in record-level) should be better than SpatialHadoop (organize data in block-level) in highly selective queries. Could you tell me any idea to tune GeoMesa such that it can provide a better performance?

Thanks,

Tin
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

Follow-Ups:
- Re: [geomesa-users] GeoMesa range query performance
  - From: Tin Vu

References:
- [geomesa-users] GeoMesa range query performance
  - From: Tin Vu
- Re: [geomesa-users] GeoMesa range query performance
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] GeoMesa range query performance
  - From: Tin Vu
- Re: [geomesa-users] GeoMesa range query performance
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] GeoMesa range query performance
  - From: Tin Vu

Prev by Date: Re: [geomesa-users] GeoMesa range query performance
Next by Date: [geomesa-users] Spark ingestion of simplefeature type with no geom
Previous by thread: Re: [geomesa-users] GeoMesa range query performance
Next by thread: Re: [geomesa-users] GeoMesa range query performance
Index(es):
- Date
- Thread

Breadcrumbs