Are you using the geotools API programmatically then? There are a
lot of things that can affect the query performance, a few things I
would look at:
* Check if you data is distributed across the cluster. By default,
GeoMesa will create 4 splits on ingestion. If your data doesn't
reach the split threshold, then you will only be querying 4 regions
on at most 4 servers.
* Check that client can handle the number of threads being used.
GeoMesa spawns multiple client threads per query (based on the data
store configuration), so by default you'd be running 8 threads per
query.
* Try to determine the bottleneck - you may be saturating your
network, or your client may not be reading results as fast as they
are being delivered.
I'm not familiar with how SpatialHadoop works, so those things may
or may not be affecting it as well.
At any rate, I don't think anyone has compared the two before. I'd
be interested to see some more detailed results (code samples,
timings, etc), if you'd share them.
Thanks,
Emilio
On 5/20/19 9:10 AM, Tin Vu wrote:
I used concurrent threads. 1 thread for 1 query.
Hello,
How are you submitting queries to GeoMesa?
Thanks,
Emilio
On
5/19/19 3:25 PM, Tin Vu wrote:
Hi Emilio,
Thanks for your response. I executed my
experiments as follows:
1. Cluster: 1 master node, 12 slave nodes, 64 GB
memory in each node.
2. Dataset: Open street map All Nodes (size 96
GB, 2.7 Billion records).
3. Queries: I created 10 batches of queries with
different size (for example, query area / whole
space area = 10^-12, 10^-11,...., 10^-2). Each batch
contains 100 square query in the same size. Those
query is randomly distributed in the whole space.
4. I submit those batches of queries to
SpatialHadoop and GeoMesa, wait until they finish
then count the running time.
Thanks,
Tin
Hello,
Could you say more about how you're querying?
SpatialHadoop uses map/reduce jobs, if I understand
- it seems like there would be a lot of overhead to
spin up the job. How long are your queries taking?
How big is your cluster?
Thanks,
Emilio
On
5/16/19 3:20 PM, Tin Vu wrote:
Hi all,
I just wanted to to ask you a question
about the performance of GeoMesa range query.
This is my experimental set up:
3. Query: range query with different
selectivity: 10^-12, 10^-11, 10^-10, which is
the ratio of query range and total area of the
dataset space.
I saw that GeoMesa does not work better
than SpatialHadoop, which is not expected.
Since I think that GeoMesa (organize data in
record-level) should be better than
SpatialHadoop (organize data in block-level)
in highly selective queries. Could you tell me
any idea to tune GeoMesa such that it can
provide a better performance?
Thanks,
Tin
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your
password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
|