Hi GeoMesa Users,
we are using GeoMesa
with an S3 file system datastore and are experiencing
extremely slow response times when we access our data - even
with a “moderate” number of files stored in it (let’s say
10.000).
Our setup:
* GeoMesa 2.3.0
* Filesystem datastore
pointing to an S3 URL
** encoding: orc
** partition scheme:
daily,xz2-8bits
** leaf-storage: true
We’re accessing that
data store using different “clients”:
* a Java microservice
which uses the GeoTools GeoMesa API and is running in the
same AWS region as the S3 bucket
* GeoServer (2.14)
running in the same AWS region as the S3 bucket
* geomesa-fs CLI running
in the same AWS region as the S3 bucket
All of them are really
slow (it takes minutes up to hours until we get a response).
Doing some debugging with our microservice we found out that
even operations like
org.geotools.data.DataStore.getTypeNames() takes really long
because all of the metadata files seem to be scanned (which
does not seem to be necessary since reading the per-feature
top-level storage.json files should be sufficient). Is that
“works-as-designed” or might that be a bug inside the
Geomesa-FSDS implementation?
Is there anything
(besides switching the actual data store) we can do to
improve the performance?
We’re doing a
“geomesa-fs compact …” from time to time which gives us a
fairly acceptable performance (but also takes hours,
sometimes even days, to complete).
Thanks,
Christian
Mit freundlichen Grüßen / Kind regards
Christian Sickert
Crowd Data & Analytics for Automated Driving
Daimler AG - Mercedes-Benz Cars Development - RD/AFC
+49
176 309 71612
christian.sickert@xxxxxxxxxxx