Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] GeoMesa FSDS on S3 - very slow response times

Title: Default Disclaimer Daimler AG
Hello,

The FSDS is going to work best when you only have to query a few large files. The metadata will be cached, so if you keep a data store around (e.g. in geoserver), it shouldn't be doing repeated reads of the metadata files. That leads me to believe that you are seeing slowness from scanning a large number of files, where the overhead of opening the file is dominating the query time.

A few suggestions:

You're creating a lot of partitions - up to 256 per day. How much data ends up in a typical partition with your current setup? I would suggest trying with 2 or 4 bits of precision in your partition scheme.

How are you ingesting data? You should try to avoid creating lots of small data files, as that requires a lot of overhead to scan.

If you aren't already, make sure that you compact by partition. Assuming your data is coming in semi-live, there won't be any writes going to older partitions. Compacting them again will not improve performance, but may generate considerable work.

Finally, you may want to switch to JDBC for metadata persistence, which should alleviate most of the issues around metadata operations: https://www.geomesa.org/documentation/user/filesystem/metadata.html#relational-database-persistence

Re: getTypeNames, that could probably be improved, although the metadata is read once and then cached, so you will likely pay that penalty the first time you access each feature type anyway. I've opened a ticket to track the issue here: https://geomesa.atlassian.net/browse/GEOMESA-2678

Thanks,

Emilio


On 7/30/19 11:23 AM, christian.sickert@xxxxxxxxxxx wrote:

Hi GeoMesa Users,

 

we are using GeoMesa with an S3 file system datastore and are experiencing extremely slow response times when we access our data - even with a “moderate” number of files stored in it (let’s say 10.000).

 

Our setup:

* GeoMesa 2.3.0

* Filesystem datastore pointing to an S3 URL

** encoding: orc

** partition scheme: daily,xz2-8bits

** leaf-storage: true

 

We’re accessing that data store using different “clients”:

* a Java microservice which uses the GeoTools GeoMesa API and is running in the same AWS region as the S3 bucket

* GeoServer (2.14) running in the same AWS region as the S3 bucket

* geomesa-fs CLI running in the same AWS region as the S3 bucket

 

All of them are really slow (it takes minutes up to hours until we get a response). Doing some debugging with our microservice we found out that even operations like org.geotools.data.DataStore.getTypeNames() takes really long because all of the metadata files seem to be scanned (which does not seem to be necessary since reading the per-feature top-level storage.json files should be sufficient). Is that “works-as-designed” or might that be a bug inside the Geomesa-FSDS implementation?

 

Is there anything (besides switching the actual data store) we can do to improve the performance?

 

We’re doing a “geomesa-fs compact …” from time to time which gives us a fairly acceptable performance (but also takes hours, sometimes even days, to complete).

 

Thanks,

Christian

 

 

 

Mit freundlichen Grüßen / Kind regards

Christian Sickert

Crowd Data & Analytics for Automated Driving
Daimler AG - Mercedes-Benz Cars Development - RD/AFC

+49 176 309 71612
christian.sickert@xxxxxxxxxxx

 


If you are not the addressee, please inform us immediately that you have received this e-mail by mistake, and delete it. We thank you for your support.


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users


Back to the top