Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[geomesa-users] 回复: 回复: -EXT- problem with query data by spark

hello:
In the scenario I described, the data should already exist on the server, as it can be read through DataStore.read().
I will continue to observe and test, and will contact you again if there are any new findings.

Thanks,
Mike

---- 回复的原邮件 ----
发件人Lahr-Vivaz, Emilio via geomesa-users<geomesa-users@xxxxxxxxxxx>
发送日期2025年01月03日 22:43
收件人Geomesa project user mailing list<geomesa-users@xxxxxxxxxxx>
抄送人Lahr-Vivaz, Emilio<emilio.lahr-vivaz@xxxxxxxxxxx>
主题Re: [geomesa-users] 回复: -EXT- problem with query data by spark
Hmm, GeoMesa uses the HBase MultiTableInputFormat for reading data from spark. I don't see any indication that wouldn't read entries in the memstore, although it's possible. Are you sure that the data has been flushed by your writer process to the region server? HBase writers will cache data locally until they hit a threshold (size or time), so the data might not have actually been written yet in your testing.

Thanks,

Emilio Lahr-Vivaz
General Atomics, CCRi

From: geomesa-users <geomesa-users-bounces@xxxxxxxxxxx> on behalf of zhou lihuang via geomesa-users <geomesa-users@xxxxxxxxxxx>
Sent: Thursday, January 2, 2025 9:00 PM
To: Geomesa project user mailing list <geomesa-users@xxxxxxxxxxx>
Cc: zhou lihuang <zlh_0923@xxxxxxxxxxx>
Subject: [geomesa-users] 回复: -EXT- problem with query data by spark
 

WARNING:  This message is from an external source.  Evaluate the message carefully BEFORE clicking on links or opening attachments.

First of all, thannk you for your reply.

I've found the possible reason after multiple tests:
When HBase stores only a small amount of data, the data exists only in the MemStore and hasn't been flushed to the StoreFile yet. In this case, when using Spark to query through spatialRDDProvider.rdd, no data will be obtained. However, once the data has been flushed to the StoreFile, the query results will be normal.

I'm not sure whether this is a bug of GeoMesa or not.

Best Regards,
Mike


发件人: geomesa-users <geomesa-users-bounces@xxxxxxxxxxx> 代表 Lahr-Vivaz, Emilio via geomesa-users <geomesa-users@xxxxxxxxxxx>
发送时间: 2025年1月2日 21:33
收件人: Geomesa project user mailing list <geomesa-users@xxxxxxxxxxx>
抄送: Lahr-Vivaz, Emilio <Emilio.Lahr-Vivaz@xxxxxxxxxxx>
主题: Re: [geomesa-users] -EXT- problem with query data by spark
 
I think ".count" will short-circuit by default. Can you try ".show" instead and see if that works? If you really just want a count, try setting the system property "geomesa.force.count" to true: https://www.geomesa.org/documentation/stable/user/datastores/runtime_config.html#geomesa-force-count 

Thanks,

Emilio Lahr-Vivaz
General Atomics, CCRi

From: geomesa-users <geomesa-users-bounces@xxxxxxxxxxx> on behalf of zhou lihuang via geomesa-users <geomesa-users@xxxxxxxxxxx>
Sent: Wednesday, December 25, 2024 4:31 AM
To: geomesa-users@xxxxxxxxxxx <geomesa-users@xxxxxxxxxxx>
Cc: zhou lihuang <zlh_0923@xxxxxxxxxxx>
Subject: -EXT-[geomesa-users] problem with query data by spark
 

WARNING:  This message is from an external source.  Evaluate the message carefully BEFORE clicking on links or opening attachments.

hello everyone:

I used RDD Provider to query data, but retrieve 0 data (there are 2 features).
And I used DataStore created by DataStoreFinder.getDataStore,It’s successfully get 2 features.
code is as follows:



The env is :
geomesa: 4.0.5
spark: 3.3.0
hbase: 2.2.0

I've tried modify geomesa version and dependencies version, but it didn't work.

How can I fix this problem now? 

Thank you everyone.

Best Regards, 
Mike 

Back to the top