Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[geomesa-users] Experiencing ZooKeeper session timeout during scan query (GeoMesa 3.0.0/3.1.0 + HBase1)

Hi,

I have a GeoMesa table hosted by HBase cluster. After switching from GeoMesa 2.4.1 to 3.0.0, some queries started to fail due to "ZooKeeper session timeout". 
Stack trace from my app: 
...
1604525339987,"java.util.NoSuchElementException: Could not obtain the next feature:org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=3, exceptions:"
1604525339987,"Wed Nov 04 20:55:03 UTC 2020, RpcRetryingCaller{globalStartTime=1604523178583, pause=100, retries=3}, java.io.IOException: Call to ip-10-0-22-145.ec2.internal/10.0.22.145:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=5864, waitTime=73235, rpcTimetout=60000"
1604525339987,"Wed Nov 04 20:56:19 UTC 2020, RpcRetryingCaller{globalStartTime=1604523178583, pause=100, retries=3}, org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0"
1604525339987,"Wed Nov 04 20:57:48 UTC 2020, RpcRetryingCaller{globalStartTime=1604523178583, pause=100, retries=3}, java.io.IOException: Call to ip-10-0-22-145.ec2.internal/10.0.22.145:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=5869, waitTime=66664, rpcTimetout=60000"
1604525339987,  at org.geotools.feature.FeatureReaderIterator.next(FeatureReaderIterator.java:75) ~[geomesa-hbase-spark-runtime-hbase1_2.11-3.0.0.jar:?]
1604525339987,  at org.geotools.feature.FeatureReaderIterator.next(FeatureReaderIterator.java:42) ~[geomesa-hbase-spark-runtime-hbase1_2.11-3.0.0.jar:?]
1604525339987,  at org.geotools.feature.collection.DelegateFeatureIterator.next(DelegateFeatureIterator.java:52) ~[geomesa-hbase-spark-runtime-hbase1_2.11-3.0.0.jar:?]
...

I also found warnings from the log:
...
1604523293568,"04 Nov 2020 20:54:11,817 [33m[WARN] [m  (pool-9-thread-13-SendThread(ip-10-0-21-114.ec2.internal:2181)) org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 62688ms for sessionid 0x47045cd3d"
1604523335819,"04 Nov 2020 20:55:35,795 [33m[WARN] [m  (pool-9-thread-13-SendThread(ip-10-0-21-114.ec2.internal:2181)) org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 42227ms for sessionid 0x47045cd3d"
1604523368963,"04 Nov 2020 20:56:08,963 [33m[WARN] [m  (pool-9-thread-13-SendThread(ip-10-0-21-114.ec2.internal:2181)) org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x47045cd3d has expired"
1604523368963,"04 Nov 2020 20:56:08,963 [33m[WARN] [m  (pool-9-thread-13-EventThread) org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, closing it. It will be recreated next time someone needs it"
1604523379744,"04 Nov 2020 20:56:19,742 [33m[WARN] [m  (pool-9-thread-18) org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x31aa9b01-0x47045cd3d, quorum=ip-10-0-21-114.ec2.internal:2181, baseZNode=/hbase Unable to get data of znode /hbase/table/test_TestTable_xz3_geom_timestamp_v2"
1604523379748,"04 Nov 2020 20:56:19,742 [33m[WARN] [m  (pool-9-thread-13) org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x31aa9b01-0x47045cd3d, quorum=ip-10-0-21-114.ec2.internal:2181, baseZNode=/hbase Unable to get data of znode /hbase/table/test_TestTable_xz3_geom_timestamp_v2"
1604523379749,"04 Nov 2020 20:56:19,742 [33m[WARN] [m  (pool-9-thread-19) org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x31aa9b01-0x47045cd3d, quorum=ip-10-0-21-114.ec2.internal:2181, baseZNode=/hbase Unable to get data of znode /hbase/table/test_TestTable_xz3_geom_timestamp_v2"
...

The query which triggered this issue: 
INTERSECTS(geom,POLYGON ((-100.78857422 28.58452172, -100.78857422 31.273855990000005, -93.71337890999999 31.273855990000005, -93.71337890999999 28.58452172, -100.78857422 28.58452172))) AND timestamp <= '2020-10-23 16:52:20' AND timestamp > '2019-08-01 00:00:00'

The size of the test_TestTable_xz3_geom_timestamp_v2 table is around 272 GB (GZ compressed), and the output data size of this query is around 1.7GB (uncompressed). 

I am able to reproduce the issue with this query pretty consistently. And it would succeed if I just replaced the GeoMesa jar in the classpath from 3.0.0/3.1.0 to 2.4.1. 

I will keep looking into what got changed between the releases, but would like to see if others are also experiencing similar issues or can provide some insights on it. 



Regards,
Jun Cai

Back to the top