Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Weird Behavior of GeoMesa HBase Query

Hi Amit,

First, thanks for including the indexing info and the query explainer output.  That helps us answer more readily.

When a z3 index is used, if there is only an upper (or similar lower) bound on the date range, then ranges in index space are implicated.  For each split, the same pattern of z3 ranges will be scanned, so having 60 splits will lead to lots of ranges to scan, and those ranges will likely be spread over the HBase cluster. 

There are a few possible things to try:
1.  Reingest with fewer ranges
2.  Add a z2 (spatial) index to help with these types of queries (the spatial predicate will return few records which would then be further filtered)
3.  Figure out if you can add a lower temporal bound (admittedly, this may not be possible.  In some use cases, it does make sense.)

Cheers,

Jim

On 6/9/20 1:50 PM, Amit Srivastava wrote:
Hi Emilio,

A quick question about HBase Query. I am seeing a lot of data getting scanned for small spatial queries. I fired the geospatial query on the OSMNodes table. Below are query and table details. I am seeing the total number of read requests (5,553,421,708) on HBase and seeing requests on most of the regions and region servers. Any reasoning why did we scan each region for this query?

Query:
"DWITHIN(geometry, POINT(-122.332426 47.607282), 50, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31' AND nextTimestamp > '2020-05-27 16:59:31'"
Table Schema:
geomesa-hbase describe-schema -c atlas -f OSMNodes INFO  Describing attributes of feature 'OSMNodes' geometry           | Point     (Spatio-temporally indexed) ingestionTimestamp | Timestamp (Spatio-temporally indexed) nextTimestamp      | Timestamp serializerVersion  | String     featurePayload     | String     User data:   geomesa.index.dtg    | ingestionTimestamp   geomesa.indices      | z3:6:3:geometry:ingestionTimestamp,id:4:3:   geomesa.stats.enable | true   geomesa.z.splits     | 60
Query Plan (through GeoMesa Cli):
geomesa-hbase explain -c atlas -f OSMNodes -q "DWITHIN(geometry, POINT(-122.332426 47.607282), 50, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31' AND nextTimestamp > '2020-05-27 16:59:31'"
Planning 'OSMNodes' (DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00) AND nextTimestamp > 2020-05-27T16:59:31+00:00
  Original filter: (DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= '2020-05-27 16:59:31') AND nextTimestamp > '2020-05-27 16:59:31'
  Hints: bin[false] arrow[false] density[false] stats[false] sampling[none]
  Sort: none
  Transforms: none
  Strategy selection:
    Query processing took 17ms for 1 options
    Filter plan: FilterPlan[Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00][nextTimestamp > 2020-05-27T16:59:31+00:00]]
    Strategy selection took 1ms for 1 options
  Strategy 1 of 1: Z3Index(geometry,ingestionTimestamp)
    Strategy filter: Z3Index(geometry,ingestionTimestamp)[DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00][nextTimestamp > 2020-05-27T16:59:31+00:00]
    Geometries: FilterValues(List(POLYGON ((-122.3317610175119 47.607282, -122.33177379496394 47.60715226835226, -122.33181163628976 47.607027522218985, -122.33187308726842 47.606912555524126, -122.33195578637329 47.606811786373285, -122.33205655552413 47.60672908726843, -122.33217152221899 47.60666763628976, -122.33229626835225 47.606629794963936, -122.332426 47.606617017511894, -122.33255573164774 47.606629794963936, -122.33268047778101 47.60666763628976, -122.33279544447586 47.60672908726843, -122.33289621362671 47.606811786373285, -122.33297891273158 47.606912555524126, -122.33304036371024 47.607027522218985, -122.33307820503606 47.60715226835226, -122.3330909824881 47.607282, -122.33307820503606 47.60741173164774, -122.33304036371024 47.60753647778101, -122.33297891273158 47.60765144447587, -122.33289621362671 47.60775221362671, -122.33279544447586 47.60783491273157, -122.33268047778101 47.60789636371024, -122.33255573164774 47.60793420503606, -122.332426 47.6079469824881, -122.33229626835225 47.60793420503606, -122.33217152221899 47.60789636371024, -122.33205655552413 47.60783491273157, -122.33195578637329 47.60775221362671, -122.33187308726842 47.60765144447587, -122.33181163628976 47.60753647778101, -122.33177379496394 47.60741173164774, -122.3317610175119 47.607282))),true,false)
    Intervals: FilterValues(List((-∞,2020-05-27T16:59:31Z]),true,false)
    Plan: ScanPlan
      Tables: atlas_OSMNodes_z3_geometry_ingestionTimestamp_v6
      Ranges (7440): [%00;%0a;E$A%08;%00;%00;%00;%00;%00;::%00;%0a;E$A%0c;], [%01;%0a;E$A%08;%00;%00;%00;%00;%00;::%01;%0a;E$A%0c;], [%02;%0a;E$A%08;%00;%00;%00;%00;%00;::%02;%0a;E$A%0c;], [%03;%0a;E$A%08;%00;%00;%00;%00;%00;::%03;%0a;E$A%0c;], [%04;%0a;E$A%08;%00;%00;%00;%00;%00;::%04;%0a;E$A%0c;]
      Scans (120): ['%0a;ElA%98;%00;%00;%00;%00;%00;::'%0a;Ema%8c;], [:%0a;ElA%98;%00;%00;%00;%00;%00;:::%0a;Ema%8c;], [%14;::%14;%0a;ElA%8c;], [(%0a;ElA%98;%00;%00;%00;%00;%00;::(%0a;Ema%8c;], [%12;%0a;ElA%98;%00;%00;%00;%00;%00;::%12;%0a;Ema%8c;]
      Column families: d
      Remote filters: MultiRowRangeFilter, Z3HBaseFilter[(epoch,2629:2629),(zt,0:2009670),(zxy,335934:1603233:335941:1603248)], CqlFilter[(DWITHIN(geometry, POINT (-122.332426 47.607282), 50.0, meters) AND ingestionTimestamp <= 2020-05-27T16:59:31+00:00) AND nextTimestamp > 2020-05-27T16:59:31+00:00]
    Plan creation took 135ms
  Query planning took 433ms

--

Regards,

Amit Kumar Srivastava


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxx
To unsubscribe from this list, visit https://dev.eclipse.org/mailman/listinfo/geomesa-users



Back to the top