Re: [geomesa-users] spark integration, "querying accumulo without spatia

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] spark integration, "querying accumulo without spatialtemporal filter"

From: Jim Hughes <jnh5y@xxxxxxxx>
Date: Sun, 31 May 2015 17:39:21 -0400
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <http://www.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <http://www.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

Hi Simon,

Thanks for reporting this behavior. It's a bug, and I've filed a ticket for it. (1) And you are dead on, the entire table was scanned.

First, I've got a quick work-around. Rather then using CQL.toFilter, you can do something like this to build up a filter which should work better.

import org.geotools.factory._
import scala.collection.JavaConversions._

val ff = CommonFactoryFinder.getFilterFactory2
val d = CQL.toFilter("When During 2010-08-08T00:00:00.000Z/2010-08-08T23:59:59.000Z")
val a = CQL.toFilter("Foo = 2")
val s = CQL.toFilter("BBOX(Where, 0, 0, 1, 1)")
val filter = ff.and(List(d, a, s))

When you give that a spin, the lines which read

Geometry filters: ArrayBuffer()
Temporal filters: ArrayBuffer()

in the query planner's explanation should change to contain the BBOX and DuringImpl filters as expected.

For the explanation, this function partitionSubFilters isn't general enough: https://github.com/locationtech/geomesa/blob/geomesa-accumulo1.5-1.0.0-rc.7/geomesa-core/src/main/scala/org/locationtech/geomesa/core/filter/package.scala#L222-227

In the case when one (or more) ANDs is a child of a top-level AND, this function will not do the correct thing. I'll see if I can sort out a fix for it this coming week.

Cheers,

Jim

(1) https://geomesa.atlassian.net/browse/GEOMESA-817

On 05/31/2015 12:14 AM, Xu (Simon) Chen wrote:

Hey folks,

I've got a simple query working with a java program, but ran into issues with spark integration.

My query is like:

BBOX(Where, x1, y1, x2, y2) AND (When DURING t1/t2) AND (Activity = 2)

When I construct a filter with CQL.toFilter("entire cql"), I got a warning:

scala> val queryRdd = GeoMesaSpark.rdd(conf, sc, params, q)

Scanning ST index table for feature type SlowStart

Filter: [[[ Where bbox POLYGON ((-79.5 36.5, -79.5 36.6, -79.3 36.6, -79.3 36.5, -79.5 36.5)) ] AND org.geotools.filter.temporal.DuringImpl@26381560] AND [ Activity = 2 ]]

Geometry filters: ArrayBuffer()

Temporal filters: ArrayBuffer()

Other filters: ArrayBuffer([[ Where bbox POLYGON ((-79.5 36.5, -79.5 36.6, -79.3 36.6, -79.3 36.5, -79.5 36.5)) ] AND org.geotools.filter.temporal.DuringImpl@26381560], [ Activity = 2 ])

Tweaked geom filters are ArrayBuffer()

GeomsToCover: ArrayBuffer()

15/05/31 03:58:30 WARN index.STIdxStrategy: Querying Accumulo without SpatioTemporal filter.

STII Filter: No STII Filter

Interval: No interval

Filter: AcceptEverythingFilter

Planning query

Random Partition Planner (5): 0,1,2,3,4

IndexOrDataPlanner: 1

ConstPlanner: SlowStart

GeoHashKeyPlanner: KeyAccept (3)

DatePlanner: start: 0000010100 end: 9999123123

The resulting query took a long time to finish - I think it scanned the entire data set. The same CQL.toFilter() worked fine in my java program, returning results quickly.

Any ideas?

Thanks.

-Simon
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

Follow-Ups:
- Re: [geomesa-users] spark integration, "querying accumulo without spatialtemporal filter"
  - From: Xu (Simon) Chen

References:
- [geomesa-users] spark integration, "querying accumulo without spatialtemporal filter"
  - From: Xu (Simon) Chen

Prev by Date: [geomesa-users] spark integration, "querying accumulo without spatialtemporal filter"
Next by Date: Re: [geomesa-users] plan for supporting newer accumulo releases?
Previous by thread: [geomesa-users] spark integration, "querying accumulo without spatialtemporal filter"
Next by thread: Re: [geomesa-users] spark integration, "querying accumulo without spatialtemporal filter"
Index(es):
- Date
- Thread

Breadcrumbs