Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] spark integration, "querying accumulo without spatialtemporal filter"

Hi Simon,

Thanks for reporting this behavior.  It's a bug, and I've filed a ticket for it. (1)  And you are dead on, the entire table was scanned.

First, I've got a quick work-around.  Rather then using CQL.toFilter, you can do something like this to build up a filter which should work better.

import org.geotools.factory._
import scala.collection.JavaConversions._

val ff = CommonFactoryFinder.getFilterFactory2
val d = CQL.toFilter("When During 2010-08-08T00:00:00.000Z/2010-08-08T23:59:59.000Z")
val a = CQL.toFilter("Foo = 2")
val s = CQL.toFilter("BBOX(Where, 0, 0, 1, 1)")
val filter = ff.and(List(d, a, s))

When you give that a spin, the lines which read

Geometry filters: ArrayBuffer()
Temporal filters: ArrayBuffer()

in the query planner's explanation should change to contain the BBOX and DuringImpl filters as expected.

For the explanation, this function partitionSubFilters isn't general enough:  https://github.com/locationtech/geomesa/blob/geomesa-accumulo1.5-1.0.0-rc.7/geomesa-core/src/main/scala/org/locationtech/geomesa/core/filter/package.scala#L222-227

In the case when one (or more) ANDs is a child of a top-level AND, this function will not do the correct thing.  I'll see if I can sort out a fix for it this coming week.

Cheers,

Jim

(1) https://geomesa.atlassian.net/browse/GEOMESA-817

On 05/31/2015 12:14 AM, Xu (Simon) Chen wrote:
Hey folks,

I've got a simple query working with a java program, but ran into issues with spark integration.

My query is like:
BBOX(Where, x1, y1, x2, y2) AND (When DURING t1/t2) AND (Activity = 2)

When I construct a filter with CQL.toFilter("entire cql"), I got a warning:

scala> val queryRdd = GeoMesaSpark.rdd(conf, sc, params, q)

Scanning ST index table for feature type SlowStart

Filter: [[[ Where bbox POLYGON ((-79.5 36.5, -79.5 36.6, -79.3 36.6, -79.3 36.5, -79.5 36.5)) ] AND org.geotools.filter.temporal.DuringImpl@26381560] AND [ Activity = 2 ]]

Geometry filters: ArrayBuffer()

Temporal filters: ArrayBuffer()

Other filters: ArrayBuffer([[ Where bbox POLYGON ((-79.5 36.5, -79.5 36.6, -79.3 36.6, -79.3 36.5, -79.5 36.5)) ] AND org.geotools.filter.temporal.DuringImpl@26381560], [ Activity = 2 ])

Tweaked geom filters are ArrayBuffer()

GeomsToCover: ArrayBuffer()

15/05/31 03:58:30 WARN index.STIdxStrategy: Querying Accumulo without SpatioTemporal filter.

STII Filter: No STII Filter

Interval:  No interval

Filter: AcceptEverythingFilter

Planning query

Random Partition Planner (5): 0,1,2,3,4

IndexOrDataPlanner: 1

ConstPlanner: SlowStart

GeoHashKeyPlanner: KeyAccept (3)

DatePlanner: start: 0000010100 end: 9999123123


The resulting query took a long time to finish - I think it scanned the entire data set. The same CQL.toFilter() worked fine in my java program, returning results quickly.


Any ideas?


Thanks.

-Simon



_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users


Back to the top