Re: [geomesa-users] Questions on UDF efficiency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Questions on UDF efficiency

From: jhughes <jhughes@xxxxxxxx>
Date: Thu, 16 Jul 2020 07:27:32 -0400
Delivered-to: geomesa-users@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/geomesa-users>
List-help: <mailto:geomesa-users-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@eclipse.org?subject=unsubscribe>
User-agent: Roundcube Webmail/1.4.2

Hi Evan,

Good question. The spatial predicates are based on a model calledDE-9IM (https://en.wikipedia.org/wiki/DE-9IM). JTS is the library whichcalculates the relationship between two geometries to answer thequestions of 'covers', 'contains', etc.

If you are querying for points in a (multi)polygon, most of theserelationships should be the same.* I'd just use st_intersects, butthat's personal preference. (Also, I suppose if you switch to workingwith non-point geometries and you are rendering data in a web client,you'll likely want to see things which are partially on the screen(intersects) rather than just those data completely contained(covers/within).)

Since you've mentioned st_* functions, you are likely using Spark. Thebiggest observation is that while working with a GeoMesa DataStore inSpark, there's one chance to push down filters to the database. If youknow ahead of time what subset of data you'd like to work with, youshould build up that query as the first query while reading from theunderlying datastore.

Optimizing a Spark workflow in this manner is definitely a complextopic. If you are using the Spark SQL API, I can recommend using the'explain' command to understand when you are querying an underlyingdatastore, what work/filters are pushed down, and when work is beingdone in memory, etc.


Cheers,

Jim

On 2020-07-16 06:52, Yifan Wang wrote:

Hi,

I'm currently trying to filter points with Z-index using UDF like
st_within, st_cover, st_contains to calculate relationship with
polygon and I'm wondering which UDF has the highest efficiency? Thank
you!

Best Regards,
Evan
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxx
To unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/geomesa-users

References:
- [geomesa-users] Questions on UDF efficiency
  - From: Yifan Wang

Prev by Date: [geomesa-users] Questions on UDF efficiency
Next by Date: [geomesa-users] Question on accelerating 'join' process
Previous by thread: [geomesa-users] Questions on UDF efficiency
Next by thread: [geomesa-users] Question on accelerating 'join' process
Index(es):
- Date
- Thread

Breadcrumbs