[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
| Re: [geomesa-users] KNN-Queries | 
Marcel,
Interesting.  In your timing code, are you including the time to 
instantiate the classes you give to runKNNQuery?
The query language which GeoTools and GeoServer speak (ECQL) doesn't 
include joins or group bys.  To implement a group by, it would be most 
natural to collect the complete resultset and then sort it by a given 
column.  For big data, there is some risk that the resultset may not fit 
into memory.
All that said, GeoTools does support sorting.  Check out Query and 
SortBy classes.*  With that, you'd be able to write some natural code to 
sort the resultset and then do whatever remaining operations you are 
looking to implement.  Let us know if that doesn't help enough.
Cheers,
Jim
http://docs.geotools.org/latest/javadocs/org/geotools/data/Query.html
Specifically note query.setSortBy()
http://docs.geotools.org/stable/javadocs/org/opengis/filter/sort/SortBy.html
To create a SortBy, also check out 
http://docs.geotools.org/latest/javadocs/org/opengis/filter/FilterFactory.html#sort%28java.lang.String,%20org.opengis.filter.sort.SortOrder%29
On 07/16/2015 11:36 AM, Marcel wrote:
Okay, thats strange. For me KNNQuery.runKNNQuery is always 0.5 seconds 
faster than KNNQuery.runNewKNNQuery. I set k=1, 
searchDistanceInMeters=1000 and maxDistanceInMeters=2500000.
Is it possible to group by a specific attribute similar to SQL? I have 
nothing found so far.
Thanks, I will try my best.
Regards,
Marcel.
Am 15.07.2015 17:23, schrieb Michael Ronquest:
Hi Marcel,
          It is interesting if you are seeing a performance 
difference between the two methods: runNewKNNQuery just creates the 
GeoHashSpiral and NearestNeighbors for you, and then runs the 
runKNNQuery method. Do you think you could quantify the performance 
difference? Also what parameters are you currently using for "k", 
"searchDistanceInMeters" and "maxDistanceInMeters"?
You can run your query without a filter by using the ECQL filter 
INCLUDE, which includes everything. Specifically, 
org.opengis.filter.Filter.INCLUDE from GeoTools is what you want.
It sounds like you've got an interesting thesis topic on your hands! 
In the future we'd be interested to hear about your results!
All the best,
Mike
On 07/15/2015 07:12 AM, Marcel wrote:
Hey Mike,
thanks for the detailed answer. With this it was possible to get my 
knn-query working. I tested the KNNQuery.runKNNQuery and the 
KNNQuery.runNewKNNQuery method. I decided to take the first option, 
because the performance seems to be a little better.
Is there any possibility that I can run my query without a filter? I 
dont want to filter on time but when I create something like
new Query("gdelt", null, new String[]{"SQLDATE", "geom"}) (set 
filter to null) the program won´t finish.
I´m currently working on my masterthesis with focus on storage and 
querying geotemporal data in the hadoop ecosystem. Thats why I 
examine some technologies in detail. I dont have a specific use 
case, so I´m satisfied working with the GDELT-Dataset (I noticed, 
that the column "url" was discarded).
Regards,
Marcel.
Am 14.07.2015 20:18, schrieb Michael Ronquest:
Hi Marcel,
         Thanks for writing in, as well as your interest in the KNN 
method in GeoMesa. Once things are working for you, I'd be *very* 
interested in receiving additional feedback, as well as hearing a 
bit about your use case.
In short, the KNN algorithm begins by searching in a geohash that 
contains your point of reference,  with the spatial scale of the 
geohash set in the query process. Once all features in that central 
geohash are processed, the algorithm then begins to "spiral" out to 
neighboring geohashes as needed to either find k neighbors, or to 
ensure the current k "best" neighbors are indeed the k nearest 
neighbors.
Your instinct regarding the KNNQuery is correct: that is what you 
want to use. Apologies for the "magic" parameters: KNNQuery is used 
by the KNearestNeighborSearchProcess, and the parameters are better 
explained there.
Note: the KNNSearchProcess class is used by GeoServer WPS 
processes, with a good deal of related boilerplate, so stay away 
from that.
The runNewKNNQuery method has these parameters:
source: SimpleFeatureSource   ===> where your data reside: note 
this really should be a GeoMesa Source as we attempt to exploit its 
geospatial index in the algorithm
query: Query ===> your "base" query which would include filters on 
attributes, time and space.
numDesired: Int ===> this is simply "k", how many points you seek
searchDistanceInMeters:Double ===> this is the "typical" distance 
you'd expect to find k points in your data and serves as a "initial 
guess" for the search and defines the spatial scale at which the 
iterative query by GeoHash will run.
If I was looking for 1000 tweets in Manhattan over the course of a 
day, I'd set this to ~500 meters, while if I'm looking for 1000 
tweets around Nageezi, New Mexico, I'd set this to 100000 meters or 
more.  The search is iterative here, so err toward smaller 
distances here (at the potential cost of a slower process, as more 
"geohash queries" will need to be made).
maxDistanceInMeters: Double ===> this is the maximum distance at 
which the algorithm will search and acts almost like an additional 
predicate on your Query: this prevents runaway queries. For 
example, imagine in your case if you ask for k=1000 when you only 
have 100 Features around Beijing. The KNN process would then 
"spiral" out from Beijing, geohash by geohash, querying GeoMesa 
each time for additional Features. If you only have sparse data 
outside of Beijing, then the KNN algorithm my churn for a great 
while, perhaps over the entire planet! So this parameter prevents 
that. It is possible to get edge effects here, so error on much 
larger distances here.
aFeatureForSearch: SImpleFeature ===> this is the reference point 
around which to search.
With the parameters defined, you'd then do something like this:
||
|Query theQuery = new Query("gdelt", timeFilter, new String[] |||{ 
||"SQLDATE"||, ||"geom"| |})|);
        // want 100 points
        Int k = 100;
        // Beijing is dense....
        Double guessedDistance = 1000.0;|
|
        // very roughly the "radius" of china
        Double maxLimitDistance = 2500000.0
||        NearestNeighbors neighbors = KNNQuery.runKNNQuery(fs, 
theQuery, k, guessedDistance, maxLimitDistance, beijingCenter);
|
|||||||||||||||||||||
|
where fs and timeFilter are as you've previously defined them and 
beijingCenter is a SimpleFeature with your point as its geometry.
Hopefully this will help. Please report back on further issues or 
success.
Cheers,
Mike
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users