Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Any suggestions for indexing data with a duration?

Beau,

The query-planner translates the two forms of temporal query -- one
split into AFTER and BEFORE; the second using DURING -- into the same,
single-pass query, so there should be no substantial difference in the
execution time.

Thanks!

Sincerely,
  -- Chris


On Fri, 2014-06-27 at 14:55 +0000, Beau Lalonde wrote:
> Chris,
> 
> Thanks for the information and insight.
> 
> I think the secondary indexing will be helpful in the future for my situation or similar situations.  In the meantime, we have come up with a workaround for our use case.  We are going to subtract a certain deltaT from our dt0 prior to our query.  In our situation, we are able to figure out an appropriate deltaT to use in order to guarantee that we will get a result when we should get a result.  In pseudo code, we are planning on doing the following:
> 
>     geom INTERSECTS polygon
>     AND start AFTER (dt0-deltaT)
>     AND start BEFORE dt1
> 
> Which brings me to a question.  Is the way the above query performed substantially different (performance-wise) than the following:
> 
>     geom INTERSECTS polygon
>     AND start DURING (dt0-deltaT)/dt1
> 
> In other words, if I break the time portion of the query into parts (e.g. using a BEFORE and an AFTER command), are those parts performed separately and then the results aggregated?  If so, it seems like I should use a DURING command.  Any additional insight would be appreciated.
> 
> 
> Thanks for your help,
> Beau
> 
> 
> -----Original Message-----
> From: geomesa-users-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-users-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Chris Eichelberger
> Sent: Thursday, June 26, 2014 2:49 PM
> To: Geomesa User discussions
> Subject: Re: [geomesa-users] Any suggestions for indexing data with a duration?
> 
> Beau,
> 
> You are, of course, correct as to the query structure.  Thanks for that.
> 
> As to the indexing, the query's effectiveness will depend quite a bit on how dense your data are in the various dimensions.  If your query polygon is huge, or if you have billions of records whose start-date precedes "dt1", then the query may not perform as well as you'd like.
> On the other hand, GeoMesa distributes data uniformly across all of the tablet-servers in your cluster (in part) to distribute the load of applying CQL filters to the subset of records that meet the (potentially
> coarse) geographic and temporal filters as applied by the index.
> 
> Another capability that we are prototyping internally (but that won't make it into the upcoming 1.0, sadly) is secondary indexing, which is specifically designed to facilitate the type of query you describe.
> 
> I hope this helps; if not, please just let us know.
> 
> Thanks!
> 
> Sincerely,
>  -- Chris
> 
> 
> On Thu, 2014-06-26 at 18:20 +0000, Beau Lalonde wrote:
> > Chris,
> > 
> > As always, thanks for the quick reply.
> > 
> > I have not tested, but I don't think the logic you supplied will work if query time, dt0, is after the indexed start-time.  
> > 
> > What I really want is (simplified for presentation, assuming start is the special time-indexed attribute):
> > geom INTERSECTS polygon
> > AND start BEFORE dt1
> > AND end AFTER dt0
> > 
> > But my concern is that the "start BEFORE dt1" portion, the temporal portion that takes advantage of the GeoMesa indexing, is unbounded on one-side and thus may take a long time to query.  
> > 
> > Thanks,
> > 
> > Beau
> > 
> > 
> > -----Original Message-----
> > From: geomesa-users-bounces@xxxxxxxxxxxxxxxx 
> > [mailto:geomesa-users-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Chris 
> > Eichelberger
> > Sent: Thursday, June 26, 2014 1:04 PM
> > To: Geomesa User discussions
> > Subject: Re: [geomesa-users] Any suggestions for indexing data with a duration?
> > 
> > Beau,
> > 
> > GeoMesa should be able to get you most of the way through this use case.
> > By indexing based on location and one end-point of your time range, the remainder of the range query can be handled via CQL.  That is, if I have a feature that contains these fields "geom:Geometry,start:Date,end:Date", then a query of the form (simplified for presentation):
> > 
> >   geom INTERSECTS polygon
> >   AND start DURING dt0/dt1
> >   AND end BEFORE dt0/dt1
> > 
> > would both take advantage of the index on the (location, start-time) 
> > pair, and would return all of the records whose time-span intersects 
> > the
> > dt0/dt1 interval.
> > 
> > If you find that's not working for you, please just let us know.
> > 
> > Thanks!
> > 
> > Sincerely,
> >   -- Chris
> > 
> > 
> > P.S.  We removed explicit support for the end-time attribute relatively recently, but only because that field was never actually used in the index.  Whether we add this field explicitly to the geo-time index in the future probably depends on how our work on secondary indexes gels.
> > 
> > 
> > On Thu, 2014-06-26 at 16:21 +0000, Beau Lalonde wrote:
> > > All,
> > > 
> > > As I am using the latest GeoMesa I am coming up with some issues that hopefully others have already thought about.
> > > 
> > > Namely, I have data that inherently has a time duration (e.g. a start and end time), and I want to index that data using GeoMesa.  In an older version of GeoMesa I could index the data using both a start and end time, but the current version of GeoMesa only indexes data using a single time parameter.  Since time ranges do not seem to be supported by current GeoMesa, does anyone have a suggested approach?
> > > 
> > > Here is an abstract example for my problem:
> > > - I index data that inherently lasts from time 5 to time 10
> > > - I want to be able to perform a query that will return results if 
> > > the query time range at all intersects/overlaps with the indexed 
> > > data
> > > -- For example, I want to perform a query using the time range 6-7 
> > > and still get a result
> > > 
> > > My only thoughts are that since I can no longer index a time range, I must discretize my data and index each discretized portion - each with its own indexed time.  This may work in a practical sense, but will always succumb to the above abstract problem where a query that should return results does not return results because the indexed data is discretized.
> > > 
> > > Does anyone have any thoughts?
> > > 
> > > Is GeoMesa going to bring back support for indexing data that has a duration?
> > > 
> > > Thanks,
> > > 
> > > Beau
> > > 
> > > _______________________________________________
> > > geomesa-users mailing list
> > > geomesa-users@xxxxxxxxxxxxxxxx
> > > http://www.locationtech.org/mailman/listinfo/geomesa-users
> > 
> > _______________________________________________
> > geomesa-users mailing list
> > geomesa-users@xxxxxxxxxxxxxxxx
> > http://www.locationtech.org/mailman/listinfo/geomesa-users
> > _______________________________________________
> > geomesa-users mailing list
> > geomesa-users@xxxxxxxxxxxxxxxx
> > http://www.locationtech.org/mailman/listinfo/geomesa-users
> 
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> http://www.locationtech.org/mailman/listinfo/geomesa-users
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> http://www.locationtech.org/mailman/listinfo/geomesa-users



Back to the top