| Joel, 
 This is a great question.  To reframe the question, it sounds like
    you'd like to be able to sort a query by a column (ascending and
    descending) and page through the results.
 
 In full generality, this is a tall order for a database layer living
    on top of a distributed key-value store.  GeoMesa uses sharding for
    our spatial index to distribute data evenly across the cloud.  To be
    as efficient as possible, queries use multiple threads to read from
    several tablet servers at a time.  This means that two subsequent
    queries will very likely get back results in different orders (hence
    paging is hard).
 
 I think you are on the right track with caching/storing queries to
    serve up.  Assuming that users are going to interact with the same
    query for a few minutes, could you possibly cache the queries in
    memory with a timeout of a minute or two?  A load request would hit
    GeoMesa, but the subsequent sort and page requests could work
    against the data in memory.  If the user leaves and comes back,
    their query may have to be re-requested.
 
 For GeoMesa, we have worked a little bit with caching in the
    GeoTools layer, but we haven't ironed out all the issues.  To give
    it a spin, add 'caching -> true' in the DataStore params.  As I
    experimented with caching just now, I noticed that we don't look at
    the sorting part of the query.  This should be an incredibly easy
    fix.*  If in-memory caching is a suitable solution, I can help add a
    few lines to get sorting to work with caching.  Other than that, it
    might be good to think through what cache settings we could expose
    to the user to make caching viable.
 
 The obvious downside is that if there are too many users relative to
    available memory, this plan will fail.  As a more complex
    possibility, one could imagine writing a users query results to a
    'temporary' Accumulo table*.  Records in this table could be indexed
    by session id / user / query id.  During the first write, one would
    be able to pick a column and sort order.  From there, paging might
    make sense.  Reversing the sort order or sorting on another column
    would require sorting in memory or creating another temporary copy
    of the data.**
 
 Thanks,
 
 Jim
 
 * The code for the Caching Feature Collection is here:
https://github.com/locationtech/geomesa/blob/accumulo1.5.x/1.x/geomesa-core/src/main/scala/org/locationtech/geomesa/core/data/AccumuloFeatureSource.scala#L111-154
 
 **  Rather than actually trying to figure out separate tables for
    each user and when it is safe to delete them, one could configure
    Accumulo's AgeOffFilter for the table.  Copies of queries would be
    deleted after a configurable time.
 
 *** Now that I'm thinking of it, assuming that query results are
    small-ish (5k records), if there are only a few columns (say under
    10), one could write entries which would be sort (forwards and
    backwards) on each column to the temporary table.  It would require
    a tad custom Accumulo work, but it'd be relatively straightforward.
 
 
 On 05/01/2015 04:42 PM, Joel Folkerts
      wrote:
 
      Good
          afternoon. I am working on a project that is serving Geomesa
          results to users through a web interface by means of a REST
          API. Currently, the users construct a geospatial query, the
          API in turn sends this query to Geomesa, which then returns
          all of the records back through the API to the user. We run
          into problems when the returning dataset is over 5,000 records
          (which it normally is) and we end up crashing the user's
          browser. 
        
 
  What we're trying to
          avoid to writing Geomesa search results to HDFS and then
          layering Impala on top of it. While this would solve the
          problem, we risk wasting a tremendous amount of HDFS space. 
  Our ultimate goal is
          to connect a DataTables UI to Accumulo/Geomesa and being able
          to only retrieve the data that we want, i.e. 10 records out of
          100,000 records. 
  Any ideas, design
          patterns, or code samples would be very much appreciated.
          Thank you in advance! 
 -Joel 
 
 _______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users 
 |