| Hi Joel and others, 
 As a follow-up to this, I wanted to share that we're working a fix
    for a related sorted issue.  I'm hopeful that we're having something
    turned around in a few days.
 
 Thanks for mentioning the DataTables api; it looks like a fun way to
    show a reasonably sized dataset.
 
 Jim
 
 
 On 05/03/2015 03:46 PM, Jim Hughes
      wrote:
 
      
      Joel,
 This is a great question.  To reframe the question, it sounds like
      you'd like to be able to sort a query by a column (ascending and
      descending) and page through the results.
 
 In full generality, this is a tall order for a database layer
      living on top of a distributed key-value store.  GeoMesa uses
      sharding for our spatial index to distribute data evenly across
      the cloud.  To be as efficient as possible, queries use multiple
      threads to read from several tablet servers at a time.  This means
      that two subsequent queries will very likely get back results in
      different orders (hence paging is hard).
 
 I think you are on the right track with caching/storing queries to
      serve up.  Assuming that users are going to interact with the same
      query for a few minutes, could you possibly cache the queries in
      memory with a timeout of a minute or two?  A load request would
      hit GeoMesa, but the subsequent sort and page requests could work
      against the data in memory.  If the user leaves and comes back,
      their query may have to be re-requested.
 
 For GeoMesa, we have worked a little bit with caching in the
      GeoTools layer, but we haven't ironed out all the issues.  To give
      it a spin, add 'caching -> true' in the DataStore params.  As I
      experimented with caching just now, I noticed that we don't look
      at the sorting part of the query.  This should be an incredibly
      easy fix.*  If in-memory caching is a suitable solution, I can
      help add a few lines to get sorting to work with caching.  Other
      than that, it might be good to think through what cache settings
      we could expose to the user to make caching viable.
 
 The obvious downside is that if there are too many users relative
      to available memory, this plan will fail.  As a more complex
      possibility, one could imagine writing a users query results to a
      'temporary' Accumulo table*.  Records in this table could be
      indexed by session id / user / query id.  During the first write,
      one would be able to pick a column and sort order.  From there,
      paging might make sense.  Reversing the sort order or sorting on
      another column would require sorting in memory or creating another
      temporary copy of the data.**
 
 Thanks,
 
 Jim
 
 * The code for the Caching Feature Collection is here:
      https://github.com/locationtech/geomesa/blob/accumulo1.5.x/1.x/geomesa-core/src/main/scala/org/locationtech/geomesa/core/data/AccumuloFeatureSource.scala#L111-154
 
 **  Rather than actually trying to figure out separate tables for
      each user and when it is safe to delete them, one could configure
      Accumulo's AgeOffFilter for the table.  Copies of queries would be
      deleted after a configurable time.
 
 *** Now that I'm thinking of it, assuming that query results are
      small-ish (5k records), if there are only a few columns (say under
      10), one could write entries which would be sort (forwards and
      backwards) on each column to the temporary table.  It would
      require a tad custom Accumulo work, but it'd be relatively
      straightforward.
 
 
 On 05/01/2015 04:42 PM, Joel Folkerts
        wrote:
 
        Good
            afternoon. I am working on a project that is serving Geomesa
            results to users through a web interface by means of a REST
            API. Currently, the users construct a geospatial query, the
            API in turn sends this query to Geomesa, which then returns
            all of the records back through the API to the user. We run
            into problems when the returning dataset is over 5,000
            records (which it normally is) and we end up crashing the
            user's browser. 
          
 
  What we're trying
            to avoid to writing Geomesa search results to HDFS and then
            layering Impala on top of it. While this would solve the
            problem, we risk wasting a tremendous amount of HDFS space. 
  Our ultimate goal
            is to connect a DataTables UI to Accumulo/Geomesa and being
            able to only retrieve the data that we want, i.e. 10 records
            out of 100,000 records. 
  Any ideas, design
            patterns, or code samples would be very much appreciated.
            Thank you in advance! 
 -Joel 
 
 _______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users 
 
 
 _______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users 
 |