| Hello, 
 Is this a one-off ingest, or continuous streaming data?
 
 BigTable is fairly opaque, in that it hides any database
    configuration from you. Thus, optimizations are limited. There is no
    way to e.g. write database files directly, so whatever ingest
    mechanism you use will end up using the same client writers. The
    bottleneck will likely be your BigTable instance - any client
    bottlenecks can be overcome by parallelizing your ingestion clients.
    Client connections are configured through the hbase-site.xml file -
    I haven't played around with it too much, but there might be some
    optimizations possible there. An issue you might run into is
    BigTable node parallelism - GeoMesa creates some initial split
    points in the table structure, but my understanding is that BigTable
    will eventually collapse those back down if your data isn't large
    enough (in the TB). Thus, you might only be utilizing a single node
    for writing.
 
 In general, you want to have your clients 'close' to your back end -
    so in this case running your ingestion in GCE. To get started, you
    can pretty easily use the GeoMesa command line tools for a local
    ingestion of flat files (you will have to define a GeoMesa converter
    that maps your data into SimpleFeatures). You can specify multiple
    local threads, up to the number of files you are processing. If you
    find that you need more ingest throughput, you can use the same
    converter to run a distributed map/reduce ingest. For BigTable,
    there may be some classpath issues to be sorted out with the GeoMesa
    map/reduce ingest - in particular getting your hbase-site.xml on the
    distributed classpath. If you go this route and hit any issues, let
    us know.
 
 We don't currently have any tools for ingesting directly from
    another database - you could pretty easily write something custom,
    or just export to files and ingest those.
 
 One minor GeoTools optimization is to use the PROVIDED_FID hint, if
    you already have unique IDs. If not, GeoMesa will generate UUIDs for
    each feature. (the converter framework I mentioned earlier supports
    this by default).
 
 Thanks,
 
 Emilio
 
 
 On 01/23/2017 09:39 AM, Damiano Albani
      wrote:
 
      
        Hello, 
 Would someone have any particular advice to provide in the
          context of ingesting a lot of data into GeoMesa? The target backend is HBase in my use case -- on Google
          BigTable to be more precise. And the source data is stored in flat files and/or
          databases. 
 How should I architecture the loading workflow in order to
          get the best performance and loading time? I was thinking in terms of parallel jobs, Java tweaking or
          even GeoTools settings. So if you have some experience with filling a GeoMesa
          instance on HBase, I'd be glad to hear it. 
 Thanks! 
 
        -- 
         
 
 _______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users 
 |