| Hi Damiano, 
 No problem, more replies inline.
 
 Thanks,
 
 Emilio
 
 
 On 01/23/2017 02:42 PM, Damiano Albani
      wrote:
 
      Because we're an Eclipse project, anything we host has to be blessed
    by Eclipse for provenance and license. As we haven't gotten this
    sign-off on all the BigTable dependencies yet, we unfortunately
    can't bundle it - we can still use it as a plugin, but you have to
    build it yourself. Hopefully we will be able to get it approved
    soon.Hello Emilio,
         
 First, thanks for having taking the time to write such a
          detailed answer! 
 
 
      Since you're doing a one-time bulk ingest, map/reduce could be a
    good fit. Depending on your inputs, our tools should make it fairly
    easy (with the classpath caveat I mentioned). If you have a cluster
    to run on, and your inputs are flat files, it will handle all the
    multi-threading and load-balancing for you.
        
       
 
 
      GeoTools does have a lot of different ways to accomplish the same
    thing. The main underlying abstraction for writing is the
    FeatureWriter (either Append of Modify) - if you look at the
    addFeatures method, we just use a feature writer:
        
       
 https://github.com/locationtech/geomesa/blob/master/geomesa-index-api/src/main/scala/org/locationtech/geomesa/index/geotools/GeoMesaFeatureStore.scala#L31-L53
 
 Buffering is implementation dependent - for GeoMesa HBase/BigTable,
    we use an underlying org.apache.hadoop.hbase.client.BufferedMutator.
    You can control the batch size through the system property
    'geomesa.hbase.write.batch'. If you want finer control, you can also
    cast a FeatureWriter to
    org.locationtech.geomesa.hbase.data.HBaseAppendFeatureWriter, which
    includes a 'flush' method (you can get a feature writer through
    datastore.
    
    getFeatureWriterAppend).
 
 As I mentioned before, all methods of writing through GeoMesa will
    end up funneling through that feature writer class, so this applies
    across the board.
 
 As for the input side, we use a combination of GeoTools data stores
    and custom code. Our converter framework is designed to convert flat
    files into simple features in a streaming fashion, and I can attest
    that it handles memory well. The other GeoTools data stores may work
    differently (loading the entire file at once, etc) - I'm not
    entirely sure there.
 
 
 
      That may be reasonable depending on your input data (using a
    GeoTools query implies that you already have your data in a GeoTools
    data store). Feature writers and readers are all single-threaded
    though, so you would want to load 5 different feature collections by
    splitting your data on some queryable attribute (e.g. by month if
    your data has timestamps).
        
       
 
 
      
        
       
 
 _______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users 
 |