| Hi Damiano, 
 GDELT does have a fair number of invalid records, so that is normal
    (however that might be slowing down the ingest due to the discarded
    records). About the threading, one caveat is that each input file
    will only be processed by a single thread - so if you specify more
    threads than files, the excess will not be used. Also, you might
    want to play around with the JAVA_OPTS - increase memory, etc. Also,
    performance might increase as you get more data into the system - as
    I mentioned before sometimes you will only end up utilizing a single
    node of your BigTable cluster.
 
 I can't recall now the exact throughputs we've seen, but in general
    those numbers seem reasonable for a first pass...
 
 So you were able to run a map/reduce ingest, but it performed
    horribly? In order to compare directly to the local ingest, you can
    try using the same command line tools you've been using, but put the
    files into hdfs - this will cause it to launch a map/reduce job
    (that tutorial is more a proof-of-concept). You will need to have
    the appropriate HADOOP_HOME environment variable set, or manually
    copy the hadoop configuration files onto the GeoMesa classpath. In
    addition, you will need to have your hbase-site.xml on the
    distributed hadoop classpath - the easiest way to do this might be
    to copy it onto each node of your hadoop cluster.
 
 The question is which part of the process is the bottleneck - if
    it's the GeoMesa ingest, then using map/reduce or more
    threads/processes will increase your throughput - but if you are
    maxing out your BigTable connection, then you will not seen any
    increase (or possibly a decrease due to resource contention).
 
 Thanks,
 
 Emilio
 
 
 On 01/26/2017 10:38 AM, Damiano Albani
      wrote:
 
      Hi Emilio (and everyone else),
 
 
 _______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users 
 |