Hi Chris, 
     
    Here is an example Java project that ingests GDELT using Hadoop 2.2,
    Accumulo 1.5, and the tip of GeoMesa master.  
    It took 30 minutes to ingest a 72G TSV file that is an uncompressed
    concatenation of GDELT up to Feb 24, 2014. 
     
    We plan to roll it into geomesa/geomesa-gdelt but for now it is a
    separate project: 
     
    https://github.com/ccri/geomesa-gdelt 
     
    Instructions 
    1) mvn install 
    2) hadoop jar target/geomesa-gdelt-1.0-SNAPSHOT.jar
    geomesa.gdelt.GDELTIngest -instanceId [instanceId] -zookeepers
    [zookeepers] -user [user] -password [password] -auths [auths]
    -tableName [tableName] -featureName [featureName] -ingestFile
    [ingestFile] 
     
    It will copy its jar to HDFS and requires that the ingestFile be a
    gdelt format TSV that is already on HDFS. 
     
    I hope this is helpful - let me know if you have any questions. This
    branch is still under development and we are also working on a
    complete tutorial to accompany it. 
     
    thanks, 
    Hunter 
     
    On 04/11/2014 04:42 PM, Hunter Provyn
      wrote: 
     
    
      
      
      Hi Chris, 
        
        We recommend the steps below for ingesting a non-shapefile csv
        or tsv:
         
        1. in java code, get a handle on a DataStore using
        DataStoreFinder.getDataStore() 
        2. create a SimpleFeatureType for GDELT using DataUtilities 
        3. call ds.createSchema(schemaType) 
        4. run map reduce job with that schema
         
        I'm working on an example project in Java that I will send you
        when complete.
          
          Below is an example of using DataUtilities to create a
          SimpleFeatureType for GDELT. You may need to double check some
          of the types in the sftSpec String, I referred to the GDELT
          online documentation: 
           
          String name = "gdelt"; 
          String sftSpec =  
          "GLOBALEVENTID:Integer,SQLDATE:Date,MonthYear:Integer,Year:Integer,FractionDate:Float,Actor1Code:String,Actor1Name:String,Actor1CountryCode:String,Actor1KnownGroupCode:String,Actor1EthnicCode:String,Actor1Religion1Code:String,Actor1Religion2Code:String,Actor1Type1Code:String,Actor1Type2Code:String,Actor1Type3Code:String,Actor2Code:String,Actor2Name:String,Actor2CountryCode:String,Actor2KnownGroupCode:String,Actor2EthnicCode:String,Actor2Religion1Code:String,Actor2Religion2Code:String,Actor2Type1Code:String,Actor2Type2Code:String,Actor2Type3Code:String,IsRootEvent:Integer,EventCode:String,EventBaseCode:String,EventRootCode:String,QuadClass:Integer,GoldsteinScale:Float,NumMentions:Integer,NumSources:Integer,NumArticles:Integer,AvgTone:Float,Actor1Geo_Type:Integer,Actor1Geo_FullName:String,Actor1Geo_CountryCode:String,Actor1Geo_ADM1Code:String,Actor1Geo_Lat:Float,Actor1Geo_Long:Float,Actor1Geo_FeatureID:Integer,Actor2Geo_Type:Integer,Actor2Geo_FullName:String,Actor2Geo_CountryCode:String,Actor2Geo_ADM1Code:String,Actor2Geo_Lat:Float,Actor2Geo_Long:Float,Actor2Geo_FeatureID:Integer,ActionGeo_Type:Integer,ActionGeo_FullName:String,ActionGeo_CountryCode:String,ActionGeo_ADM1Code:String,ActionGeo_Lat:Float,ActionGeo_Long:Float,ActionGeo_FeatureID:Integer,DATEADDED:Integer";
        
         SimpleFeatureType
          featureType = DataUtilities.createType(name, sftSpec); 
        dataStore.createSchema(featureType); 
        
        Hunter
         
        On 04/11/2014 01:22 PM, Chris Snider wrote:
        
      
        
        
        
        
        
          Hi, 
            
          I saw some of the Geomesa YouTube videos
            referencing loading data as well as the “Spatio-temporal
            Indexing in Non-relational Distributed Databases” paper
            referencing loading the GDELT dataset.  Are there any
            documented steps on how to load the GDELT dataset? 
            
          Additionally, I have been able to load
            features to a feature type using the WFS-T endpoint.  Is
            there a better/faster/more efficient method of loading even
            modest amounts of data?  Example, I have a Natural Earth
            Country Polygon set that I extracted the geometry, name and
            admin columns from to push into Geomesa through the WFS-T
            endpoint.  I can only push between 5 and 10 rows without
            hitting a timeout. 
            
          Thanks, 
            
          Chris Snider 
          Senior Software
              Engineer 
          Intelligent
                Software Solutions, Inc. 
            
            
         
         
        
         
        _______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
http://www.locationtech.org/mailman/listinfo/geomesa-users
 
       
       
       
      
       
      _______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
http://www.locationtech.org/mailman/listinfo/geomesa-users
 
     
     
  
 |