Thanks for the information. I will try and use it next week after our initial demo is completed.
As an FYI;
I am using the shapefile ingest process and a driver I have written to push shapefile information into a geomesa table in accumulo. I am finding that my shape files have some invalid geometries which are throwing
This is a console log dump for the Natural Earth country shapefile.
Exception in thread "main" java.lang.Exception: ERROR: Could not find a suitable 0-bit MBR for the target geometry: POLYGON ((-179.99900834195893 -8
9.99982838943765, -179.99900834195893 -84.35288625138337, -164.16219722595443 -78.99626230923768, -158.26351334731075 -77.08619801855957, -158.2192008
065469 -77.0736923216127, -55.37731278554597 -61.07063812275956, -45.934374960003026 -60.524831644861884, -45.7198660894195 -60.51625335736412, 135.34
32939050984 -66.10066701248235, 135.4887113859158 -66.11100229913357, 143.50863529466415 -66.84460093230936, 153.878803345093 -68.27893198676848, 167.
70100874912129 -70.79128509537779, 170.26488326045944 -71.29213307737697, 170.28265995306788 -71.29874766015023, 170.8138936774078 -71.69965342205771,
170.88500044784155 -71.75887461315726, 170.96623579935533 -71.83452891065282, 180.0000000000001 -84.35286778378045, 180.0000000000001 -88.38070249172
895, 179.99989628106425 -89.99982838943765, -179.99900834195893 -89.99982838943765))
at geomesa.utils.geohash.GeohashUtils$.getMinimumBoundingGeohash(GeohashUtils.scala:246)
at geomesa.utils.geohash.GeohashUtils$.decomposeGeometry_(GeohashUtils.scala:527)
at geomesa.utils.geohash.GeohashUtils$.decomposeGeometry(GeohashUtils.scala:556)
at geomesa.core.index.SpatioTemporalIndexEncoder.encode(SpatioTemporalIndexSchema.scala:168)
at geomesa.core.index.SpatioTemporalIndexEncoder.encode(SpatioTemporalIndexSchema.scala:147)
at geomesa.core.index.IndexSchema$class.encode(IndexFormat.scala:40)
at geomesa.core.index.SpatioTemporalIndexSchema.encode(SpatioTemporalIndexSchema.scala:78)
at geomesa.utils.geotools.GeneralShapefileIngest$.featuresToDataStore(GeneralShapefileIngest.scala:75)
at geomesa.utils.geotools.GeneralShapefileIngest$.shpToDataStore(GeneralShapefileIngest.scala:40)
at geomesa.utils.geotools.ShapefileIngest.ingestShapefile(
at com.issinc.we.kepler.IngestDriver.main(
This particular geometry is Antarctica.
From: Hunter Provyn
Sent: Wednesday, April 16, 2014 3:19 PM
Sent: Wednesday, April 16, 2014 3:19 PM
To: geomesa-users@xxxxxxxxxxxxxxxx; geomesa@xxxxxxxx; Chris Snider
Subject: Re: [geomesa-users] Loading Data
Hi Chris,
Here is an example Java project that ingests GDELT using Hadoop 2.2, Accumulo 1.5, and the tip of GeoMesa master.
It took 30 minutes to ingest a 72G TSV file that is an uncompressed concatenation of GDELT up to Feb 24, 2014.
We plan to roll it into geomesa/geomesa-gdelt but for now it is a separate project:
1) mvn install
2) hadoop jar target/geomesa-gdelt-1.0-SNAPSHOT.jar geomesa.gdelt.GDELTIngest -instanceId [instanceId] -zookeepers [zookeepers] -user [user] -password [password] -auths [auths] -tableName [tableName] -featureName [featureName] -ingestFile [ingestFile]
It will copy its jar to HDFS and requires that the ingestFile be a gdelt format TSV that is already on HDFS.
I hope this is helpful - let me know if you have any questions. This branch is still under development and we are also working on a complete tutorial to accompany it.
On 04/11/2014 04:42 PM, Hunter Provyn wrote:
Hi Chris,
We recommend the steps below for ingesting a non-shapefile csv or tsv:
1. in java code, get a handle on a DataStore using DataStoreFinder.getDataStore()
2. create a SimpleFeatureType for GDELT using DataUtilities
3. call ds.createSchema(schemaType)
4. run map reduce job with that schema
I'm working on an example project in Java that I will send you when complete.
Below is an example of using DataUtilities to create a SimpleFeatureType for GDELT. You may need to double check some of the types in the sftSpec String, I referred to the GDELT online documentation:
String name = "gdelt";
String sftSpec =
SimpleFeatureType featureType
= DataUtilities.createType(name,
On 04/11/2014 01:22 PM, Chris Snider wrote:
I saw some of the Geomesa YouTube videos referencing loading data as well as the “Spatio-temporal Indexing in Non-relational Distributed Databases” paper referencing loading the GDELT dataset. Are there any documented steps on how to load
the GDELT dataset?
Additionally, I have been able to load features to a feature type using the WFS-T endpoint. Is there a better/faster/more efficient method of loading even modest amounts of data? Example, I have a Natural Earth Country Polygon set that
I extracted the geometry, name and admin columns from to push into Geomesa through the WFS-T endpoint. I can only push between 5 and 10 rows without hitting a timeout.
