Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geowave-dev] GW Tutorial questions -- CSV example

Mladen,
There are three relevant pluggable/extensible components to the customization you are trying to make.  All extension points are discovered at run-time by GeoWave using java SPI [1] and are completely plug and play. 

1) One is an ingest format, for being able to read and ingest whatever input data you would like or need.  Implementing IngestFormatPluginProviderSpi [2] is the basic SPI extension point to ingest formats and the local file ingest plugin is the most basic method on that extension point (by implementing all methods you get full support for any `geowave ingest` mechanisms such as kafka, or mapreduce, although local file ingest gives you access to basic single-process multi-threaded ingest and spark distributed ingest).  The name of the ingest format is the optional `-f`` commandline argument, so for example `geotools-vector` gives access to any geotools DataStore, such as if the CSVDataStore is in your classpath [3].  However, in geotools this CSV datastore is in the "unsupported" section [4] so it won't be in our classpath by default.  For convenience, we have a `/usr/local/geowave/tools/plugins` directory that will be on the classpath for `geowave` CLI commands.

2) The structure of your index is completely customizable through SPI (DimensionalityTypeProviderSpi  [5]).  We provide "spatial" and "spatial-temporal" out of the box [5 and 6], but these are just the <index_type> option in `geowave config addindex -t <index_type>` which could be any customization.  It is completely decoupled from other components and will work in any current or future backend keyvalue store as well.

3) You may or may not want a custom DataAdapter [7].  This is responsible for reading/writing data as whatever Java object you would like to/from GeoWave.  A simplified perspective is that the key is formed by the "index" from above and the value is formed by this data adapter within the generic keyvalue store.  We provide out of the box a FeatureDataAdapter that reads/writes SimpleFeature's (GeoTools vector data type) and a RasterDataAdapter that read/writes GridCoverage's (GeoTools raster data type).  The format plugin from above, parses data and tells GeoWave what the appropriate adapter is for that data.  It sounds to me that you are interested in SimpleFeature data, but perhaps you will need to extend our FeatureDataAdapter to tell GeoWave what fields map to what dimensions of a custom index.  This is what we did for NYC taxi data [8].  Like the other two extensions, the data adapter is completely decoupled from the underlying keyvalue store and will work for any current and future backends.

Mechanically speaking, this project exemplifies all of the customization options as simply as it can be [9].  Its completely experimental in nature but is able to show a very custom extension to GeoWave. It defines a new IngestFormat [10] that reads a custom file, it defines 4 different indexing approaches for 1D, 2D, 3D, and 4D indexing [11], and it defines a DataAdapter for a new type "FourDimensionalData" [12].  From these components, GeoWave can read/write FourDimensionalData objects, ingest from a defined format, and index using a newly defined strategy (`geowave config addindex -t <1D,2D,3D,or 4D> ...` would be how to create these new index types).  Moreover, this is compatible in any of GeoWave's keyvalue stores (Accumulo, HBase, BigTable now, Cassandra and DynamoDB are nearing a fully compatible release).  In the "FourDimensionalData" case the data is contrived here for readability and to most easily show the extension points, but it is of course real data for the NYC taxi examples.

Sorry for the lengthy reply.  The extensibility and high level of customizations are easily applied by those close to the project, and we are working on making these more broadly consumable.  I'm hoping the detail provided makes sense, but please let us know if you have additional questions.


[1] https://docs.oracle.com/javase/tutorial/ext/basics/spi.html
[2] https://github.com/locationtech/geowave/blob/master/core/ingest/src/main/java/mil/nga/giat/geowave/core/ingest/spi/IngestFormatPluginProviderSpi.java
[3] http://docs.geotools.org/latest/userguide/tutorial/datastore/intro.html
[4] https://github.com/geotools/geotools/blob/master/modules/unsupported/csv/src/main/java/org/geotools/data/csv/CSVDataStore.java
[5] https://github.com/locationtech/geowave/blob/master/core/geotime/src/main/java/mil/nga/giat/geowave/core/geotime/ingest/SpatialDimensionalityTypeProvider.java
[6] https://github.com/locationtech/geowave/blob/master/core/geotime/src/main/java/mil/nga/giat/geowave/core/geotime/ingest/SpatialTemporalDimensionalityTypeProvider.java
[7] https://github.com/locationtech/geowave/blob/master/core/store/src/main/java/mil/nga/giat/geowave/core/store/adapter/DataAdapter.java
[8] https://github.com/rfecher/geowave-nyctaxi-experiments/blob/master/src/main/java/mil/nga/giat/geowave/format/nyctlc/adapter/NYCTLCDataAdapter.java
[9] https://github.com/rfecher/dimensionality-experiments
[10] https://github.com/rfecher/dimensionality-experiments/blob/master/src/main/java/mil/nga/giat/geowave/experiment/FourDimensionalIngestPlugin.java
[11] https://github.com/rfecher/dimensionality-experiments/blob/master/src/main/java/mil/nga/giat/geowave/experiment/OneDimensionalProvider.javahttps://github.com/rfecher/dimensionality-experiments/blob/master/src/main/java/mil/nga/giat/geowave/experiment/TwoDimensionalProvider.javahttps://github.com/rfecher/dimensionality-experiments/blob/master/src/main/java/mil/nga/giat/geowave/experiment/ThreeDimensionalProvider.javahttps://github.com/rfecher/dimensionality-experiments/blob/master/src/main/java/mil/nga/giat/geowave/experiment/FourDimensionalProvider.java
[12] https://github.com/rfecher/dimensionality-experiments/blob/master/src/main/java/mil/nga/giat/geowave/experiment/FourDimensionalDataAdapter.java


On Mon, Feb 12, 2018 at 9:20 AM, M G <mladen-g@xxxxxxxxxxx> wrote:

Hello,


I'm quite new to GeoWave and key-value data stores in general, so apologies if these questions are a little basic.  I have installed GW 0.9.6 (HBase data store) on an HDP 2.6 sandbox and followed the GDELT ingestion tutorial (locationtech.github.io/geowave/walkthrough-vector.html).


This ingested the data into HBase, but in a simple single family, single column format.  How can I ingest data indexed spatially, temporally, etc, but specify a schema on the values?


For example, I have a CSV file with the example line:

45.4706448 , -75.7374945 , 2018-02-09T22:54:50 , 1 , 3 , someText , 95333 , 453.5 , 5 , moreText , etc, etc,


The SQL schema would be:
float Latitude, float Longitude, string Timestamp, int CustomIndexKey1, int CustomIndexKey2, string StringField, | int AnotherInteger | float AnotherFloat | int YetAnotherInteger, string AnotherString, etc.

I would like to ingest this into GeoWave, and index on the first 5 columns -- that is, Lat/Lon spatial index, temporal index on Timestamp, and two more integer attributes (CustomIndexKey1, CustomIndexKey2).

So, my questions are:
1)  If I use the GDELT tutorial as an example, I guess I would ingest this file type using something like
"geowave ingest localtogw /mnt/myfile.csv myDataStore myfile-spatial -f geotools-vector"

Is that right? This does not seem to do anything. I don't get any error messages, but nothing is imported into HBase. Well, I get error messages about missing extensions for GeoServer, etc, but those seem unrelated, as I got them when I was ingesting GDELT data as well and it worked.

2)  How do I define which columns are lat, long, timestamp, and desired secondary indices?  I see that there is support for configuring the input by definining a JSON 'SIMPLE_FEATURE_CONFIG_FILE' (https://locationtech.github.io/geowave/userguide.html#ingest-plugins).  Is that what I would use?  I don't see anywhere to define the lat-long columns in the example, just numeric secondary indices and temporal indices.

3)  Does the CSV file need to have a header, or can I specify the schema in the JSON feature config file?

Thank you,
Mladen

GeoWave is an open-source library for storage, index, and search of multi-dimensional data on top of sorted key-value datastores and popular big data frameworks.


_______________________________________________
geowave-dev mailing list
geowave-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geowave-dev



Back to the top