Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[geowave-dev] GW Tutorial questions -- CSV example

Hello,


I'm quite new to GeoWave and key-value data stores in general, so apologies if these questions are a little basic.  I have installed GW 0.9.6 (HBase data store) on an HDP 2.6 sandbox and followed the GDELT ingestion tutorial (locationtech.github.io/geowave/walkthrough-vector.html).


This ingested the data into HBase, but in a simple single family, single column format.  How can I ingest data indexed spatially, temporally, etc, but specify a schema on the values?


For example, I have a CSV file with the example line:

45.4706448 , -75.7374945 , 2018-02-09T22:54:50 , 1 , 3 , someText , 95333 , 453.5 , 5 , moreText , etc, etc,


The SQL schema would be:
float Latitude, float Longitude, string Timestamp, int CustomIndexKey1, int CustomIndexKey2, string StringField, | int AnotherInteger | float AnotherFloat | int YetAnotherInteger, string AnotherString, etc.

I would like to ingest this into GeoWave, and index on the first 5 columns -- that is, Lat/Lon spatial index, temporal index on Timestamp, and two more integer attributes (CustomIndexKey1, CustomIndexKey2).

So, my questions are:
1)  If I use the GDELT tutorial as an example, I guess I would ingest this file type using something like
"geowave ingest localtogw /mnt/myfile.csv myDataStore myfile-spatial -f geotools-vector"

Is that right? This does not seem to do anything. I don't get any error messages, but nothing is imported into HBase. Well, I get error messages about missing extensions for GeoServer, etc, but those seem unrelated, as I got them when I was ingesting GDELT data as well and it worked.

2)  How do I define which columns are lat, long, timestamp, and desired secondary indices?  I see that there is support for configuring the input by definining a JSON 'SIMPLE_FEATURE_CONFIG_FILE' (https://locationtech.github.io/geowave/userguide.html#ingest-plugins).  Is that what I would use?  I don't see anywhere to define the lat-long columns in the example, just numeric secondary indices and temporal indices.

3)  Does the CSV file need to have a header, or can I specify the schema in the JSON feature config file?

Thank you,
Mladen

locationtech.github.io
GeoWave is an open-source library for storage, index, and search of multi-dimensional data on top of sorted key-value datastores and popular big data frameworks.


Back to the top