Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geowave-dev] Accumulo Tables

That is correct, the data and the index are colocated in the same table.  To get a better idea for how this works out, check out our Accumulo Key Structure here.  Looking at that, you'll see that each key value pair is composed from a variety of elements, including the Index ID (i.e. the geospatial index), feature field id (column qualifier) and the feature field value. 

Can I ask why you would want to separate the data from the index?  It seems to me that if you took that approach, you would have a secondary geospatial index instead of the primary geospatial index that we provide.  By separating the data from the index, that would mean that when running spatial queries, there would be an additional hop to retrieve your actual data.  By creating a primary geospatial index (where the data is colocated with the index), there are no additional hops needed to retrieve your data when running a spatial query. 

Does that make sense?

Whitney

On Wed, Sep 30, 2015 at 10:36 AM, Marcel <m.jacob@xxxxxxxxxxx> wrote:
If I understand you correctly the "_SPATIAL_TEMPORAL_VECTOR_IDX" contains all data + my index. So there is no possibility to differentiate between disk usage for the data itself and the disk usage for indexing?

Best regards,
Marcel Jacob.


Am 29.09.2015 21:41, schrieb Whitney O'Meara:
Hey Marcel,

The "_SPATIAL_TEMPORAL_VECTOR_IDX" is the table that contains all of your data that you've ingested.  As GeoWave generates a unique key per feature attribute, you can expect the number of entries in the table to be the number of features you've ingested multiplied by the number of attributes per feature. 

The "_SPATIAL_TEMPORAL_VECTOR_IDX_GEOWAVE_ALT_INDEX" is an optional secondary index which is used to look up features by their feature ID.  In practice this speeds up certain operations that GeoWave performs, but is not required and can be turned off by setting your AccumuloOptions appropriately. 

The "_GEOWAVE_METADATA" table is where all of the information about your data adapters, indices, and statistics gets stored. 

That should be all of the tables generated by GeoWave.  Let us know if you have any other questions.

Whitney

On Tue, Sep 29, 2015 at 3:16 PM, Marcel <m.jacob@xxxxxxxxxxx> wrote:
Hello all,
I would be very happen if you could explain something about the geowave tables which will be created while import.
As some additional information I use vector data and a spatial temporal index.
When I load a 3.3 GiB file into GeoWave:
1) There is a table "geowave_SPATIAL_TEMPORAL_VECTOR_IDX" with a very large number of entries (~ number of columns * number of records). So I assume this table holds the data itself.
2) "geowave_SPATIAL_TEMPORAL_VECTOR_IDX_GEOWAVE_ALT_INDEX" is the second table with exactly the number of records of my dataset.
When looking at the disk usage with the accumulo shell table 1) is approximately 1.2 GB. Is this compression possible through serialization?
Table 2) is very small: only 9 MB.  But where is the index table? I can not imagine that the index only needs 9MB.
3) geowave_GEOWAVE_METADATA
Do I miss any tables?

Thanks in advance,
Marcel Jacob.

_______________________________________________
geowave-dev mailing list
geowave-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geowave-dev



_______________________________________________
geowave-dev mailing list
geowave-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geowave-dev


_______________________________________________
geowave-dev mailing list
geowave-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geowave-dev



Back to the top