Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Geomesa FileSystem s3a credential

Oh, I forgot to mention that if you use UUIDs as feature IDs, you can enable a flag to save some bytes:

https://www.geomesa.org/documentation/user/datastores/index_config.html#configuring-feature-id-encoding

There is also a new feature that will be available in 2.1 for using TWKB geometry serialization, which may also save some space (especially with non-point geometries):

https://www.geomesa.org/documentation/current/user/datastores/index_config.html#configuring-geometry-serialization

Thanks,

Emilio

On 10/29/18 11:01 AM, Emilio Lahr-Vivaz wrote:
Hello,

Have you looked at https://www.geomesa.org/documentation/user/datastores/index_config.html#customizing-index-creation ?

By default, GeoMesa HBase will create several different indices to support different query patterns. Without any configuration, you will get 3 indices, which means your data will be stored 3 times over. This supports efficient queries with spatial filters (z2), spatio-temporal filters (z3), and feature ID lookups (id). You can also specify attribute indices, for querying by attribute values.

In contrast, GeoMesa FileSystem will only store your data once, which generally supports spatio-temporal queries (it depends on your partition scheme). Additionally, there are space savings from the file format (parquet or orc, which can support things like dictionary encoding and other optimizations over a particular column), and not having to store an index key for each feature (in general an extra 10 or so bytes).

You may also want to look into table compression in HBase, I believe by default we enable gzip compression but you can specify other algorithms through the user-data keys 'geomesa.table.compression.enabled' and 'geomesa.table.compression.type'. (I just looked and it appears we have not documented that config).

There may also be hdfs redundancy in HBase, which would take up even more space (that may not be an issue if you are running HBase on s3).

All of that said, I would still expect HBase to take more space, but you may be able to narrow the gap.

Thanks,

Emilio

On 10/29/18 10:37 AM, Martin Kellner wrote:
Hello,

I repeated the setup the next day without any problems. Since then, I never experienced that error again.
Probably I made some mistake referencing my files on S3.

I have tried GeoMesa Filesystem and GeoMesa HBase now. 
GeoMesa Filesystem stores the data quite efficently in S3 (it even takes less space than my input data, since it stores everything in .parquet files).
However GeoMesa HBase requires lots of storage. For me, it takes 8 times the size of my input data.
So far I have not found a good way to reduce that demand for storage (I tried to make the id shorter and play around with the ingest configuration).

So for my understanding, it is just normal, that Geomesa on HBase demands lots of storage.
Is this correct or do I miss something?

Thank you,

Martin

Am Do., 25. Okt. 2018 um 14:43 Uhr schrieb Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>:
Hello,

Where did you try to set those properties? Did you see the section in the docs on configuring access to s3?
https://www.geomesa.org/documentation/user/cli/filesystems.html#enabling-s3-ingest

I believe that the URL prefix that you use makes a difference as well - s3 vs s3a vs s3n. I think s3a is the preferred prefix to use, but some commands tend to require one or the other.

Thanks,

Emilio

On 10/25/18 8:06 AM, Martin Kellner wrote:
Hi,

I just tried to setup GeoMesa FileSystem.
I want to store the files on s3. But when I try the ingest I get the following error message:

ERROR AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).
java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).

Of course I tried to set 
<property>
  <name>fs.s3a.access.key</name>
  <value>XXXXXXXXXXXXX</value>
</property>
<property>
  <name>fs.s3a.secret.key</name>
  <value>YYYYYYYYYYYYYYYYYYYYYYY</value
</property> 

Unfortunately I still get the same error mesage.
Do I have to re-initialize something to apply those changes?

Thank you very much,

Martin

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users


Back to the top