Oh, I forgot to mention that if you use UUIDs as feature IDs, you
can enable a flag to save some bytes:
https://www.geomesa.org/documentation/user/datastores/index_config.html#configuring-feature-id-encoding
There is also a new feature that will be available in 2.1 for using
TWKB geometry serialization, which may also save some space
(especially with non-point geometries):
https://www.geomesa.org/documentation/current/user/datastores/index_config.html#configuring-geometry-serialization
Thanks,
Emilio
On 10/29/18 11:01 AM, Emilio Lahr-Vivaz
wrote:
Hello,
Have you looked at
https://www.geomesa.org/documentation/user/datastores/index_config.html#customizing-index-creation
?
By default, GeoMesa HBase will create several different indices to
support different query patterns. Without any configuration, you
will get 3 indices, which means your data will be stored 3 times
over. This supports efficient queries with spatial filters (z2),
spatio-temporal filters (z3), and feature ID lookups (id). You can
also specify attribute indices, for querying by attribute values.
In contrast, GeoMesa FileSystem will only store your data once,
which generally supports spatio-temporal queries (it depends on
your partition scheme). Additionally, there are space savings from
the file format (parquet or orc, which can support things like
dictionary encoding and other optimizations over a particular
column), and not having to store an index key for each feature (in
general an extra 10 or so bytes).
You may also want to look into table compression in HBase, I
believe by default we enable gzip compression but you can specify
other algorithms through the user-data keys
'geomesa.table.compression.enabled' and
'geomesa.table.compression.type'. (I just looked and it appears we
have not documented that config).
There may also be hdfs redundancy in HBase, which would take up
even more space (that may not be an issue if you are running HBase
on s3).
All of that said, I would still expect HBase to take more space,
but you may be able to narrow the gap.
Thanks,
Emilio
On 10/29/18 10:37 AM, Martin Kellner
wrote:
Hello,
I repeated the setup the next day without any problems.
Since then, I never experienced that error again.
Probably I made some mistake referencing my files on S3.
I have tried GeoMesa Filesystem and GeoMesa HBase now.
GeoMesa Filesystem stores the data quite efficently in S3
(it even takes less space than my input data, since it
stores everything in .parquet files).
However GeoMesa HBase requires lots of storage. For me,
it takes 8 times the size of my input data.
So far I have not found a good way to reduce that demand
for storage (I tried to make the id shorter and play around
with the ingest configuration).
So for my understanding, it is just normal, that Geomesa
on HBase demands lots of storage.
Is this correct or do I miss something?
Thank you,
Martin
Hello,
Where did you try to set those properties? Did you see the
section in the docs on configuring access to s3?
https://www.geomesa.org/documentation/user/cli/filesystems.html#enabling-s3-ingest
I believe that the URL prefix that you use makes a
difference as well - s3 vs s3a vs s3n. I think s3a is the
preferred prefix to use, but some commands tend to require
one or the other.
Thanks,
Emilio
On
10/25/18 8:06 AM, Martin Kellner wrote:
Hi,
I just tried to setup GeoMesa FileSystem.
I want to store the files on s3. But when
I try the ingest I get the following error
message:
ERROR AWS Access Key ID and Secret
Access Key must be specified as the
username or password (respectively) of a
s3 URL, or by setting the
fs.s3.awsAccessKeyId or
fs.s3.awsSecretAccessKey properties
(respectively).
java.lang.IllegalArgumentException: AWS
Access Key ID and Secret Access Key must
be specified as the username or password
(respectively) of a s3 URL, or by setting
the fs.s3.awsAccessKeyId or
fs.s3.awsSecretAccessKey properties
(respectively).
Of course I tried to set
<property>
<name>fs.s3a.access.key</name>
<value>XXXXXXXXXXXXX</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>YYYYYYYYYYYYYYYYYYYYYYY</value
</property>
Unfortunately I still get the same error
mesage.
Do I have to re-initialize something to
apply those changes?
Thank you very much,
Martin
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
|