Once we started using Geomesa with Kerberos-secured HBase on different runtimes (regular APIs in docker, Flink, Spark on Databricks) we faced a number of issues that prevented Geomesa to work at all. The issues are:
- While running on managed env such as Databricks there is already a current user present on the workers. And simply running HBaseConnectionPool.configureSecurity which does UserGroupInformation.loginUserFromKeytab does not set the current user as logged-in user and current authorization basically does nothing. There are 2 ways to make a fix:
- The most simple which affects less code which I applied. Switch from UserGroupInformation.getCurrentUser to UserGroupInformation.getLoginUser which always would used logged-in user.
- The best way, yet will require lots of refactoring. The connection to the HBase should be created with an explicit user via ConnectionFactory.createConnection(conf, user). In case there is any nested code that relies on the current user (for example, hadoop hbase data format) is called - that should be wrapped with {{user.doAs { ... }
}}
- HBaseIndexAdapter leaks on long-running jobs. There is a known HBase defect and passing the explicit thread pool and closing it manually solves the issue.
- While writing into HBase that is a good idea to control the flush timeout, otherwise, we cannot guarantee the latency of writing into HBase.
- While working with Spark we need to handle hadoop configuration changes as well as authorization in every mapPartition . Note, that explicit reauth is not expensive cause the user will be statically cached on every node anyway. Also for some reason a piece of code with token auth never worked for us (I am not even sure how it was supposed to be working)
- While ingesting spark DataFrame data into Geomesa if there is an existing Geomesa schema we should prefer it if compatible and not try to force Spark inferred schema which might be slightly different yet compatible and will cause the crash.
Note, those changes were already working in Geomesa running on Kerberos-secured Azure HD Insights HBase for more than a year on different workloads. We were adding more and more fixes as we faced different situations. I believe the code might be improved a lot yet that will require rewriting a bunch of original code and a new cycle of E2E tests. |