[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [geomesa-users] GeoMesa Docker EMR - Jupyter Notebook help
|
Thanks Jason -
That did not work. We checked docker exec -t -i accumulo-master find /opt -name *.jar and the file names there match the file names in geomesa_spark_scala/kernel.json. We are wondering about the appended -SNAPSHOT (my .jar files vs yours).
In order to get GeoMesa (at least ingestion) to work as well as geoserver we had to adjust the Bootstrap. Perhaps that is where we went wrong? This was due to the following:
1. Access error to s3://geomesa-docker/bootstrap-geodocker-accumulo.sh, we can see file contents via boto and CLI ls
my_bucket = s3.Bucket('geomesa-docker')
for obj in my_bucket.objects.all():
print(obj.key)
But are unable to grab it via CLI cp or urllib.request.urlretrieve.
2. We adapted a copy from geowave-geomesa-comparative-analysis/analyze/bootstrap-geodocker-accumulo.sh Relative changes were:
IMAGE=quay.io/geomesa/accumulo-geomesa:latest
vs
IMAGE=quay.io/geodocker/accumulo:${TAG:-"latest"}
AND
DOCKER_OPT="-d --net=host --restart=always"
if is_master ; then
docker pull $IMAGE
docker pull quay.io/geomesa/geoserver:latest
docker pull quay.io/geomesa/geomesa-jupyter:latest
docker run $DOCKER_OPT --name=accumulo-master $DOCKER_ENV $IMAGE master --auto-init
docker run $DOCKER_OPT --name=accumulo-monitor $DOCKER_ENV $IMAGE monitor
docker run $DOCKER_OPT --name=accumulo-tracer $DOCKER_ENV $IMAGE tracer
docker run $DOCKER_OPT --name=accumulo-gc $DOCKER_ENV $IMAGE gc
docker run $DOCKER_OPT --name=geoserver quay.io/geomesa/geoserver:latest
docker run $DOCKER_OPT --name=jupyter quay.io/geomesa/geomesa-jupyter:latest
else # is worker
docker pull $IMAGE
docker run -d --net=host --name=accumulo-tserver $DOCKER_ENV $IMAGE tserver
Versus
DOCKER_OPT="-d --net=host --restart=always"
if is_master ; then
docker run $DOCKER_OPT --name=accumulo-master $DOCKER_ENV $IMAGE master --auto-init
docker run $DOCKER_OPT --name=accumulo-monitor $DOCKER_ENV $IMAGE monitor
docker run $DOCKER_OPT --name=accumulo-tracer $DOCKER_ENV $IMAGE tracer
docker run $DOCKER_OPT --name=accumulo-gc $DOCKER_ENV $IMAGE gc
docker run $DOCKER_OPT --name=geoserver quay.io/geodocker/geoserver:latest
else # is worker
docker run -d --net=host --name=accumulo-tserver $DOCKER_ENV $IMAGE tserver
3. Bootstrap config changes were -i=quay.io/geomesa/accumulo-geomesa:latest, -n=gis, -p=secret, -e=TSERVER_XMX=10G, -e=TSERVER_CACHE_DATA_SIZE=6G, -e=TSERVER_CACHE_INDEX_SIZE=2G
4. Error from Jupyter Startup read (replaced IPs with <IP>)
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
[I <IP> NotebookApp] Kernel started: 83a4cb2d-8004-4c69-ad21-ad46ac2b4a48
Starting Spark Kernel with SPARK_HOME=/usr/local/spark
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
(Scala,org.apache.toree.kernel.interpreter.scala.ScalaInterpreter@1bc715b8)
(PySpark,org.apache.toree.kernel.interpreter.pyspark.PySparkInterpreter@292d1c71)
(SparkR,org.apache.toree.kernel.interpreter.sparkr.SparkRInterpreter@2b491fee)
(SQL,org.apache.toree.kernel.interpreter.sql.SqlInterpreter@3f1c5af9)
17/02/15 15:44:00 WARN toree.Main$$anon$1: No external magics provided to PluginManager!
17/02/15 15:44:04 WARN layer.StandardComponentInitialization$$anon$1: Locked to Scala interpreter with SparkIMain until decoupled!
17/02/15 15:44:04 WARN layer.StandardComponentInitialization$$anon$1: Unable to control initialization of REPL class server!
[W 15:44:04.777 NotebookApp] Notebook GDELT+Analysis.ipynb is not trusted
[W 15:44:04.803 NotebookApp] 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20170215154320 (<IP>) 2.94ms referer=http://ec2<IP>.compute-1.amazonaws.com:8890/notebooks/GDELT%2BAnalysis.ipynb
17/02/15 15:44:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[W 15:44:06.577 NotebookApp] Timeout waiting for kernel_info reply from 83a4cb2d-8004-4c69-ad21-ad46ac2b4a48
17/02/15 15:44:06 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/02/15 15:44:13 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
Any feedback is greatly appreciated.
Byron, Texas A&M Transportation Institute
From: geomesa-users-bounces@xxxxxxxxxxxxxxxx <geomesa-users-bounces@xxxxxxxxxxxxxxxx> on behalf of Jim Hughes <jnh5y@xxxxxxxx>
Sent: Tuesday, February 14, 2017 4:15 PM
To: geomesa-users@xxxxxxxxxxxxxxxx
Subject: Re: [geomesa-users] GeoMesa Docker EMR - Jupyter Notebook help
Hi Byron,
As happenstance, I'm setting up a GeoMesa demo, and I have a quick fix.
You'll want to connect to the Jupyter docker (say, with 'docker exec -it jupyter /bin/sh), and edit this file: /var/lib/hadoop-hdfs/.local/share/jupyter/kernels/geomesa_spark_scala/kernel.json.
The line with the with the Toree Spark opts should read...
"__TOREE_SPARK_OPTS__": "--driver-java-options=-Xmx4096M --driver-java-options=-Dlog4j.logLevel=info --master yarn --jars file:///opt/geomesa/dist/spark/geomesa-accumulo-spark-runtime_2.11-1.3.0.jar,file:///opt/geomesa/dist/spark/geomesa-spark-converter_2.11-1.3.0.jar,file:///opt/geomesa/dist/spark/geomesa-spark-geotools_2.11-1.3.0.jar",
One of the jars changed names (from geomesa-accumulo-spark_2.11-1.3.0-shaded.jar to geomesa-accumulo-spark-runtime_2.11-1.3.0.jar). That difference caused the issues; I need to sort out re-building the Docker images.
Let me know if that doesn't sort it out!
Cheers,
Jim
On 02/14/2017 04:07 PM, Byron Chigoy wrote:
Hi - probably pretty basic, but we are able to get the Docker Bootstrap tutorial working on AWS. We are pulling from https://quay.io/organization/geomesa . Once started we can ingest the GDELT example and get the descriptive. We are also able to bring the GDELT example into GeoServer.
However while Jupyter gets docked - the Kernel GeoMesa Spark - Scala fails (Just says kernel busy). We started the notebook on another port to see the error behavior and get a list of them. See below any help or clues would be most appreciated.
(Scala,org.apache.toree.kernel.interpreter.scala.ScalaInterpreter@5ef0d29e)
(PySpark,org.apache.toree.kernel.interpreter.pyspark.PySparkInterpreter@38f57b3d)
(SparkR,org.apache.toree.kernel.interpreter.sparkr.SparkRInterpreter@51850751)
(SQL,org.apache.toree.kernel.interpreter.sql.SqlInterpreter@3ce3db41)
17/02/14 20:50:29 WARN toree.Main$$anon$1: No external magics provided to PluginManager!
17/02/14 20:50:32 WARN layer.StandardComponentInitialization$$anon$1: Locked to Scala interpreter with SparkIMain until decoupled!
17/02/14 20:50:32 WARN layer.StandardComponentInitialization$$anon$1: Unable to control initialization of REPL class server!
17/02/14 20:50:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[W 20:50:34.769 NotebookApp] Timeout waiting for kernel_info reply from fe9c2776-f5d7-47bc-b5dd-d2769f631f2f
17/02/14 20:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/02/14 20:50:42 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
Byron
_______________________________________________ geomesa-users mailing list geomesa-users@xxxxxxxxxxxxxxxx To change your delivery options, retrieve your password, or unsubscribe from this list, visit https://dev.locationtech.org/mailman/listinfo/geomesa-users