Re: [geomesa-users] GeoMesa Docker EMR - Jupyter Notebook help

Thanks Jason -
That did not work.  We checked docker exec -t -i accumulo-master find /opt -name *.jar and the file names there match the file names in geomesa_spark_scala/kernel.json.  We are wondering about the appended -SNAPSHOT (my .jar files vs yours). 

In order to get GeoMesa (at least ingestion) to work as well as geoserver we had to adjust the Bootstrap. Perhaps that is where we went wrong? This was due to the following:

1. Access error to s3://geomesa-docker/, we can see file contents via boto and CLI ls
        my_bucket = s3.Bucket('geomesa-docker')
        for obj in my_bucket.objects.all():
But are unable to grab it via CLI cp or urllib.request.urlretrieve.

2. We adapted a copy from geowave-geomesa-comparative-analysis/analyze/ Relative changes were:


DOCKER_OPT="-d --net=host --restart=always"
if is_master ; then
	docker pull $IMAGE
	docker pull
	docker pull
    docker run $DOCKER_OPT --name=accumulo-master $DOCKER_ENV $IMAGE master --auto-init
    docker run $DOCKER_OPT --name=accumulo-monitor $DOCKER_ENV $IMAGE monitor
    docker run $DOCKER_OPT --name=accumulo-tracer $DOCKER_ENV $IMAGE tracer
    docker run $DOCKER_OPT --name=accumulo-gc $DOCKER_ENV $IMAGE gc
    docker run $DOCKER_OPT --name=geoserver
	docker run $DOCKER_OPT --name=jupyter
else # is worker
	docker pull $IMAGE
    docker run -d --net=host --name=accumulo-tserver $DOCKER_ENV $IMAGE tserver


3. Bootstrap config changes were, -n=gis, -p=secret, -e=TSERVER_XMX=10G, -e=TSERVER_CACHE_DATA_SIZE=6G, -e=TSERVER_CACHE_INDEX_SIZE=2G 

4. Error from Jupyter Startup read (replaced IPs with <IP>)

bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
[I <IP> NotebookApp] Kernel started: 83a4cb2d-8004-4c69-ad21-ad46ac2b4a48
Starting Spark Kernel with SPARK_HOME=/usr/local/spark
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
17/02/15 15:44:00 WARN toree.Main$$anon$1: No external magics provided to PluginManager!
17/02/15 15:44:04 WARN layer.StandardComponentInitialization$$anon$1: Locked to Scala interpreter with SparkIMain until decoupled!
17/02/15 15:44:04 WARN layer.StandardComponentInitialization$$anon$1: Unable to control initialization of REPL class server!
[W 15:44:04.777 NotebookApp] Notebook GDELT+Analysis.ipynb is not trusted
[W 15:44:04.803 NotebookApp] 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20170215154320 (<IP>) 2.94ms referer=http://ec2<IP>
17/02/15 15:44:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[W 15:44:06.577 NotebookApp] Timeout waiting for kernel_info reply from 83a4cb2d-8004-4c69-ad21-ad46ac2b4a48
17/02/15 15:44:06 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/02/15 15:44:13 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

Any feedback is greatly appreciated.
Byron, Texas A&M Transportation Institute

From: geomesa-users-bounces@xxxxxxxxxxxxxxxx <geomesa-users-bounces@xxxxxxxxxxxxxxxx> on behalf of Jim Hughes <jnh5y@xxxxxxxx>
Sent: Tuesday, February 14, 2017 4:15 PM
To: geomesa-users@xxxxxxxxxxxxxxxx
Subject: Re: [geomesa-users] GeoMesa Docker EMR - Jupyter Notebook help
Hi Byron,

As happenstance, I'm setting up a GeoMesa demo, and I have a quick fix.  

You'll want to connect to the Jupyter docker (say, with 'docker exec -it jupyter /bin/sh), and edit this file: /var/lib/hadoop-hdfs/.local/share/jupyter/kernels/geomesa_spark_scala/kernel.json.

The line with the with the Toree Spark opts should read...

    "__TOREE_SPARK_OPTS__": "--driver-java-options=-Xmx4096M --driver-java-options=-Dlog4j.logLevel=info --master yarn --jars  file:///opt/geomesa/dist/spark/geomesa-accumulo-spark-runtime_2.11-1.3.0.jar,file:///opt/geomesa/dist/spark/geomesa-spark-converter_2.11-1.3.0.jar,file:///opt/geomesa/dist/spark/geomesa-spark-geotools_2.11-1.3.0.jar",

One of the jars changed names (from geomesa-accumulo-spark_2.11-1.3.0-shaded.jar to geomesa-accumulo-spark-runtime_2.11-1.3.0.jar).  That difference caused the issues; I need to sort out re-building the Docker images.

Let me know if that doesn't sort it out!



On 02/14/2017 04:07 PM, Byron Chigoy wrote:
Hi - probably pretty basic, but we are able to get the Docker Bootstrap tutorial working on AWS. We are pulling from .  Once started we can ingest the GDELT example and get the descriptive.  We are also able to bring the GDELT example into GeoServer. 

However while Jupyter gets docked - the Kernel GeoMesa Spark - Scala fails (Just says kernel busy).  We started the notebook on another port to see the error behavior and get a list of them. See below any help or clues would be most appreciated.

17/02/14 20:50:29 WARN toree.Main$$anon$1: No external magics provided to PluginManager!
17/02/14 20:50:32 WARN layer.StandardComponentInitialization$$anon$1: Locked to Scala interpreter with SparkIMain until decoupled!
17/02/14 20:50:32 WARN layer.StandardComponentInitialization$$anon$1: Unable to control initialization of REPL class server!
17/02/14 20:50:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[W 20:50:34.769 NotebookApp] Timeout waiting for kernel_info reply from fe9c2776-f5d7-47bc-b5dd-d2769f631f2f
17/02/14 20:50:35 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/02/14 20:50:42 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.


