Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geowave-dev] Geowave analytics options

Ok Eric, thanks. Yes, the data was loaded by a geowave 0.8.9 build from a few weeks ago. Do you see any gotchas with removing the current data using Accumulo shell and then running a new ingest? I guess I need to upload the latest geowave jar into Accumulo classpath as well before uploading. 

Cheers,

Scott

On Fri, Sep 11, 2015 at 9:54 AM, Eric Robertson <rwgdrummer@xxxxxxxxx> wrote:
This looks to me like the data was loaded from an older version of GeoWave.  There is a statistic that is missing.
I am finishing up an adjustment that will handle this more gracefully along with a few other optimizations to the GeoWaveInputFormat.

I am also in the middle of a DBScan refactor, fixing a few bugs and adding optimizations.



On Fri, Sep 11, 2015 at 9:45 AM, Derek Yeager <dcy2003@xxxxxxxxx> wrote:


---------- Forwarded message ----------
From: Scott <sctevl@xxxxxxxxx>
Date: Fri, Sep 11, 2015 at 9:13 AM
Subject: Re: [geowave-dev] Geowave analytics options
To: geowave-dev <geowave-dev@xxxxxxxxxxxxxxxx>


Rich,

  Thanks for the help. I grabbed the latest, recompiled the analytics jar and then ran again. I got a little further but still hit an error. Is there a different way to do this that's better?

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:java.library.path=:/usr/hdp/2.2.6.0-2800/hadoop/lib/native/Linux-amd64-64:/data/hdp/2.2.6.0-2800/hadoop/lib/native

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-573.3.1.el6.x86_64

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:user.name=root

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:user.home=/root

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/sclark/geowave/analytics/mapreduce/target/munged

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=gilmith.nicc.noblis.org:2181 sessionTimeout=30000 watcher=org.apache.accumulo.fate.zookeeper.ZooSession$ZooWatcher@398d81fe

15/09/11 08:19:00 INFO zookeeper.ClientCnxn: Opening socket connection to server gilmith.nicc.noblis.org/172.18.151.210:2181. Will not attempt to authenticate using SASL (unknown error)

15/09/11 08:19:00 INFO zookeeper.ClientCnxn: Socket connection established to gilmith.nicc.noblis.org/172.18.151.210:2181, initiating session

15/09/11 08:19:00 INFO zookeeper.ClientCnxn: Session establishment complete on server gilmith.nicc.noblis.org/172.18.151.210:2181, sessionid = 0x24fbb590bba0001, negotiated timeout = 30000

15/09/11 08:19:02 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native

15/09/11 08:19:02 INFO compress.CodecPool: Got brand-new compressor [.bz2]

15/09/11 08:19:02 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library

15/09/11 08:19:02 INFO compress.CodecPool: Got brand-new compressor [.gz]

15/09/11 08:19:02 INFO compress.CodecPool: Got brand-new compressor [.lz4]

15/09/11 08:19:02 INFO compress.CodecPool: Got brand-new compressor [.snappy]

15/09/11 08:19:02 WARN mapreduce.GeoWaveAnalyticJobRunner: Compression with class org.apache.hadoop.io.compress.SnappyCodec

15/09/11 08:19:04 INFO impl.TimelineClientImpl: Timeline service address: http://gilmith.nicc.noblis.org:8188/ws/v1/timeline/

15/09/11 08:19:04 INFO client.RMProxy: Connecting to ResourceManager at gilmith.nicc.noblis.org/172.18.151.210:8050

15/09/11 08:19:30 WARN metadata.AbstractAccumuloPersistence: Object 'ROW_RANGE_SPATIAL_VECTOR_IDX' not found

15/09/11 08:19:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/root/.staging/job_1441722061750_0001

15/09/11 08:19:30 ERROR analytic.AnalyticCLIOperationDriver: Unable to run analytic job

java.lang.NullPointerException

at mil.nga.giat.geowave.datastore.accumulo.mapreduce.input.GeoWaveInputFormat.getRangeMax(GeoWaveInputFormat.java:452)

at mil.nga.giat.geowave.datastore.accumulo.mapreduce.input.GeoWaveInputFormat.getIntermediateSplits(GeoWaveInputFormat.java:516)

at mil.nga.giat.geowave.datastore.accumulo.mapreduce.input.GeoWaveInputFormat.getSplits(GeoWaveInputFormat.java:393)

at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)

at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)

at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)

at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)

at mil.nga.giat.geowave.analytic.mapreduce.ToolRunnerMapReduceIntegration.waitForCompletion(ToolRunnerMapReduceIntegration.java:43)

at mil.nga.giat.geowave.analytic.mapreduce.GeoWaveAnalyticJobRunner.run(GeoWaveAnalyticJobRunner.java:272)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at mil.nga.giat.geowave.analytic.mapreduce.ToolRunnerMapReduceIntegration.submit(ToolRunnerMapReduceIntegration.java:31)

at mil.nga.giat.geowave.analytic.mapreduce.GeoWaveAnalyticJobRunner.run(GeoWaveAnalyticJobRunner.java:184)

at mil.nga.giat.geowave.analytic.mapreduce.nn.NNJobRunner.run(NNJobRunner.java:80)

at mil.nga.giat.geowave.analytic.mapreduce.dbscan.DBScanJobRunner.run(DBScanJobRunner.java:180)

at mil.nga.giat.geowave.analytic.mapreduce.dbscan.DBScanIterationsJobRunner.run(DBScanIterationsJobRunner.java:123)

at mil.nga.giat.geowave.analytic.mapreduce.dbscan.DBScanIterationsJobRunner.run(DBScanIterationsJobRunner.java:259)

at mil.nga.giat.geowave.analytic.AnalyticCLIOperationDriver.run(AnalyticCLIOperationDriver.java:62)

at mil.nga.giat.geowave.core.cli.GeoWaveMain.main(GeoWaveMain.java:48)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)


On Wed, Sep 9, 2015 at 4:57 PM, Rich Fecher <rfecher@xxxxxxxxx> wrote:
Thanks for pointing that out Scott. I actually haven't run DBScan myself using the shaded jar produced by activating the 'analytics-singlejar' profile, but it looks to me, at least regarding your experience with an unrecognized CLI operation, that there's an issue with running analytics through GeoWaveMain in 0.8.9-SNAPSHOT.  The analytics operations simply don't seem to be provided within META-INF/services.  Its a very easy fix at least, I just committed the file with the SPI operation provider that should get you past this error.

I do know Eric's been running DBScan recently, but probably through a different means than this? Eric, should he just be using a different packaging?

We are trying to move toward everything including the analytics running through GeoWaveMain, but we aren't fully there yet (apparently).  Let us know if this works for you.

Rich

On Wed, Sep 9, 2015 at 4:09 PM, Scott <sctevl@xxxxxxxxx> wrote:
Hello,

  I tried out the sample command to run the dbscan analytic against the geolife tables I loaded into GeoWave. However, when I run the command: 

yarn jar geowave-analytic-mapreduce-0.8.9-SNAPSHOT-analytics-singlejar.jar  -dbscan  -n geowave.geolife -u geowave -p hadoop -z FQDN:2181 -i accumulo -emn 2 -emx 6 -pd 1000 -pc mil.nga.giat.geowave.analytic.partitioner.OrthodromicDistancePartitioner -cms 10 -orc 4 -hdfsbase /user/geolife -b bdb4 -eit geolife


I get the following: ERROR cli.GeoWaveMain: Unable to parse operation

org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -dbscan


Is dbscan a different option now? Is there a list of the proper options for all of the analytics (such as KMeans, Nearest Neighbors, etc)? Am I just overlooking something in the docs?

Thanks,

 Scott


_______________________________________________
geowave-dev mailing list
geowave-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geowave-dev



_______________________________________________
geowave-dev mailing list
geowave-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geowave-dev



_______________________________________________
geowave-dev mailing list
geowave-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geowave-dev





Back to the top