Re: [geowave-dev] Geowave analytics options

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geowave-dev] Geowave analytics options

From: Scott <sctevl@xxxxxxxxx>
Date: Mon, 21 Sep 2015 13:30:49 -0400
Delivered-to: geowave-dev@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mailman/private/geowave-dev>
List-help: <mailto:geowave-dev-request@locationtech.org?subject=help>
List-subscribe: <https://www.locationtech.org/mailman/listinfo/geowave-dev>, <mailto:geowave-dev-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://www.locationtech.org/mailman/options/geowave-dev>, <mailto:geowave-dev-request@locationtech.org?subject=unsubscribe>

Eric,

Apologies for taking a bit to get back to this thread. I ended up rebuilding the namenode and re-initing accumulo. Anyway, I got everything up and running, built the GeoWave jars from a clone I did today, and ran an ingest of the natural earth, and a portion of the geolife data. I stopped the ingest at 56M records just to conserve resources. So if the end of the ingest involves calculating additional stats, then I wouldn't have that in this case. Let me know if that's a problem or if I'm missing any other steps

I ran the -dbscan again and got the error with the index again. Should I be seeing this as a separate table in accumulo?

I have the following tables in Accumulo:

geowave.geolife_GEOWAVE_METADATA (3 entries)

geowave.geolife_SPATIAL_TEMPORAL_VECTOR_IDX (49.95M entries)

geowave.geolife_SPATIAL_TEMPORAL_VECTOR_IDX_GEOWAVE_ALT_INDEX (7.15M entries)

15/09/21 13:09:20 WARN metadata.AbstractAccumuloPersistence: Object 'geowave.geolife_ROW_RANGE_SPATIAL_TEMPORAL_VECTOR_IDX_SPATIAL_TEMPORAL_VECTOR_IDX' not found

15/09/21 13:09:20 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/root/.staging/job_1442850766093_0001

15/09/21 13:09:20 ERROR analytic.AnalyticCLIOperationDriver: Unable to run analytic job

java.lang.NullPointerException

at mil.nga.giat.geowave.datastore.accumulo.mapreduce.input.GeoWaveInputFormat.getRangeMax(GeoWaveInputFormat.java:526)

at mil.nga.giat.geowave.datastore.accumulo.mapreduce.input.GeoWaveInputFormat.getIntermediateSplits(GeoWaveInputFormat.java:565)

at mil.nga.giat.geowave.datastore.accumulo.mapreduce.input.GeoWaveInputFormat.getSplits(GeoWaveInputFormat.java:415)

at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)

at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)

at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)

at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)

at mil.nga.giat.geowave.analytic.mapreduce.ToolRunnerMapReduceIntegration.waitForCompletion(ToolRunnerMapReduceIntegration.java:43)

at mil.nga.giat.geowave.analytic.mapreduce.GeoWaveAnalyticJobRunner.run(GeoWaveAnalyticJobRunner.java:272)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at mil.nga.giat.geowave.analytic.mapreduce.ToolRunnerMapReduceIntegration.submit(ToolRunnerMapReduceIntegration.java:31)

at mil.nga.giat.geowave.analytic.mapreduce.GeoWaveAnalyticJobRunner.run(GeoWaveAnalyticJobRunner.java:184)

at mil.nga.giat.geowave.analytic.mapreduce.nn.NNJobRunner.run(NNJobRunner.java:80)

at mil.nga.giat.geowave.analytic.mapreduce.dbscan.DBScanJobRunner.run(DBScanJobRunner.java:180)

at mil.nga.giat.geowave.analytic.mapreduce.dbscan.DBScanIterationsJobRunner.run(DBScanIterationsJobRunner.java:123)

at mil.nga.giat.geowave.analytic.mapreduce.dbscan.DBScanIterationsJobRunner.run(DBScanIterationsJobRunner.java:259)

at mil.nga.giat.geowave.analytic.AnalyticCLIOperationDriver.run(AnalyticCLIOperationDriver.java:62)

at mil.nga.giat.geowave.core.cli.GeoWaveMain.main(GeoWaveMain.java:48)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

On Fri, Sep 11, 2015 at 11:26 AM, Eric Robertson <rwgdrummer@xxxxxxxxx> wrote:

Yes. drop the table and re-ingest, placing the new geowave jar in the classpath before hand.

On Fri, Sep 11, 2015 at 10:06 AM, Scott <sctevl@xxxxxxxxx> wrote:
Ok Eric, thanks. Yes, the data was loaded by a geowave 0.8.9 build from a few weeks ago. Do you see any gotchas with removing the current data using Accumulo shell and then running a new ingest? I guess I need to upload the latest geowave jar into Accumulo classpath as well before uploading.

Cheers,

Scott

On Fri, Sep 11, 2015 at 9:54 AM, Eric Robertson <rwgdrummer@xxxxxxxxx> wrote:
This looks to me like the data was loaded from an older version of GeoWave. There is a statistic that is missing.
I am finishing up an adjustment that will handle this more gracefully along with a few other optimizations to the GeoWaveInputFormat.

I am also in the middle of a DBScan refactor, fixing a few bugs and adding optimizations.

On Fri, Sep 11, 2015 at 9:45 AM, Derek Yeager <dcy2003@xxxxxxxxx> wrote:

---------- Forwarded message ----------
From: Scott <sctevl@xxxxxxxxx>
Date: Fri, Sep 11, 2015 at 9:13 AM
Subject: Re: [geowave-dev] Geowave analytics options
To: geowave-dev <geowave-dev@xxxxxxxxxxxxxxxx>

Rich,

Thanks for the help. I grabbed the latest, recompiled the analytics jar and then ran again. I got a little further but still hit an error. Is there a different way to do this that's better?

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:java.library.path=:/usr/hdp/2.2.6.0-2800/hadoop/lib/native/Linux-amd64-64:/data/hdp/2.2.6.0-2800/hadoop/lib/native

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-573.3.1.el6.x86_64

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:user.name=root

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:user.home=/root

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/sclark/geowave/analytics/mapreduce/target/munged

15/09/11 08:18:59 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=gilmith.nicc.noblis.org:2181 sessionTimeout=30000 watcher=org.apache.accumulo.fate.zookeeper.ZooSession$ZooWatcher@398d81fe

15/09/11 08:19:00 INFO zookeeper.ClientCnxn: Opening socket connection to server gilmith.nicc.noblis.org/172.18.151.210:2181. Will not attempt to authenticate using SASL (unknown error)

15/09/11 08:19:00 INFO zookeeper.ClientCnxn: Socket connection established to gilmith.nicc.noblis.org/172.18.151.210:2181, initiating session

15/09/11 08:19:00 INFO zookeeper.ClientCnxn: Session establishment complete on server gilmith.nicc.noblis.org/172.18.151.210:2181, sessionid = 0x24fbb590bba0001, negotiated timeout = 30000

15/09/11 08:19:02 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native

15/09/11 08:19:02 INFO compress.CodecPool: Got brand-new compressor [.bz2]

15/09/11 08:19:02 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library

15/09/11 08:19:02 INFO compress.CodecPool: Got brand-new compressor [.gz]

15/09/11 08:19:02 INFO compress.CodecPool: Got brand-new compressor [.lz4]

15/09/11 08:19:02 INFO compress.CodecPool: Got brand-new compressor [.snappy]

15/09/11 08:19:02 WARN mapreduce.GeoWaveAnalyticJobRunner: Compression with class org.apache.hadoop.io.compress.SnappyCodec

15/09/11 08:19:04 INFO impl.TimelineClientImpl: Timeline service address: http://gilmith.nicc.noblis.org:8188/ws/v1/timeline/

15/09/11 08:19:04 INFO client.RMProxy: Connecting to ResourceManager at gilmith.nicc.noblis.org/172.18.151.210:8050

15/09/11 08:19:30 WARN metadata.AbstractAccumuloPersistence: Object 'ROW_RANGE_SPATIAL_VECTOR_IDX' not found

15/09/11 08:19:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/root/.staging/job_1441722061750_0001

15/09/11 08:19:30 ERROR analytic.AnalyticCLIOperationDriver: Unable to run analytic job

java.lang.NullPointerException

at mil.nga.giat.geowave.datastore.accumulo.mapreduce.input.GeoWaveInputFormat.getRangeMax(GeoWaveInputFormat.java:452)

at mil.nga.giat.geowave.datastore.accumulo.mapreduce.input.GeoWaveInputFormat.getIntermediateSplits(GeoWaveInputFormat.java:516)

at mil.nga.giat.geowave.datastore.accumulo.mapreduce.input.GeoWaveInputFormat.getSplits(GeoWaveInputFormat.java:393)

at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)

at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)

at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)

at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)

at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)

at mil.nga.giat.geowave.analytic.mapreduce.ToolRunnerMapReduceIntegration.waitForCompletion(ToolRunnerMapReduceIntegration.java:43)

at mil.nga.giat.geowave.analytic.mapreduce.GeoWaveAnalyticJobRunner.run(GeoWaveAnalyticJobRunner.java:272)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at mil.nga.giat.geowave.analytic.mapreduce.ToolRunnerMapReduceIntegration.submit(ToolRunnerMapReduceIntegration.java:31)

at mil.nga.giat.geowave.analytic.mapreduce.GeoWaveAnalyticJobRunner.run(GeoWaveAnalyticJobRunner.java:184)

at mil.nga.giat.geowave.analytic.mapreduce.nn.NNJobRunner.run(NNJobRunner.java:80)

at mil.nga.giat.geowave.analytic.mapreduce.dbscan.DBScanJobRunner.run(DBScanJobRunner.java:180)

at mil.nga.giat.geowave.analytic.mapreduce.dbscan.DBScanIterationsJobRunner.run(DBScanIterationsJobRunner.java:123)

at mil.nga.giat.geowave.analytic.mapreduce.dbscan.DBScanIterationsJobRunner.run(DBScanIterationsJobRunner.java:259)

at mil.nga.giat.geowave.analytic.AnalyticCLIOperationDriver.run(AnalyticCLIOperationDriver.java:62)

at mil.nga.giat.geowave.core.cli.GeoWaveMain.main(GeoWaveMain.java:48)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

On Wed, Sep 9, 2015 at 4:57 PM, Rich Fecher <rfecher@xxxxxxxxx> wrote:
Thanks for pointing that out Scott. I actually haven't run DBScan myself using the shaded jar produced by activating the 'analytics-singlejar' profile, but it looks to me, at least regarding your experience with an unrecognized CLI operation, that there's an issue with running analytics through GeoWaveMain in 0.8.9-SNAPSHOT. The analytics operations simply don't seem to be provided within META-INF/services. Its a very easy fix at least, I just committed the file with the SPI operation provider that should get you past this error.

I do know Eric's been running DBScan recently, but probably through a different means than this? Eric, should he just be using a different packaging?

We are trying to move toward everything including the analytics running through GeoWaveMain, but we aren't fully there yet (apparently). Let us know if this works for you.

Rich

On Wed, Sep 9, 2015 at 4:09 PM, Scott <sctevl@xxxxxxxxx> wrote:
Hello,

I tried out the sample command to run the dbscan analytic against the geolife tables I loaded into GeoWave. However, when I run the command:

yarn jar geowave-analytic-mapreduce-0.8.9-SNAPSHOT-analytics-singlejar.jar -dbscan -n geowave.geolife -u geowave -p hadoop -z FQDN:2181 -i accumulo -emn 2 -emx 6 -pd 1000 -pc mil.nga.giat.geowave.analytic.partitioner.OrthodromicDistancePartitioner -cms 10 -orc 4 -hdfsbase /user/geolife -b bdb4 -eit geolife

I get the following: ERROR cli.GeoWaveMain: Unable to parse operation

org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -dbscan

Is dbscan a different option now? Is there a list of the proper options for all of the analytics (such as KMeans, Nearest Neighbors, etc)? Am I just overlooking something in the docs?
Thanks,
Scott

_______________________________________________
geowave-dev mailing list
geowave-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geowave-dev

_______________________________________________
geowave-dev mailing list
geowave-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geowave-dev

_______________________________________________
geowave-dev mailing list
geowave-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geowave-dev

References:
- [geowave-dev] Geowave analytics options
  - From: Scott
- Re: [geowave-dev] Geowave analytics options
  - From: Rich Fecher
- Re: [geowave-dev] Geowave analytics options
  - From: Scott
- Re: [geowave-dev] Geowave analytics options
  - From: Scott

Prev by Date: Re: [geowave-dev] Wrong number of results with TemporalRange
Next by Date: Re: [geowave-dev] Geowave analytics options
Previous by thread: Re: [geowave-dev] Geowave analytics options
Next by thread: Re: [geowave-dev] Geowave analytics options
Index(es):
- Date
- Thread

Breadcrumbs