Re: [geomesa-users] Geomesa Spark Java API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Geomesa Spark Java API

From: Tom Kunicki <tkunicki@xxxxxxxx>
Date: Thu, 16 Mar 2017 14:34:45 -0400
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>

Hi, José.

It appears some of our META-INF/services files don’t terminate w/ newlines. We haven’t had an issue with this in the past but it appears this is causing you an issue.

Would you mind sharing your development environment?

OS?

JDK?

maven version?

maven shade plugin version?

Thanks,

Tom

On Mar 16, 2017, at 11:41 AM, Jose Bujalance <joseab56@xxxxxxxxx> wrote:
Actually I already had that block on my shade plugin, but I am still having those long lines in my services. The good part is that when I edit them manually once the jar has been generated, everything works perfectly, getting the same result as with the scala code.
2017-03-16 16:33 GMT+01:00 Jose Bujalance <joseab56@xxxxxxxxx>:
Hi Jim,

Thanks for your answer. You are right ! This is what I found in the generated META-INF/services/org.geotools.filter._expression_.PropertyAccessorFactory:

org.locationtech.geomesa.convert.cql.ArrayPropertyAccessorFactoryorg.locationtech.geomesa.features.kryo.json.JsonPropertyAccessorFactory
org.geotools.filter._expression_.SimpleFeaturePropertyAccessorFactory
org.geotools.filter._expression_.ThisPropertyAccessorFactory
org.geotools.filter._expression_.DirectPropertyAccessorFactory

And now that you mention it, I had the exact same problem with the META-INF/org.locationtech.geomesa.spark.SpatialRDDProvider, which I modify manually every time I generate the jar (you can guess I am not very good with Maven ^^)

So, is this solved 'automatically' by adding the following block to the shade plugin in my pom ?

<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
</transformers>
Thank you,
José
2017-03-16 15:46 GMT+01:00 Jim Hughes <jnh5y@xxxxxxxx>:
Hi José,

Does project bundle a fat jar for use in Spark? You may need to add a block to the maven-shade-plugin (or whichever plugin you are using) to sort the META-INF services entry for org.geotools.filter._expression_.PropertyAccessorFactory.

You can check out your jar with something like...

> jar xvf ./org/locationtech/geomesa/geomesa-convert-common_2.11/1.3.1/geomesa-convert-common_2.11-1.3.1.jar META-INF/services/org.geotools.filter._expression_.PropertyAccessorFactory
> more META-INF/services/org.geotools.filter._expression_.PropertyAccessorFactory
org.locationtech.geomesa.convert.cql.ArrayPropertyAccessorFactory

My guess is that your jar has one long line with two factories on it...

As another note, GeoMesa 1.3.x depends on GeoTools 15.1. It might work with later versions of GeoTools.

Cheers,

Jim

1. https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-spark-runtime/pom.xml#L138-L140

On 03/16/2017 10:10 AM, Jose Bujalance wrote:
Hi again,

I am trying the new Java API for Geomesa-Spark provided in the 1.3.1 version, but I am having some troubles.

First of all, I have tested that everything works fine querying my Accumulo datastore through the Spark shell using the geomesa-accumulo-spark-runtime_2.11-1.3.1.jar. This is how my scala code looks like:

import org.apache.hadoop.conf.Configuration

import org.apache.spark.rdd.RDD

import org.apache.spark.{SparkConf, SparkContext}

import org.locationtech.geomesa.spark._

import org.geotools.data.{DataStoreFinder, Query}

import org.geotools.factory.CommonFactoryFinder

import org.geotools.filter.text.ecql.ECQL

import scala.collection.JavaConversions._

// Accumulo datastore params

val params = Map(

"instanceId" -> "hdp-accumulo-instance",

"zookeepers" -> "hdf-sb-a.praxedo.net:2181,hdf-sb-b.praxedo.net:2181",

"user" -> "root",

"password" -> "praxedo",

"tableName" -> "Geoloc_Praxedo"

)

// set the configuration to the existant SparkContext

val conf = sc.getConf

conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

conf.set("spark.kryo.registrator", classOf[GeoMesaSparkKryoRegistrator].getName)

val sc = SparkContext.getOrCreate(conf)

// create RDD with a geospatial query using Geomesa functions

val spatialRDDProvider = GeoMesaSpark(params)

val filter = ECQL.toFilter("BBOX(coords, 2.249294, 48.815215, 2.419337, 48.904295)")

val query = new Query("history_1M", filter)

val resultRDD = spatialRDDProvider.rdd(new Configuration, sc, params, query)

resultRDD.count

This code works fine, giving the expected result.

Now I am trying to do the same thing on Java. This is how my code looks like:

package com.praxedo.geomesa.geomesa_spark;

import org.apache.hadoop.conf.Configuration;

import org.apache.spark.SparkConf;

import org.apache.spark.api.java.JavaSparkContext;

import org.geotools.data.Query;

import org.geotools.filter.text.cql2.CQLException;

import org.geotools.filter.text.ecql.ECQL;

import org.locationtech.geomesa.spark.api.java.*;

import java.io.IOException;

import java.util.HashMap;

import java.util.Map;

public class Test {

private final static String ACCUMULO_INSTANCE = "hdp-accumulo-instance";

private final static String ACCUMULO_ZOOKEEPERS = "hdf-sb-a.praxedo.net:2181,hdf-sb-b.praxedo.net:2181";

private final static String ACCUMULO_USER = "root";

private final static String ACCUMULO_PASSWORD = "password";

private final static String GEOMESA_CATALOG = "Geoloc_Praxedo";

private final static String GEOMESA_FEATURE = "history_1M";

public static void main(String[] args) throws IOException, CQLException {

//Spark configuration

SparkConf conf = new SparkConf().setAppName("MyAppName").setMaster("local[*]");

conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");

conf.set("spark.kryo.registrator", "org.locationtech.geomesa.spark.GeoMesaSparkKryoRegistrator");

JavaSparkContext jsc = new JavaSparkContext(conf);

//Datastore configuration

Map<String, String> parameters = new HashMap<>();

parameters.put("instanceId", ACCUMULO_INSTANCE);

parameters.put("zookeepers", ACCUMULO_ZOOKEEPERS);

parameters.put("user", ACCUMULO_USER);

parameters.put("password", ACCUMULO_PASSWORD);

parameters.put("tableName", GEOMESA_CATALOG);

JavaSpatialRDDProvider provider = JavaGeoMesaSpark.apply(parameters);

String predicate = "BBOX(coords, 2.249294, 48.815215, 2.419337, 48.904295)";

Query query = new Query(GEOMESA_FEATURE, ECQL.toFilter(predicate));

JavaSpatialRDD resultRDD = provider.rdd(new Configuration(), jsc, parameters, query);

System.out.println("Number of RDDs: " + resultRDD.count());

System.out.println("First RDD: " + resultRDD.first());

}

}

And here are the dependencies I am importing with Maven:

<dependency>

<groupId>junit</groupId>

<artifactId>junit</artifactId>

<version>3.8.1</version>

<scope>test</scope>

</dependency>

<dependency>

<groupId>org.apache.accumulo</groupId>

<artifactId>accumulo-core</artifactId>

<version>1.7.0</version>

</dependency>

<dependency>

<groupId>org.apache.accumulo</groupId>

<artifactId>accumulo-fate</artifactId>

<version>1.7.0</version>

</dependency>

<dependency>

<groupId>org.apache.spark</groupId>

<artifactId>spark-core_2.11</artifactId>

<version>2.0.0</version>

</dependency>

<dependency>

<groupId>org.locationtech.geomesa</groupId>

<artifactId>geomesa-spark-core_2.11</artifactId>

<version>1.3.1</version>

</dependency>

<dependency>

<groupId>org.locationtech.geomesa</groupId>

<artifactId>geomesa-spark-converter_2.11</artifactId>

<version>1.3.1</version>

</dependency>

<dependency>

<groupId>org.locationtech.geomesa</groupId>

<artifactId>geomesa-security_2.11</artifactId>

<version>1.3.1</version>

</dependency>

<dependency>

<groupId>org.locationtech.geomesa</groupId>

<artifactId>geomesa-spark-geotools_2.11</artifactId>

<version>1.3.1</version>

</dependency>

<dependency>

<groupId>org.locationtech.geomesa</groupId>

<artifactId>geomesa-spark-sql_2.11</artifactId>

<version>1.3.1</version>

</dependency>

<dependency>

<groupId>org.locationtech.geomesa</groupId>

<artifactId>geomesa-accumulo-datastore_2.11</artifactId>

<version>1.3.1</version>

</dependency>

<dependency>

<groupId>org.locationtech.geomesa</groupId>

<artifactId>geomesa-accumulo-spark_2.11</artifactId>

<version>1.3.1</version>

</dependency>

<dependency>

<groupId>org.locationtech.geomesa</groupId>

<artifactId>geomesa-utils_2.11</artifactId>

<version>1.3.1</version>

</dependency>

<dependency>

<groupId>org.locationtech.geomesa</groupId>

<artifactId>geomesa-index-api_2.11</artifactId>

<version>1.3.1</version>

</dependency>

<dependency>

<groupId>org.geotools</groupId>

<artifactId>gt-main</artifactId>

<version>16.1</version>

</dependency>

I have succesfully built the jar, but when I launch it on my cluster I am getting the following error:

Exception in thread "main" java.util.ServiceConfigurationError: org.geotools.filter._expression_.PropertyAccessorFactory: Provider org.locationtech.geomesa.convert.cql.ArrayPropertyAccessorFactoryorg.locationtech.geomesa.features.kryo.json.JsonPropertyAccessorFactory not found

at java.util.ServiceLoader.fail(ServiceLoader.java:239)

at java.util.ServiceLoader.access$300(ServiceLoader.java:185)

at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:372)

at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)

at java.util.ServiceLoader$1.next(ServiceLoader.java:480)

at org.geotools.filter._expression_.PropertyAccessors.<clinit>(PropertyAccessors.java:51)

at org.geotools.filter.AttributeExpressionImpl.evaluate(AttributeExpressionImpl.java:213)

at org.geotools.filter.AttributeExpressionImpl.evaluate(AttributeExpressionImpl.java:189)

at org.geotools.filter.FilterAttributeExtractor.visit(FilterAttributeExtractor.java:130)

at org.geotools.filter.AttributeExpressionImpl.accept(AttributeExpressionImpl.java:340)

at org.geotools.filter.visitor.DefaultFilterVisitor.visit(DefaultFilterVisitor.java:214)

at org.geotools.filter.spatial.BBOXImpl.accept(BBOXImpl.java:224)

at org.geotools.data.DataUtilities.propertyNames(DataUtilities.java:413)

at org.locationtech.geomesa.filter.FilterHelper$.propertyNames(FilterHelper.scala:469)

at org.locationtech.geomesa.filter.visitor.FilterExtractingVisitor.keep(FilterExtractingVisitor.scala:44)

at org.locationtech.geomesa.filter.visitor.FilterExtractingVisitor.visit(FilterExtractingVisitor.scala:133)

at org.geotools.filter.spatial.BBOXImpl.accept(BBOXImpl.java:224)

at org.locationtech.geomesa.filter.visitor.FilterExtractingVisitor$.apply(FilterExtractingVisitor.scala:28)

at org.locationtech.geomesa.index.strategies.SpatioTemporalFilterStrategy$class.getFilterStrategy(SpatioTemporalFilterStrategy.scala:37)

at org.locationtech.geomesa.accumulo.index.Z3Index$.getFilterStrategy(Z3Index.scala:21)

at org.locationtech.geomesa.index.api.FilterSplitter$$anonfun$5.apply(FilterSplitter.scala:122)

at org.locationtech.geomesa.index.api.FilterSplitter$$anonfun$5.apply(FilterSplitter.scala:122)

at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)

at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)

at scala.collection.immutable.List.foreach(List.scala:381)

at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)

at scala.collection.immutable.List.flatMap(List.scala:344)

at org.locationtech.geomesa.index.api.FilterSplitter.org$locationtech$geomesa$index$api$FilterSplitter$$getSimpleQueryOptions(FilterSplitter.scala:122)

at org.locationtech.geomesa.index.api.FilterSplitter.getQueryOptions(FilterSplitter.scala:104)

at org.locationtech.geomesa.index.api.StrategyDecider$$anonfun$1.apply(StrategyDecider.scala:52)

at org.locationtech.geomesa.index.api.StrategyDecider$$anonfun$1.apply(StrategyDecider.scala:52)

at org.locationtech.geomesa.utils.stats.MethodProfiling$class.profile(MethodProfiling.scala:26)

at org.locationtech.geomesa.index.api.StrategyDecider.profile(StrategyDecider.scala:18)

at org.locationtech.geomesa.index.api.StrategyDecider.getFilterPlan(StrategyDecider.scala:52)

at org.locationtech.geomesa.index.api.QueryPlanner$$anonfun$4.apply(QueryPlanner.scala:135)

at org.locationtech.geomesa.index.api.QueryPlanner$$anonfun$4.apply(QueryPlanner.scala:114)

at org.locationtech.geomesa.utils.stats.MethodProfiling$class.profile(MethodProfiling.scala:26)

at org.locationtech.geomesa.index.api.QueryPlanner.profile(QueryPlanner.scala:43)

at org.locationtech.geomesa.index.api.QueryPlanner.getQueryPlans(QueryPlanner.scala:114)

at org.locationtech.geomesa.index.api.QueryPlanner.planQuery(QueryPlanner.scala:61)

at org.locationtech.geomesa.index.geotools.GeoMesaDataStore.getQueryPlan(GeoMesaDataStore.scala:464)

at org.locationtech.geomesa.accumulo.data.AccumuloDataStore.getQueryPlan(AccumuloDataStore.scala:108)

at org.locationtech.geomesa.jobs.accumulo.AccumuloJobUtils$.getMultipleQueryPlan(AccumuloJobUtils.scala:117)

at org.locationtech.geomesa.spark.accumulo.AccumuloSpatialRDDProvider.rdd(AccumuloSpatialRDDProvider.scala:107)

at org.locationtech.geomesa.spark.api.java.JavaSpatialRDDProvider.rdd(JavaGeoMesaSpark.scala:37)

at com.praxedo.geomesa.geomesa_spark.Test.main(Test.java:43)

Maybe a missing dependency? Any idea on this problem?

Thanks for your time.

José.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

Follow-Ups:
- Re: [geomesa-users] Geomesa Spark Java API
  - From: Jose Bujalance

References:
- [geomesa-users] Geomesa Spark Java API
  - From: Jose Bujalance
- Re: [geomesa-users] Geomesa Spark Java API
  - From: Jim Hughes
- Re: [geomesa-users] Geomesa Spark Java API
  - From: Jose Bujalance
- Re: [geomesa-users] Geomesa Spark Java API
  - From: Jose Bujalance

Prev by Date: Re: [geomesa-users] Geomesa Spark Java API
Next by Date: Re: [geomesa-users] Geomesa Spark Java API
Previous by thread: Re: [geomesa-users] Geomesa Spark Java API
Next by thread: Re: [geomesa-users] Geomesa Spark Java API
Index(es):
- Date
- Thread

Breadcrumbs