Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Apply Geomesa function to standard Spark Dataset

Hi Jim,

Thanks a lot. I got it working now, had to add a few filters as well to clean the data:

    val prepedData = spark.sql("""SELECT *, st_makePoint(Actor1Geo_Lat, Actor1Geo_Long) as geom FROM ingested_data""")

    prepedData.show(10)

    val prepedDataFiltered = prepedData
      .filter("geom IS NOT NULL")
      // filter out invalid lat and long
      .filter("NOT(ABS(Actor1Geo_Lat) > 90.0 OR ABS(Actor1Geo_Long) > 90.0)")

    println("========= FINAL OUTPUT ===============")

    prepedDataFiltered.show(10)

    prepedDataFiltered
      .write
      .format("geomesa")
      .options(dsParams)
      .option("geomesa.feature","event")
      .save()

I shall publish a blog post shortly on this topic.

Best regards,
Diethard

On Sat, Jul 8, 2017 at 12:32 AM, Jim Hughes <jnh5y@xxxxxxxx> wrote:
Hi Diethard,

Since a dataframe lines up with a feature/feature type/feature source, you'll see that error if you don't indicate which feature in the GeoMesa datastore to process.  To call it out separately, in our unit tests and sample code, we typically set it with a separate call to .option on the builder. (1)

The good news is that you don't seem to have hit any bugs yet.  If you do want to try the latest master, you can download a distribution bundle here (2).

Cheers,

Jim

1) https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-spark-runtime/src/test/scala/org/locationtech/geomesa/accumulo/spark/AccumuloSparkProviderTest.scala#L85
2) https://repo.locationtech.org/service/local/repositories/geomesa-snapshots/content/org/locationtech/geomesa/geomesa-accumulo-dist_2.11/1.3.2-SNAPSHOT/geomesa-accumulo-dist_2.11-1.3.2-20170707.204012-164-bin.tar.gz


On 7/7/2017 3:51 PM, Diethard Steiner wrote:
Thanks a lot Jim! I am getting following error now:
Exception in thread "main" java.util.NoSuchElementException: key not found: geomesa.feature

This is with v1.3.1.

Is this the error you were expecting?
I can build from source to get the bug fix ... which tag should I checkout?

Best regards,
Diethard

On Fri, Jul 7, 2017 at 5:52 PM, Jim Hughes <jnh5y@xxxxxxxx> wrote:
Hi Diethard,

I think the fix'll be easy.  Can you try this?

    prepedData
      .write
      .format("geomesa")
      .options(dsParams)
      .save()

I *think* you are just missing the call to 'save'.  We did recently fix a different bug with writing in Spark, so if that doesn't do it, let us know which version of GeoMesa you are using, etc.

As a more complete example, check out (1).

Cheers,

Jim

1. https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-spark-runtime/src/test/scala/org/locationtech/geomesa/accumulo/spark/AccumuloSparkProviderTest.scala#L86


On 07/06/2017 05:16 PM, Diethard Steiner wrote:
Thanks a lot Jim! It's working now. My next step is to write it out into Accumulo. I tried this:

    val prepedData = spark.sql("""SELECT *, st_makePoint(Actor1Geo_Lat, Actor1Geo_Long) as geom FROM ingested_data""")

    prepedData.show(10)

    prepedData
      .write
      .format("geomesa")
      .options(dsParams)

However, the `write` part does not seem to do anything. I get no error, but there is also no data in Accumulo. Can you please let me know how to resolve this?

I also have the feature definition available (partial code example):

  // GeoMesa Feature
  var geoMesaSchema = Lists.newArrayList(
    "GLOBALEVENTID:Integer",
    "SQLDATE:Date",
    "MonthYear:Integer",
    "Year:Integer",

Is there a way to add this as an option to the write function?

Best regards,
Diethard

On Thu, Jul 6, 2017 at 9:38 PM, Jim Hughes <jnh5y@xxxxxxxx> wrote:
Hi Diethard,

GeoMesa uses a private bit of the Spark API to add user-defined types and functions.  You'll want to make sure that the geomesa-spark-sql_2.11 jar is on the classpath, and then you can call

org.apache.spark.sql.SQLTypes.init(sqlContext)

Calling this function will add the geometric types, functions, and optimizations to the SQL Context.  As part of loading a GeoMesa dataset into Spark SQL, the code calls this function.  (This is why all these functions work when you use GeoMesa, etc.)

As another alternative, you can use the GeoMesa Converter library to load GDELT as a DataFrame.  You should be able to use a spark.read("geomesa").options(params) call to parse GDELT CSVs straight into SimpleFeatures.  That'd save needing to write SQL to munge columns into geometries, etc.

Cheers,

Jim


On 07/06/2017 04:11 PM, Diethard Steiner wrote:
Hi,

So I am sourcing some data with Spark SQL and now I want to use the Geomesa function `st_makePoint`:

    val ingestedData = (
      spark
        .read
        .option("header", "false")
        .option("delimiter","\\t")
        .option("ignoreLeadingWhiteSpace","true")
        .option("ignoreTrailingWhiteSpace","true")
        .option("treatEmptyValuesAsNulls","true")
        .option("dateFormat","yyyyMMdd")
        .schema(gdeltSchema)
        .csv(ingestFile)
      )

    ingestedData.show(10)
    println(ingestedData.getClass)

    ingestedData.createOrReplaceTempView("ingested_data")
    val prepedData = spark.sql("""SELECT *, st_makePoint(Actor1Geo_Lat, Actor1Geo_Long) as geom FROM ingested_data""")

I get following error:

 Exception in thread "main" org.apache.spark.sql.AnalysisException: Undefined function: 'st_makePoint'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.;

How do I resolve this?

Note: The other example on sourcing data directly from Accumulo via Spark SQL and using functions on it is working in my environment. So I assume I just need a way to convert a normal Spark Dataset into one that can use the GeoMesa functions.

Best regards,
Diethard


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users



_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users



_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users



_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users



_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users



_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users


Back to the top