Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Apply Geomesa function to standard Spark Dataset

Thanks a lot Jim! It's working now. My next step is to write it out into Accumulo. I tried this:

    val prepedData = spark.sql("""SELECT *, st_makePoint(Actor1Geo_Lat, Actor1Geo_Long) as geom FROM ingested_data""")

    prepedData.show(10)

    prepedData
      .write
      .format("geomesa")
      .options(dsParams)

However, the `write` part does not seem to do anything. I get no error, but there is also no data in Accumulo. Can you please let me know how to resolve this?

I also have the feature definition available (partial code example):

  // GeoMesa Feature
  var geoMesaSchema = Lists.newArrayList(
    "GLOBALEVENTID:Integer",
    "SQLDATE:Date",
    "MonthYear:Integer",
    "Year:Integer",

Is there a way to add this as an option to the write function?

Best regards,
Diethard

On Thu, Jul 6, 2017 at 9:38 PM, Jim Hughes <jnh5y@xxxxxxxx> wrote:
Hi Diethard,

GeoMesa uses a private bit of the Spark API to add user-defined types and functions.  You'll want to make sure that the geomesa-spark-sql_2.11 jar is on the classpath, and then you can call

org.apache.spark.sql.SQLTypes.init(sqlContext)

Calling this function will add the geometric types, functions, and optimizations to the SQL Context.  As part of loading a GeoMesa dataset into Spark SQL, the code calls this function.  (This is why all these functions work when you use GeoMesa, etc.)

As another alternative, you can use the GeoMesa Converter library to load GDELT as a DataFrame.  You should be able to use a spark.read("geomesa").options(params) call to parse GDELT CSVs straight into SimpleFeatures.  That'd save needing to write SQL to munge columns into geometries, etc.

Cheers,

Jim


On 07/06/2017 04:11 PM, Diethard Steiner wrote:
Hi,

So I am sourcing some data with Spark SQL and now I want to use the Geomesa function `st_makePoint`:

    val ingestedData = (
      spark
        .read
        .option("header", "false")
        .option("delimiter","\\t")
        .option("ignoreLeadingWhiteSpace","true")
        .option("ignoreTrailingWhiteSpace","true")
        .option("treatEmptyValuesAsNulls","true")
        .option("dateFormat","yyyyMMdd")
        .schema(gdeltSchema)
        .csv(ingestFile)
      )

    ingestedData.show(10)
    println(ingestedData.getClass)

    ingestedData.createOrReplaceTempView("ingested_data")
    val prepedData = spark.sql("""SELECT *, st_makePoint(Actor1Geo_Lat, Actor1Geo_Long) as geom FROM ingested_data""")

I get following error:

 Exception in thread "main" org.apache.spark.sql.AnalysisException: Undefined function: 'st_makePoint'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.;

How do I resolve this?

Note: The other example on sourcing data directly from Accumulo via Spark SQL and using functions on it is working in my environment. So I assume I just need a way to convert a normal Spark Dataset into one that can use the GeoMesa functions.

Best regards,
Diethard


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users



_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users


Back to the top