Re: [geomesa-users] Date Indexing, Stucked queries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Date Indexing, Stucked queries

From: Marcel <m.jacob@xxxxxxxxxxx>
Date: Wed, 2 Sep 2015 20:20:39 +0200
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <http://www.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <http://www.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

This is how I build my simple feature type (slightly adapted versionfrom geomesa-gdelt project).

private static SimpleFeatureType buildGDELTFeatureType(StringfeatureName) throws SchemaException {

        String spec = Joiner.on(",").join(attributes);

SimpleFeatureType featureType =DataUtilities.createType(featureName, spec);

        // This tells GeoMesa to use this Attribute as the Start Time index
featureType.getUserData().put(Constants.SF_PROPERTY_START_TIME, "SQLDATE");
        return featureType;
    }

    /**

* list of gdelt attributes with their datatypes. *geom indicatesthat this attribute will be the default geometry.

*/

private static List<String> attributes =Lists.newArrayList("GLOBALEVENTID:Integer", "SQLDATE:Date:index=full","MonthYear:Integer","Year:Integer", "FractionDate:Float", "Actor1Code:String","Actor1Name:String", "Actor1CountryCode:String","Actor1KnownGroupCode:String", "Actor1EthnicCode:String","Actor1Religion1Code:String","Actor1Religion2Code:String", "Actor1Type1Code:String","Actor1Type2Code:String", "Actor1Type3Code:String","Actor2Code:String", "Actor2Name:String","Actor2CountryCode:String", "Actor2KnownGroupCode:String","Actor2EthnicCode:String", "Actor2Religion1Code:String","Actor2Religion2Code:String","Actor2Type1Code:String", "Actor2Type2Code:String","Actor2Type3Code:String", "IsRootEvent:Integer","EventCode:String", "EventBaseCode:String","EventRootCode:String", "QuadClass:Integer","GoldsteinScale:Float", "NumMentions:Integer","NumSources:Integer", "NumArticles:Integer", "AvgTone:Float","Actor1Geo_Type:Integer", "Actor1Geo_FullName:String","Actor1Geo_CountryCode:String","Actor1Geo_ADM1Code:String", "Actor1Geo_Lat:Float","Actor1Geo_Long:Float", "Actor1Geo_FeatureID:String","Actor2Geo_Type:Integer", "Actor2Geo_FullName:String","Actor2Geo_CountryCode:String","Actor2Geo_ADM1Code:String", "Actor2Geo_Lat:Float","Actor2Geo_Long:Float", "Actor2Geo_FeatureID:String","ActionGeo_Type:Integer", "ActionGeo_FullName:String","ActionGeo_CountryCode:String","ActionGeo_ADM1Code:String", "ActionGeo_Lat:Float","ActionGeo_Long:Float", "ActionGeo_FeatureID:String","DATEADDED:Integer", "SourceUrl:String","*geom:Point:srid=4326");

I´m using geomesa 1.1.0-rc.4. Yes I dropped all of my geomesa-tablebefore reingesting them.

These stucked queries and heapspace errors only occurs when executinggeotemporal queries like this one. I ingested a 1 GiB gdelt-testfile.

/**

* find all events in ukraine since 2010 (until 2015-06-30) inconnection

     * with protests (eventrootcode = 14).
     */

private static SimpleFeatureIteratorgetResultsForQuery13(Map<String, String> dsConf) {

SimpleFeatureSource featureSource =SimpleFeatureSourceFactory.getSimpleFeatureSource(dsConf);


        FilterFactory2 ff = CommonFactoryFinder.getFilterFactory2();

        DateFormat df = new SimpleDateFormat("yyyy-MM-dd");
        Date start = null;
        Date end = null;
        try {
            start = df.parse("2010-01-01");
            end = df.parse("2015-06-30");
        } catch (java.text.ParseException e) {
            e.printStackTrace();
        }

Filter timeFilter =ff.between(ff.property(GDELTConstants.DATE), ff.literal(start),ff.literal(end));

        // bound query spatially to ukraine
        Filter spatialFilter = null;
        try {
            spatialFilter = ECQL.toFilter(

"Contains(Polygon((34.01626 44.00715, ... ,34.0162644.00715)), " + GDELTConstants.GEOM + ")");

        } catch (CQLException e) {
            e.printStackTrace();
        }

// Now we can combine our time filter and our spatial filterusing a

        // boolean and operator
        Filter timeSpatialFilter = ff.and(timeFilter, spatialFilter);

Filter attributeFilter =ff.like(ff.property(GDELTConstants.EVENT_ROOT_CODE), "14");

        Filter completeFilter = ff.and(timeSpatialFilter, attributeFilter);

Query query = newQuery(dsConf.get(AccumuloDataStoreConfiguration.FEATURE_NAME),completeFilter,new String[] { GDELTConstants.GLOBAL_EVENTID,GDELTConstants.DATE });

        SimpleFeatureCollection sfCollection = null;
        try {
            sfCollection = featureSource.getFeatures(query);
        } catch (IOException e) {
            e.printStackTrace();
        }

        return sfCollection.features();
    }

Thanks,
Marcel Jacob.


Am 01.09.2015 21:34, schrieb Emilio Lahr-Vivaz:

Hi Marcel,
Could you provide your full simple feature type string? I'll try toreproduce the error you're seeing with the full table scan. Also, whatversion of geomesa are you currently using? Did you re-ingest yourdata using the new version? If not, what was the old version that youingested the data with?
With regards to the queries not finishing - we try to optimize queriesso that they only scan records that are likely to match. However,depending on the query, we can't always do that. If you're seeing the'full table scan' warning, then the query won't completely returnuntil it has scanned your entire dataset, even if none of the featuresactually match. In all cases, the scan should eventually return, butif you're getting memory errors you might need to bump up somesettings somewhere. If java gets low on memory and starts swapping todisk, it can slow things to a crawl. Where are you seeing theheapspace errors?
Thanks,

Emilio

On 09/01/2015 11:58 AM, Marcel wrote:
Hello,
after some weeks of abstinence I continued working with Geomesa.First of all I updated to the new geomesa version and some of myproblems got solved.Unfortunately others were not. My data imported successfully on thecluster, but it seems that my Date attribute was not indexed. I used"SQLDATE:Date:index=full" for this attribute. But when executing aquery using a temporal filter the logger says: "Running full tablescan for schema event with filter SQLDATE AFTER1991-04-28T22:00:00+00:00". Is this the correct way to define that myattribute should be indexed?
Another problem seems to appear when there are 0 results for myquery. These queries often dont finish. Sometimes even a HeapSpaceerror occurs. Maybe this stays in connection with my missing indexingdate attribute when scanning over all records.
Best regards,
Marcel Jacob.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, orunsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, orunsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

Follow-Ups:
- Re: [geomesa-users] Date Indexing, Stucked queries
  - From: Emilio Lahr-Vivaz

References:
- [geomesa-users] Date Indexing, Stucked queries
  - From: Marcel
- Re: [geomesa-users] Date Indexing, Stucked queries
  - From: Emilio Lahr-Vivaz

Prev by Date: Re: [geomesa-users] Is SF_PROPERTY_END_TIME still supported?
Next by Date: Re: [geomesa-users] Date Indexing, Stucked queries
Previous by thread: Re: [geomesa-users] Date Indexing, Stucked queries
Next by thread: Re: [geomesa-users] Date Indexing, Stucked queries
Index(es):
- Date
- Thread

Breadcrumbs