Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Date Indexing, Stucked queries

Hello,

Given that I've seen timeout/out-of-memory issues too, I just wanted to check a couple of things:

Is org.locationtech.geomesa.utils.interop.SimpleFeatureTypes equivalent to org.locationtech.geomesa.utils.geotools.SimpleFeatureTypes in 1.1.0 rc.2?

Is there  a method that will return the underlying accumulo ranges for a given query, or is a breakpoint my best bet?

Thanks!

Ben

-----Original Message-----
From: geomesa-users-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-users-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Emilio Lahr-Vivaz
Sent: Wednesday, September 02, 2015 7:07 PM
To: geomesa-users@xxxxxxxxxxxxxxxx
Subject: Re: [geomesa-users] Date Indexing, Stucked queries

Hi Marcel,

The problem with your date not being indexed is because you are using the DataUtilities class to create your simple feature type. The index hints are a geomesa-specific feature, so to trigger them you have to use the following method instead:

org.locationtech.geomesa.utils.interop.SimpleFeatureTypes#createType

I think the issue with your memory is that your query is fairly large (5+ years), which means that we end up creating a lot of ranges for accumulo to scan. I used a small polygon 4 by 5 degrees square with that date range, and the query resulted in 660261 ranges. To alleviate the problem, you may want to split your query up into smaller chunks (maybe
6 months at a time).

I've created a ticket here to track the issue: 
https://geomesa.atlassian.net/browse/GEOMESA-905

Thanks,

Emilio


On 09/02/2015 02:20 PM, Marcel wrote:
> This is how I build my simple feature type (slightly adapted version 
> from geomesa-gdelt project).
>
> private static SimpleFeatureType buildGDELTFeatureType(String
> featureName) throws SchemaException {
>         String spec = Joiner.on(",").join(attributes);
>         SimpleFeatureType featureType = 
> DataUtilities.createType(featureName, spec);
>         // This tells GeoMesa to use this Attribute as the Start Time 
> index featureType.getUserData().put(Constants.SF_PROPERTY_START_TIME,
> "SQLDATE");
>         return featureType;
>     }
>
>     /**
>      * list of gdelt attributes with their datatypes. *geom indicates 
> that this attribute will be the default geometry.
>      */
>     private static List<String> attributes = 
> Lists.newArrayList("GLOBALEVENTID:Integer", "SQLDATE:Date:index=full", 
> "MonthYear:Integer",
>             "Year:Integer", "FractionDate:Float", "Actor1Code:String", 
> "Actor1Name:String", "Actor1CountryCode:String",
>             "Actor1KnownGroupCode:String", "Actor1EthnicCode:String", 
> "Actor1Religion1Code:String",
>             "Actor1Religion2Code:String", "Actor1Type1Code:String", 
> "Actor1Type2Code:String", "Actor1Type3Code:String",
>             "Actor2Code:String", "Actor2Name:String", 
> "Actor2CountryCode:String", "Actor2KnownGroupCode:String",
>             "Actor2EthnicCode:String", "Actor2Religion1Code:String", 
> "Actor2Religion2Code:String",
>             "Actor2Type1Code:String", "Actor2Type2Code:String", 
> "Actor2Type3Code:String", "IsRootEvent:Integer",
>             "EventCode:String", "EventBaseCode:String", 
> "EventRootCode:String", "QuadClass:Integer",
>             "GoldsteinScale:Float", "NumMentions:Integer", 
> "NumSources:Integer", "NumArticles:Integer", "AvgTone:Float",
>             "Actor1Geo_Type:Integer", "Actor1Geo_FullName:String", 
> "Actor1Geo_CountryCode:String",
>             "Actor1Geo_ADM1Code:String", "Actor1Geo_Lat:Float", 
> "Actor1Geo_Long:Float", "Actor1Geo_FeatureID:String",
>             "Actor2Geo_Type:Integer", "Actor2Geo_FullName:String", 
> "Actor2Geo_CountryCode:String",
>             "Actor2Geo_ADM1Code:String", "Actor2Geo_Lat:Float", 
> "Actor2Geo_Long:Float", "Actor2Geo_FeatureID:String",
>             "ActionGeo_Type:Integer", "ActionGeo_FullName:String", 
> "ActionGeo_CountryCode:String",
>             "ActionGeo_ADM1Code:String", "ActionGeo_Lat:Float", 
> "ActionGeo_Long:Float", "ActionGeo_FeatureID:String",
>             "DATEADDED:Integer", "SourceUrl:String", 
> "*geom:Point:srid=4326");
>
> I´m using geomesa 1.1.0-rc.4. Yes I dropped all of my geomesa-table 
> before reingesting them.
>
> These stucked queries and heapspace errors only occurs when executing 
> geotemporal queries like this one. I ingested a 1 GiB gdelt-testfile.
>
>     /**
>      * find all events in ukraine since 2010 (until 2015-06-30) in 
> connection
>      * with protests (eventrootcode = 14).
>      */
>     private static SimpleFeatureIterator 
> getResultsForQuery13(Map<String, String> dsConf) {
>
>         SimpleFeatureSource featureSource = 
> SimpleFeatureSourceFactory.getSimpleFeatureSource(dsConf);
>
>         FilterFactory2 ff = CommonFactoryFinder.getFilterFactory2();
>
>         DateFormat df = new SimpleDateFormat("yyyy-MM-dd");
>         Date start = null;
>         Date end = null;
>         try {
>             start = df.parse("2010-01-01");
>             end = df.parse("2015-06-30");
>         } catch (java.text.ParseException e) {
>             e.printStackTrace();
>         }
>
>         Filter timeFilter =
> ff.between(ff.property(GDELTConstants.DATE), ff.literal(start), 
> ff.literal(end));
>         // bound query spatially to ukraine
>         Filter spatialFilter = null;
>         try {
>             spatialFilter = ECQL.toFilter(
>                     "Contains(Polygon((34.01626 44.00715, ... 
> ,34.01626 44.00715)), " + GDELTConstants.GEOM + ")");
>         } catch (CQLException e) {
>             e.printStackTrace();
>         }
>
>         // Now we can combine our time filter and our spatial filter 
> using a
>         // boolean and operator
>         Filter timeSpatialFilter = ff.and(timeFilter, spatialFilter);
>         Filter attributeFilter =
> ff.like(ff.property(GDELTConstants.EVENT_ROOT_CODE), "14");
>         Filter completeFilter = ff.and(timeSpatialFilter, 
> attributeFilter);
>
>         Query query = new
> Query(dsConf.get(AccumuloDataStoreConfiguration.FEATURE_NAME),
> completeFilter,
>                 new String[] { GDELTConstants.GLOBAL_EVENTID, 
> GDELTConstants.DATE });
>         SimpleFeatureCollection sfCollection = null;
>         try {
>             sfCollection = featureSource.getFeatures(query);
>         } catch (IOException e) {
>             e.printStackTrace();
>         }
>
>         return sfCollection.features();
>     }
>
> Thanks,
> Marcel Jacob.
>
>
> Am 01.09.2015 21:34, schrieb Emilio Lahr-Vivaz:
>> Hi Marcel,
>>
>> Could you provide your full simple feature type string? I'll try to 
>> reproduce the error you're seeing with the full table scan. Also, 
>> what version of geomesa are you currently using? Did you re-ingest 
>> your data using the new version? If not, what was the old version 
>> that you ingested the data with?
>>
>> With regards to the queries not finishing - we try to optimize 
>> queries so that they only scan records that are likely to match.
>> However, depending on the query, we can't always do that. If you're 
>> seeing the 'full table scan' warning, then the query won't completely 
>> return until it has scanned your entire dataset, even if none of the 
>> features actually match. In all cases, the scan should eventually 
>> return, but if you're getting memory errors you might need to bump up 
>> some settings somewhere. If java gets low on memory and starts 
>> swapping to disk, it can slow things to a crawl. Where are you seeing 
>> the heapspace errors?
>>
>> Thanks,
>>
>> Emilio
>>
>> On 09/01/2015 11:58 AM, Marcel wrote:
>>> Hello,
>>> after some weeks of abstinence I continued working with Geomesa. 
>>> First of all I updated to the new geomesa version and some of my 
>>> problems got solved.
>>> Unfortunately others were not. My data imported successfully on the 
>>> cluster, but it seems that my Date attribute was not indexed. I used 
>>> "SQLDATE:Date:index=full" for this attribute. But when executing a 
>>> query using a temporal filter the logger says: "Running full table 
>>> scan for schema event with filter SQLDATE AFTER 
>>> 1991-04-28T22:00:00+00:00". Is this the correct way to define that 
>>> my attribute should be indexed?
>>>
>>> Another problem seems to appear when there are 0 results for my 
>>> query.  These queries often dont finish. Sometimes even a HeapSpace 
>>> error occurs. Maybe this stays in connection with my missing 
>>> indexing date attribute when scanning over all records.
>>>
>>> Best regards,
>>> Marcel Jacob.
>>> _______________________________________________
>>> geomesa-users mailing list
>>> geomesa-users@xxxxxxxxxxxxxxxx
>>> To change your delivery options, retrieve your password, or 
>>> unsubscribe from this list, visit 
>>> http://www.locationtech.org/mailman/listinfo/geomesa-users
>>
>> _______________________________________________
>> geomesa-users mailing list
>> geomesa-users@xxxxxxxxxxxxxxxx
>> To change your delivery options, retrieve your password, or 
>> unsubscribe from this list, visit 
>> http://www.locationtech.org/mailman/listinfo/geomesa-users
>
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or 
> unsubscribe from this list, visit 
> http://www.locationtech.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit http://www.locationtech.org/mailman/listinfo/geomesa-users


Back to the top