[geowave-dev] Split query results in chunks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[geowave-dev] Split query results in chunks

From: Marcel Jacob <m.jacob@xxxxxxxxxxx>
Date: Tue, 10 Nov 2015 11:22:55 +0000
Accept-language: en-US
Delivered-to: geowave-dev@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mailman/private/geowave-dev>
List-help: <mailto:geowave-dev-request@locationtech.org?subject=help>
List-subscribe: <https://www.locationtech.org/mailman/listinfo/geowave-dev>, <mailto:geowave-dev-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://www.locationtech.org/mailman/options/geowave-dev>, <mailto:geowave-dev-request@locationtech.org?subject=unsubscribe>
Thread-index: AQHRG6ok6MDbgO6q+EKEvCfMFWx1jA==
Thread-topic: Split query results in chunks

Hello,
I wrote a query which needs a group by statement. Since this keyword is 
not supported in GeoWave I use Spark.
This is fine for small datasize like 1 til 3 GB. However if I change to 
10 GB there is not enough heap space to answer the query and I can´t 
give more heap space to my mini cluster.

Iterator<SimpleFeature> intermediateResults;

This is the iterator for my intermediate results. Unfortunately the 
.remove() method is not supported. So I thought chunking up the results 
should save me space. A SimpleFeature is not serializable so I have to 
encapsulate it in a custom object for use with Spark. Like so:

while (intermediateresults.hasNext()) {
sf = intermediateresults.next();
countryCode1 = String.valueOf(sf.getAttribute("Actor1CountryCode"));
countryCode2 = String.valueOf(sf.getAttribute("Actor2CountryCode"));
actorCountryList.add(new CountryNames(
countryCode1,
countryCode2));
}

CountryNames are serializable. This loop is my bottleneck, which causes 
the error, because it is one the client node. I added a counter and each 
1 million results I process spark results and clear my list. Afterwards 
I merge my results to the final one. But this causes the same error, so 
memory could not released. So I think the ITERATOR is the main-problem 
here. Is there another way for chunking? Or do you have an idea what 
else I could try?

Best regards,
Marcel Jacob.

Follow-Ups:
- Re: [geowave-dev] Split query results in chunks
  - From: Eric Robertson

Prev by Date: Re: [geowave-dev] No results with cqlfilter
Next by Date: Re: [geowave-dev] Split query results in chunks
Previous by thread: [geowave-dev] Accumulo table namespaces
Next by thread: Re: [geowave-dev] Split query results in chunks
Index(es):
- Date
- Thread

Breadcrumbs