Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[geowave-dev] Split query results in chunks

Hello,
I wrote a query which needs a group by statement. Since this keyword is 
not supported in GeoWave I use Spark.
This is fine for small datasize like 1 til 3 GB. However if I change to 
10 GB there is not enough heap space to answer the query and I canĀ“t 
give more heap space to my mini cluster.

Iterator<SimpleFeature> intermediateResults;

This is the iterator for my intermediate results. Unfortunately the 
.remove() method is not supported. So I thought chunking up the results 
should save me space. A SimpleFeature is not serializable so I have to 
encapsulate it in a custom object for use with Spark. Like so:

while (intermediateresults.hasNext()) {
sf = intermediateresults.next();
countryCode1 = String.valueOf(sf.getAttribute("Actor1CountryCode"));
countryCode2 = String.valueOf(sf.getAttribute("Actor2CountryCode"));
actorCountryList.add(new CountryNames(
countryCode1,
countryCode2));
}

CountryNames are serializable. This loop is my bottleneck, which causes 
the error, because it is one the client node. I added a counter and each 
1 million results I process spark results and clear my list. Afterwards 
I merge my results to the final one. But this causes the same error, so 
memory could not released. So I think the ITERATOR is the main-problem 
here. Is there another way for chunking? Or do you have an idea what 
else I could try?

Best regards,
Marcel Jacob.

Back to the top