[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[geowave-dev] Split query results in chunks
|
Hello,
I wrote a query which needs a group by statement. Since this keyword is
not supported in GeoWave I use Spark.
This is fine for small datasize like 1 til 3 GB. However if I change to
10 GB there is not enough heap space to answer the query and I canĀ“t
give more heap space to my mini cluster.
Iterator<SimpleFeature> intermediateResults;
This is the iterator for my intermediate results. Unfortunately the
.remove() method is not supported. So I thought chunking up the results
should save me space. A SimpleFeature is not serializable so I have to
encapsulate it in a custom object for use with Spark. Like so:
while (intermediateresults.hasNext()) {
sf = intermediateresults.next();
countryCode1 = String.valueOf(sf.getAttribute("Actor1CountryCode"));
countryCode2 = String.valueOf(sf.getAttribute("Actor2CountryCode"));
actorCountryList.add(new CountryNames(
countryCode1,
countryCode2));
}
CountryNames are serializable. This loop is my bottleneck, which causes
the error, because it is one the client node. I added a counter and each
1 million results I process spark results and clear my list. Afterwards
I merge my results to the final one. But this causes the same error, so
memory could not released. So I think the ITERATOR is the main-problem
here. Is there another way for chunking? Or do you have an idea what
else I could try?
Best regards,
Marcel Jacob.