Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] BindingSet



On Fri, Nov 6, 2020, at 09:54, Håvard Ottestad wrote:

Hi

As part of optimizing sorting in RDF4J I discovered that we are spending a lot of time instantiating HashMaps within BindingSet and then a lot of time hashing strings.

Intriguing. I'm not against anything that improves performance of course, but I'd like to understand a little better what you're proposing to optimize here. You say we "spend a lot of time instantiating hashmaps"? When doing what, exactly?

I was considering looking into having internal variables names that allow for lookup directly in arrays instead of maps. Essentially having variables be bytes. Then mapping those bytes back into the variable names for the enduser. This would support up to 256 variables per BindingSet. So we would have to fall back to the old one if we need more. 

This sounds like a potential separate implementation of BindingSet.

I've got to admit that I'm not convinced that hashing of strings is such an expensive operation that we should be avoiding it. What makes you think that mapping from a string to a byte and then back to a string again (effectively doing our own "cheap" hash) is significantly more performant than just relying on standard string hashing? Especially since, once you have the hash, lookup in a Map is O(1), while array lookup is O(n). Not to mention the fact that you'll have to do your own bookkeeping to avoid clashes and detect overflows (e.g. > 256 variables).


For things like group by or count I think this will make a massive difference. Also for queries that do a lot of internal joining of data, but actually return very few results. 

If we can quantify the improvement in those terms, this might be very useful.

I am conscious of the fact that you are in the middle of a massive SHACL rewrite/refactor though. It's not up to me to tell you what to work on of course but are you sure you have the bandwidth to pick this up, and won't this go at the cost of finishing the new AST in time? 

Cheers,

Jeen

Back to the top