Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] Collection Factory GH-3843 how to configure?
  • From: jerven Bolleman <jerven.bolleman@sib.swiss>
  • Date: Tue, 14 Jun 2022 10:26:43 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=sib.swiss; dmarc=pass action=none header.from=sib.swiss; dkim=pass header.d=sib.swiss; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=plSmuFWUl8OBMEfPhxsLsljBXcgaayymT+garCEkZAY=; b=DRSEmYfDifrKtv52urDmBC28ffvCexdbvpFShTPimc0sq2dHs1RtE+Z/2PyRSYzfZDT0ZgBzH0jP/BXT8N1GYb5MWujDUF0EQYY1GoKNbITji1vW/XFaPxD9eYGVzJb1h1xOAll2gnXe3HjYjtqPfuViHN/c2vKOVDijdGYGBLsfqfKmPy63xNsmCkfK8ORM0VPtnWj7i5Y9Z3nhJUcPPkWimKZLo68JGXMd6y79G1O6TDH0YRoppGiv743CpV/1bbdb0+HEJX+ascp9WSpOqgC474RxtAbPLfw1+S+Kto0MZduvD277h7Wx5v6fYx4ac/qxRaeQYA6bRHtM3o1QKg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OtOZ0llkPf1k8C90Npkip+WcqGCf/edAb6SKpd/WuBfLoD2gwn1baALjq+R7+V4N7ykUC7CqMrxpMmS7MAUlMdfSKRHoMbqDm9cBAh//L8KECF33DoIutn5R6pc2+f946FLR3YMEKCl4wbBfaGzMJhQk1aaVsJ/Qox38pwFWCxEqovbcuSgxW7bl9rBWFXTj4FtNpgoWNSOpcdP0LEMyxNmW0EtmpJ6bp2Zh+5VVL7+ow60Hl+MAh6KJ9TBdmvACl/hdcMjxpv/7eqkLdpVpyZjlWVqG7K9K2yNKfDkqtUrs5ZBVfCcgtKw0bUG+nBwtOs+TrVeoXbVqM2VPamxzuw==
  • Delivered-to: rdf4j-dev@xxxxxxxxxxx
  • List-archive: <https://www.eclipse.org/mailman/private/rdf4j-dev/>
  • List-help: <mailto:rdf4j-dev-request@eclipse.org?subject=help>
  • List-subscribe: <https://www.eclipse.org/mailman/listinfo/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=subscribe>
  • List-unsubscribe: <https://www.eclipse.org/mailman/options/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=unsubscribe>
  • User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0

Hi Jeen,


On 14/06/2022 00:18, Jeen Broekstra wrote:
Hi Jerven,

I haven't really found time yet to look at your PR in more detail, apologies for that. I'll try and put some time into this over the weekend (but can't promise anything, I'm in between house moves at the moment so it's all a little chaotic).
Thanks for the offer, but please take care of real life first. There is no urgency on my side.


Rough kneejerk thoughts on the idea:

 1.   I can totally understand that each Sail implementation can make
    more optimal choices regarding how to collect/persist its own
    values, but why does that need to be anything other than an internal
    matter for the sail? In other words: who are the intended /users/ of
    these publicly exposed collections and collection factories?
The challenge I have been having is the communication between the sail layer and the query evaluation layer. The benefits come from the evaluation layer using knowledge from the sail. But the sails already depend on the query evaluation. So classically this could be done by extending the EvaluationStrategy e.g. LimitedSizeEvaluationStrategy.
And then overriding every specific Iterator which used a collection.
Which is not trivial and already quite likely to lead to more maintenance issues downstream.

Code organization wise I think I fell into a local optimum. I have tried having the interfaces part of the query-evaluation package. First time round I was not to pleased with it, but I am *not* committed to the code layout as it is now.

In any case quite a bit of the performance benefits can be achieved without making this factory. (I am aware I propose a factory-factory in the RDF4J codebase, and while it is a crowning achievement of 15 years of java software development, is fuel on the fire for java haters). A lot is due to improved logic for setting up MapDB and more compact serializers.

 2. I am very way of adding /any/ methods (even with default
    implementations) to the base Sail interfaces, as it adds
    implementation responsibilities on third parties, and also just
    makes the interfaces harder to understand. I'd much rather add a new
    behavioral interface with this method, and have our sail
    implementations derive from that new interface (alongside the Sail
    interface).
This should not need to affect the SAIL interface, it currently does. However, it should only affect the EvaluationStrategy and EvaluationStrategyFactory. That is part of the configuration problem I would like some help with.

Take the above with several grains of salt as (like I said) I haven't really had time to look at your PR in detail. I think the main thing for me is a better understanding of the motivation and use cases.

An other motivation is to have a migration possibility for the replacing of MapDB in the group by logic. With a second one making path queries scale to disk instead of scale to memory.

Regards,
Jerven

Cheers,

Jeen

On Fri, 10 Jun 2022, at 20:57, jerven Bolleman wrote:
Hi All,

I wanted some feedback on GH-3843 [1]. Specifically on how best to
configure these kinds of factories.

So the idea of this issue is that we can build Collection classes that
use specific knowledge of a store's value implementations to be faster
and use less memory.

Now I think my pull request shows the benefits. But it also leads to the
collection-factory to be configurable. e.g. using MapDB or soon LMDB. It
also allows a better implementation of [2] or even removal as it is no
longer needed.

So regarding this configuration: I am very not at all experienced with
the rdf files used to configure rdf4j server etc. and I would like some
help/pointers in how to best implement this.

Regards,
Jerven

[1] https://github.com/eclipse/rdf4j/pull/3844 <https://github.com/eclipse/rdf4j/pull/3844> [2] https://github.com/eclipse/rdf4j/issues/3983 <https://github.com/eclipse/rdf4j/issues/3983>
--

*Jerven Tjalling Bolleman*
Principal Software Developer
*SIB | Swiss Institute of Bioinformatics*
1, rue Michel Servet - CH 1211 Geneva 4 - Switzerland
t +41 22 379 58 85
Jerven.Bolleman@sib.swiss <mailto:Jerven.Bolleman@sib.swiss> - www.sib.swiss <http://www.sib.swiss>

_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx <mailto:rdf4j-dev@xxxxxxxxxxx>
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/rdf4j-dev <https://www.eclipse.org/mailman/listinfo/rdf4j-dev>



_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/rdf4j-dev

--

	*Jerven Tjalling Bolleman*
Principal Software Developer
*SIB | Swiss Institute of Bioinformatics*
1, rue Michel Servet - CH 1211 Geneva 4 - Switzerland
t +41 22 379 58 85
Jerven.Bolleman@sib.swiss - www.sib.swiss



Back to the top