Eclipse Community Forums: Buckminster » Aggregator, scalability, and validation

Home » Archived » Buckminster » Aggregator, scalability, and validation

Aggregator, scalability, and validation [message #502746]

Wed, 09 December 2009 09:49

Eclipse User

Hi,
I've been thinking a lot about huge repositories and scalability lately. The p2 planner brings all meta-data involved in
an installation into memory. Its SAT solver needs that in order to come up with a plan. This will soon become a problem
for our Aggregator. The approach to validate repository consistency by asking the p2 planner to make a plan that
includes everything, simply won't fly as the repositories grow larger. I think we need some way to scope the consistency
check. Perhaps that's a matter of creating sub-meta-data repositories that are internally consistent.

Another thing that I think we should improve is to support the use-case where you'd like a mirror of something that
isn't necessarily consistent. You simply want what's out there, and you don't really care if it's consistent or not.
Perhaps you do that as a first step in order to combine with other things and finally create something that indeed is
consistent. So perhaps the aggregation is one thing and consistency validation another.

A third improvement that I'd like to see is to enable that all aggregations share the same artifact repository. There's
never a reason to copy an artifact with an MD5-checksum twice. If you have it, you have it.

What do others think?

Regards,
Thomas Hallgren

Re: Aggregator, scalability, and validation [message #502958 is a reply to message #502746]

Thu, 10 December 2009 07:34

Eclipse User

The aggregator might have three validation levels:

- full consistency validation
- contribution based consistency validation (each contribution subtree
is validated separately)
- no consistency validation

Karel

Thomas Hallgren napsal(a):
> Hi,
> I've been thinking a lot about huge repositories and scalability lately.
> The p2 planner brings all meta-data involved in an installation into
> memory. Its SAT solver needs that in order to come up with a plan. This
> will soon become a problem for our Aggregator. The approach to validate
> repository consistency by asking the p2 planner to make a plan that
> includes everything, simply won't fly as the repositories grow larger. I
> think we need some way to scope the consistency check. Perhaps that's a
> matter of creating sub-meta-data repositories that are internally
> consistent.
>
> Another thing that I think we should improve is to support the use-case
> where you'd like a mirror of something that isn't necessarily
> consistent. You simply want what's out there, and you don't really care
> if it's consistent or not. Perhaps you do that as a first step in order
> to combine with other things and finally create something that indeed is
> consistent. So perhaps the aggregation is one thing and consistency
> validation another.
>
> A third improvement that I'd like to see is to enable that all
> aggregations share the same artifact repository. There's never a reason
> to copy an artifact with an MD5-checksum twice. If you have it, you have
> it.
>
> What do others think?
>
> Regards,
> Thomas Hallgren

Re: Aggregator, scalability, and validation [message #502982 is a reply to message #502746]

Thu, 10 December 2009 09:13

Eclipse User

Thomas Hallgren napsal(a):
> Hi,
> I've been thinking a lot about huge repositories and scalability lately.
> The p2 planner brings all meta-data involved in an installation into
> memory. Its SAT solver needs that in order to come up with a plan. This
> will soon become a problem for our Aggregator. The approach to validate
> repository consistency by asking the p2 planner to make a plan that
> includes everything, simply won't fly as the repositories grow larger. I
> think we need some way to scope the consistency check. Perhaps that's a
> matter of creating sub-meta-data repositories that are internally
> consistent.
>

Yes, this may become a problem. Perhaps we should start thinking about
something like "Consistency Rules" specifying how a contribution should
be consistent with the rest (i.e. other contributions or everything). Maybe
the same should be possible at various levels (top, contribution, mapped repo).

A question is what happens if each contributor demands full consistency
but it is not feasible to run it.

> Another thing that I think we should improve is to support the use-case
> where you'd like a mirror of something that isn't necessarily
> consistent. You simply want what's out there, and you don't really care
> if it's consistent or not. Perhaps you do that as a first step in order
> to combine with other things and finally create something that indeed is
> consistent. So perhaps the aggregation is one thing and consistency
> validation another.
>

If we implement some kind of "Consistency Rules" mentioned above, we could
also support intentionally inconsistent content. Sometimes it even makes
sense to provide more that the latest version so that users could choose
their preferred version, or for case there is no other solution than to
pick up an older version due to resolution conflicts.

> A third improvement that I'd like to see is to enable that all
> aggregations share the same artifact repository. There's never a reason
> to copy an artifact with an MD5-checksum twice. If you have it, you have
> it.
>

A good idea - is there any problem to implement this? I hope not.

Filip

Re: Aggregator, scalability, and validation [message #502990 is a reply to message #502958]

Thu, 10 December 2009 09:31

Eclipse User

On 2009-12-10 13:34, Karel Brezina wrote:
> The aggregator might have three validation levels:
>
> - full consistency validation
> - contribution based consistency validation (each contribution subtree
> is validated separately)
> - no consistency validation
>
Yes, those would be relatively easy to both implement and explain.

But what if I for instance want to create a combination of things. Let's say I have Galileo + Helios + a whole bunch of
LGPL'ed stuff (Subversion, JBoss tooling, etc.). The repository I get will become fairly large and no single
contribution can be validated on its own. Not even Helios verifies today since they now keep old versions in there when
new milestones are released.

Nevertheless, when given a set of features from Helios, I can validate that that set is consistent. I can then add
Subversion etc. to that set, and it still validates. But adding Helios as a whole will cause the validation to fail. So
in this case, obviously a contribution as such is not the 'validation unit' that I'm looking for.

- thomas

Re: Aggregator, scalability, and validation [message #502999 is a reply to message #502982]

Thu, 10 December 2009 09:38

Eclipse User

On 2009-12-10 15:13, Filip Hrbek wrote:
> Thomas Hallgren napsal(a):
>> Hi,
>> I've been thinking a lot about huge repositories and scalability
>> lately. The p2 planner brings all meta-data involved in an
>> installation into memory. Its SAT solver needs that in order to come
>> up with a plan. This will soon become a problem for our Aggregator.
>> The approach to validate repository consistency by asking the p2
>> planner to make a plan that includes everything, simply won't fly as
>> the repositories grow larger. I think we need some way to scope the
>> consistency check. Perhaps that's a matter of creating sub-meta-data
>> repositories that are internally consistent.
>>
>
> Yes, this may become a problem. Perhaps we should start thinking about
> something like "Consistency Rules" specifying how a contribution should
> be consistent with the rest (i.e. other contributions or everything). Maybe
> the same should be possible at various levels (top, contribution, mapped
> repo).
>
> A question is what happens if each contributor demands full consistency
> but it is not feasible to run it.
>
I'm leaning more towards an approach where we can define units that somehow can contain things from the aggregation,
collected freely, not bound by any contribution boundaries. Some way to tag things that makes them fit together in one
unit of things that you want to apply some rule on (like validation). A contribution may of course be a tag in itself
which would support the use-cases you mention.

>> Another thing that I think we should improve is to support the
>> use-case where you'd like a mirror of something that isn't necessarily
>> consistent. You simply want what's out there, and you don't really
>> care if it's consistent or not. Perhaps you do that as a first step in
>> order to combine with other things and finally create something that
>> indeed is consistent. So perhaps the aggregation is one thing and
>> consistency validation another.
>>
>
> If we implement some kind of "Consistency Rules" mentioned above, we could
> also support intentionally inconsistent content. Sometimes it even makes
> sense to provide more that the latest version so that users could choose
> their preferred version, or for case there is no other solution than to
> pick up an older version due to resolution conflicts.
>
Exactly. Several repositories are now also keeping all history so that users can roll back to previous installations
etc. It's also a much better story when you use it as input for builds. Some builds (maintenance especially) often need
access to old stuff.

>> A third improvement that I'd like to see is to enable that all
>> aggregations share the same artifact repository. There's never a
>> reason to copy an artifact with an MD5-checksum twice. If you have it,
>> you have it.
>>
>
> A good idea - is there any problem to implement this? I hope not.
>
No problem, it's just a matter of how to make it easy for the user to manage.

- thomas

Previous Topic:	lacking features and some small problems.
Next Topic:	Using Buckminster to Build a Web Application

Goto Forum:

-=] Back to Top [=-

Current Time: Sun Jul 13 18:13:07 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter