Eclipse Community Forums: B3 » repositories

Home » Archived » B3 » repositories - design discussion

repositories - design discussion [message #528804]

Wed, 21 April 2010 15:24

Eclipse User

Hi,
today, Thomas Hallgren and I were looking at making b3 use Buckminster
as a build engine and we discussed the best way to handle different
aspects of what a "top level specification of a project's build" needs
to contain.

This made us look again at the repositories specification. The current
b3 implementation is influenced by how the RMAP works in Buckminster,
and we questioned if the current model is the best way to capture the
information - the issue being that the Buckminster way describes a
bottom up approach where each component (i.e. project) is checked out
separately - when it would be much faster to check out "everything under
a root". This also coincides with how things are typically done when
running continuous integration.

The current b3 implementation of "repositories" actually describe a lot
more than just the repositories, it is actually a "resolution strategy"
since it includes first-found, best-found, and switch-case selection.
Just as in Buckminster it is also likely that some information needs to
be duplicated when the same repository is involved in several places in
the "resolution strategy description".

What we concluded was that these two concepts should be declared
separately; "repositories" thus becomes a list of named repositories
that is referenced in the "resolution strategy".

For repositories - the common set of features across all types of
repositories are:
- type (e.g. svn, cvs, git, p2) (class)
- identifier (ID)
- remote location (URI) and credentials.
- local location (URI)

Then it gets a bit trickier as all types of repositories does not have
the same traits:
- there is a big difference between multidimensional type repositories
(e.g. SCMs having concepts like branches, tags, revisions, and
timestamps), and non dimensional (e.g. file system, p2).
- some repositories do not have both a local and a remote location (a
repository may already be local, the implementation of a repository may
decide on its own where it caches remote data, etc.).
- SCMs differ in their capability to locally represent multiple versions
of the same component. Using git, a local repository clone can easily
contain all branches (or selected branches) but for CVS or SVN there is
no such local representation - what is checked out has a reference to
the branch/tag in the remote repository.

Why does this matter? The idea is that when something is needed from a
repository this should trigger a checkout of a specified set of content.
It should also be possible to specify that when a component is needed it
should be looked up using a "search path" (if not available on branch
"lazy-3.5.2", check branch "lazy-3.5.1", then check tag "release-3.5.0").

If using git, we would simple get everything in the repository when we
clone, and resolution can select from the wanted branch and bind that to
the workspace.

When using CVS or SVN we need to do this differently as we can not check
out from multiple branches and tags in one operation. Instead, a local
representation of everything under the wanted branches and tags must be
performed to different locations in the local file system. The
resolution can select from the wanted branch/tag and bind that to the
workspace. This introduces an issue as it is possible to change the
repository mapping for the component (project X under lazy-3.5.1 can be
bound to lazy-3.5.2 in the REMOTE repository) thus royally screwing up
the local representation of the remote branches. Or is this simply not
true? (since a root was checked out everything under that root will need
to be switched, so a user trying to switch one project would simply not
have this option)?

With non Git-like repositories there is also the need to specify what to
checkout when something from the repository is needed. Some repositories
contains lots of material under a root that should not be made available
locally.

Anyway - it seems to be a good idea to let the branch/tag search path,
and specification of what to make available locally, be a concern of the
"repository", as the implementation of a particular repository type will
be the best judge of how the repository is represented locally, and how
it can be bound to the workspace. Alternatively, there may be
restrictions for non git repositories that they can only specify a
single branch/tag per "repository" - this has the drawback that common
information needs to be restated (or that the grammar needs to be made
more complicated to allow references of one "repository" to another, or
that specifications are nested ("sub repositories").

We then have the two main types of repositories to consider (with or
without tags/branches). An implementation could be made where the b3
grammar allows specification of a search path that includes a list of ID
or STRING. Validation is performed if the specified repository type is
of SCM type or not flagging use of a path with a non SCM type to be an
error. The interpretation of the ID, STRING list is up to the repository
implementation (if they are syntactically correct names of
branches/tags, represent a revision or timestamp, etc.). Alternatively
we need to model the various branch/tag naming rules (git is quite
different as there are numerous ways to refer to a commit, including the
full SHA hash or a shortened form of the hash (say, the first 6
characters or so) or using "friendly" syntax like HEAD, HEAD^, HEAD^^,
HEAD~3, and so on).

branch/tag before module
- - -
It seems natural that the settings that controls the order of ROOT,
MODULE, and COMPONENT should also be part of the "repository" (as
opposed to being part of the resolution strategy). The only possible
downside is that if a repository is organized differently in different
parts of the repository then several repository entries are needed. This
is however far less likely than having multiple resolution strategies
using the same repository.

Resolution Strategy
- - -
We then come to the resolution strategy. This now describe how a request
(a capability in a namespace of a version range) is looked up using
first-found, best-found, or switch-case in the specified repositories
and what relative paths from the repository root to try (and in what
order). There are several alternatives:
- a node per match is used and the order is controlled with
first/best/switch
- a node has a list of relative paths (first found is used)

Question is how much processing to perform - from simplest case to more
elaborate:
- the name of the component can be looked up directly in the repository
- component name can be used, but a search path is needed (e.g.
features/, plugins/, examples/)
- component name needs massage (with or without paths)
- path(s) are derived from the name - but name can be used directly
- paths(s) are derived from the name, name needs massage

Suggestion welcome regarding balance between declarative style, and
using expressions. Also, if structure select-first/best/switch should be
used as selection mechanism, or if each entry should have additional
structure.

Regards
- henrik

Previous Topic:	repositories - design discussion
Next Topic:	in search for a term - opinions wanted

Goto Forum:

-=] Back to Top [=-

Current Time: Tue Jul 08 17:28:53 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter