[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [tycho-dev] [discuss] tycho repository layout and metadata format

See inline

--
Regards,
Igor

On 11-05-31 4:54 AM, Sievers, Jan wrote:
* focus on artifact deploy and dependency resolution during the
build. To help us manage scope of this work, I want to explicitly
exclude IU categories and other user-facing features from the
scope.

We do support anything that is published by tycho in p2artifacts.xml/p2content.xml though? I am thinking e.g. about feature root files or source bundles.

Yes, we need to support these, which I think means that each Tycho project should be able to deploy multiple artifacts and multiple IUs and Tycho repository layout has to support this.


If we want to exclude other features, we have to make this very explicit as people will expect the repo to behave just like any plain p2 repo. So effectively are we saying this is a repo format for tycho and at build time only, it's not something you can paste into the p2 update UI and install/update from it?

The problem is IU categorizatyion a filtering. For example, if all Helios and Indigo artifacts were deployed in a Tycho repository, how do we let the client choose between the two? I am not saying this is not possible or not desirable, I just want to concentrate on build-related use cases for now.


Another requirement that's important to us to ensure build reproducibility is good support for dependency management, i.e. exact control over which IUs are in the search scope of a build and which are not.

With Import-Package and repositories the size 200K+ IUs, requirements
are bound to become ambiguous. Also, there is widespread use of
dependency ranges but there is often an implicit assumption that this
dependency is resolved against a certain eclipse release train repo.

What we need IMHO is a concept of restricted "view" on the
repository. In contrast to maven POM dependency management, for a
build against released versions I would like to have a "white list"
or "bill of materials" of IU versions that define the resolution
scope.

Not sure I understand. Does this "bill of materials" list all target platform IUs or does it provide additional constraints to help the resolver choose among multiple versions?

One of aspects of build reproducibility is ability to make small changes
to the project, to produce a bugfix for an old release for example. This
implies ability to make small changes to target platform definition and
expect corresponding small changes to resolved target platform. I don't
think resolved target platform "snapshot" provides this property.


Don't know whether dependency management should be done on client or server side or maybe both. If it's on the server side, I could imagine e.g. "helios" and "indigo" views on the same repo. Each view would have a distinct URL similar to nexus repo groups. If dependency management is done on the client side, we don't want everybody to define their own or copy-paste the target definition. So we would need a concept for reusing and composability of the target definition.

We, too, thought about repository "views" and we actually implemented something like this as part one of Sonatype commercial products. The way I see it, "views" can be built on top of "raw" repositories used by the build and do filtering, aggregation, categorization and anything else necessary to expose build results to the end user. This is why I am suggesting we push user-facing behaviour out of scope for now, and deal with it separately from build-related usecases.

I do not think these "views" can be used for build reproducibility,
however. To really guarantee reproducible build, everything
interesting/important should happen on the client, and server should
just be a relatively dump metatada store without any smarts. Servers get
moved and server software gets updated, and we need to make sure build
results stay the same even if these results are technically "wrong".
Here I assume that exact version of Tycho is part of the client.

And yes, pervasive use of dependency version ranges makes reproducible
builds much harder to guarantee. In p2 3.5, it was possible to tell if
given p2 resolution result was completely locked down or there was some
wiggle-room, so it was possible to validate project target platform was
reproducible or some additional constraints were needed. After
introduction of query-based requirements in 3.6, this became much a
harder question to answer.



* long term metadata compatibility strategy, i.e. artifacts
deployed with tycho 1.0 should be consumable by tycho 2.3.1

For this we need to be clear whether we reuse the p2 metadata format or we define our own. Since we use the p2 publishers, we have a dependency on the p2 metadata format. AFAIK the p2 metadata format is not API.


IUSerializer and IUDeserializer introduced in P2 3.7 are API and were introduced to support this exact use case. They, however, only provide forward compatibility, i.e. future versions of P2 will be able to read metadata generated by older versions but not vise versa. The question here, what do we do if older tycho runs into metatada generated by newer tycho? ignore it? fail the build? This is another aspect of build reproducibility.

Also, p2 publisher does NOT depend on repository metadata format AFIAK,
so we are free to use whatever format we choose as long as we implement
IMetadataRepository and IArtifactRepository implementation. I am not
saying we should invent our own format, but we can if we decide we need to.


Regards Jan



-----Original Message----- From: tycho-dev-bounces@xxxxxxxxxxx
[mailto:tycho-dev-bounces@xxxxxxxxxxx] On Behalf Of Igor Fedorenko
Sent: Freitag, 27. Mai 2011 18:17 To: tycho-dev@xxxxxxxxxxx Subject:
 [tycho-dev] [discuss] tycho repository layout and metadata format

I think I am finally ready to give TYCHO-335 [1] another try. For
uninitiated, this is about being able to share artifacts and
corresponding p2 metadata via a repository.Before I do anything I'd
like us to discuss and agree on high-level requirements for this

* synchronous or near-synchronous metadata update after deploy. So if
one Hudson (or cli) build deploys artifacts, the next build is
expected to be able to consume the artifacts

* efficiently support both deploy-only (i.e. RELEASE) and
deploy-remove (i.e. SNAPSHOT) repositories. Although desirable, it is
not a hard requirement to support both usage patterns with single
format, we can define two formats if needed.

* scale to 200K+ artifacts/installable units. To put this in conext,
- there are ~160K artifacts in maven central [2] - indigo M7 repo had
~11K IUs and was ~4.6M in size when jarred - assuming comparable
compression level, 200K IUs will be>65M jarred

* long term metadata compatibility strategy, i.e. artifacts deployed
with tycho 1.0 should be consumable by tycho 2.3.1

* focus on artifact deploy and dependency resolution during the
build. To help us manage scope of this work, I want to explicitly
exclude IU categories and other user-facing features from the scope.

* providing "simple" or "composite" p2 repository layouts is
explicitly NOT a requirement. likewise, using Maven2 repository
layout is NOT a requirement. Lets keep our options open

For bonus points

* allow efficient caching-proxy and aggregating repositories

* allow efficient implementation of fine-grained access control

* possibility to interact with maven-bundle-plugin and other maven
OSGi tools


Did I miss anything? Comments, ideas?


[1] https://issues.sonatype.org/browse/TYCHO-335 [2] http://search.maven.org/#stats