Eclipse 3.0 builder issues

Last modified: November 13, 2003

This document outlines the current pressing builder problems, maps out the solution space, and will eventually propose concrete changes for Eclipse 3.0 and if necessary for a 2.1.x maintenance release. Since this is not a committed plan item, no promises are being made at this point.

Problems

P1. Dynamic classpaths, static project references

There are many different PDE self-hosting styles. Some users build and run against plugins outside the workspace, some use binary or source projects in the workspace, and some use a combination of the two. Every time a user switches from building against external libraries to building against internal projects, the classpaths of all downstream plugins need to be updated. When the set of referenced projects on the classpath change, so do the platform project references. These references are used by the platform when computing the default project build order.

Before the Eclipse 2.1 release, JDT introduced the notion of classpath containers. Classpath containers abstract away volatile sets of references so that the .classpath file does not need to change every time these reference sets change. The two common uses of containers are for JRE libraries, and PDE project references. This approach solves the problem of .classpath files constantly changing as the user's self-hosting style changes (or when the JRE changes). The remaining problem is that the project references, manifested in the .project file, do not have a corresponding dynamic container facility. Therefore the .project file continues to change whenever the user switches between external and internal project references.

P2. Builders need more context

Individual project builders are not supplied with enough context information when they are invoked. In particular, they do not know whether the current build is a single project build or a workspace build. Also, there is no consistent hook that builder implementors can use to be notified when the build process is starting and finishing. If this information was available, builder implementors would have more flexibility in optimizing multiple project builds.

Without going into too much detail, consider a simple case of projects A and B, where B depends on A and so must be built after it. When A is built, it could leave behind a "cache" that describes in more detail exactly what changes builder A made. Builder B could then read the cached information and optimize its build accordingly. However, with manual project builds, project A could be built several times before project B is built. In this simple case, builder A could incrementally update this cache until B is finally built. However, if there is also a project C that depends on A, it could be built at a different time. In the general case, it is impossible for a builder to leave behind a cache for dependent builders to read, because it can never know the appropriate times to update or flush that cache.

To further complicate matters, other builders may have been installed on projects A or B that run in between the two builders that are trying to optimize downstream dependency changes. Builders cannot reliably determine if other builders have been run that affect them, or whether changes they see in a delta have been caused by their own builder or some other builder.

P3. Build scopes

Some Eclipse users work with massive workspaces of highly interconnected resources. These workspaces often take a very long time to build when a resource with a large number of downstream references changes. Builder implementors are able to optimize these cases to some extent, but in cases where thousands of referenced resources are genuinely affected by a change, these optimizations can only go so far.

One coping strategy that users have adopted is to only build small subsets of the workspace frequently, and perform full workspace builds less often. This essentially batches changes so that a single long build can replace frequent long builds, thus freeing up more time for users.

The current Eclipse build facilities are not well suited to this approach. Autobuild and the incremental build command (invoked with Ctrl+B), work only at a workspace scope. To build smaller sets of projects, users have to perform cumbersome manual steps involving multi-selecting the set of projects, and invoking "Build Project" from the "Project" context menu. Autobuild has established a high expectation about the transparency of building, making these manual steps unacceptable.

Solutions

S1. Workspace builders

Currently a workspace build runs roughly as follows:

   for each project P(i) in workspace {
      for each builder B(j) installed on P(i) {
         run builder P(i)B(j)
      }
   }

This process could be inverted by allowing builders to be installed on the entire workspace that have complete control over how the projects they care about are built. The build process would then run as follows:

   for each workspace builder B(i) {
      run builder B(i)
   }

And a workspace builder B(i) would have a loop within its build like this:

   for each project P(j) in workspace {
      if (P(j) has nature for builder B(i) {
         build project builder B(i) on project P(j)
      }
   }

The order in which builders run would be established by declarative rules in the workspace builder extension point. These rules would be resolved by platform as a topological graph ordering with cycles treated as unrecoverable errors.

Pros: With this approach, each builder has complete control over the build order it uses. Project references are no longer needed, so problem P1 goes away. The builder has all the context information it needs, so the optimizations discussed in problem P2 are now possible. It is also no longer possible for other builders to run interleaved with a particular builder as it iterates over the projects. Thus the builder has reasonably complete knowledge of what changes have happened to each project, and what the content of those changes are.

Cons: This is a breaking change to the workspace build order. If project P1 and P2 have builders B1 and B2, the old order is:

P1B1, P1B2, P2B1, P2B2

And the new order is:

P1B1, P2B1, P1B2, P2B2

If there are projects relying on this order, they will be broken by this change. It is true that since the platform has always exposed API to run an individual builder on a single project, that all possible build orders have always been permitted and all builders should be able to handle it. However, this is a breaking change to autobuild and to the behaviour of the API method IWorkspace.build.

Second, there is a large class of builders that piggyback on other builders, and rely on those builders to establish the build order for them. For example, a client may install a builder that acts as a pre- or post-processor on the java builder, generating extra source files or copying/mutating class files. These builders typically have no knowledge of the correct build project order. In a world of workspace builders, the workspace builder for these secondary builders would need a mechanism for computing a reasonable project build order. The java builder and other "primary" builders would need to expose such an API and all secondary builders would use this (non generic) API to compute a reasonable build order.

In the general case, it would be impossible to provide a backwards compatibility mode for builders implemented against Eclipse 2.1. For example, say a 2.1 workspace is opened in 3.0, and it has projects with builder order {B1, B2}. Say B2 is now a workspace builder that defines its own project build order. B2 was previously using project references, but it immediately removes all project references when its plugin is first loaded since these are no longer used (if it did not do this it would not be solving problem P1 in our list). How does the platform determine an appropriate project order for running builder B1? Since B2 has not run, it can not infer an order from B2. B2 would have to expose some method that pre-declares the order it intends to use. B1, in the absence of its own build order, would use this declared order for its own projects. Now say there are two workspace builders (B2 and B3) on each project, and each declares a different project order. It is now impossible to infer an appropriate build order for B1.

S2. Pluggable build strategy

A less ambitious proposal is to allow pluggable build strategies to determine build order, but then build each project in the same manner as Eclipse 2.1. A build strategy extension point would allow plugins to provide a subclass of BuildStrategy:

public abstract class BuildStrategy {
	public abstract void build(IProject[] projects, IProgressMonitor monitor);
	public final void build(IProject project, IProgressMonitor monitor) {}
	public final int getKind() {...}
	... other methods to provide additional context if needed ... 
}

A build strategy is a first class participant in the build progress. Each installed strategy is invoked for every single build (workspace, project, incremental, full, autobuild, etc). A strategy instance is given an array of projects to build, and it must call BuildStrategy.build(IProject, IProgressMonitor) on the projects that it is interested in building. This callback will then invoke all the builders on that project in the order specified on the project build spec (same as previous Eclipse releases). The strategy may build zero or more projects in the array, and it may build the same project multiple times (in the case of cycles).

If multiple strategies are present, they are invoked in no particular order. Strategies are only given the "left over" projects that no previous strategy chose to build. Finally, a default platform strategy runs to build the remaining unbuilt projects, using project references to determine the order.

(Note: I am calling this solution "build strategy" to avoid confusing it with S1. If we go with this route, it could easily be renamed to "workspace builder" since everyone seems to like that term.)

Pros: A build strategy can be used as an alternative to project references for setting the project build order. Since this mechanism is dynamic, it fits with the JDT dynamic build path story and avoids modifying the project description file every time the build path changes (solving problem P1). This hook also gives a build strategy implementor all the context information they need, including the complete set of projects being built, and an opportunity to add pre- and post-processing on the entire build cycle (solving problem P2).

This approach is also backwards compatible. If no builders made any changes, then the default build strategy would continue to determine build order from project references as it did in previous releases. Each builder that was influencing the build order by changing project references can now supply a build strategy extension if they want to compute build order dynamically.

Cons: If there are multiple strategies installed, they may only operate on disjoint sets of projects. This is the same limitation that existed with project references, since multiple builders setting the project references prior to Eclipse 3.0 would have been overwriting each other's references.

This does not solve the final problem discussed in P2, the interleaving of other builders between two invocations of a given builder extension. This problem cannot be solved without breaking changes to the workspace build order.

Another drawback is that this puts responsibility for complex aspects of the build infrastructure into the hands of strategy implementors. For example, handling cycles in the build order, and handling a user specified build order. This increase in delegated responsibility comes with the usual cost of decreased flexibility to make future changes to the build infrastructure.

S3. Working set build

The two build actions that currently surface in the UI are project build and workspace build. A third option, working set build, could be introduced for building a subset of the workspace. This option would only be visible after entering some advanced mode (for example from a preference page) to avoid confusion for new users. This option would only run builders on the specified set of projects, avoiding builds of either prerequisite or downstream dependent projects. This option could be made available in both autobuild and manual build modes. While a manual working set build can be done using current API, it would be beneficial to introduce a new core API method for working set builds. This would ensure correct computation of build order, and allow for participation by other proposed mechanisms such as workspace builders or pluggable build strategies.

This solution aims to address only problem P3 listed above. It may work in conjunction with other solutions listed in this document.

Pros: Allows user with large, heavily interconnected workspaces to build smaller sets of projects more often, and defer building the entire workspace until a larger number of changes have been made.

Cons: Working set autobuild in particular would introduce significant complexity. To avoid compatibility problems it would have to be treated as a disjoint mode from workspace autobuild. I.e., IWorkspace.isAutoBuilding would return false when in working set autobuild mode, and some new flag could be used to find out if the workspace is in working set autobuild mode. This mode would be difficult to explain to both API clients and users. Workspace autobuild is simple to explain: when in this mode, all projects are always in a fully built state. Working set autobuild is more complex: when in this mode, and when all prerequisites of projects in the working set are in a built state, then all projects in the working set will remain in a built state, but all projects outside of the working set will not be built. This story would be easier to explain if we required that the working set be internally consistent (not referencing projects outside the working set), but this assertion is difficult to verify if we introduce a dynamic build order such as workspace builders or pluggable build strategies. Similar difficulties exist with manual working set build, but the notion of manual build at a different scope already exists.

S4. Disabling builders on a per-project basis

An alternative to working set build is to add new API for disabling build on a per-project (or even per-builder) basis. The setting would apply to all forms of builds (incremental ,full, auto, project, workspace). The value would be persisted by the platform. Builds scoped at the workspace level (including autobulid) would simply omit projects for which building is disabled.

This solution is essentially equivalent in functionality to working set build. It answers the question "what not to build?" rather than the question "what to build?". This is perhaps an easier story to explain to users (similar concept in some ways to project close/open). With working sets there is confusion about what types of builds it will apply to, in addition to overloading the working set concept currently used in the UI for filtering views.

If this solution was carefully implemented, it could also solve the long standing enhancement request to be able to turn off the java builder (in order to replace it with javac or other compiler).

Proposed action

TBD