Large-scale development issues

Large-scale development issues

Last modified: February 24, 2005

One of the major development themes for Eclipse 3.1 is to improve support for "Large-scale development" in Eclipse. This includes improving collaboration for large, distributed teams, but it also encompasses support for large workspaces. This document captures requirements submitted in bug reports, mailing lists, and other discussions from people using Eclipse for large-scale project development. Not all of these issues are committed to be solved in Eclipse 3.1, but this list presents, in no particular order, a problem scope from which work items can be chosen. Some of these items are already present on the Eclipse 3.1 plan, but are included here for completeness.

1. Memory footprint

Eclipse imposes a significant RAM footprint when working with a large workspace. Identify principal areas of memory consumption and explore opportunities to reduce current footprint.

Extension registry footprint. Eclipse maintains the registry of plug-ins, extensions, and extension points in memory. As the number of plug-ins and extensions grows, so does this footprint. Convert the registry to a cache structure that stores infrequently referenced portions on disk and brings extensions into memory in a lazy and transient manner.
Workspace tree footprint. The workspace is represented in memory as a tree containing various data such as resource names, attributes, markers, etc. Explore reducing the amount of data stored in memory for each resource, and other optimizations such as uniquification of strings.
Team/CVS metadata footprint. The CVS plug-ins store significant information in memory about the synchronization and "dirty" state of each resource. Explore reducing the footprint of this data or using lazy caching to only bring this information into memory when needed.
Message bundles (bug 37712). Most plug-ins store translated strings in ResourceBundle objects. These bundles are not space-efficient, and often use lengthy string-based keys for message lookup. Explore a more efficient representation, integer-based keys, or a disk-based bundle for infrequently used messages.

2. Performance of I/O-bound operations

Large teams often store development artifacts (code, diagrams, documentation) on a network file system in order to increase reliability, facilitate backup and restoring of data, and to simplify integration and building. I/O-bound operations in Eclipse are typically much slower in such environments. Explore optimization of I/O-bound operations, and moving lengthy operations into a background thread.

Project creation. Creating a project at a file system location that contains a large number of existing files and folders requires significant I/O to discover all the files and to gather local information such as time stamps. This project discovery can be moved into a background thread.
Resource copy, move, and delete. Most operations that act on trees of resources still cannot be run in the background. When these operations take a long time, the user is forced to wait until they complete. These should be converted to "user" jobs that can optionally be run in the background.
[>3.1] Recursive deletion (bug 10628). Java provides no API for recursively deleting a directory containing files and other directories. This means deleting a large resource tree requires two native I/O calls per directory (one to list the children, one to delete), and one native I/O call per file. This particularly impacts compilation, which often needs to delete large trees of resources in the output (bin) folder. Consider adding a native method to improve recursive deletion performance.

3. Project interchange

Eclipse has always emphasized first-class support for integration of repository tools, and has treated repositories as the primary vehicle for code sharing among team members. This leaves behind groups that either don't use a repository, or don't use a repository that has Eclipse integration plug-ins. The Import/Export wizards are typically used by such groups to share code. Some improvements to the import and export tools would them more powerful as a project interchange (sharing) mechanism.

Import multiple projects at once (bug 22698). If you have unzipped, untarred, or checked out a large group of projects from a repository, there is no way to load them all into a workspace at once. The current "existing project" import wizard only allows importing one project at a time.
Import project inside zip file (bugs 66798, 67808). The Export wizard allows you to export an entire project into a ZIP or JAR file. The corresponding Import wizards don't allow you to import that ZIP or JAR file back into a workspace as a top-level project. The user has to unzip the file and import as existing project, or create a new project with the same name and import the contents. This should be made easier. Similarly, it should be possible to import a ZIP containing multiple projects.
[>3.1] Rename project on import (bug 40493). The "import existing project" wizard doesn't allow you to import a project but pick a different name for the project in the workspace. This is often needed by users who check out projects from a repository into the file system, and then want to call them something different in the workspace (one common example is when working on multiple streams of the same project in a single workspace).

4. Support for non-incremental builders

The workspace builder infrastructure is designed primarily with efficient incremental compilers in mind. Auto-build is turned on by default, and this is only realistic for fast builders. The workspace should have support for inherently non-incremental or slower builders (such as C compilers and Ant-based builders). In particular, we need to support users working in a heterogeneous environment with some fast incremental builders and some slow non-incremental builders, sometimes with both on the same project (bug 60803). Read the proposal.

5. Improved working sets

With very large workspaces, working sets are often used to filter the amount of information showed in various views, and for scoping long-running tasks such as builds and searches. The current working set support has some problems:

Shared notion of a working set (bug 22328). Each view has to be explicitly and manually scoped to a given working set. Each long running search or build also needs to have the working set manually chosen. One particularly bad example is the Java browsing perspective, which has four views that each needs to have its working set specified manually. Consider adding a global notion of a current working set, or a current working set per window.
Dynamic working sets. A working set is defined as a static list of elements. Consider adding mechanisms to make working sets more flexible, such as wildcards (bug 62646), exclusion (bug 22362), and tracking project creations (bug 15941) and moves (bug 15938). Aggregating or nesting working sets would also be useful for very large scale workspaces.

Legend
	item is under development.		item is under investigation.
	item is finished.	( )	item is time permitted.
[>3.1]	item is deferred.		new