Last modified: Feb 16, 2005
Note: this document has been updated/extended to depict the solution actually implemented in the 3.0 release and changes made during the 3.1 cycle. The original proposal written during the 3.0 development cycle, as well as all versions of this document are also available ( marks interesting changes since the original proposal).
Plan item description: Content-type-based editor lookup. The choice of editor is currently based on file name pattern. This is not very flexible, and breaks down when fundamentally different types of content are found in files with undistinguished file names or internal formats. For example, many different models with specialized editors get stored in XML format files named *.xml. Eclipse should support a notion of content type for files and resources, and use these to drive decisions like which editor to use. This feature would also be used by team providers when doing comparisons based on file type. The several existing file-type registries in Eclipse should be consolidated. [Platform Core, Platform UI] [Theme: User experience] (bug 37668, 51791, 52784)
This plan item is about two important features:
Content types determine many properties and actions related to files such as encoding, associated editors, etc. Automatic content type determination allows content type specific actions without requiring the user to manually define the content type for a given file. Content type detection is based on:
Content type determination based on file name/extension ("file selection specs") is the easiest one to compute. Each content type has a set of file selection specs associated to it. Determining the content type corresponding to a file selection spec is done by a simple lookup on the catalog.
Content type determination based on file contents is more complex, and requires examining the contents. Since we are talking about an open set of possible content types, this examination implies in delegation to content type detectors contributed by other plug-ins (content describers).
The proposed API contains 4 new interfaces in a new package called
Following is a brief description for each of them.
Represents a content type in the platform.
are provided by the platform, built from extensions to the
extension point. Relevant properties for
org.eclipse.core.runtime.content.IContentDescriber), a class that knows how to recognize if a given stream of bytes contains compatible to the content type, and how to extract other content-type specific information from the stream.
IContentType provides methods that check whether the given
file name is matched by this content type file selection spec, or whether a
content type is a subtype of another content type.
Represents the content type registry. Provides methods for obtaining the content
type associated to a file name, and for discovering the corresponding content
type for a stream of bytes.
IContentTypeManager allows clients
Content-based content type detection and content description rely on specialized content detectors associated to content types. When a content type is contributed to the platform, a content describer class may be provided. Content describers are able to detect if a given stream of bytes is conformant to the content type file format, and may also be able to extract important properties from the contents, such as what charset was used to encode the contents (for text files), and any content type specific information that may be required.
The main method in
int describe(InputStream contents, IContentDescription description, QualifiedName
options) throws IOException;
The first thing implementations for this method must do is to check if the
contents represent a valid sample for their corresponding content type file
format. If not (or if cannot be determined), this method should exit immediately,
depending on how strict the file format is. Otherwise, this method should return
IContentDescription.VALID, but only after trying to provide all
required information (according to the specified options, if any) by reading
the contents and filling the content description
Note: it is essential that for this mechanism to work in a suitable manner, the execution of content describers by the platform should not cause the activation of the plugins providing them. In the Eclipse 3.0 runtime, plug-ins that have built-in bundle manifests will be able to selectively enable/disable auto-activation on a per-package basis (for more information, see bug 52393). Although this will not be enforced by the platform, content describers must be self-contained and not trigger auto-activation.
Content descriptions are obtained by calling
method. A content description contains interesting information (such as encoding)
about an arbitrary stream of bytes. These information are filled partially by
the platform and partially by the content describer elected (if any).
Content types are managed by the platform but plug-ins are in charge of contributing content types. While this provides good flexibility, it also opens oportunities for conflicts. There are a few scenarios where conflicts may arise:
Generally by using the same content type registry and sharing the same concept of content/file type. Other examples are:
To allow important properties to be inherited by new specialized content types:
The content type (and consequently any descendants) will be deemed invalid and ignored.
Not so far. The main reason is that not every file format with a content type in Eclipse will have a MIME type, so we could not use it as the main association mechanism between content types and applications. We considered keeping MIME-types as an optional property for each content type, and provide a method findContentTypesByMIMEType (or something like that), but decided removing it since there was no sound use case for that (and the idea in the initial proposal was to keep only the essential stuff to avoid distractions).
By associating additional file specs to existing content types.
Not so far. It is up to the plugin provider to determine whether a content type describer will be provided.
One of the content types (arbitrarily selected) will be chosen as the preferred one (the other will also be associated). Priorities are also taken into account.
As seem above, the only way this can happen is when two different file specs (for instance, a file name and a file extension) accept the same file name (for instance, one content type is associated with a "xml" file extension, other is associated with a "plugin.xml" file name. ) File name specs have priority over file extension specs (so plugin.xml is a plugin manifest before being a XML document). The normal case is that the content type that defines a file name spec is based on the file type that defines a file extension spec (a plugin manifest is a kind of XML document). This ensures that actions applicable to general XML documents will be applicable to a plugin manifest.
It is a mechanism to prevent conflicts. When multiple plugins contribute content types associated with the same file specs, we have a conflict. Conflicts are bad because introduce ambiguity (what is the right content type?). Most of times when such conflicts arise, it is a case of independently developed plugins trying to contribute the same content type (semantically speaking). Imagine a plug-in com.examples.foo that wants to be associated to the Java Source content typ (org.eclipse.jdt.core.javaSource)e, provided by org.eclipse.jdt.core, but that does not require org.eclipse.jdt.core to be present to work. Such plug-in can contribute its own Java Source content type (com.examples.foo.java), and mark it as an alias for the content type provided by JDT/Core. If JDT/Core is present, the com.examples.foo.java will be omitted from the content type registry, and all references to it will be automatically interpreted as references to org.eclipse.jdt.core.javaSource instead.
Sometimes a plugin A does not depend on plugin B, but declares a content type which is intended to be a specialization of another content type declared by B. To prevent the content type declared by A to be disabled:
If the originally intended base type is available, your base type will be marked as just an alias, and your specialized content type will be properly attached to the intended content type. Otherwise, the placeholder will be enabled, and although things might not be as great as intended (actions associated to the original content type will not be available), your content type will still be enabled.
New content types should be created only if there is no existing content type with the semantics required. Otherwise, when only additional file specs must be provided, file associations are the way to go.
Only if none is specified in the sub type.
The proposed approach is to check if the file's content type is a kind of the "org.eclipse.core.runtime.text" content type, which is intended to be the ancestor for all text oriented content types. If it turns out to be a very frequent idiom, we might consider proving a convenience API to do that.
No, although if the file has a identifiable signature/format, it is recommended, because improves the overall quality of content-based content type lookups.
Note: comments are encouraged. Any questions/concerns not addressed here should be discussed in the platform-core-dev list, or bug 37668.
The solution described above was implemented and relatively succesful. Some components took advantage of the new content type infrastructure, but still in many cases file-association is being done in an ad-hoc manner. Also, no UI was provided for customizing content types (such as changing the default encoding, adding associations with files) so the user has no control on how the content type detection mechanism works. Thus, the main issues to be addressed in the 3.1 cycle are:
Ensure the content type support works for the SDK plug-ins and for products built on top of Eclipse (see bug 78654).
Platform/UI - file/editor association
Content type-editor association is definitely the most important use case for the content type support. The basic idea is that for a given file or stream of data, the UI should be able to:
1, 2 and 4 are currently supported by the existing file-editor association mechanism. 3 is being requested by users, and it is orthogonal (as 4 is) to the content type support provided by runtime.
Content types add a level of indirection between files and editors. At a first glance, there is no reason why changing the default editor would affect what content type is assigned to a file, so users should be able to pick up any editors without affecting content type detection.
Platform/Team - binary vs ascii files
The Team plug-in keeps a catalog of file extensions and their expected content type (either binary or ASCII). If content types were broadly adopted throughout the rest of Eclipse (so that most files dealt with by users have a content type), couldn't the Team plug-in use content type based encoding determination to figure out a good default for this setting? (see bug 85490)
Ensure users have means to customize how the content type detection works for them. Provide UI for content types. May provide some way of showing related objects for a given content type (editors, views, comparators, etc). Users cannot provide content type detection code, so user-defined content types would be useful only for cases where content type detection is file name based (like for non-formatted text files, such as source files, configuration files, etc).
A simple plug-in that allows users to add/remove file associations is available in the downloads page.
Ensure content type detection works (or can be made to work) appropriately when incompatible products are deployed together.
See also corresponding PR 78654 - content type should be used universally