Language Support in Eclipse WTP/SSE

Position Paper for Eclipse Language Workshop
October, 2005

David Williams
(and team: Nitin Dahyabhai, Phil Avery, Amy Wu)

In Eclipse WTP we provide editing related support for XML, DTD, HTML, XHTML, CSS, JavaScript, and JSP's. I'd like to relate some of our experiences and plans that might be useful to the larger "language community". And -- shameless plug -- if you like what you see, maybe you'll be convinced to make some contributions! :)

ContentType based

Things like XML and JSP are good use cases for the importance of content type based function, since not obvious "file extensions" are always used.

Mixed Languages

For us, all content is "mixed language" to some extent, even simple HTML can have JavaScript or CSS.

JSP's can not only have text/html, but text/xml, text/css, etc

Editors should be extensible, by extension point

SourceViewerConfiguration

The structured text viewer configuration customizes various aspects of the Structured text editor. It defines syntax highlighting, content assist, and more. Clients must subclass org.eclipse.wst.sse.ui.StructuredTextViewerConfiguration to provide a custom viewer configuration for their content type. Clients contributing new configuration for new content types can utilize an existing configuration. They can subclass the existing configuration to build on top if it. Or they can create a new instance of the existing configuration in their configuration, and call the methods within it. The StructuredTextViewerConfigurationJSP, for example, creates new instances of the StructuredTextViewerConfigurationHTML, StructuredTextViewerConfigurationXML, JavaSourceViewerConfiguration and retrieves their processors to be reused in JSP content.

Discussion Point: Is there need to separate out highlighting, content assist, etc?

OutlineConfiguration

The outline view configuration customizes the editor's outline view. It defines how the current editor's input maps to elements in a Tree control, as well as selection filtering, toolbar/menu contributions, etc. Clients must subclass org.eclipse.wst.sse.ui.views.contentoutline.ContentOutlineConfiguration to provide a custom outline view for their content type

PropertySheet Configuration

The properties view configuration customizes the editor's properties view. It defines how the current editor's input maps to a list of properties in a Table control, as well as toolbar/menu contributions, etc. Clients must subclass org.eclipse.wst.sse.ui.views.properties.PropertySheetConfiguration to provide a custom properties view for their content type.

Source Page Validation

Allows participants to provide an org.eclipse.wst.validation.core.IValidator for source validation (as-you-type) via the org.eclipse.wst.sse.ui.extensions.sourcevalidation extension point. The enablement ("what" the validators can operate on) of the source validators can be specified by content type ( org.eclipse.core.runtime.content.IContentType ) and partition types ( org.eclipse.jface.text.ITypedRegion.getType() ) within each content type. [This is likely the same org.eclipse.wst.validation.core.IValidator used for "batch" validation via the org.eclipse.wst.validation.validator extension.] The validation messages are displayed to the user on the source page as as "temporary" annotations. These show up in the text as "squiggles" beneath the text and in the overview ruler as rectangles. The validation message itself is displayed by hovering the squiggle or rectangle.

Target ID Resolution

Clients can target their editor configuration to either a specific content type and/or editor. In the event there are conflicts as to which configuration should be used, below is the resolution policy.

configuration with matching editor id
configuration with matching editor id + ".source" in multipage editors
configuration with matching content type id
configuration for content type's base content type
default Structured Text Editor configuration

If more than one configuration is defined for an editor type or content type, the first one defined, is the first one served.

Discussion Point: How much and when is there a need for "user choice" in resolution.

SSE Model/Document Concepts

Structured Documents

Once central concept is the IStructuredDocument . It is intended to be similar to an IDocument, but with a few significant differences.

It has an associated parser and re-parser, which in our experience is how most programmers think of text models.
It also does not depend on a "reconciler thread" to reconcile the model. Instead, it notifies the model immediate of changes. Our philosophy is that it is the model itself that should decide if there would be a delay or a separate thread required to reconcile itself. For small and fast models, there is no need to do this on a separate thread and the user's experience will appear much more responsive. For other cases a longer time may be required, but when handled correctly, the model can still give immediate feedback to the user that a longer operation is in progress.

The IStructuredDocument is conceptually a stream of characters made up of (divided into) IStructuredDocumentRegions. The main constraint on what types of languages are appropriate for structured documents and structured regions is whether or not it has the concept of having a syntactically determined end. This is used to know how to correctly handle reparsing (deciding what is a "safe start" and "safe end" for the reparse operation).

Parsers and re-parsers

Parsers and reparsers can (will) be associated to a StucturedDocument via extension point. The parser/reparser pair has a few conceptual requirements. They must be able to handle any text (legal or not) and must return regions that completely cover the input text (for example, whitespace can not simply be ignored). Another, more difficult constraint to implement is that for any subset of text, the reparser must give the same results that the parser would if whole document re-parsed. (Granted, a legal fall-back is just to re-parse whole document, but can be "expensive").

Its important to handle ill-formed text. Heuristics are often used to make these decisions so sometimes we can and do "guess right" (based on what a user might be in the process of typing) and sometimes not -- in other words, its not always easily predictable as it is for "valid" text but is based on doing a reasonable job which would not invalidate subsequent reparsing.

Structured Document Events

The StructuredDocumentEvents are similar to IDocument's DocumentEvents, but provide much more information about the nature of the change in terms of IStructuredDocumentRegions (and the ITextRegions of which they are composed). Listeners, such as DOM parsers, can make use of this to minimize the amount of reparsing/rebuilding they have to do.

Content Models

These are abstract "descriptions" of what an instance document could or should look like. We use these for content assist, validation, and in some cases to actually parse a document to its "model" (or tree structure). We compute content models for arbitrary schemas, dtd, and taglibs, and have a "hard coded" one for html 4.01

Structured Models

Structured models are mostly interesting due to is extended types and implementers and exists as an abstraction to provide a consistent way to manage shared models and also to access its underlying structured document.

Node Notifiers

In addition to IAdaptable, many "Node" structures in SSE related models make use of a finer level of adaption which includes notification. This mechanism can be used to have improved UI updates or can be used to keep related models "in synch" (for example, a DOM model change can cause a change in an EMF model, and vis versa).

Model Management

Another important contribution of sse.core is the IModelManager. Its purpose is to provide a StructuredModel, appropriate to contentType, that can be shared between many clients at runtime. This increases efficiency since each client doesn't have to (re)create their own, but just as importantly, it is an easy way for clients to all stay "in synch"--a change in the model made by one client will be known by all others clients. The other motivation for this is it allows looser coupling, since clients who may not even know about each other can share the model without passing it around to each other.