A Seed for an Eclipse Language Toolkit

Martin Aeschlimann, IBM Research Zurich

Position paper for the Eclipse Language Symposium in Ottawa (October 2005)

My background

I'm a member of the Eclipse JDT UI team since its beginning. Responsibilities are the quick fix feature, including the underlying code rewriting infrastructure. I'm in close contact with the JDT model and DOM-AST and also helped there evolving these APIs. I happened to be the author of a 3 week C-plug-in prototype that later became be the seed for the current CDT plug-in.


The approach taken when implementing this CDT prototype was to follow the concepts and structure of the JDT plug-in. Eclipse gives you comprehensive support to integrate (e.g. building, launching, compare and search) and reusing/extending existing components like editors with outline views. Still there were parts that had to been copied; most notable the language model infrastructure.

Several other language plug-in implementations followed that same way and duplicated again the same code, resulting in the desire that Eclipse should provide a common infrastructure for language models to avoid code duplication.

API is always costly in creation and especially in maintenance. It introduces new dependencies that can hinder innovation as you get locked into solutions.  JDT was therefore very conservative investing work towards this topic. However, more infrastructure has significant benefits. For developers it's a time saver, users get language plug-ins, faster and maybe designed more consistently, and the Eclipse platform wins by providing new infrastructure.

To summarize, the goals are:

  • Distill the JDT / Platform proven concepts so they can also be used for new language plug-ins.

  • A fresh look at the problem, make is as simple as possible. Start fresh and don't have the goal to bring existing plug-ins into the framework. Gain more experience with new language plug-ins to create new infrastructure and improve existing plug-ins in this direction.

  • Provide a seed with the goal to be grown by the community


In my proposal for a seed, I would like to look at the problem from a compiler and not from an editor or explorer perspective. A compiler's inputs are tokens and it uses parsers to build syntax trees. Most of the tooling for a language plug-in (e.g. refactoring, quick fix) directly works on ASTs. The proposal therefore 'structures' the language model along these lines and use tokens, AST nodes, token streams and parser as foundation for a language independent toolkit.

The interesting question is, what kind of concepts are needed as a foundation to implement as much concrete functionality as possible?
Only having AST nodes and tokens will not be enough. For example it would not be possible to show an outline (of declarations) in a generic way. The solution is to add additional information to AST nodes such as structural information. This additional information can be annotations on the node itself or provided by visitors that calculate it.

This is a sketch of abstractions, concrete type names have been used for illustration only:

  • ILanguageModel: access point to the languages infrastructure
    • getTokenStream(ISourceElement) : ITokenStream
    • createAST(ITokenStream) : AST
  • ITokenStream
    • getNextToken(): ITokenNode
  • IAST
    • getASTRoot(): IASTNode
    • findNode(position)
The types ITokenNode and IASTNode could both be unified using INode
  • ITokenNode: abstract base class for tokens
  • IASTNode: abstract base class for AST nodes: has parent (AST node) and children (nodes: AST nodes and tokens), provides functionality to traverse children and to the node type
  • INodeType: meta information describing token types and AST node types
    • what are the children for a node of this type: what token types what node types, are they mandatory
    • which nodes are declarations
Languages often have the concept of declarations and references to these declarations. A resolver connects references to declarations
  • IResolver
    • Given a AST node, find the corresponding declaration (to be discussed in what format)
The following sections describe what language neutral infrastructure can be built using these abstractions:
  • Handle based elements: ASTs are usually expensive to build and to keep in memory. There's a need for a more lightweight element that can be used e.g. by viewers. These elements are created from the AST, but don't hold a reference to it. They can rebuild the underlying AST anytime when required.
The handles are usualy only a subset of the existing AST nodes, typically all declarations.
  • NodeHandle
    • Inexpensive representatation of an AST node that can be shown in viewer. Does not hold on the full AST, but can rebuild AST if necessary
    • Handles: Might not exist anymore (or not yet)
    • can be stored/restored using mementos
  • Search index: Indexing declarations and references
  • AST flattening and code formatting using the AST node type properties
  • AST rewriting infrastructure using AST flattening and code formatting

On the user interface front it is now possible to implement a 'generic editor':
  • syntax highlighting from token streams
  • structure views (Outlines) from handle based elements
  • code resolve ('Open Declaration F3') using the resolver
  • structured selection expansion (Source > Expand Selection) and bracket matching using AST nodes
  • mark occurrences and local (linked) rename with the resolver
  • auto-indent using the formatter
  • ...
It has to be noted that using such a generic editor is good to get quick results for new plug-ins. It is likely that language plug-in will use the generic editor as a base, but extend with language specific features.

Optional opportunities (extra plug-ins):
  • Generation of compatible scanners and ASTs from a grammar