My background
I'm a member of the Eclipse JDT UI team since
its beginning. Responsibilities are the quick fix feature, including
the
underlying code rewriting infrastructure. I'm in close contact with the
JDT model and DOM-AST and also helped there evolving these APIs. I
happened to be the author of a 3 week C-plug-in prototype that later
became be the seed for the current CDT plug-in.
Motivation
The approach taken when implementing this CDT prototype was to
follow the concepts and structure of the JDT plug-in. Eclipse gives you
comprehensive support to integrate (e.g. building, launching, compare
and
search) and reusing/extending existing components like editors with
outline views. Still there were parts that had to been copied; most
notable the language model infrastructure.
Several other language plug-in implementations followed that
same way and duplicated again the same code, resulting in the desire
that
Eclipse should provide a common infrastructure for language models
to avoid code duplication.
API is always costly in
creation and especially in maintenance. It introduces new dependencies
that can hinder innovation as you get locked into solutions. JDT
was therefore very conservative
investing work towards this topic. However, more infrastructure has
significant benefits. For developers it's a time saver, users get
language plug-ins, faster and maybe designed more consistently, and the
Eclipse platform wins by providing new infrastructure.
To summarize, the goals are:
-
Distill the JDT / Platform proven concepts so they
can also be used for new language plug-ins.
-
A fresh look at the problem, make is as simple as
possible.
Start fresh and don't have the goal to bring existing plug-ins into the
framework. Gain more experience with
new
language plug-ins to create new infrastructure and improve existing
plug-ins in this direction.
-
Provide a seed with the goal to be grown by the community
Proposal
In my proposal for a seed, I would like to look at the problem
from a compiler and not from an editor or explorer perspective. A
compiler's inputs are tokens and it uses parsers to build syntax trees.
Most of the tooling for a language plug-in (e.g. refactoring, quick
fix) directly works on ASTs. The proposal therefore 'structures' the
language model along these lines and use tokens, AST nodes, token streams and parser as foundation for a
language independent toolkit.
The interesting question is, what kind of concepts are needed
as a foundation to implement as much concrete functionality as possible?
Only having AST nodes and tokens will not be enough. For example it
would not be possible to show an outline (of declarations) in a generic
way. The solution is to add additional information to AST nodes such as
structural information. This additional information can be annotations
on the node itself or provided by visitors that calculate it.
This is a sketch of abstractions, concrete type names have
been used for illustration only:
- ILanguageModel: access point to the languages
infrastructure
- getTokenStream(ISourceElement) : ITokenStream
- createAST(ITokenStream) : AST
- ITokenStream
- getNextToken(): ITokenNode
- IAST
- getASTRoot(): IASTNode
- findNode(position)
The types ITokenNode and IASTNode could both be unified using INode
- ITokenNode:
abstract base class for tokens
- IASTNode: abstract
base class for AST nodes: has parent (AST node) and children (nodes:
AST nodes and tokens), provides functionality to traverse children and
to the node type
- INodeType: meta
information describing token types and AST node types
- what are the children for a node of this type: what token
types what node types, are they mandatory
- which nodes are declarations
Languages often have the concept of declarations and references to
these
declarations. A resolver connects references to declarations
- IResolver
- Given a AST node, find the corresponding
declaration (to be discussed in what format)
The following sections describe what language neutral infrastructure
can be built using these abstractions:
- Handle based
elements: ASTs are usually expensive to build and to keep in
memory. There's a need for a more lightweight element that can be used
e.g. by viewers. These elements are created from the AST, but don't
hold a reference to it. They can rebuild the underlying AST anytime
when required.
The handles are usualy only a
subset of the existing AST nodes, typically all declarations.
- NodeHandle
- Inexpensive representatation of an AST node that can be
shown in viewer. Does not hold on the full AST, but can rebuild AST if
necessary
- Handles: Might not exist anymore (or not yet)
- can be stored/restored using mementos
- Search index: Indexing
declarations and references
- AST flattening
and code formatting using the
AST node type properties
- AST rewriting
infrastructure using AST flattening and code formatting
On the user interface front it is now possible to implement a 'generic
editor':
- syntax highlighting
from token streams
- structure views
(Outlines) from handle based elements
- code resolve
('Open Declaration F3') using the resolver
- structured selection
expansion (Source > Expand
Selection) and bracket matching
using AST nodes
- mark occurrences
and local (linked) rename with
the resolver
- auto-indent using
the formatter
- ...
It has to be noted that using such a generic editor is good to get
quick results for new plug-ins. It is likely that language plug-in will
use the generic editor as a base, but extend with language specific
features.
Optional opportunities (extra plug-ins):
- Generation of compatible scanners and ASTs from a grammar
|