CDT Parser Framework for 1.2 and Beyond
This document serves as an overview for the work that is being done to support CDT 1.2 & 2.0 language related features.
Author : John Camelon
Revision Date : 07/14/2003 - Version: 1.0

Table of Contents

1. Introduction
1.1 Overview
2. New Requirements
3. Design Overview
3.1 External Interfaces
3.2 Internals
3.3 C/C++ Language Variants
3.3 Selection Search/Code Assist
4. References


1. Introduction

In order to better communicate the changes that are happening related to the Parser/CDOM architecture, I have been asked to put together a document to describe the problems with the 1.1 parser architecture, where we are going with the parser framework in CDT 1.2 & beyond.  This document serves mostly as a plan of record, a catch-all for requirement and design discussions that have for the most part have happened exclusively @ IBM Canada.   

1.1 Overview

The CDT Parser improved significantly between release 1.0 & 1.1, yet while this was a significant achievement, the language support provided was only at a coarse structural level; features such as the Outline View and the Structure Compare features only rely upon having a structural view of the file, without parsing into function bodies or maintaining strict cross-reference information. 

We had also discovered a couple of serious problems with our Parser/IParserCallback strategy: 

  1. Since IParserCallback was far too granular, it was difficult for clients to use it without thinking that there had to be an easier way.  Since we can not always look ahead far enough in the Token Stream, sometimes the parser would try a particular path of execution, find it to be pointless, and then ask the client to rollback all of the callbacks that they had received.  The net result of this was that only one IParserCallback was implemented outside of the cdt.core.parser packages, and that was the DOMBuilder.  
  2. The DOM does not have a public interface, and unfortunately stored tokens as part of its building.  Since ITokens are linked in order to allow for rollback within the parser, that means that once a Token was stored, all were stored and not garbage collected.  Then, as the CModelBuilder or StructureComparator were building their own respective meta-models off the DOM, we would run out of memory on source files > 3 MB in size.  
  3. Since this was discovered in quick parse mode, the logical progression arises that the same thing would happen as we started to follow inclusions, we cannot have the entire DOM or AST in memory, otherwise we will run out of memory on every search.     
2. New Requirements
  1. We must be able to maintain both a quick parse mode (structural elements only) as well as a full parse mode (w/full cross reference information).  
  2. We must keep the same performance requirements for as stated in CDT 1.1, but be more conscious of our memory usage, and provide a callback strategy that allows for more garbage collection.  
  3. We must simplify the client interface to the parser so that clients do not need to use middleware (i.e. our DOM)  to use the CDT Parser.
  4. We must provide some rudimentary support for non-ANSI variants such as GNU in CDT 1.2, since that is our preferred compiler as a community.  
3. Design Overview

3.1 External Interface

IParserCallback has been removed (mostly) and is replaced with an interface named ISourceElementRequestor. This interface is modeled after the interface with the same name in the JDT, that provides coarse-granularity callbacks for code elements and cross references. The structure of the AST (and thus the callback) is more along the lines of UML-esque constructs (IASTMethod, IASTVariable, etc.) as compared to how they are represented in the grammar.

 

ISourceElementRequestor has two different categories of callback methods: acceptXXXX style methods are provided for completely parsed units that do not relate to Scopes or Inclusions. enterXXXX and exitXXXX style methods are provided for when we enter and exit a scope. It is the client's responsibility to keep a stack of scopes so that they can make sure to be aware of the current scope/inclusion level.  

It is expected that clients of ISourceElementRequestor do not keep hold of the nodes once they are processed in the callback.  In ParserMode.COMPLETE_PARSE mode, local variable definitions are garbage collected when they go out of scope, by holding onto these references you would be doing the CDT a disservice.  

All Parser implementations can be instantiated using static methods upon the ParserFactory class.  Please do not instantiate Parser internal classes from outside the Parser package.   

3.2 Internals

Our parser uses an IASTFactory to create the AST nodes that are called back through ISourceElementRequestor.  This way, the Parser can remain agnostic with regards to the specific implementation of an IAST-interface. 

Thus, we can have different IASTFactory implementations that allow for different semantic validation.   In particular, our ParserSymbolTable (which implements the ANSI C++ scoping and lookup rules) resides exclusively in the FullParseFactory.  

In the COMPLETE_PARSE mode, the AST Nodes serve as facades to the ParserSymbolTable Declaration-nodes. In QUICK_PARSE mode, the AST Nodes have simple implementations.

 

3.3 C/C++ Language Variants

It is undecided at this point to what extent we shall support non-ISO variants in the 1.2 time-frame.  If substantial support is deemed necessary, then it will be necessary to add Parser variants to the CDT Target Model.  

3.4 Selection Search/Code Assists

It will be necessary to provide another interface upon IParser that will allow for the client to specify an offset to parse up to. In this mode, ISourceElementRequestor callbacks will not be provided, but a node will be returned as a result of parse() that will qualify the client as to what the cursor/selection is referring to in the overall AST Parse-tree.

 

4. References
  1. http://dev.eclipse.org/viewcvs/index.cgi/~checkout~/cdt-core-home/docs/rationalProposals/Scanner/Scanner.html?cvsroot=Tools_Project
  2. http://dev.eclipse.org/viewcvs/index.cgi/~checkout~/cdt-core-home/docs/rationalProposals/Parser.html?cvsroot=Tools_Project
  3. ISO C++  Standard 1998
  4. ISO C Standard 1999

Last Modified on July 14, 2003