On 23 Feb 2009, at 22:40, Mike Kucera wrote:
Did you take a look at the LR C parser? Its not part of the CDT core, its in a separate source folder in the CVS repository named lrparser. It uses an LALR parser generator called LPG. I designed it because we needed to add support for the UPC language to CDT which is a simple extension of C99. The DOM C parser in CDT doesn't lend itself to be extended so instead of writing a UPC parser from scratch we decided to create an extensible C parser and then extend it to support UPC. The UPC plugins are also in the CDT repository and you can take a look at them to see how the parser is extended. We felt it was a more forward thinking approach and it would help others who want to extend CDT. It could be extended to support Objective-C in the same manner.
Al, I suggest you take a look at this approach before rolling your own parser from scratch. It would save a chunk of effort. And really, this is exactly the situation that the LR parsers were designed for.
That sounds great. I'd not taken much time into looking at the parsers yet (though I'd wondered why there were so many ...) and I definitely had this on my 'todo' list. But it does look, on first approach, that it might be more amenable than the Gnu parser (at least in terms of extensibility). However, the implementation of this returns IToken keys, which for Objective-C might make sense to include other keywords (@class, @interface). But perhaps the grammar could be used to implement the same code without affecting the tokens.
Also before you dive in head first please understand that parsing C-like languages in CDT is actually more difficult that you would expect. First of all the language is ambiguous, for example something as simple as (x * y;) could mean x times y or it could mean the declaration of a pointer variable y of type x.
Yeah, I suspect Objective-C is going to be worse at this, since [foo bar] is the standard method call construct, which is going to throw its own spanner in the works to array parsing. There were a few attempts at an Antlr grammar before that never seem to have made it; so it's not likely to be an easy job.
The LR C parser already has a scheme for handing these ambiguities. Please take a look.
Will do. Thanks for the pointers.