Eclipse Community Forums: IMP » IMP support for multplei ASTs

Help

Home

Home » Archived » IMP » IMP support for multplei ASTs

Show: Today's Messages :: Show Polls :: Message Navigator

IMP support for multplei ASTs [message #28003]

Sun, 14 June 2009 16:46

Eclipse User

Originally posted by: mattias.felt.removethis.esu.edu

Hello,

In my IDE compilation unit has its own AST. But each project consists of
several
compilation units (e.g. files), so to be able to do some kind of semantic
analysis several
AST must be analysed. I can't see how IMP (LPG?) can help me with this.

Any thoughts ideas would be appreciated.

regards,

M

Report message to a moderator

Re: IMP support for multplei ASTs [message #29017 is a reply to message #28003]

Tue, 14 July 2009 16:31

Robert M. Fuhrer

Messages: 294
Registered: July 2009

Senior Member

Mattias Felt wrote:
> Hello,
>
> In my IDE compilation unit has its own AST. But each project consists of
> several
> compilation units (e.g. files), so to be able to do some kind of semantic
> analysis several
> AST must be analysed. I can't see how IMP (LPG?) can help me with this.
>
> Any thoughts ideas would be appreciated.

Hi there,

The standard (scalable) way of doing this is to process each AST in turn
to produce some sort of "digested" data structures that describe the
salient characteristics of each compilation unit/type/function/etc. This
representation should be space-efficient enough to be able to fit
however many compilation units you might find in a user's workspace, or
at least as many compilation units as you need to process for the
particular analysis in question. You then run the core of your analysis
over this digested representation. One needs to take care to save enough
information in the "digested" form to correlate back to the ASTs/source.
[Saving source extents for the relevant entities is typically enough.]

The representation might be a dictionary of types, a list of defined or
referenced variables/functions, 3-address code for function bodies, or
whatever you need.

To help with this, the IMP's "PDB" (Program DataBase) plugin
org.eclipse.imp.pdb.values provides persistable representations and
operators for a variety of types that are useful for program analysis,
e.g. sets, relations, maps, and so on.

The PDB is intended to be fairly CPU- and memory-efficient, and can be
used, for example, to persist indices in support of indexed search.

We're actively working on the PDB (particularly with our colleagues at
CWI).

The other area of active development in this space is the declarative
language "Rascal", the result of some nice work at CWI, which we intend
to make available as part of IMP in the not-too-distant future. Rascal
combines both AST pattern matching and computations over the kinds of
data structures that the PDB supports. As a result, it can be used for
both "fact extraction" and analysis per se.

I don't believe Rascal is ready for prime time yet, but it's getting
there. In the meantime you can write code against the PDB API.

If you tell us more about the kind of analysis you're interested in,
I might be able to provide more specific guidance.

--
Cheers,
-- Bob

--------------------------------
Robert M. Fuhrer
Research Staff Member
Programming Technologies Dept.
IBM T.J. Watson Research Center

IDE Meta-tooling Platform Project Lead (http://www.eclipse.org/imp)
X10: Productive High-Performance Parallel Programming (http://x10.sf.net)

Report message to a moderator

Re: IMP support for multplei ASTs [message #29280 is a reply to message #29017]

Thu, 16 July 2009 13:22

Eclipse User

Originally posted by: mattias.felt.removethis.esu.edu

Hi,

The IDE that we are working on supports a proprietary DSL (partly influenced
by languages like Pascal and Ada).

The required semantic analysis needed is in the first stage is type checking
and scope analsysis (the language support inheritance and nested methods
etc), but in a near future we want
to add warnings for unused (unrefernced) variables etc. The analsysis is
needed to make it possible to do the correct color highlighting, symbol
linking etc in the UI, but we can antecipate
that the users wants more direct feedback by the editor to avoid round-trips
to the build system.

So as I understand from your description the scenario is: each
CompilationUnit has its corresponding AST, each AST is then condensed into a
"Symbol table" that is stored using the PDB API
[The AST is after this obsolete and can be released / deleted ? ]. The
semantic analysis is then performed [ in the background] against the PDB
API.

( It has been some years since I took a Compiler Construction Course,
noticed that the Dragon book still is used :-) , but I recall something of
a Concret Syntax Tree (CST) involved in
semantic analysis, but perhaps that is just another terminology)

I just took a quick look, but could not find much documentation or sample
code about PDB, but perhaps I can find this on the home of "Rascal"?

BR

M

"Robert M. Fuhrer" <rfuhrer@watson.ibm.com> skrev i meddelandet
news:h3ibt8$kie$1@build.eclipse.org...
> Mattias Felt wrote:
>> Hello,
>>
>> In my IDE compilation unit has its own AST. But each project consists of
>> several
>> compilation units (e.g. files), so to be able to do some kind of semantic
>> analysis several
>> AST must be analysed. I can't see how IMP (LPG?) can help me with this.
>>
>> Any thoughts ideas would be appreciated.
>
> Hi there,
>
> The standard (scalable) way of doing this is to process each AST in turn
> to produce some sort of "digested" data structures that describe the
> salient characteristics of each compilation unit/type/function/etc. This
> representation should be space-efficient enough to be able to fit however
> many compilation units you might find in a user's workspace, or at least
> as many compilation units as you need to process for the particular
> analysis in question. You then run the core of your analysis over this
> digested representation. One needs to take care to save enough information
> in the "digested" form to correlate back to the ASTs/source.
> [Saving source extents for the relevant entities is typically enough.]
>
> The representation might be a dictionary of types, a list of defined or
> referenced variables/functions, 3-address code for function bodies, or
> whatever you need.
>
> To help with this, the IMP's "PDB" (Program DataBase) plugin
> org.eclipse.imp.pdb.values provides persistable representations and
> operators for a variety of types that are useful for program analysis,
> e.g. sets, relations, maps, and so on.
>
> The PDB is intended to be fairly CPU- and memory-efficient, and can be
> used, for example, to persist indices in support of indexed search.
>
> We're actively working on the PDB (particularly with our colleagues at
> CWI).
>
> The other area of active development in this space is the declarative
> language "Rascal", the result of some nice work at CWI, which we intend
> to make available as part of IMP in the not-too-distant future. Rascal
> combines both AST pattern matching and computations over the kinds of
> data structures that the PDB supports. As a result, it can be used for
> both "fact extraction" and analysis per se.
>
> I don't believe Rascal is ready for prime time yet, but it's getting
> there. In the meantime you can write code against the PDB API.
>
> If you tell us more about the kind of analysis you're interested in,
> I might be able to provide more specific guidance.
>
> --
> Cheers,
> -- Bob
>
> --------------------------------
> Robert M. Fuhrer
> Research Staff Member
> Programming Technologies Dept.
> IBM T.J. Watson Research Center
>
> IDE Meta-tooling Platform Project Lead (http://www.eclipse.org/imp)
> X10: Productive High-Performance Parallel Programming (http://x10.sf.net)

Report message to a moderator

Re: IMP support for multplei ASTs [message #29356 is a reply to message #29280]

Fri, 17 July 2009 16:22

Robert M. Fuhrer

Messages: 294
Registered: July 2009

Senior Member

Mattias Felt wrote:
> Hi,
>
> The IDE that we are working on supports a proprietary DSL (partly influenced
> by languages like Pascal and Ada).
>
> The required semantic analysis needed is in the first stage is type checking
> and scope analsysis (the language support inheritance and nested methods
> etc), but in a near future we want
> to add warnings for unused (unrefernced) variables etc. The analsysis is
> needed to make it possible to do the correct color highlighting, symbol
> linking etc in the UI, but we can antecipate
> that the users wants more direct feedback by the editor to avoid round-trips
> to the build system.

Ok, that's fairly standard/sensible stuff.

For name binding and type-checking, it depends on the particulars of
the language as to whether the requisite analysis is completely local
or not. If it's entirely local, then you may be able to perform the
analysis on the AST, augmented/decorated with some simple data
structures. This is the case for the trivial demo language "LEG" we
supply with IMP. If it's not (as it is with Java and C/C++), then you
may want/need something a little more sophisticated.

One option is to produce an "object file" that provides a dictionary
of declared entities in a suitable form at the front of the "object
file", so that clients need not rifle through the whole file when
searching for that info.

Another option is to produce a more global database of declared
entities and references. For this, we'd advocate using the PDB.

As for editor highlighting and the like, the normal scenario is to
use the AST for information that drives views that are directly
associated with an open editor buffer. Other, more global, views
would again appeal to a global database, such as can be created
and persisted using the PDB.

Finally, if you need to perform a whole-program analysis, such as
global data-flow or type analysis, again, the PDB and its data
structures and operators should prove useful.

> So as I understand from your description the scenario is: each
> CompilationUnit has its corresponding AST, each AST is then condensed into a
> "Symbol table" that is stored using the PDB API
> [The AST is after this obsolete and can be released / deleted ? ]. The
> semantic analysis is then performed [ in the background] against the PDB
> API.

Yes, that's right.

> ( It has been some years since I took a Compiler Construction Course,
> noticed that the Dragon book still is used :-) , but I recall something of
> a Concret Syntax Tree (CST) involved in
> semantic analysis, but perhaps that is just another terminology)

Actually, the term "concrete syntax" usually refers to a syntax tree
that retains *all* of the original source info, including whitespace,
comments, tokens for "binary" syntactic entities (e.g. modifiers that
are either present or not), and so on. The purpose of the abstraction
in an abstract syntax tree is partly to remove some of this in order
to provide a simpler API for analyses to use.

> I just took a quick look, but could not find much documentation or sample
> code about PDB, but perhaps I can find this on the home of "Rascal"?

You're right that there's not much documentation yet. I'll ping Jurgen,
who's leading the PDB development, to see what we can put in place.

--
Cheers,
-- Bob

--------------------------------
Robert M. Fuhrer
Research Staff Member
Programming Technologies Dept.
IBM T.J. Watson Research Center

IDE Meta-tooling Platform Project Lead (http://www.eclipse.org/imp)
X10: Productive High-Performance Parallel Programming (http://x10.sf.net)

Report message to a moderator