Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » DTP » Using DTP to analyze data flow
Using DTP to analyze data flow [message #52316] Wed, 14 January 2009 14:00 Go to next message
James Synge is currently offline James Synge
Messages: 2
Registered: July 2009
Junior Member
Last summer Brian Fitzpatrick asked the question "How are YOU using DTP?"
on his blog. I finally have a potentially novel/interesting answer. :-)

I needed to know which columns of base tables were being read by a set of
several hundred database views, and to which view columns they
contributed. To answer this I wrote a tool that builds on DTP, and does
the following:

a) Reads the schema from a text file and create corresponding DTP
objects (i.e. Database, Schema, Table, Column).
I didn't have access to the database (Teradata at a customer
site), so I couldn't make a connection to it in order to get
this data.

b) Read a file of records, each containing a DDL statement defining
a VIEW.

c) For each view definition, I did a very trivial parse to get the
query within the DDL. In addition, I had to hack on the query
to replace SQL3 and Teradata syntax with expressions that DTP
could parse.

d) After parsing the query with DTP's SQL Parser, the tool then
walks the resulting tree, determining all the columns that
contribute to each result column, transitively, which requires
handling nested and combined queries. This is by far the most
complicated part, due to challenges 1 and 2 below.

The key challenges were:

1) Very little documentation exists, so I had to explore the parsed
queries to understand the data structures. It would be helpful
if the (EMF?) model from which the data structures were generated
had at least a sentence about each field.

2) Walking the parse tree required a lot of down-casting; i.e. lots of:

if (o instanceof X) { X x = (X)o; handleX(x); }
else if (o instanceof Y) ...

I'm not clear on how best to handle this, but this pattern is one
of those "smells" that says a redesign might be in order.

3) The parser couldn't handle the SQL3 syntax in the view definitions,
so I had to "pre-process" the SELECT statement to replace such
syntax with something simpler. Note that I didn't specify the type
of database (Teradata) as I didn't have the appropriate vendor specific
plug-in.

I don't know if the vendor specific plug-ins have the ability to
extend/replace the SQL parser with vendor extensions.

James Synge
Re: Using DTP to analyze data flow [message #52342 is a reply to message #52316] Thu, 15 January 2009 09:34 Go to previous messageGo to next message
Eclipse User
Originally posted by: brianf.sybase.com

Wow! That *IS* a novel approach to using DTP.

For #1, on the documentation side, Hemant just added some additional
documentation for Model Base... Check out
http://www.eclipse.org/datatools/project_modelbase/ and scroll to the
bottom to look for the documentation.


Unfortunately I can't really speak to #2 or #3, but perhaps someone on the
SQL Dev Tools team can.

Very cool use of DTP though... Thanks for sharing! It's great to know that
folks are finding uses for DTP beyond what the core!

--Fitz

"James Synge" <james.synge@gmail.com> wrote in message
news:9e482d3c29fc475965a60539ceaa074e$1@www.eclipse.org...
> Last summer Brian Fitzpatrick asked the question "How are YOU using DTP?"
> on his blog. I finally have a potentially novel/interesting answer. :-)
>
> I needed to know which columns of base tables were being read by a set of
> several hundred database views, and to which view columns they
> contributed. To answer this I wrote a tool that builds on DTP, and does
> the following:
>
> a) Reads the schema from a text file and create corresponding DTP
> objects (i.e. Database, Schema, Table, Column).
> I didn't have access to the database (Teradata at a customer
> site), so I couldn't make a connection to it in order to get
> this data.
>
> b) Read a file of records, each containing a DDL statement defining
> a VIEW.
>
> c) For each view definition, I did a very trivial parse to get the
> query within the DDL. In addition, I had to hack on the query
> to replace SQL3 and Teradata syntax with expressions that DTP
> could parse.
>
> d) After parsing the query with DTP's SQL Parser, the tool then
> walks the resulting tree, determining all the columns that
> contribute to each result column, transitively, which requires
> handling nested and combined queries. This is by far the most
> complicated part, due to challenges 1 and 2 below.
>
> The key challenges were:
>
> 1) Very little documentation exists, so I had to explore the parsed
> queries to understand the data structures. It would be helpful
> if the (EMF?) model from which the data structures were generated
> had at least a sentence about each field.
>
> 2) Walking the parse tree required a lot of down-casting; i.e. lots of:
>
> if (o instanceof X) { X x = (X)o; handleX(x); }
> else if (o instanceof Y) ...
>
> I'm not clear on how best to handle this, but this pattern is one
> of those "smells" that says a redesign might be in order.
>
> 3) The parser couldn't handle the SQL3 syntax in the view definitions,
> so I had to "pre-process" the SELECT statement to replace such
> syntax with something simpler. Note that I didn't specify the type
> of database (Teradata) as I didn't have the appropriate vendor specific
> plug-in.
>
> I don't know if the vendor specific plug-ins have the ability to
> extend/replace the SQL parser with vendor extensions.
>
> James Synge
>
Re: Using DTP to analyze data flow [message #52394 is a reply to message #52316] Thu, 15 January 2009 12:57 Go to previous message
Brian Payton is currently offline Brian Payton
Messages: 152
Registered: July 2009
Senior Member
Regarding walking the model generated from the query, did you explore
the methods in the classes in the "helper" package of
org.eclipse.datatools.modelbase.sql.query? In particular, the
StatementHelper class has many methods that help with analyzing a query.

Extending the parser and model for a particular database dialect is
non-trivial, but do-able. (We have an extension for IBM databases
in-house.) If you would like to create one for Teradata, let me know.

James Synge wrote:
> Last summer Brian Fitzpatrick asked the question "How are YOU using
> DTP?" on his blog. I finally have a potentially novel/interesting
> answer. :-)
>
> I needed to know which columns of base tables were being read by a set
> of several hundred database views, and to which view columns they
> contributed. To answer this I wrote a tool that builds on DTP, and does
> the following:
>
> a) Reads the schema from a text file and create corresponding DTP
> objects (i.e. Database, Schema, Table, Column).
> I didn't have access to the database (Teradata at a customer
> site), so I couldn't make a connection to it in order to get
> this data.
>
> b) Read a file of records, each containing a DDL statement defining
> a VIEW.
>
> c) For each view definition, I did a very trivial parse to get the
> query within the DDL. In addition, I had to hack on the query
> to replace SQL3 and Teradata syntax with expressions that DTP
> could parse.
>
> d) After parsing the query with DTP's SQL Parser, the tool then
> walks the resulting tree, determining all the columns that
> contribute to each result column, transitively, which requires
> handling nested and combined queries. This is by far the most
> complicated part, due to challenges 1 and 2 below.
>
> The key challenges were:
>
> 1) Very little documentation exists, so I had to explore the parsed
> queries to understand the data structures. It would be helpful
> if the (EMF?) model from which the data structures were generated
> had at least a sentence about each field.
>
> 2) Walking the parse tree required a lot of down-casting; i.e. lots of:
>
> if (o instanceof X) { X x = (X)o; handleX(x); }
> else if (o instanceof Y) ...
>
> I'm not clear on how best to handle this, but this pattern is one
> of those "smells" that says a redesign might be in order.
>
> 3) The parser couldn't handle the SQL3 syntax in the view definitions,
> so I had to "pre-process" the SELECT statement to replace such
> syntax with something simpler. Note that I didn't specify the type
> of database (Teradata) as I didn't have the appropriate vendor specific
> plug-in.
>
> I don't know if the vendor specific plug-ins have the ability to
> extend/replace the SQL parser with vendor extensions.
>
> James Synge
>
Re: Using DTP to analyze data flow [message #594245 is a reply to message #52316] Thu, 15 January 2009 09:34 Go to previous message
Brian Fitzpatrick is currently offline Brian Fitzpatrick
Messages: 495
Registered: July 2009
Senior Member
Wow! That *IS* a novel approach to using DTP.

For #1, on the documentation side, Hemant just added some additional
documentation for Model Base... Check out
http://www.eclipse.org/datatools/project_modelbase/ and scroll to the
bottom to look for the documentation.


Unfortunately I can't really speak to #2 or #3, but perhaps someone on the
SQL Dev Tools team can.

Very cool use of DTP though... Thanks for sharing! It's great to know that
folks are finding uses for DTP beyond what the core!

--Fitz

"James Synge" <james.synge@gmail.com> wrote in message
news:9e482d3c29fc475965a60539ceaa074e$1@www.eclipse.org...
> Last summer Brian Fitzpatrick asked the question "How are YOU using DTP?"
> on his blog. I finally have a potentially novel/interesting answer. :-)
>
> I needed to know which columns of base tables were being read by a set of
> several hundred database views, and to which view columns they
> contributed. To answer this I wrote a tool that builds on DTP, and does
> the following:
>
> a) Reads the schema from a text file and create corresponding DTP
> objects (i.e. Database, Schema, Table, Column).
> I didn't have access to the database (Teradata at a customer
> site), so I couldn't make a connection to it in order to get
> this data.
>
> b) Read a file of records, each containing a DDL statement defining
> a VIEW.
>
> c) For each view definition, I did a very trivial parse to get the
> query within the DDL. In addition, I had to hack on the query
> to replace SQL3 and Teradata syntax with expressions that DTP
> could parse.
>
> d) After parsing the query with DTP's SQL Parser, the tool then
> walks the resulting tree, determining all the columns that
> contribute to each result column, transitively, which requires
> handling nested and combined queries. This is by far the most
> complicated part, due to challenges 1 and 2 below.
>
> The key challenges were:
>
> 1) Very little documentation exists, so I had to explore the parsed
> queries to understand the data structures. It would be helpful
> if the (EMF?) model from which the data structures were generated
> had at least a sentence about each field.
>
> 2) Walking the parse tree required a lot of down-casting; i.e. lots of:
>
> if (o instanceof X) { X x = (X)o; handleX(x); }
> else if (o instanceof Y) ...
>
> I'm not clear on how best to handle this, but this pattern is one
> of those "smells" that says a redesign might be in order.
>
> 3) The parser couldn't handle the SQL3 syntax in the view definitions,
> so I had to "pre-process" the SELECT statement to replace such
> syntax with something simpler. Note that I didn't specify the type
> of database (Teradata) as I didn't have the appropriate vendor specific
> plug-in.
>
> I don't know if the vendor specific plug-ins have the ability to
> extend/replace the SQL parser with vendor extensions.
>
> James Synge
>
Re: Using DTP to analyze data flow [message #594265 is a reply to message #52316] Thu, 15 January 2009 12:57 Go to previous message
Brian Payton is currently offline Brian Payton
Messages: 152
Registered: July 2009
Senior Member
Regarding walking the model generated from the query, did you explore
the methods in the classes in the "helper" package of
org.eclipse.datatools.modelbase.sql.query? In particular, the
StatementHelper class has many methods that help with analyzing a query.

Extending the parser and model for a particular database dialect is
non-trivial, but do-able. (We have an extension for IBM databases
in-house.) If you would like to create one for Teradata, let me know.

James Synge wrote:
> Last summer Brian Fitzpatrick asked the question "How are YOU using
> DTP?" on his blog. I finally have a potentially novel/interesting
> answer. :-)
>
> I needed to know which columns of base tables were being read by a set
> of several hundred database views, and to which view columns they
> contributed. To answer this I wrote a tool that builds on DTP, and does
> the following:
>
> a) Reads the schema from a text file and create corresponding DTP
> objects (i.e. Database, Schema, Table, Column).
> I didn't have access to the database (Teradata at a customer
> site), so I couldn't make a connection to it in order to get
> this data.
>
> b) Read a file of records, each containing a DDL statement defining
> a VIEW.
>
> c) For each view definition, I did a very trivial parse to get the
> query within the DDL. In addition, I had to hack on the query
> to replace SQL3 and Teradata syntax with expressions that DTP
> could parse.
>
> d) After parsing the query with DTP's SQL Parser, the tool then
> walks the resulting tree, determining all the columns that
> contribute to each result column, transitively, which requires
> handling nested and combined queries. This is by far the most
> complicated part, due to challenges 1 and 2 below.
>
> The key challenges were:
>
> 1) Very little documentation exists, so I had to explore the parsed
> queries to understand the data structures. It would be helpful
> if the (EMF?) model from which the data structures were generated
> had at least a sentence about each field.
>
> 2) Walking the parse tree required a lot of down-casting; i.e. lots of:
>
> if (o instanceof X) { X x = (X)o; handleX(x); }
> else if (o instanceof Y) ...
>
> I'm not clear on how best to handle this, but this pattern is one
> of those "smells" that says a redesign might be in order.
>
> 3) The parser couldn't handle the SQL3 syntax in the view definitions,
> so I had to "pre-process" the SELECT statement to replace such
> syntax with something simpler. Note that I didn't specify the type
> of database (Teradata) as I didn't have the appropriate vendor specific
> plug-in.
>
> I don't know if the vendor specific plug-ins have the ability to
> extend/replace the SQL parser with vendor extensions.
>
> James Synge
>
Previous Topic:Logging utility classes have (annoying) dependency on org.eclipse.core.runtime.Plugin
Next Topic:Basic .csv reporting question
Goto Forum:
  


Current Time: Fri Aug 29 14:29:13 EDT 2014

Powered by FUDForum. Page generated in 0.10228 seconds