Data Tools Project
This project proposal is in the Proposal Phase and is posted here to solicit community feedback, additional project participation, and ways the project can be leveraged from the Eclipse membership-at-large. You are invited to comment on and/or join the project. Please send all feedback to the http://www.eclipse.org/newsportal/thread.php?group=eclipse.dtp newsgroup.
Project Organization
The Data Tools Project (DTP) is a proposed open-source Top Level Project of eclipse.org. The Charter describes the scope, the organization of the project, the roles and responsibilities of the participants, and the top-level development process for the project.
Overview
“Data Tools” is a vast domain, yet there are a fairly small number of foundational requirements when developing with or managing data-centric systems. A developer is interested in an environment that is easy to configure, one in which the challenges of application development are due to the problem domain, not the complexity of the tools employed. Data management, whether by a developer working on an application, or an administrator maintaining or monitoring a production system, should also provide a consistent, highly usable environment that works well with associated technologies.
Such an environment starts with key frameworks designed both for use and extensibility. Examples include location and management of data source drivers, and configurations for access to particular data source instances. Once a connection is successfully made, the next task often is to explore the data source, making changes as required. Some of these operations might be carried out by GUI actions, others directly through commands. For example, users – both developers and administrators – typically will create, edit, and test SQL for these commands. Assistance in editing SQL through code completion, formatting, and dialect specialization, greatly enhances productivity. Further, the ability to execute or debug commands, both SQL and stored procedures, rounds out the rapid development process that Eclipse supports so well. Finally, bridging chasms, whether between relational, object, or other structures, presents challenges that data management tooling should address.
Project Principles
Founded in the spirit of open-source, community-driven principles guiding eclipse.org itself, this project will concentrate on several key ideals:
Vendor neutrality: We intend to provide data management frameworks and tools not biased toward any vendor. Our intention is that DTP be leveraged to provide the Eclipse community with the widest range of choices possible. To this end, we seek community involvement in formulating key framework interfaces, so that the largest possible constituency is represented.
Extensibility: We recognize both the common need for data tooling infrastructure and the desire to extend the offerings in new and innovative ways. To support these efforts, our components will be designed for, and make good use of, extensibility mechanisms supported by Eclipse.
Community Involvement: Success for DTP, as with other eclipse.org projects, is as much a factor of community involvement as the technical merit of its components. We strongly believe that DTP will achieve its full potential only as the result of deep and broad cooperation with the Eclipse membership-at-large. Thus, we will make every effort to accommodate collaboration, reach acceptable compromises, and provide a project management infrastructure that includes all contributors, regardless of their affiliation, location, interests, or level of involvement. Regular meetings covering all aspects of DTP, open communication channels, and equal access to process will be key areas in driving successful community involvement.
Transparency: As with all projects under the eclipse.org banner, key information and discussions at every level – such as requirements, design, implementation, and testing – will be easily accessible to the Eclipse membership-at-large.
Agile development We will strive to incorporate into our planning process innovations that arise once a project is underway, and the feedback from our user community on our achievements to date. We think an agile planning and development process, in which progress is incremental, near-term deliverables are focused, and long-term planning is flexible, will be the best way to achieve this.
Project Scope
Data-centric applications are those having a connection to a data source, and a mapping from a data source to an in-memory model. The distinguishing characteristic of such applications is that their domain of capabilities is no more specific than data-centric. For instance, while a Java source file and its in-memory representation could be considered data, the domain of Java development is clearly present, and is more specific than just “data.” Thus, data-centric delineates an abstract, foundational domain, which is superseded by more specific domains when the application manipulates something more than just “data.” An application with a more specific domain is data-dependent rather than data-centric. Data-dependent applications are not within the scope of this project.
Using a model-driven approach, the Data Tools Project (DTP) consists of extensible frameworks and exemplary tools for data-centric applications. These include:
In-memory representation: Models providing a domain-based interaction with data, such as database definitions, query models, result sets, and objects. These models provide the basis upon which all other DTP components are constructed.
Connectivity: Specification and configuration of data source drivers.
Management: Administration of data sources including both generic and vendor-specific configuration options. Examples include adding and removing tables from a database, setting type information for contained data, and setting performance parameters.
Data-centric model transformation: Changing data from one format to another is a common task in data-centric application development. For example, there are several popular dialects of the SQL standard. A query using vendor-specific-dialect extensions is useless when used with another vendor’s data source. Hence, there is a need to translate between the two dialects.
In addition to these areas, we expect that future versions of DTP will include:
Extract-Transform-Load: Obtaining data from and supplying data to data sources, typically using large-scale batch movements and involving data checking and validation. Often includes some operations on data obtained before loading into target data source.
Data Mapping: Mapping between data source and in-memory representation, used for bridging between domains such as object, relational, hierarchical, and multi-dimensional data structures.
Although the scope of DTP includes exemplary connectors for popular open source and commercial data sources, these are not necessarily intended to be the definitive connectors. Instead, they are intended to serve two purposes. First, they are intended to enable users to immediately use these data sources, although possibly with not exploiting all their features. Second, they are intended to serve as examples to both commercial and open source developers who want to integrate data sources into Eclipse. It is consistent with the goals of this project that the exemplary connectors become superseded by more complete implementations provided by third parties, both commercial and open source.
Projects
Initially DTP will contain the following three projects, and emphasize relational data sources and structures. Since the core frameworks and tools are data source agnostic, this initial emphasis on relational data does not preclude contributions in later releases of DTP or from the community of support for other data source types.
Model Base
The Model Base project provides the foundation for DTP. Using industry best practices such as model-driven development with UML, and taking advantage of the Eclipse Modeling Framework (EMF), initially included are models for:
Driver definition
Database definition
SQL
SQL Query
Key features supported and benefits provided include:
Support for change management: Models can be version controlled through Eclipse team support.
Broad editing support: Visual and other means of model editing, integrated with EMF, enabling seamless editing and EMF model generation.
Published models with documentation (Java Doc) to support DTP consumers.
Extensible and database-agnostic models.
Complied with the latest standards, such as SQL.
Supporting JDBC and other connectivity standards
Sample code/plug-ins: To demonstrate model usage.
Connectivity
The Connectivity project includes components for defining, connecting to, and working with data sources. These include:
Driver Management Framework
Access to the appropriate drivers is a prerequisite for programmatic interaction with data sources. The Driver Management Framework (DMF) supplies an Eclipse preference page enabling users to create driver definitions based on supplied templates. A number of templates are provided in the base installation, and additional templates can be added by component developers contributing to DMF extension points.
Connection Management Framework
The Connection Management Framework (CMF) is the foundation upon which specific connection types are created. The connection types, called Connection Profiles (CP), are contributed to the CMF through extension points. Users then connect to data source instances by creating and configuring a CP for that data source type. Data source-standard configuration parameters, such as the connection URL, user name, and password, are provided on CP instance creation and stored as secure meta-data for the CP. CP allow for host connectivity checks (“ping”), connection, auto-connect on CP startup, and disconnect. Further, CP Extensions enable additional functionality and content to be added to a CP. For reuse of CP instance configuration, base export/export functionality is provided by CPF and surfaced in tools such as the DTP Explorer (see below). Data source CP then become the connection providers through which other DTP tooling accesses data source instances.
JDBC connection support
DTP will include a JDBC driver template and CP, as a means of enabling database connectivity, and serving as an example for further CP development. Database-specific capabilities can then be surfaced as CP extensions, allowing for specialization and presentation of differentiating database functionality directly in that database's CP.
Data Source Explorer
The Data Source Explorer (DSE) is an Eclipse view housing CP instances. From this view, CP capabilities are surfaced, and data source content is presented. The type and level of detail for any one instance is constrained only by the CP itself. DSE also is a provider of CP instance data to clients, such as drag and drop and API calls. This allows data tooling requiring connection management to interact with the DSE as a mediator to CP instances.
Open Data Access
The Open Data Access (ODA) component is an open and flexible data access framework that allows applications to access data from both standard and custom data sources. It enables data connectivity between data consumers and data source providers through published run-time and design-time interfaces. In addition, the framework also includes an ODA driver management package that helps an ODA consumer application to manage diverse behavior of individual ODA data drivers.
A data driver is created simply by implementing the run-time interfaces defined by the framework. The run-time interfaces include support for establishing a connection, accessing meta-data, and executing queries to retrieve data. A driver can define internal data source connection profiles and/or work with the CMF's Connection Profiles extensions. Once developed, the driver can be registered through an extension point with individual ODA consumer components to enable data connectivity. The framework also provides design-time interfaces to integrate custom query builders within an application designer tool.
SQL Development Tools
The SQL Development Tools project provides frameworks and tools for deep and broad SQL support. The frameworks include:
Routines Editor Framework
An extensible framework for editing database routines and SQL statements. Vender-specific extensions enable specialized support for particular databases.
Routines Debugger Framework
While routine debug support varies widely in existing database offerings, the Routines Debugger Framework will provide an extensible base enabling debug support for specific cases in a manner consistent with existing Eclipse debug infrastructure.
SQL Query Parser
Although SQL is defined by a standard, several major dialects exist. Thus, while the standard must be supported, practicality also demands flexibility in adjustment to dialects. The SQL Query Parser meets these needs by providing an extensible framework, enabling dialect-aware SQL components and tools.
SQL Execution Plan Framework
The ability to understand how a SQL evaluation engine will execute a given query is vital in tuning queries to optimize performance. The SQL Execution Plan Framework will provide a means for capturing and presenting execution plans in a generic fashion, enabling extends to customize support for specific SQL execution engines.
The tools include:
SQL Editor
The SQL Editor will provide an exemplary tool for standard text-based editing of SQL statements. Providing content assist tied to the SQL Model, syntax colorization, and multiple statement support, this editor will provide an essential tool for data-centric development.
Visual SQL Builder
The Visual SQL Builder allows for graphical editing of SQL, raising the level of abstraction, increasing developer productivity, and making query construction possible for a wider user base.
Results View
The Results View displays the output of routine or SQL statement execution in a tabular form typical of SQL result sets. These results can be exported to persistent storage in a variety of formats and reloaded at a later time into the results view.
Script History
Typically a number of scripts will be executed repeatedly during the course data-centric development. The ability to retain a history of these queries and thereby quickly repeat execution of them and view results increases productivity. The Script History is a view meeting these needs, based on a development session.