Very high overview of COSMOS COmmunity Systems Management Open Source (COSMOS) Project Proposal

The project has been created. Please visit the project page.


Introduction

The COSMOS Project is a project proposed incubate under the supervision of the Technology PMC at Eclipse.org and once reaching full project status will become a top level project. This proposal is in the project Proposal Phase (as defined in the Eclipse Development Process document) and is written to declare its intent and scope. This proposal is written to solicit additional participation and input from the Eclipse community. You are invited to comment and/or join the project. Once announced, please send all feedback to the eclipse.technology.cosmos newsgroup. The project will follow the structure and guidelines as outlined by the Technology PMC.

Background

In the IT industry there is a long history of tools being developed for specific roles or tasks. It has often been the case that these tools did not integrate with each other, even though they are all working with the same end resource. Systems management is no different. In the past, there have been efforts to declare common API or data formats to facilitate integration and open communications. These efforts generally focused on a particular subset of resource types, execution environments, or management domain. However, the activities involved in managing the entire life cycle of an IT resource span a wide set of domains. Viewed holistically, systems management encompasses many disciplines and user roles that can be divided into several high level categories, such as:

  1. Health monitoring

  2. Inventory tracking

  3. Resource modeling

  4. Deployment modeling

  5. Deployment

  6. Dynamic configuration

  7. Configuration change tracking

COSMOS is being proposed to facilitate the first steps in the next evolutionary move in the integration in the area of systems management.

Project Overview

The mission of the Eclipse COSMOS Project is to build a generic, extensible, standards-based components for a tools platform upon which software developers can create specialized, differentiated, and inter-operable offerings of tools for system management.

It is important to focus the initial project efforts on specific problems. The domain of system management as shown above is very large and there is plenty of work to be done. Initially the key problem areas COSMOS intends to target can be described as follows.

Resource Monitoring

In the area of resource monitoring there are already many ways to observe information about common resources. In fact many resource providers support several of the current standard APIs in an effort to publish informationto the widest audience possible. However there is a constant trade off of performance and other factors against providing all the data to anyone requesting it. This has lead to support for purpose driven APIs have been standardized in order to try to focus the amount of data being requested. However although often APIs have been agreed and implemented, the data made available via these interfaces is not always consistent. The units of measure as well as the meaning of the terms are critical. For example is "available memory" of 2000 a measure of megabytes of RAM or gigabytes of disk space.

Historically it has been the case that each data consumer leveraged some mechanism, often a local agent, to manage it's access to the resource through some published api, and these agents would be in conflict with each other, causing customers to pick specific vendor solutions which implicitly excluded others. Because the use of the data is often greatly varied, and vendors typically do not cover every single user use case, this is a significant problem for end users. Users were also left with the cost of ownership and administration of these agents as they had to mix and match agents with run times depending on the tools they wanted to use. End users are now demanding agentless environments when environments provide native collection systems.

In order to address this set of problems, COSMOS intends to provide a data staging server that will exploit existing infrastructure and APIs to access data and normalize it into a single agreed form. In cases where local agents are required existing infrastructure will be used to access the data. COSMOS will promote standards based data collection systems and work with the community to select the optimal standards to follow.

In order to provide a complete out of the box user experience, and to demonstrate the value of the data that can be collected, COSMOS will also provide a BIRT based web interface as well as an RCP client. These user interfaces are being provided as reusable examples for browser or thick client tools. These will provide an exemplary out of the box experience to the COSMOS user.

Transaction monitoring and correlation

There are times when data from multiple resources need to be correlated in order analyze one resource in the context of another. A typical example is transaction decomposition of a distributed application. Standards such as ARM from Open Group are intended to address this need, however this is an example of the previously mentioned purpose driven APIs. As a run time component it provides a way to define correlation tokens and present them in logged events, however each implementation is unique and the run time needed to pass the correlation data is redundant with each provider.

COSMOS will leverage the work done by various run times and the TPTP project to advocate a standard correlation mechanism that can be used in distributed decomposition scenarios. COSMOS will consume this correlation data and support the consumption and provisioning of ARM events using this data. This will be done in the data collection component.

In order to allow data providers, be they existing resources or new applications, to more easily develop any new code needed to expose data through the standard interfaces COSMOS will provide developer targeted tools, if they do not exist, to simplify the task. COSMOS will manage a dependancy on tools and services in other open source projects when they exist rather than develop new ones. When COSMOS does need to develop technology in the area of enablement it will be done in the build to manage component.

Resource Modeling

In most tools that monitor resources there is an awareness of some of the more static features of a resource as well. For example the physical location and the IP address, in addition to the actual data being extracted from the resource is often known. Yet each tool has it's own representation of this information. With the growing presence of distributed and componentized applications this information also includes the overall topology of the application. This is redundant information between the tool used to debug a performance bottleneck and the tool that is used to deploy components of the application, or the tool that is doing load testing and monitoring of the application. Just as with the data collection scenarios discussed earlier there are also purpose driven formats for this kind of information. An example is the SCA composition model which is intended to describe the composition and relationship of the components making up the application, but at some point it is not intended to model the asset tags used in the inventory of a data center or abstract design views of the application. The resolution of this redundancy can once again be addressed by having a canonical representation of the resources. To that end SML is proposed as a constrained extension of XML to be used as the language to interchange this kind of information. COSMOS will provide a reference implementation of a Service Modeling Language Interchange Format (SML-IF) storage and validation system as well as basic tools to manipulate such data.

Organization

The project will contain several components; monitoring user interface, data collection and server, build to manage, and resource modeling.

  • Monitoring user interface
    This component will develop a web based and stand alone RCP based user experience for monitoring data.
  • Data collection and server
    This component will exploit instrumentation and wrappers developed in the Build to Manage component and other existing infrastructure. This component has the primary purpose of normalizing and storing data that is received from various data collection and management systems. In other words a client of those systems that normalizes and stores the data.
  • Build to Manage
    This component will be dependant on other open source instrumentation, and wrappers of applications and run times to enable data collection. The focus is enablement tooling, and advocating capabilites that already exist by being dependant on them.
  • Resource Modeling
    This component will be the home for a reference implementation of the emerging SML specification in the form of validators as well as common model instances and manipulation tools.

Scope

The goal of this proposal is to generate interest among companies involved in systems management so that the project can be defined by the community with an agreed upon scope. However, it is key to practically set the limits of the project at the outset. This project will deliver support for each of the categories described earlier, and do this via the components previously outlined. This is a significantly large domain and the COSMOS end user function will be limited to monitoring and data collection components needed to support application health monitoring and debugging as well as initial support for validating and manipulating Service Modeling Language (SML) based models.

In building an ecosystem and fostering community growth, it is important to provide a fundamental level of management function that offers immediate "out of the box" value to consumers and development tooling for vendors, resource providers, and system administrators. This is the objective of the first release of Cosmos. Following releases will add value for subsequent categories of use at the framework and end user function level.

Each component is complimented by set of development tools that facilitate the creation of extensions or authoring of key artifacts, e.g. resource models, agent configurations, etc. For example, the BIRT tooling can be used to create "canned" reports that can be deployed in the Web UI or re-used within the RCP client. The components of COSMOS will expose extension points to allow for both run time extension as well as tooling value add.

COSMOS as a Tool Platform

COSMOS is intended to provide a fundamental level of management functionality to its users.

It will provide immediate "out-of-the-box" value with workable solutions for resource monitoring and modeling upon open standards. These capabilities are useful in themselves for monitoring, and basic management needs, but are even more useful as building blocks for constructing higher-level management functions. The combination of tooling and run time offer a distinct advantage for COSMOS as no other open source offering effectively covers multiple aspects of the management life cycle.Integration with other Eclipse projects adds to the application life cycle already supported by those projects. This unique advantage over other open source projects directly benefit users who will extend the framework and tooling.

Initial Release Plan

COSMOS will follow a 7 week iteration cycle leading to a 1.0 release in March and a 1.5 release in June 2007 aligned with the Eclipse platform. The initial commitment will be to deliver the systems monitoring support and SML validation described earlier. As more committers are confirmed, the PMC will determine if more function can be provided on the same schedule. The next priority after systems monitoring support will be resource modeling, beginning with support for the emerging SML standard. The first iteration, due in Oct 06, will be focused on establishing the project infrastructure. Transferring function from other eclipse projects such as TPTP, will be discussed as possible deliverable in 1.5. The following are the current iteration complete dates.

IterationComments
I0 11/09/06 - 27/10/06 Establish infrastructure
I1 30/10/06 - 15/12/06
1.0 Shutdown15/01/07 - 02/03/07 1.0 GA
I2 15/01/07 - 02/03/07 1.5 Feature/API freeze
I3 05/03/07 - 20/04/07 Documentation freeze
I4 23/04/07 - 01/06/07 Candidate GA
1.5 Shutdown02/06/07 - 29/06/07 1.5 GA

Initial Community

The following companies are helping to shape the project and may contribute committers to get the project started:

Interested parties

The following companies have expressed interest in the project: