Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Archived » COSMOS » COSMOS for high-performance computing
COSMOS for high-performance computing [message #9527] Thu, 15 November 2007 00:07 Go to next message
Eclipse UserFriend
Originally posted by: randal.lanl.gov

I lead a project at Los Alamos National Laboratory charged with revamping
(replacing) our current monitoring infrastructure for HPC systems. Our
environment is several 1000-10000 node Linux clusters, and our definition
of monitoring is real-time alerting, system event investigation, and
regular
reporting of system interrupts in some detail.

Our requirements documentation identifies several concepts in common with
the COSMOS project--the importance of a system model, for instance.
However, we're having trouble pulling out the details from current
documentation, and the June, 2008, general release date is problematic for
us.

We're currently talking with GroundWork and Zenoss (only one of whom seems
to be involved with COSMOS) about our extension of one of their
infrastructures to meet our needs. Is COSMOS release 0.4 something we
should consider as a basis for a project that needs to provide software
used in a production HPC environment, or should we just not spend the time?

Regardless of the answer to that question, what is the proper mechanism
for understanding the core COSMOS principles (other than what we glean
from the eclipse site)? For instance, is an HPC environment an eventual
potential target, or is the focus on networks and application servers?
There seem to be some biases (each piece of data is atomically relevant,
with little room for higher-level correlations, for instance) in all the
products/infrastructures we've surveyed, and I personally would like to
understand whether the biases are real or if we just misunderstand some
underlying concepts.

Thanks,
Rand
Re: COSMOS for high-performance computing [message #9570 is a reply to message #9527] Wed, 21 November 2007 16:22 Go to previous message
No real name is currently offline No real nameFriend
Messages: 404
Registered: July 2009
Senior Member
Hi Randal,

There is currently an enhancement open to integrate Nagios with the COSMOS
framework:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=188390

The purpose of the enhancement is to provide a comprehensive solution for
easily monitoring a set of hosts and services. A design document is
underway to spell out the requirements and implementation details. Assuming
the approval of the enhancement, there should be something tangible by the
end of COSMOS first release. Feel free to add yourself to the CC list of
the enhancement. Help and guidance from the community will always be
appreciated.

As for a system model, COSMOS uses a project-specific model based on the
Service Model Language (SML) to represent resources. This was developed to
illustrate the COSMOS framework. It will likely be modified as the project
evolves. It is not at a stage to be used in production by other vendors.
The plan is to eventually have the model derived from the Common Model
Library (CML) which is still under development.

A lot of the work currently going on in COSMOS revolves around the CMDBf
specification. Here are some links that you may find useful:

1) Read about CMDBf here: http://cmdbf.org/
2) COSMOS commitment to CMDBf:
http://wiki.eclipse.org/images/b/b3/Cmdbf-cosmos-deliverable s-v0.07.zip
3) Providing a CMDBf Query and Registration Service:
http://wiki.eclipse.org/Providing_a_CMDBf_Query_and_Registra tion_Service
4) COSMOS Programming Model:
http://wiki.eclipse.org/COSMOS_Programming_Model

There are also many design documents, relevant links, and architectural
meeting minutes that you can find here:
http://wiki.eclipse.org/COSMOS_Architecture_Meetings
You'll need to do some data mining to make sense of what is included in the
architectural meeting minutes.

I hope that helps.
Thanks,

Ali Mehregani


"Randal Rheinheimer" <randal@lanl.gov> wrote in message
news:f29cb8a646b44a77d691574d69de63b7$1@www.eclipse.org...
>I lead a project at Los Alamos National Laboratory charged with revamping
>(replacing) our current monitoring infrastructure for HPC systems. Our
>environment is several 1000-10000 node Linux clusters, and our definition
>of monitoring is real-time alerting, system event investigation, and
>regular
> reporting of system interrupts in some detail.
>
> Our requirements documentation identifies several concepts in common with
> the COSMOS project--the importance of a system model, for instance.
> However, we're having trouble pulling out the details from current
> documentation, and the June, 2008, general release date is problematic for
> us.
>
> We're currently talking with GroundWork and Zenoss (only one of whom seems
> to be involved with COSMOS) about our extension of one of their
> infrastructures to meet our needs. Is COSMOS release 0.4 something we
> should consider as a basis for a project that needs to provide software
> used in a production HPC environment, or should we just not spend the
> time?
>
> Regardless of the answer to that question, what is the proper mechanism
> for understanding the core COSMOS principles (other than what we glean
> from the eclipse site)? For instance, is an HPC environment an eventual
> potential target, or is the focus on networks and application servers?
> There seem to be some biases (each piece of data is atomically relevant,
> with little room for higher-level correlations, for instance) in all the
> products/infrastructures we've surveyed, and I personally would like to
> understand whether the biases are real or if we just misunderstand some
> underlying concepts.
>
> Thanks,
> Rand
>
Re: COSMOS for high-performance computing [message #570698 is a reply to message #9527] Wed, 21 November 2007 16:22 Go to previous message
No real name is currently offline No real nameFriend
Messages: 404
Registered: July 2009
Senior Member
Hi Randal,

There is currently an enhancement open to integrate Nagios with the COSMOS
framework:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=188390

The purpose of the enhancement is to provide a comprehensive solution for
easily monitoring a set of hosts and services. A design document is
underway to spell out the requirements and implementation details. Assuming
the approval of the enhancement, there should be something tangible by the
end of COSMOS first release. Feel free to add yourself to the CC list of
the enhancement. Help and guidance from the community will always be
appreciated.

As for a system model, COSMOS uses a project-specific model based on the
Service Model Language (SML) to represent resources. This was developed to
illustrate the COSMOS framework. It will likely be modified as the project
evolves. It is not at a stage to be used in production by other vendors.
The plan is to eventually have the model derived from the Common Model
Library (CML) which is still under development.

A lot of the work currently going on in COSMOS revolves around the CMDBf
specification. Here are some links that you may find useful:

1) Read about CMDBf here: http://cmdbf.org/
2) COSMOS commitment to CMDBf:
http://wiki.eclipse.org/images/b/b3/Cmdbf-cosmos-deliverable s-v0.07.zip
3) Providing a CMDBf Query and Registration Service:
http://wiki.eclipse.org/Providing_a_CMDBf_Query_and_Registra tion_Service
4) COSMOS Programming Model:
http://wiki.eclipse.org/COSMOS_Programming_Model

There are also many design documents, relevant links, and architectural
meeting minutes that you can find here:
http://wiki.eclipse.org/COSMOS_Architecture_Meetings
You'll need to do some data mining to make sense of what is included in the
architectural meeting minutes.

I hope that helps.
Thanks,

Ali Mehregani


"Randal Rheinheimer" <randal@lanl.gov> wrote in message
news:f29cb8a646b44a77d691574d69de63b7$1@www.eclipse.org...
>I lead a project at Los Alamos National Laboratory charged with revamping
>(replacing) our current monitoring infrastructure for HPC systems. Our
>environment is several 1000-10000 node Linux clusters, and our definition
>of monitoring is real-time alerting, system event investigation, and
>regular
> reporting of system interrupts in some detail.
>
> Our requirements documentation identifies several concepts in common with
> the COSMOS project--the importance of a system model, for instance.
> However, we're having trouble pulling out the details from current
> documentation, and the June, 2008, general release date is problematic for
> us.
>
> We're currently talking with GroundWork and Zenoss (only one of whom seems
> to be involved with COSMOS) about our extension of one of their
> infrastructures to meet our needs. Is COSMOS release 0.4 something we
> should consider as a basis for a project that needs to provide software
> used in a production HPC environment, or should we just not spend the
> time?
>
> Regardless of the answer to that question, what is the proper mechanism
> for understanding the core COSMOS principles (other than what we glean
> from the eclipse site)? For instance, is an HPC environment an eventual
> potential target, or is the focus on networks and application servers?
> There seem to be some biases (each piece of data is atomically relevant,
> with little room for higher-level correlations, for instance) in all the
> products/infrastructures we've surveyed, and I personally would like to
> understand whether the biases are real or if we just misunderstand some
> underlying concepts.
>
> Thanks,
> Rand
>
Previous Topic:Re: [geclipse-dev] Management of Grid Resources.....
Next Topic:javax.persistance - where from?
Goto Forum:
  


Current Time: Thu Mar 28 14:46:44 GMT 2024

Powered by FUDForum. Page generated in 0.03134 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top