[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Anyone know why the CVS commit lists don't list WHO did the commit?
Even just their user name on the server? I'd like to know these things! :)
Just committed a pile of code that lets you switch your monitoring
system from the simulated one over to the ORTE/OMPI one. In fact, in
the preferences now when you select one and 'OK' it the ModelManager
gets told to change over its universe / model that it has cached. The
UI refreshes, etc.
There's a tiny bug right now where if you go FROM the simulation TO the
ORTE it bombs out. This is because the simulation is simulating running
processes / jobs and the threads aren't told to shutdown cleanly. I'm
going to fix this (by removing those Threads as Greg and I discussed
today) so in the meantime just be sure to go from ORTE TO Simulated (if
you feel like testing / playing).
I still need to resolve the bit where the preferences page screws up the
first time you load it up (with no default preferences). I'll be
working on that and passing down the runtime arguments to the runtime
system layer (JNI layer) over the next few days.
FYI for those that weren't involved: We've decided to drop the term
'runtime environment' or 'runtime system' generically and replace it
with two systems: a monitoring system / state of health monitor system
paired with a control system. The control system is responsible for
starting jobs, stopping jobs, etc. The SoH monitoring system (or just
monitoring system) is responsible for determining the status of the
universe as you see it: what machines are there? how many nodes on each
machine? what jobs are running? who owns them? etc, etc.
These two components were previously linked into a single 'runtime
system' but now are being broken out. This allows us to set the
monitoring system through preferences and refresh / populate the runtime
model without requiring the user to, at that time, choose how they want
to actually control jobs. While systems like OMPI will have both the
control and monitoring system interlinked (in time), a system like MPICH
might require MPICH for the job control but another system for
monitoring (like perhaps Supermon or Ganglia (sp)). Things to think about.
Just wanted to drop an update.
--
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@xxxxxxxx
---------------------------------------------------------------------