Eclipse PTP: Supporting Software Engineering for Computational Science

Computational science through modeling and simulation is now a well established field that aims to solve complex problems using advanced computing capabilities. The last few years have seen the computational power of high performance computing (HPC) systems reach petascale capacity (10¹⁵ floating point operations per second), and there is already a concerted effort to push computing into the exascale era. Projects such as the Department of Energy’s Exascale Computing Project (ECP) aim to develop productive exascale computing by 2023. Developing modeling and simulation applications for these machines is an ever increasing challenge, and exemplary software engineering tools, such as Eclipse, are going to be critical to achieving the potential that computational science has to offer.

Parallel programming is the primary programming model used for application development in HPC, however there are few integrated development environments that target this area. The Eclipse Parallel Tools Platform (PTP) project was established in 2004 as an entry point for enabling the development of parallel applications using Eclipse. Since then, the PTP project has been through a number of iterations which have culminated in a range of core features that are integral to supporting Eclipse-based software engineering for HPC. These include:

Target System Configurations
Synchronized Projects
Support for Parallel Programming Models
Parallel Debugging
Remote System Framework

The PTP project also provides an “Eclipse for Parallel Application Developers” package that incorporates all these features and that can be downloaded from the main Eclipse download page.

Application development for HPC systems is unlike most software development practices. The most significant difference is that the limited nature of computational resources means that they are typically remotely located from the developer’s desktop. While many software engineers are familiar with developing applications that are remote from the user (such as a web browser), it is rare that the developer themselves are remotely located from the resources required to build and run (or at least test) their application codes. This presents a number of challenges that are not addressed by other Eclipse projects, such as the C/C++ Development Tools (CDT) project, but whose features are very important for the development of applications for HPC systems. To solve the remote location issues, PTP provides a target system configuration framework for remote job launching and monitoring, and synchronized projects to enable a remote build capability. Both these features utilize services provided the remote system framework. PTP also provides other features that help developers create applications for HPC systems, such as support for parallel programming models, as well as parallel debugging and integration with performance tuning tools.

Target System Configuration

The Target System Configuration framework supports running, debugging, and monitoring of jobs on remote systems. HPC centers typically employ a job scheduler to control access to limited computational resources. User’s must submit their jobs to the scheduler, which then determines when (and where) the job will be run, usually when sufficient resources become available. There are many different types of job schedulers, each with their own user interfaces. PTP is able to provide a generic interface that can be used to interact with any type of job scheduler.

Synchronized Projects

In addition to launching and monitoring jobs, developers also need to be able to work on their source code from within Eclipse. Most Eclipse projects assume that the source code is located on the same machine that runs Eclipse. This is a problem for HPC, since these systems typically have a complicated environment (compilers, libraries, programming models, etc.) which must be available when the application is built. CDT provides some support for cross-compiling, but this is not adequate for HPC. Instead, the application must usually be built on a system that is specifically allocated for this purpose. Further, many HPC centers provide separate systems for building and running applications, which further complicates the development environment. To overcome these difficulties, PTP provides a new project type called a synchronized project. This looks like a regular project to Eclipse, as the source files are located locally, however they are also automatically mirrored onto a remote system. When the application needs to be built, the appropriate commands are sent to the remote system rather than running locally, allowing the build to happen in the expected environment. Similarly, when the application is to be run, commands are sent to the remote machine to submit a job to the native job scheduler.

Syncronized Project

Parallel Programming Model Support

In order to obtain the incredible performance required to solve some of the world’s most challenging problems, HPC systems employ a range of programming models that are not normally seen in more conventional software engineering practices. Most developers would be familiar with parallelism in the form of threads, and games developers are probably also familiar with developing code for running on graphical processing units (GPUs). However, HPC has introduced a wide variety of programming models in order to try to eek the maximum performance out of the hardware. The most common models are a combination of the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, but there are many variations of these, as well as different approaches to working with GPUs and hardware acceleration technology. PTP attempts to simplify the development of applications by providing tools for the most common models (MPI and OpenMP), as well as some less well known models, such as OpenSHMEM.

Menu Options

Parallel Debugging

Debugging applications for HPC systems is an incredible challenge. Not only are the systems and applications incredibly large and complex, the use of parallelism combined with a variety of different programming models means that it can be very difficult to even identify where a problem is occurring, let alone try to isolate the cause. There are a number of commercial debuggers that attempt to solve some of the simpler problems with debugging HPC applications, however even these have significant limitations. Eclipse users have access to two debuggers that can help: the CDT debugger is useful for multi-threaded applications and has recently had improvements added to support large numbers of threads. PTP provides a simple parallel debugger that builds on many of the CDT debugger features by adding a parallel programming paradigm. Using PTP, developers can debug MPI and OpenMP programs on a variety of different platforms.

Remote System Framework

Underpinning much of the remote features provided by PTP is a generic remote system framework. This framework allows Eclipse-based plugins to access remote systems using a variety of protocols, such as ssh and telnet, and is general enough to allow support for other remote technologies to be easily added. Unlike the Remote System Explorer (RSE) project (part of the Target Management project), PTP’s remote system framework is primarily driven through a set of application programming interfaces (APIs). This allows downstream plugins to easily provide access to remote system, and isolates all the remote-specific code into a small number of common plugins.

New Connection

Future Work

Although PTP is now a mature project, there are two exciting areas of new development that are being undertaken: a proxy-based remote protocol and a new parallel debugger.

Proxy-based Remote Protocol

This is a simple protocol that will run over virtually any type of connection to provide remote services such as process startup, process control, and file access. The protocol is also extensible so that additional services can be added for a specific implementation. Although conceptually similar to the Target Configuration Framework (TCF) protocol, the remote protocol offers a number of advantages. First, it is simple enough that the remote proxy service can be written in a variety of languages (currently a Java version is provided), which allows a zero installation approach. The first time a connection is established, the proxy will be automatically copied to the remote target where it is started. This alleviates any need for the user or system administrator to install software on the remote target. The second advantage is that the protocol employs a multiplexed channel mechanism, so multiple virtual connections can be established over a single physical connection. This enables, for example, a single SSH connection to be used for all the remote access requirements of an Eclipse development session, and will greatly improve the ability of Eclipse to be used in many complex HPC environments.

Parallel Debugger

PTP’s existing parallel debugger was originally developed when the largest HPC system comprised 1000 cores. With current systems exceeding 1 million cores, the debugger’s capabilities are now somewhat limited. Work is underway to create a debugging framework that will not only scale to the current system and application sizes, but to exascale and beyond. This exciting development will combine the capabilities of a range of existing debugging technologies with new techniques for debugging at large scale. Eclipse will be used to provide a user interface for the new debugger, as well as provide an extensive visualization capability.

Trying It Out

The fastest way to get started with PTP is to download the Eclipse for Parallel Application Developers from the main Eclipse download site. This package includes all the components discussed in this article. Information on creating a synchronized project, running and monitoring jobs on HPC systems, and using the parallel debugger can be found in the integrated help, or using the online documentation.

About the Authors

Greg Watson
Oak Ridge National Laboratory

Breadcrumbs