STEM, The Spatiotemporal Epidemiological Modeler

The project has been created.

This document is a proposal to create a new Eclipse Project under the Eclipse Technology Project called the Spatiotemporal Epidemiological Modeler or STEM. STEM is currently a component of the Open Healthcare Framework (OHF) project, which is also under the Eclipse Technology Project.

This proposal is presented in accordance with the Eclipse Development Process and is written to declare the project's intent and scope as well as to solicit additional participation and feedback from the Eclipse community. You are invited to join the project and to provide feedback using the http://www.eclipse.org/newsportal/thread.php?group=eclipse.technology.ohf newsgroup.

Background

In the past year (2008), the OHF has seen all of its components, other than STEM, migrate either to other parts of Eclipse (e.g., SODA) or to the external Open Healthcare Tools (OHT ) organization. This has left STEM as the only active component in the OHF. . The consensus of the four STEM committers is that they are very happy with Eclipse and how it is managed and wish the project to remain with the Foundation. However, given that the rest of the OHF project has moved, it makes sense to move STEM out of the OHF and place it directly under the Eclipse Technology Project. The STEM project members are looking forward to being a project separate from the OHF to better establish their "brand" and facilitate communications with users, contributors and adopters. One of the challenges of being a component of the OHF is that STEM was mixed in with many different, completely unrelated, projects; this included sharing a project web page, newsgroup, developer mailing list and project metadata. This sharing tended to dilute STEM's effectiveness in communicating with and developing its community. For instance, the OHF project metadata is out-of-date and has been for some time; the STEM project has been unable to correct that issue.

STEM began life as a platform for the collaborative development of mathematical models that characterize the spread of infectious diseases in both time and space. STEM was originally developed by IBM Research at its Almaden Research Center to be part of IBM's Global Pandemic Initiative (GPI), a working group of global healthcare "players" (WHO, UN, etc) that IBM formed to help plan for, and combat, the threat of global pandemic influenza. As part of its contribution to the GPI, IBM donated the source code for STEM to the Eclipse foundation in May 2007.

STEM enables epidemiologists and other researchers to develop disease models quickly and collaboratively. STEM includes the basic data sets that define the political geography, demographic data and transportation infrastructure for the entire planet, saving the need for modelers to collect this data on their own. It also includes configurable "text book" disease models they can use immediately and extensive editors and wizards that ease model creation. STEM includes built-in views to visualize the geographic spread of diseases as well as an interface to Google Earth. Each disease model is composed of a set of interchangeable components that supply different aspects of the model, these include data sets, as well as mathematics. These components can be created by different researchers and easily shared, thereby fostering cooperation and collaboration. As a diease modeling system, STEM has an active and growing community.

Technical Scope

At its core, STEM is a framework for composing arbitrary graphs (nodes, edges, labels) from different "parts" and then managing computations that use the graph as both a source of data and as place to record state information. One of the main innovations provided by STEM is that it allows the graph used during a simulation to be composed from different parts that represent different aspects of the eventual simulation. For instance, sets of labeled nodes that represent geographic locations can be combined with sets of labels that provide population data for those edges for a particular time period (e.g., 1918). Similarly, different sets of edges can be added to the graph to incorporate different kinds of relationships, such as transportation infrastructure or simple physical relationships such as sharing a common border. Computation is added to the mix through a similar well defined interface to the graph. These different parts can be aggregated and saved for reuse in multiple different models. They can also be exported and distributed to other users. It is this aggregation and reuse that promotes collaboration as different components can be created by different parties and easily shared.

Having such a general framework enables a variety of other kinds of applications, not all of which are simulations. It is possible, for instance, to run STEM in "real-time" where it uses "wall-clock" time when manipulating the state of the underlying graph and have it access external data sources as part of that process. Integrating real-time weather information or other real-time environmental data into a model in STEM is an example. This ability allows STEM to be applied to decision support applications that require the integration of "situational awareness" and analytics; examples would be disaster planning and response, securities trading and risk management and logistical planning. The integration of external data sources through SOA and RSS feeds is a future step being considered for the project.

The disease modeling framework, built upon the core, has well established functionality, but is deliberately designed to be extensible and has an unlimited capacity to absorb new mathematics and other aspects of disease models. For the project, however, it aims to provide a refined, but limited, set of built-in "text book" disease model mathematics as well as another set of advanced experimental models that result from project member's own research.

The incorporation of real-time data sources into STEM is an area for future development. The scope and breadth of which is uncertain and likely dependent on the particular application domains used as examples.

There are two aspects to STEM, the core for developing simulation frameworks, and actual simulation frameworks. The mandates for developing both of these aspects tend to define and govern their growth. The core for simulation frameworks is somewhat organically constrained as features are only added to it to support the needs of actual simulation frameworks such as disease modeling. The base comprises some nine EMF Ecore models that are used to generate a significant portion of the core code; no new models are anticipated at this time. The remainder of the work on the core is to polish and refine aspects of the core exposed to users such as model editors, wizards and other parts of the GUI.

STEM is more than a disease modeling system, however, the same attributes that make a good collaborative system for disease modeling are the same ones that facilitate other kinds of model development. To this end, STEM was designed and implemented from its very inception to be a more versatile platform and framework, with disease modeling being a very complete example "application."

Organization

Mentors

  • Ed Merks
  • Chris Aniszczyk

Committers

STEM currently has four existing and active Eclipse Committers.
  • Daniel Ford (daford att almaden.ibm.com)

    Daniel was the initial Eclipse Committer for STEM. His contributions to the system include the initial concept of a composable graph framework and the general architecture and organization of STEM. He also designed the UML models that underpin STEM's implementation and is responsible for their implementation using the Eclipse Modeling Framework (EMF). Daniel wrote the initial versions of most of the components that constituted the original STEM contribution, and continues to maintain a significant number of them today. Daniel created the initial CQ for STEM's source code and a second one for STEM's data sets. He worked closing with Barb Cochrane on the Eclipse IP process to quickly "clear" the original STEM source code contribution. He also worked on the initial part of the (much) longer IP processing of the STEM data sets (later passing that responsibility to James Kaufman). Daniel received his Ph.D. in Computer Science from the University of Waterloo and is now a Research Staff Member (RSM) at the IBM Almaden Research Center in San Jose, CA.

  • James Kaufman (kaufman att almaden.ibm.com)

    James founded the STEM project with Daniel Ford and was the project's second Eclipse committer. James' has a wide range of contributions to the OHF and the STEM project under Eclipse. He initiated the formation of the OHF and as the IBM manager of a number of internal IBM Research Healthcare related projects, pursued the legal and organizational challenges that lead to their donation to Eclipse to form the initial OHF code base. Later, James followed the same path with STEM and moved it from an internal IBM Research project to an open source project under Eclipse. James is also an active and critical contributor to the STEM code base with primary responsibility in the development and implementation of mathematical models for the characterization of disease propagation and the development and implementation of mathematical tools and for epidemiological data analysis in STEM. James also worked closely with the IBM and Eclipse legal teams to "clear" the STEM contribution to Eclipse. James received his Ph.D. in Physics from UCSB and is a Manager and Research Staff Member at the IBM Almaden Research Center in San Jose, CA.

  • Stefan Edlund (sedlund att us.ibm.com)

    Stefan Edlund is an Eclipse Committer has contributed to STEM since August 2008. Stefan has been working on the logging component in STEM, dramatically improving its performance. Stefan has also been contributing to the analytics perspective as well as the mathematics for STEM disease models. Stefan Edlund is a Senior Software Engineer at the IBM Almaden Research Center in San Jose, CA.

  • Yossi Mesika (mesika att il.ibm.com)

    Yossi is an Eclipse Committer who contributed several new features and improvements to the STEM project. Yossi worked on the graphical rendering of the geographical maps within STEM and added some useful features like presenting graph edges and the use of color providers. Yossi also used performance testing tools for finding memory leaks and improving the overall performance of STEM. Yossi is also the release engineer of STEM and responsible for the automatic process of generating weekly builds and publishing those in the Web site. Yossi had also made major contributions to the Eclipse Open Health Framework (OHF). Yossi is a Research Staff Member at the IBM Haifa Research Labs in Israel.

Collaborations

The STEM project is working on the development of its community. The project is doing well in developing a core set of Committers and Users (listed below). The project is working hard on developing relationships with government, industry and academia. In academia, the project is nurturing graduate students in both Epidemiology and Operations Research to create a natural class of Adopters to join the project.

Tentative Plan

2009-06 V0.4: Base release for Disease modeling 2009-12 V0.5: Non-disease modeling example application 2010-06 V0.6: Integrated situational awareness and analytics