STEM, The Spatiotemporal Epidemiological Modeler
This document is a proposal to create a new Eclipse Project under the Eclipse Technology Project called the Spatiotemporal Epidemiological Modeler or STEM. STEM is currently a component of the Open Healthcare Framework (OHF) project, which is also under the Eclipse Technology Project.
This proposal is presented in accordance with the Eclipse Development Process and is written to declare the project's intent and scope as well as to solicit additional participation and feedback from the Eclipse community. You are invited to join the project and to provide feedback using the http://www.eclipse.org/newsportal/thread.php?group=eclipse.technology.ohf newsgroup.
Background
In the past year (2008), the OHF has seen all of its components,
other than STEM, migrate either to other parts of Eclipse
(e.g.,
STEM began life as a platform for the collaborative development of mathematical models that characterize the spread of infectious diseases in both time and space. STEM was originally developed by IBM Research at its Almaden Research Center to be part of IBM's Global Pandemic Initiative (GPI), a working group of global healthcare "players" (WHO, UN, etc) that IBM formed to help plan for, and combat, the threat of global pandemic influenza. As part of its contribution to the GPI, IBM donated the source code for STEM to the Eclipse foundation in May 2007.
STEM enables epidemiologists and other researchers to develop disease models quickly and collaboratively. STEM includes the basic data sets that define the political geography, demographic data and transportation infrastructure for the entire planet, saving the need for modelers to collect this data on their own. It also includes configurable "text book" disease models they can use immediately and extensive editors and wizards that ease model creation. STEM includes built-in views to visualize the geographic spread of diseases as well as an interface to Google Earth. Each disease model is composed of a set of interchangeable components that supply different aspects of the model, these include data sets, as well as mathematics. These components can be created by different researchers and easily shared, thereby fostering cooperation and collaboration. As a diease modeling system, STEM has an active and growing community.
Technical Scope
At its core, STEM is a framework for composing arbitrary graphs (nodes, edges, labels) from different "parts" and then managing computations that use the graph as both a source of data and as place to record state information. One of the main innovations provided by STEM is that it allows the graph used during a simulation to be composed from different parts that represent different aspects of the eventual simulation. For instance, sets of labeled nodes that represent geographic locations can be combined with sets of labels that provide population data for those edges for a particular time period (e.g., 1918). Similarly, different sets of edges can be added to the graph to incorporate different kinds of relationships, such as transportation infrastructure or simple physical relationships such as sharing a common border. Computation is added to the mix through a similar well defined interface to the graph. These different parts can be aggregated and saved for reuse in multiple different models. They can also be exported and distributed to other users. It is this aggregation and reuse that promotes collaboration as different components can be created by different parties and easily shared.
Having such a general framework enables a variety of other kinds of applications, not all of which are simulations. It is possible, for instance, to run STEM in "real-time" where it uses "wall-clock" time when manipulating the state of the underlying graph and have it access external data sources as part of that process. Integrating real-time weather information or other real-time environmental data into a model in STEM is an example. This ability allows STEM to be applied to decision support applications that require the integration of "situational awareness" and analytics; examples would be disaster planning and response, securities trading and risk management and logistical planning. The integration of external data sources through SOA and RSS feeds is a future step being considered for the project.
The disease modeling framework, built upon the core, has well established functionality, but is deliberately designed to be extensible and has an unlimited capacity to absorb new mathematics and other aspects of disease models. For the project, however, it aims to provide a refined, but limited, set of built-in "text book" disease model mathematics as well as another set of advanced experimental models that result from project member's own research.
The incorporation of real-time data sources into STEM is an area for future development. The scope and breadth of which is uncertain and likely dependent on the particular application domains used as examples.
There are two aspects to STEM, the core for developing simulation frameworks, and actual simulation frameworks. The mandates for developing both of these aspects tend to define and govern their growth. The core for simulation frameworks is somewhat organically constrained as features are only added to it to support the needs of actual simulation frameworks such as disease modeling. The base comprises some nine EMF Ecore models that are used to generate a significant portion of the core code; no new models are anticipated at this time. The remainder of the work on the core is to polish and refine aspects of the core exposed to users such as model editors, wizards and other parts of the GUI.
STEM is more than a disease modeling system, however, the same attributes that make a good collaborative system for disease modeling are the same ones that facilitate other kinds of model development. To this end, STEM was designed and implemented from its very inception to be a more versatile platform and framework, with disease modeling being a very complete example "application."
Organization
Mentors
- Ed Merks
- Chris Aniszczyk
Committers
STEM currently has four existing and active Eclipse Committers.-
Daniel Ford (daford att almaden.ibm.com)
Daniel was the initial Eclipse Committer for STEM. His contributions to the system include the initial concept of a composable graph framework and the general architecture and organization of STEM. He also designed the UML models that underpin STEM's implementation and is responsible for their implementation using the Eclipse Modeling Framework (EMF). Daniel wrote the initial versions of most of the components that constituted the original STEM contribution, and continues to maintain a significant number of them today. Daniel created the initial CQ for STEM's source code and a second one for STEM's data sets. He worked closing with Barb Cochrane on the Eclipse IP process to quickly "clear" the original STEM source code contribution. He also worked on the initial part of the (much) longer IP processing of the STEM data sets (later passing that responsibility to James Kaufman). Daniel received his Ph.D. in Computer Science from the University of Waterloo and is now a Research Staff Member (RSM) at the IBM Almaden Research Center in San Jose, CA.
James Kaufman (kaufman att almaden.ibm.com)
James founded the STEM project with Daniel Ford and was the project's second Eclipse committer. James' has a wide range of contributions to the OHF and the STEM project under Eclipse. He initiated the formation of the OHF and as the IBM manager of a number of internal IBM Research Healthcare related projects, pursued the legal and organizational challenges that lead to their donation to Eclipse to form the initial OHF code base. Later, James followed the same path with STEM and moved it from an internal IBM Research project to an open source project under Eclipse. James is also an active and critical contributor to the STEM code base with primary responsibility in the development and implementation of mathematical models for the characterization of disease propagation and the development and implementation of mathematical tools and for epidemiological data analysis in STEM. James also worked closely with the IBM and Eclipse legal teams to "clear" the STEM contribution to Eclipse. James received his Ph.D. in Physics from UCSB and is a Manager and Research Staff Member at the IBM Almaden Research Center in San Jose, CA.
Stefan Edlund (sedlund att us.ibm.com)
Stefan Edlund is an Eclipse Committer has contributed to STEM since August 2008. Stefan has been working on the logging component in STEM, dramatically improving its performance. Stefan has also been contributing to the analytics perspective as well as the mathematics for STEM disease models. Stefan Edlund is a Senior Software Engineer at the IBM Almaden Research Center in San Jose, CA.
Yossi Mesika (mesika att il.ibm.com)
Yossi is an Eclipse Committer who contributed several new features and improvements to the STEM project. Yossi worked on the graphical rendering of the geographical maps within STEM and added some useful features like presenting graph edges and the use of color providers. Yossi also used performance testing tools for finding memory leaks and improving the overall performance of STEM. Yossi is also the release engineer of STEM and responsible for the automatic process of generating weekly builds and publishing those in the Web site. Yossi had also made major contributions to the Eclipse Open Health Framework (OHF). Yossi is a Research Staff Member at the IBM Haifa Research Labs in Israel.
Collaborations
The STEM project is working on the development of its community. The project is doing well in developing a core set of Committers and Users (listed below). The project is working hard on developing relationships with government, industry and academia. In academia, the project is nurturing graduate students in both Epidemiology and Operations Research to create a natural class of Adopters to join the project.
-
USAF: STEM currently has multi-year funding from the United States Air Force for the general development of the framework as well as for research into specialized analytics for "reverse engineering" disease model configurations from incident data.
University of Vermont: STEM has a successful ongoing collaboration with researchers Dr. Charles Hulse (Charles.Hulse at uvm.edu) and Joanna Conant (Joanna.Conant at uvm.edu) at the University of Vermont. This collaboration has resulted in one paper with more likely to come in the future.
Kaufman, J., Connance, J., Ford, D.A., Kirhata, W., Jones, B.A., Douglas, J.V., "Assessing the Accuracy of Spatiotemporal Epidemiological Models," BioSecure 2008, Raleigh, North Carolina, Dec. 2, 2008.
-
Johns Hopkins: The STEM project also collaborates with the Johns Hopkins Bloomberg School of Public Health, where we work with Epidemiology Ph.D., Graduate Student Justin Lesler on the validation of the mathematics of our disease models. (Full Disclosure: IBM pays Justin Lessler for this work).
MIT: STEM has an ongoing collaboration with Professor Dick Larson, in the Operations Research Department at MIT for the development of advanced disease models.
Middle East Consortium for Infectious Disease Surveillance (MECIDS): An organization supported by the Nuclear Threat Initiative (NTI) and Search for Common Ground, which includes the public health departments of Israeli, Jordan, and Palestinian. As well as Tel Aviv University and Al Quds University .
University of Pittsburgh Graduate School of Public Health and School of Medicine, Prof. Burce Lee
Tentative Plan
2009-06 V0.4: Base release for Disease modeling 2009-12 V0.5: Non-disease modeling example application 2010-06 V0.6: Integrated situational awareness and analytics