Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » Spatiotemporal Epidemiological Modeler (STEM) » STEM Plan for Version 2(STEM Plan for Version 2)
STEM Plan for Version 2 [message #492980] Thu, 22 October 2009 10:59 Go to next message
James Kaufman is currently offline James Kaufman
Messages: 156
Registered: July 2009
Senior Member
Let's start a thread here about an update to the website reflecting our plan for Version 2.

We might want to update the plan web page to reflect what we definitely have on track as well as a wish list of features were we would especially welcome/encourage new committers.
In particular:
Matthias Filter wrote:
> Dear Dan,
> yes you were right, the thing I was looking for was a graph editor. A possibility to edit or modify preinstalled graphs could be quite useful especially for user that are not familiar with programming or that experience technical problems when following the developer instructions. So if it would be possible to implement that without to much effort it would be for sure a big plus.
> Concerning the graph visualization tool: I can imagine that there are a lot of other tasks on the "wish" list. So maybe one could describe these higher level goals on the project plan website and by this further increase the support for the development team. I for myself have the possibility to invest some of my resources next year into this project and for planning this such information would be very helpful as well.
>
> Matthias
Re: STEM Plan for Version 2 [message #494268 is a reply to message #492980] Thu, 29 October 2009 16:54 Go to previous messageGo to next message
Werner Keil is currently offline Werner Keil
Messages: 1083
Registered: July 2009
Senior Member
There may have been something like that in the ESE poster session yesterday
evening...

The whole event was very constructive and full of interesting presentations
as well as some good meetings btw.

I mentioned STEM where the content translation was involved. And since
"vertical" healthcare tracks are one thing Oisin (committer e.g. in the WTP
Incubator) who gathers the committee for EclipseCon next year sounded
excited about STEM and other Eclipse healthcare projects could be welcome
there again. If the European patient management tool I'll present near Rome
next week also matches this topic, I am very likely to speak about it. And
this may also be a chance to meet some of you in person (as EclipseCon is in
California, so hopefully those of you in Almaden can easily attend)

More during our next call(s)

Werner
Features in Version 2 - some ideas [message #501357 is a reply to message #492980] Tue, 01 December 2009 17:51 Go to previous messageGo to next message
Matthias Filter is currently offline Matthias Filter
Messages: 73
Registered: July 2009
Member
First of all: on http://www.eclipse.org/stem/ there is a menu option "Feature Requests" that is currently linked to the newsgroup. I personally think it is quite reasonable to differentiate the new feature discussion from those in the newsgroup. Especially as I could imagine that it will be necessary to structure the discussion on each suggested feature, as there will be for sure comments and responses to each of the features requested.
Anyway to start the discussion here some ideas:

- allow nodes without GIS information to be incorporated into the simulation: Example: if I want to model the cattle population in country I can not assume that the cattle is equally distributed over the local districts. Instead I have to incorporate farms into the model, but usually the exact location of these farms is not known. What may be known is the distribution of farm size and the number of farms within each district. In general for many questions STEM is applied the exact location of a node is not that relevant as long as the geographic relationship to other nodes is supplied. Of course a visualization in the map view is then not adequate and not needed.

- sensitivity analysis of simulation parameters:
To support the decision making process in case of epidemics it might be useful to integrate a function that evaluate the influence of each model parameter on the simulation response variable. So this kind of analysis might help to identify those parameters where a sound paramater estimate is essential for a verifiable simulation.

- a quality score on model parameters and on data used within simulations should be assigned, e.g. by a traffic light principle:
Motivation: This could help to discriminate verified data or model components (e.g. that have been reviewed by several users) from components that have been more or less automatically generated. The quality scores of all the parameters used within a simulation should be part of the simulation result documentation.

- an extended map editor where one can visualize and edit all canonical graph features independently from a simulation.

- possibility to generate graph elements on runtime: E.g. add an edge to an existing graph

- in connection with the suggestion on GIS-independent nodes - the possibility to import network structures from other file formats, e.g. pajek, dot, gml etc. (see also http://netwiki.amath.unc.edu/DataFormats/Formats ).

Regards,

Matthias
Re: Features in Version 2 - some ideas [message #501595 is a reply to message #501357] Wed, 02 December 2009 18:49 Go to previous messageGo to next message
Stefan Edlund is currently offline Stefan Edlund
Messages: 127
Registered: July 2009
Location: IBM
Senior Member
Hi Matthias,

these are all very good suggestion, let me try and comment on some of them but I'm sure others have some thoughts on it as well.

As for the feature request, that's what we originally wanted to do but there really isn't any other suitable forum provided by eclipse where anybody can easily provide input, unless we write the web app ourselves Smile A wiki page was suggested but that might be too difficult to use for general audience.

I'm not sure I understand your first suggestion. If you know the distribution of farm size and the number of farms in each district, wouldn't you be able to roughly estimate the number of cattle in each district? That could then easily be incorporated into a STEM model.

We do not have a "parameter sensitivity" estimate right now, but I can see it being incorporated into the Experiment features in STEM. Experiments allow you to run many simulations varying all the values of the disease model parameters (either automatically or manually). I think it is a good suggestion. If you could give us an indication of the "importance" of such a feature to a public health person such as yourself it would be very helpful to us. It is actually possible to manually calculate the sensitivity of each parameter right now by looking at the log files generated when running experiments, but that would require some skills.


I don't know what the traffic light principle is. STEM allows you to compare the outcome of one simulation with another, and even compare simulation results with actual reference data if available. Perhaps you can explain this feature a little more?

The ability to add/remove nodes and edges from within the application has been requested many times so we need to do it for sure. One complication is that models in STEM points to graphs in the standard STEM library that cannot be modified. So one alternative is to make a copy of all the nodes/edges in the STEM library when you incorporate them into your model so you can edit them. However, the STEM libraries can be HUGE (like every region on the planet) so that might not be the best solution. A better solution is to keep a "change" log where the modifications done by the user are applied on top of the STEM library ones.

Importing other file formats into STEM is easier for some formats than other right now. We support ESRI shape files for GIS data for instance, but other formats would require more work. Is there any particular format that you encounter frequently that would be useful?

Regards,
/ Stefan
Re: STEM Plan for Version 2 [message #502400 is a reply to message #492980] Mon, 07 December 2009 20:00 Go to previous messageGo to next message
Werner Keil is currently offline Werner Keil
Messages: 1083
Registered: July 2009
Senior Member
I guess multiple or other diseases were already a topic before.

I found some very detailed information also on a country by country level
for HIV/Aids:
http://www.who.int/hiv/countries/en/

Is it possible to enhance or create a model for the spread of HIV,
especially in countries that are extremely affected?
If at least a draft (that for Flu is also more precise in some areas than
others ;-) was possible beyond STEM V1, then I'd love to propose this for
Aids Vienna 2010! Unlike MedInfo in SA I live here and would love to present
it. DemoCamp did not leave much space to mention other diseases except in
the discussions afterwards, but I also mentioned the option to the Health
Secretary for Vienna. I invited her and her sister to the DemoCamp as we
went to the same school, that may help for the conference, if other factors
work before the submission deadline in Feb...

Werner
Re: Features in Version 2 - some ideas [message #502521 is a reply to message #501595] Tue, 08 December 2009 11:46 Go to previous messageGo to next message
Matthias Filter is currently offline Matthias Filter
Messages: 73
Registered: July 2009
Member
Hi Stefan,

To Stefan's comments:



to the first point - nodes without GIS information:
Maybe the example was not the best. Of course you could work something around the issue, if you don't have the exact position of a node, but this is extra work for the user that simply wants to apply STEM to e.g. a simple edgelist. In the end this might prevent potential users from applying STEM. So e.g. in the area I'm working in, there we have the situation, that we don't want to locate the farm or the production site on a map, because this would cause problems with data privacy. So actually for me the question is whether it is much effort to open the system for that kind of data, because the simulation infrastructure, e.g. the epidemiological models and the simulation infrastructure connected with them, are as far as I understood not directly affected by the exact location of the node.
So if this (nodes without GIS) would be possible then one could also extend the STEM functionality with features from the graph theory community that can help to descibe the network used in a simulation, e.g. distribution of the indegrees or outdegrees of nodes etc.. One solution might be that one assigns manually the same coordinates to all nodes where exact locations are missing and the system then applies some jittering when it comes to displaying the network. But this is for sure just one possibility.

- parameter sensitivity issue:
your idea is very good, the Experiment features should be able to cover this. For me personally a sensitivity analysis would be a must if I would have to base a decision on a simulation (except I can verify my simulation with independent historical real world date). But to be honest I don't have to make that kind of decision here.


- traffic light principle means a simple three colour coding scheme, red is bad and green is good.

to the other things I comment later.

Matthias
Re: Features in Version 2 - some ideas [message #502584 is a reply to message #502521] Tue, 08 December 2009 17:26 Go to previous messageGo to next message
Matthias Filter is currently offline Matthias Filter
Messages: 73
Registered: July 2009
Member
An amendment to the traffic light system:
In many European countries we have traffic lights with the following rules:
red means you have to stop, yellow = be prepared to stop or to go, green = you can go.
This principle has been transfered to the food sector and has been implemented in Great Britain to make it easier for customers to judge whether a food product is nutritionally "good" or "bad" - for details see: http://www.eatwell.gov.uk/foodlabels/trafficlights/
In analogy one could generate a scheme to describe the quality of data that a scenario is based on. In connection with the sensitivity analysis discussed above one then could analyze whether highly influential model parameters have the necessary quality. E.g. if the foundation on data about air traffic between two countries is weak then this parameter would get the red label assigned. If then this very same parameter shows up in a sensitivity analysis of a scenario as very influential, then one should be careful in interpreting the results of the simulation.
So ideally this analysis would be intrinsically tied to the simulation and the simulation results will always be presented together with this information.

the add/remove nodes and edges issue: the idea to use a log-file based system sounds great to me, but I'm for sure not deep enough into the coding business to be able to judge.
What I noticed was that there is the wiki page: http://wiki.eclipse.org/Composing_a_Graph
where it seems that it was planned to provide a solution also to the last point "Defining a New Graph Edge".

Concerning the issue - file formats to import: I did not realize until now that it is possible to import ESRI shape files, can you give me a short description on how to do that within the pre- build version?

Matthias
Re: Features in Version 2 - some ideas [message #503835 is a reply to message #502584] Tue, 15 December 2009 17:56 Go to previous messageGo to next message
Stefan Edlund is currently offline Stefan Edlund
Messages: 127
Registered: July 2009
Location: IBM
Senior Member
Hi Matthias,

to clarify, we do not require the exact location of farms, people etc in STEM since we know that would be a privacy problem. For instance, all the public health data we use to evaluate our models in STEM must be de-identified and contain aggregate information only, for example how many new cases of flu were reported in ZIP code xyz a given day.

So in your example, we don't want to exact lat/lon position of the farm, rather all we'd need to know is what STEM region it is located it. Right now, the finest granularity regions we have in STEM is down to admin level 2, which typically is county. So disclosing the county a farm is in should be okay, right?

Using your concept of "traffic lights" to indicate the confidence we have in the input to the model is an interesting idea. One thing we do know about is the year we have population data for a country, so if that year is far in the past we would be less confident in those numbers. It would take some effort to implement such a feature, right now I would put a higher priority on being able to handle zoonotic diseases, multi-serotype disease models and new improved stochastic models in STEM.

Jamie knows more about the ESRI shapefile import into STEM, Jamie can you let Matthias know how we support that?

Regards,
/ Stefan
Re: Features in Version 2 - some ideas [message #504259 is a reply to message #503835] Thu, 17 December 2009 11:59 Go to previous messageGo to next message
Matthias Filter is currently offline Matthias Filter
Messages: 73
Registered: July 2009
Member
Hi Stefan,

thank's a lot for the comments and the feedback.
To the location issue:
OK, then I misinterpreted the available possibilities.
I assumed that I can create my own property files ( e.g. DEU_3_node.properties) and the corresponding SPATIAL_URI ( e.g. DEU_3_MAP.xml) specifically for my specific issues. And after recompiling I could run STEM on my own property files.

So if this is not possible, well then this would be another "nice to have".

Concerning the "traffic light" issue - well it is clear that development resources are limited and that not all wishes will come true, even though it is close to Christmas. I just want to mention, that this feature could be applied to every model you create with STEM while e.g. the possibility to model multi-serotype disease models would "only" affect a limited number of diseases. Of course if you generate models just for fundamental research then the documentation and evaluation of model assumptions might be not that critical, but if you want to base a decision on a model, you would want to know that. At least this is the feedback I get from responsible risk managers here in Germany. So for prioritization of development tasks one might pose the question - Who is the main target group for STEM? Is it the scientific community or the risk assessor or the risk manager?

By the way - a traffic light score on a certain parameter could also be generated by the community. So e.g. if several STEM users assign independently a good score / green label to a parameter (e.g. the population size in a region) this would increases my confidence in the date as well.

Regards,

Matthias
Re: Features in Version 2 - some ideas [message #506696 is a reply to message #501595] Fri, 08 January 2010 15:23 Go to previous messageGo to next message
James Kaufman is currently offline James Kaufman
Messages: 156
Registered: July 2009
Senior Member
Matthias,
I'd like to second Stefan's note about your good suggestions. Regarding submission of feature requests, the newsgroup is usually ideal because it also supports some discussion. If there is a very specific feature request or task that requires no discussion you can always submit it to Bugzilla as well but please put feature request in the title along with a description. Usually a newsgroup item is a good idea before doing that as it is possible we are working on features but have not yet documented some work on progress on the wiki.
Re: Features in Version 2 - some ideas [message #506697 is a reply to message #503835] Fri, 08 January 2010 15:26 Go to previous messageGo to next message
James Kaufman is currently offline James Kaufman
Messages: 156
Registered: July 2009
Senior Member
Matthias,
Per this weeks stem call I look forward to your paper on the traffic light idea. I think for us we need a general framework that not only represents confidence in a parameter (red, yellow, green) but a measure of how important the parameter is to the model. this is difficult as the models are all nonlinear.

The Dublin core was supposed to accomplish this but it does not address individual model parameters. It does describe source and validity of denominator data however.
Re: Features in Version 2 - some ideas [message #507210 is a reply to message #506697] Tue, 12 January 2010 04:21 Go to previous message
Matthias Filter is currently offline Matthias Filter
Messages: 73
Registered: July 2009
Member
Hi Jamie,

here some references on the subject of sensitivity analysis:

first some documents that are quite specific in the calculation and that might serve as a starting point for STEM:
- http://www.adb.org/documents/handbooks/water_supply_projects /Chap7-r6.PDF
- http://www.adb.org/Documents/Handbooks/Health_Sector_Project s/chap_07.pdf
- http://www.treeplan.com/download/SensItGuide144.pdf

Here two more general documents:
- "Uncertainty and data quality in exposure assessment. Part 1: Guidance document on characterizing and communicating uncertainty in exposure assessment. Part 2: Hallmarks of data quality in chemical exposure assessment":
http://www.who.int/ipcs/methods/harmonization/areas/uncertai nty%20.pdf

- Recommended Practice Regarding Selection of Sensitivity Analysis Methods Applied to Microbial Food Safety Process Risk Models":
http://www.informaworld.com/smpp/content~content=a713989286~ db=all

Concerning the traffic light scoring scheme - I will post the relevant documents from the mentioned in-house research project as soon as it is published.

Matthias
Re: Features in Version 2 - some ideas [message #561892 is a reply to message #502584] Tue, 15 December 2009 17:56 Go to previous message
Stefan Edlund is currently offline Stefan Edlund
Messages: 127
Registered: July 2009
Location: IBM
Senior Member
Hi Matthias,

to clarify, we do not require the exact location of farms, people etc in STEM since we know that would be a privacy problem. For instance, all the public health data we use to evaluate our models in STEM must be de-identified and contain aggregate information only, for example how many new cases of flu were reported in ZIP code xyz a given day.

So in your example, we don't want to exact lat/lon position of the farm, rather all we'd need to know is what STEM region it is located it. Right now, the finest granularity regions we have in STEM is down to admin level 2, which typically is county. So disclosing the county a farm is in should be okay, right?

Using your concept of "traffic lights" to indicate the confidence we have in the input to the model is an interesting idea. One thing we do know about is the year we have population data for a country, so if that year is far in the past we would be less confident in those numbers. It would take some effort to implement such a feature, right now I would put a higher priority on being able to handle zoonotic diseases, multi-serotype disease models and new improved stochastic models in STEM.

Jamie knows more about the ESRI shapefile import into STEM, Jamie can you let Matthias know how we support that?

Regards,
/ Stefan
Re: Features in Version 2 - some ideas [message #561911 is a reply to message #561892] Thu, 17 December 2009 11:59 Go to previous message
Matthias Filter is currently offline Matthias Filter
Messages: 73
Registered: July 2009
Member
Hi Stefan,

thank's a lot for the comments and the feedback.
To the location issue:
OK, then I misinterpreted the available possibilities.
I assumed that I can create my own property files ( e.g. DEU_3_node.properties) and the corresponding SPATIAL_URI ( e.g. DEU_3_MAP.xml) specifically for my specific issues. And after recompiling I could run STEM on my own property files.

So if this is not possible, well then this would be another "nice to have".

Concerning the "traffic light" issue - well it is clear that development resources are limited and that not all wishes will come true, even though it is close to Christmas. I just want to mention, that this feature could be applied to every model you create with STEM while e.g. the possibility to model multi-serotype disease models would "only" affect a limited number of diseases. Of course if you generate models just for fundamental research then the documentation and evaluation of model assumptions might be not that critical, but if you want to base a decision on a model, you would want to know that. At least this is the feedback I get from responsible risk managers here in Germany. So for prioritization of development tasks one might pose the question - Who is the main target group for STEM? Is it the scientific community or the risk assessor or the risk manager?

By the way - a traffic light score on a certain parameter could also be generated by the community. So e.g. if several STEM users assign independently a good score / green label to a parameter (e.g. the population size in a region) this would increases my confidence in the date as well.

Regards,

Matthias
Re: Features in Version 2 - some ideas [message #562016 is a reply to message #561892] Fri, 08 January 2010 15:26 Go to previous message
James Kaufman is currently offline James Kaufman
Messages: 156
Registered: July 2009
Senior Member
Matthias,
Per this weeks stem call I look forward to your paper on the traffic light idea. I think for us we need a general framework that not only represents confidence in a parameter (red, yellow, green) but a measure of how important the parameter is to the model. this is difficult as the models are all nonlinear.

The Dublin core was supposed to accomplish this but it does not address individual model parameters. It does describe source and validity of denominator data however.
Re: Features in Version 2 - some ideas [message #562059 is a reply to message #562016] Tue, 12 January 2010 04:21 Go to previous message
Matthias Filter is currently offline Matthias Filter
Messages: 73
Registered: July 2009
Member
Hi Jamie,

here some references on the subject of sensitivity analysis:

first some documents that are quite specific in the calculation and that might serve as a starting point for STEM:
- http://www.adb.org/documents/handbooks/water_supply_projects /Chap7-r6.PDF
- http://www.adb.org/Documents/Handbooks/Health_Sector_Project s/chap_07.pdf
- http://www.treeplan.com/download/SensItGuide144.pdf

Here two more general documents:
- "Uncertainty and data quality in exposure assessment. Part 1: Guidance document on characterizing and communicating uncertainty in exposure assessment. Part 2: Hallmarks of data quality in chemical exposure assessment":
http://www.who.int/ipcs/methods/harmonization/areas/uncertai nty%20.pdf

- Recommended Practice Regarding Selection of Sensitivity Analysis Methods Applied to Microbial Food Safety Process Risk Models":
http://www.informaworld.com/smpp/content~content=a713989286~ db=all

Concerning the traffic light scoring scheme - I will post the relevant documents from the mentioned in-house research project as soon as it is published.

Matthias
Previous Topic:STEM Talk from Epidemics 2009 is Online
Next Topic:Libraries
Goto Forum:
  


Current Time: Wed Jul 23 08:01:25 EDT 2014

Powered by FUDForum. Page generated in 0.03719 seconds