Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [stem-dev] stem-dev Digest, Vol 117, Issue 1

Hi all,

I forgot to respond to this to say what the solution was, but thought I should as it might be helpful for others in future.

As Jamie suggested, I wrote a loop in R that read in a single file from a single run (e.g I_0.csv from one run of the simulation), summarised the data from that file, added this to a data frame which would contain all the summarised data, deleted the original file and then moved on to reading the next I_0.csv file from the next run of the simulation. 

If anyone uses R and needs to summarise many runs of a stochastic model then I am happy to share my code with you.

Thanks for your suggestions and help Jamie.

Best wishes,

Emily



On 8 Jan 2019, at 20:59, James Kaufman <jhkauf@xxxxxxxxxx> wrote:

Hi Emily,
Thank you for this suggestion. We do have instructions on the wiki for users that might want to create their own custom logger. The url is:
https://wiki.eclipse.org/STEM_Create_EMF_Project
The example is for a JSON logger but you could do a new delimited file logger by a similar set of steps.

Some things to consider. For very long and/or large runs, the format you suggest below would create much much larger log files because the strings used to represent the node uid's would be repeated for each node and each time step.

imho the simplest thing you might try first (simpler than creating a new logger) is to write a small script in either R or Python that reads the current STEM log format and exports it to the new format you need for your other software. This could be very simple code and you could test it with a test run of your STEM scenario using a small graph and just a few time steps. You might try using a Pandas data frame if you chose to go with Python.







From:        stem-dev-request@xxxxxxxxxxx
To:        stem-dev@xxxxxxxxxxx
Date:        01/07/2019 09:00 AM
Subject:        stem-dev Digest, Vol 117, Issue 1
Sent by:        stem-dev-bounces@xxxxxxxxxxx




Send stem-dev mailing list submissions to
                stem-dev@xxxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit
               
https://www.eclipse.org/mailman/listinfo/stem-dev
or, via email, send a message with subject or body 'help' to
                stem-dev-request@xxxxxxxxxxx

You can reach the person managing the list at
                stem-dev-owner@xxxxxxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of stem-dev digest..."


Today's Topics:

  1. Re: stem-dev Digest, Vol 116, Issue 6 (Emily Nixon)


----------------------------------------------------------------------

Message: 1
Date: Mon, 7 Jan 2019 14:47:53 +0000
From: Emily Nixon <emily.nixon@xxxxxxxxxxxxx>
To: STEM developer mailing list <stem-dev@xxxxxxxxxxx>
Subject: Re: [stem-dev] stem-dev Digest, Vol 116, Issue 6
Message-ID:
                <VI1PR0601MB235277FAF5ECF729960985FEBF890@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
               
Content-Type: text/plain; charset="us-ascii"

Hi Jamie,


That makes sense, I can definitely see why you don't produce summary data anymore.


I do often choose to change the log interval so that the recorded simulations files are smaller, however, for what I am trying to do (calculating how many farms have an incidence >1 per iteration), I need to have each iteration recorded.


What about my other suggestion, to change the format of how the results are recorded by STEM in the csv files in the recorded simulations folder? Would it be difficult to change it so that instead of having nodes as individual columns, that there just be one column which specifies the node? This would greatly reduce the number of columns and increase the rows instead which would enable me to read even very large files into the software I am trying to use to analyse the data.


So instead of the current format:


Iteration time                stem://org.eclipse.stem/node/1        stem://org.eclipse.stem/node/2

0                2018-02-01  1                                                                 1


We would have:


Iteration time             nodeID                                               Incidence

0              2018-02-01  stem://org.eclipse.stem/node/1  1

0              2018-02-01  stem://org.eclipse.stem/node/2  1


On much smaller files, of course it is possible to modify the output myself in order to have this format. However, due to the sheer number of nodes I am using, and therefore the sheer number of columns, I am not able to use my database management systems to store the data first before I modify it into this format.


I think this format would be useful to others too, as it is more compatible with a number of data analysing softwares/languages.


How difficult do you think it would be to change the format STEM records the data to what I've suggested above?


Best wishes,


Emily


Emily Nixon
PhD Student

Demonstrator


School of Biological Sciences
University of Bristol
Bristol Life Sciences Building
24 Tyndall Avenue
Bristol
BS8 1TQ
Tel +44 (0)117 394 1389
________________________________
From: stem-dev-bounces@xxxxxxxxxxx <stem-dev-bounces@xxxxxxxxxxx> on behalf of James Kaufman <jhkauf@xxxxxxxxxx>
Sent: 20 December 2018 17:44:05
To: stem-dev@xxxxxxxxxxx
Subject: Re: [stem-dev] stem-dev Digest, Vol 116, Issue 6

Hi Emily,
The current logger allows you to select which compartments to log but it does not do integration over the spacial nodes. STEM used to have this feature (it was called an integrating disease model) but it was removed for  few reasons.
In order to integrated (eg) level 3 nodes up to level 0 (country) nodes it is necessary to form a graph with all hierarchical admin levels and connect them by containment edges. The containment edges still exist in STEM so this feature could be reimplemented but it has several issues.
1) It adds a lot more nodes and edges to your graph
2) it adds a lot of computational overhead
3) You need to rethink decoration of the graphs itself. You don't want to propagate the disease at all levels - just aggregate above the lowest node. But what if the graph is not uniform in depth. Computing this adds to overhead as well.

We decided it was best to keep things simple. The aggregation feature caused uses to make too many errors in composing their graph. As an alternative you might consider changing the log time interval if the logs are too big. The log time interval is not dependant on your integration time interval.






From:        stem-dev-request@xxxxxxxxxxx
To:        stem-dev@xxxxxxxxxxx
Date:        12/20/2018 03:58 AM
Subject:        stem-dev Digest, Vol 116, Issue 6
Sent by:        stem-dev-bounces@xxxxxxxxxxx
________________________________



Send stem-dev mailing list submissions to
               stem-dev@xxxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit
               
https://www.eclipse.org/mailman/listinfo/stem-dev
or, via email, send a message with subject or body 'help' to
               stem-dev-request@xxxxxxxxxxx

You can reach the person managing the list at
               stem-dev-owner@xxxxxxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of stem-dev digest..."


Today's Topics:

 1. Request for a new way of outputting data from STEM (Emily Nixon)


----------------------------------------------------------------------

Message: 1
Date: Thu, 20 Dec 2018 11:58:01 +0000
From: Emily Nixon <emily.nixon@xxxxxxxxxxxxx>
To: developer mailing list STEM <stem-dev@xxxxxxxxxxx>
Subject: [stem-dev] Request for a new way of outputting data from STEM
Message-ID:
               <VI1PR0601MB2352EDD4E8CA3D4513E36CD1BFBF0@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>

Content-Type: text/plain; charset="iso-8859-1"

Hi all,


As some of you may know, I am running scenarios on STEM that contain graphs with thousands of nodes and edges.


Therefore, the output csv files I get for each compartment of the disease are all very large which makes them difficult to work with and they take up a lot of space.


I am trying to develop ways in R to summarise this data so that I have some meaningful results, however, my supervisor thought that it might be worth checking with STEM developers about whether it is possible to have a new logger, or an option on an existing logger which means that only summary data is output. I know you are very busy at the moment, but she thought that it might not take too long, depending on how the logger code is currently written, and that it could be useful for other people too. Perhaps it could be a new feature on STEM4?


Specifically, what I would like to have is something like the following for the Incidence and I_0 csv files:

[
cid:39028d43-8521-4236-a1d9-543cd4cf57dd]

*Or the equivalent for I_0


However, if this summary data would be too difficult, then alternatively, having the normal output in a different format would help me a lot with producing this summary data myself in R.

[
cid:4a931012-09e8-42f4-8110-fcb26a17f0c8]

So instead of having a column for every node, having nodeID as a column and then having the node names in this column. This would make the list have more rows but less columns, which would make it much easier to work with.


If anyone has any thoughts or suggestions, if anyone else would find this useful or if anyone thinks they could help with implementing this in STEM, then I would greatly appreciate it!


Best wishes,


Emily











Emily Nixon
PhD Student

Demonstrator


School of Biological Sciences
University of Bristol
Bristol Life Sciences Building
24 Tyndall Avenue
Bristol
BS8 1TQ
Tel +44 (0)117 394 1389
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
https://www.eclipse.org/mailman/private/stem-dev/attachments/20181220/5c683fb4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pastedImage.png
Type: image/png
Size: 11058 bytes
Desc: pastedImage.png
URL: <
https://www.eclipse.org/mailman/private/stem-dev/attachments/20181220/5c683fb4/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pastedImage.png
Type: image/png
Size: 12221 bytes
Desc: pastedImage.png
URL: <
https://www.eclipse.org/mailman/private/stem-dev/attachments/20181220/5c683fb4/attachment-0001.png>

------------------------------

_______________________________________________
stem-dev mailing list
stem-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/stem-dev

End of stem-dev Digest, Vol 116, Issue 6
****************************************




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
https://www.eclipse.org/mailman/private/stem-dev/attachments/20190107/9cf61cb6/attachment.html>

------------------------------

_______________________________________________
stem-dev mailing list
stem-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/stem-dev

End of stem-dev Digest, Vol 117, Issue 1
****************************************




_______________________________________________
stem-dev mailing list
stem-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/stem-dev


Back to the top