Scripting Excel Spreadsheets using Epsilon¶

Spreadsheets are commonly used in software and systems engineering processes to capture and analyse structured data, and can be sources of valuable information for model-based software engineering activities.

Epsilon provides built-in support for querying and transforming Excel spreadsheets through an Apache POI-based EMC driver. This article discusses how you can configure an Epsilon program to query and modify an Excel spreadsheet, and the video below demonstrates the driver in action.

Citing the Excel EMC driver in a publication?

If you are referring to Epsilon's Excel EMC driver in a publication, please cite this paper instead of the website URL.

Regression in Epsilon 2.2

Due to a regression, the Excel driver is broken in Epsilon 2.2. The driver works well in previous versions (e.g. 2.1) as well as in 2.3. Support for column datatypes and for writing to Excel spreadsheets has improved substantially in 2.3.

Worksheets, Columns and Rows¶

Essentially, in the Excel driver, by default, worksheets are treated as model element types (e.g. Student, Staff, Module and Mark in the spreadsheet below), columns as their properties (e.g. Mark has student, module and mark properties), and rows are treated as model elements (i.e. there are two students, two members of staff, three modules and two marks in the spreadsheet below).

StudentStaffModuleMark

	A	B	C	D	E	F
1	id	firstname	lastname	age	supervisor	modules
2	jt501	Joe	Thompson	23	mt506	MSD,RQE
3	js502	Jane	Smith	22	mt506	MSD,HCI

	A	B	C	D	E	F
1	id	firstname	lastname	teaches
2	mt506	Matthew	Thomas	MSD,RQE
3	dj503	Daniel	Jackson	HCI

	A	B	C	D	E	F
1	id	title	term
2	MSD	Modelling and System Design	Autumn
3	HCI	Human Computer Interaction	Spring
4	RQE	Requirements Engineering	Spring

	A	B	C	D	E	F
1	student	module	mark
2	jt501	MSD	62
3	js502	HCI	74

References and Column Types¶

The driver supports specifying additional configuration information (e.g. column data types, references between columns) about a spreadsheet in the form of an external XML document, that can be attached to the spreadsheet in Epsilon's run configuration dialog.

For our example spreadsheet, above, the configuration file below specifies the types of the age and mark columns of the spreadsheet, the multiplicity of the teaches column, as well as references between the Student.supervisor and Staff.id, and the Staff.teaches and Module.id columns.

<spreadsheet>
  <worksheet name="Student">
    <column name="age" datatype="integer"/>
    <column name="modules" many="true"/>
  </worksheet>
  <worksheet name="Mark">
    <column name="mark" datatype="integer"/>
  </worksheet>
  <worksheet name="Staff">
    <column name="teaches" many="true" delimiter=","/>
  </worksheet>
  <reference source="Student->supervisor"
             target="Staff->id"/>
  <reference source="Student->modules"
             target="Module->id"/>           
  <reference source="Staff->teaches"
             target="Module->id"/>
  <reference source="Mark->module"
             target="Module->id"/>
  <reference source="Mark->student"
             target="Student->id"/>
</spreadsheet>

The format of the XML configuration document is as follows:

Worksheet¶

Each worksheet can have an optional name (if a name is not provided, the name of the worksheet on the spreadsheet is used) and acts as a container for column elements.

Column¶

Each column needs to specify at least either its zero-based index in the context of the worksheet it belongs to, or its name (if a name is not provided, the cell value in the first row is used, as discussed above). Additionally, a column can have:

alias: an alternative name for the column.
datatype: the type of the values in the column. As of Epsilon 2.5.0, the valid options (ignoring case) are string, integer, boolean, double, or float. The default datatype is string.
many: it can be true or false. If true, then its cells will be considered to contain unbounded collections, separated by the delimeter(see below). Columns are single-valued by default.
delimeter: the delimiter used to separate values if many is true. The default delimeter is ,.

As of 2.6.0, numeric values are converted to various datatypes as follows:

string: the textual representation of the cell's value is used, applying its declared format if possible. If the cell format is not supported, a format similar to the Excel General format will be used: # for whole numbers and #.########## for decimal numbers (see the Apache POI DateFormatter documentation).
integer, double, and float: the cell value is computed (using the formula if one is present), and then cast to the appropriate datatype.
boolean: zero values will be treated as false, and non-zero values will be treated as true.

Issues with numeric cell values in Epsilon 2.5.0 and earlier

In Epsilon 2.5.0 and earlier releases, the Excel driver only supported the integer, double, and float datatypes for cells with numeric values. string and boolean datatypes were ignored, and treated as double. See issue #89.

Reference¶

In a configuration document we can also specify ID-based references to capture relationships between columns belonging to potentially different worksheets. Each reference has a source and a target column, an optional name (if a name is not specified, the name of the source column is used to navigate the reference), a cardinality (many attribute), and specifies whether updates to cells of the target column should be propagated automatically (cascadeUpdates attribute) to the respective cells in the source column to preserve referential integrity.

Querying and Modifying Spreadsheets¶

Having specified the configuration document above, we can now query the spreadsheet with EOL as follows.

// Returns all students supervised by Matthew Thomas
Student.all.select(s|s.supervisor?.lastname = "Thomas");

// Returns the modules taught by Daniel Jackson
Module.all.select(m|
    Staff.all.exists(s|
        s.firstname="Daniel" and s.teaches.includes(m)));

Creating Rows¶

As discussed above, worksheets are treated as types and rows as their instances. As such, to create a new row in the Student worksheet, EOL's new operation can be used.

var student : new Student;

Deleting Rows¶

To delete a row from a worksheet, EOL's delete operator can be used. When a row is deleted, all the rows that contain cells referring to it through cascade-update references also need to be recursively deleted.

var student = Student.all.selectOne(s|s.id = "js502");
// deletes row 2 of the Student worksheet
// also deletes row 3 of the Mark worksheet
delete student;

Modifying Cell Values¶

If a cell is single-valued, a type-conforming assignment can be used to edit its value. For example, the following listing demonstrates modifying the age and the supervisor of a particular student.

var student : Student = ...;
var supervisor : Staff = ...;
student.age = 24;
student.supervisor = supervisor;

If on the other hand the cell is multi-valued, then its values should be handled as a collection. Adding/removing values from property collections has no effect on the spreadsheet; you need to re-assign values instead.

// Moves a module between two members of staff
var from : Staff = ...;
var to : Staff = ...;
var module : Module = ...;
// Neither of these will work
// from.teaches.remove(module);
// to.teaches.add(module);
// ... but these will
from.teaches = from.teaches.excluding(module);
to.teaches = to.teaches.including(module);

Updating the value of a cell can have side effects to other cells that are linked to it through cascade-update references to preserve referential integrity. For example, updating the value of cell A3 in the Module worksheet, should trigger appropriate updates in cells D2 and F2 of the Staff and Student worksheets respectively.

Validating and Transforming Spreadsheets¶

Of course, we can also validate spreadsheets using EVL, transform them into other models using ETL, and into text using EGL, generate graphical views using Picto etc.

context Staff {
    constraint NotOverloaded {
        check: self.teaches.size() <= 4
        message: "Member of staff" + self.firstname +
          " " + self.lastname + " is overloaded"
    }
}

Creating Spreadsheets¶

To create a spreadsheet from scratch (e.g. when it is produced by an ETL transformation), we also need to specify an index for each column in the XML mapping file. Below is an EOL program that creates the spreadsheet above from scratch, and the mapping file for it. The complete example is in Epsilon's Git repo.

create-spreadsheet.eolmapping.xml

// Create the modules
var MSD = new Module(id="MSD", 
  title="Modelling and System Design", term="Autumn");

var HCI = new Module(id="HCI", 
  title="Human Computer Interaction", term="Spring");

var RQE = new Module(id="RQE", 
  title="Requirements Engineering", term="Spring");

// Create the staff  
var matthew = new Staff(id="mt506", firstname="Matthew",
  lastname="Thomas", teaches=Sequence{MSD, RQE});

var matthew = new Staff(id="dj503", firstname="Daniel",
  lastname="Jackson", teaches=Sequence{HCI});

// Create the students
var joe = new Student(id="jt501", firstname="Joe", 
  lastname="Thompson", age="23", supervisor=matthew, modules=Sequence{MSD, RQE});

var jane = new Student(id="js502", firstname="Jane", 
  lastname="Smith", age="22", supervisor=matthew, modules=Sequence{MSD, HCI});

// Create the marks
new Mark(student=joe, module=MSD, mark=62);
new Mark(student=jane, module=HCI, mark=74);

<spreadsheet>
  <worksheet name="Student">
    <column index="0" name="id"/>
    <column index="1" name="firstname"/>
    <column index="2" name="lastname"/>
    <column index="3" name="age" datatype="integer"/>
    <column index="4" name="supervisor"/>
    <column index="5" name="modules" many="true"/>
  </worksheet>
  <worksheet name="Mark">
    <column index="0" name="student"/>
    <column index="1" name="module"/>    
    <column index="2" name="mark" datatype="integer"/>
  </worksheet>
  <worksheet name="Staff">
    <column index="0" name="id"/>
    <column index="1" name="firstname"/>
    <column index="2" name="lastname"/>
    <column index="3" name="teaches" many="true" delimiter=","/>
  </worksheet>
  <worksheet name="Module">
    <column index="0" name="id"/>
    <column index="1" name="title"/>
    <column index="2" name="term"/>
  </worksheet>
  <reference source="Student->supervisor"
             target="Staff->id"/>
  <reference source="Student->modules"
             target="Module->id"/>           
  <reference source="Staff->teaches"
             target="Module->id"/>
  <reference source="Mark->module"
             target="Module->id"/>
  <reference source="Mark->student"
             target="Student->id"/>                      
</spreadsheet>

Working with Formulas¶

To set the value of a cell to a formula, start its value with = as shown below. The complete example is in Epsilon's Git repo.

create-spreadsheet-with-formulas.eolmapping.xml

var calc : new Calc;
calc.a = 1;
calc.b = 2;
calc.sum = "=A2+B2";

calc.sum.println(); // Prints 3

<spreadsheet>
  <worksheet name="Calc">
    <column index="0" name="a" datatype="integer"/>
    <column index="1" name="b" datatype="integer"/>
    <column index="2" name="sum" datatype="integer"/>
  </worksheet>
</spreadsheet>

Reflective Access¶

To iterate over all the worksheets, columns and rows of a speadsheet without referring to them by name, we can use the following statements (assuming that our Excel spreadsheet is named M in the run configuration). Additional methods of interest for this mode of access can be found in the Javadoc of the underlying ExcelModel and SpreadsheetModel classes.

// Iterate over all worksheets
for (w in M.worksheets) {
  w.name.println();

  // Iterate over all columns
  // of the worksheet
  for (c in w.header.columns) {
    c.name.println("\t");
  }

  // Iterate over all rows
  // of the worksheet
  for (r in w.rows) {
    r.println("\t");
  }
}

Resources¶

This article shows how to use Excel spreadsheets in ANT/Gradle/Maven builds.