[tsf-dev] TSF process feedback, part 2: evaluating statements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[tsf-dev] TSF process feedback, part 2: evaluating statements

From: Sam Thursfield <sam.thursfield@xxxxxxxxxxxxxxx>
Date: Wed, 22 Apr 2026 16:04:25 +0200
Delivered-to: tsf-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/tsf-dev/>
List-help: <mailto:tsf-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/tsf-dev>, <mailto:tsf-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/tsf-dev>, <mailto:tsf-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Mozilla Thunderbird

Hello,

As promised yesterday, here's part 2 of my reflections on our use of TSF
in the CTRL project.

In CTRL we use the following things that you know from TSF:

  1. A set of Statements which present an assurance case, which is
     managed with the trudag tool.
  2. References as evidence for the statements.
  3. Evaluation of the statements by subject matter experts (SMEs).
  4. Evaluation of the statements using validator scripts.
  5. Aggregate scores
  6. Monitoring when evidence changes
  7. Modifying the statements via Gitlab
  8. The RAFIA process and STPA.

This mail covers items 3 and 4, which follow on from yesterday's mail
that covered writing statements and adding evidence. Today I'll talk
about how we evaluate them, i.e. deciding how true they are.

Evaluation by SME
-----------------

We use scoring, as documented at:
<https://pages.eclipse.dev/eclipse/tsf/tsf/model/scoring/implementation.html>.

For individual statements, it can be nice for an expert to record how
close we are to achieving some goal, by putting e.g. 0.5 against the
statement to show that we're halfway there.

This is a matter of debate though. Elsewhere we say that the score means
"confidence that a statement is true", which is different to measuring
progress against a goal. If a statement says "All documentation is in
Git", but I know that half of the documentation lives in Google Docs,
how should I score that statement? If it's a measure of *progress*, I
might score 0.5 as a way of representing "We are halfway through putting
the docs in Git." But if it's my confidence that the statement is true,
then my score must be 0.0. The statement says all the docs are in Git,
I know for a fact they are not, and so I'm completely confident that the
statement is false. The TSF docs don't make it clear which is the right
interpretation at present.

We also have issues when *modifying* statements with SME reviews. Each
time a statement changes, or any references linked to that statement
change, then you need to distract the SME to check if their score is
still valid. That gets more time consuming the more statements and
references you have. More on that tomorrow when I discuss 'Monitoring
when evidence changes'.

Lesson: Agree what the scores mean within your team!

Evaluation by validator scripts
-------------------------------

We've had a lot of success writing validator plugins to generate a score
for a particular statement. For example, we might have a statement that
says:

> If Foo receives invalid input, it raises an error.

Most likely we'd have some automated tests which exercise Foo with some
invalid inputs, check that it raises the error we expect. We would then
have a validator plugin which checks the CI for Foo and generates a
score based on how many tests passed.

This is a huge improvement on a paper-based assessment that makes
claims about Foo without any evidence, or perhaps includes just a
snippet of test code. Here we *run* the test and you can jump from
the report right to the test result.

That said, it's important to remember the limits of this approach:

  * Does the testing cover all possible invalid inputs?
  * When did those tests actually last run? How old is the report you're
    reading?
  * How do failing tests affect the score? Should it go to zero if a
    test failed? (See the related question in the previous section).
  * Is the validator script doing what you expect? Maybe it has a bug.

These are questions you should always ask about automated testing. We
miss a way in TSF to record our answers. One suggestion was to record
an SME review *as well as* automated validation for a statement so we
know when is the last time someone asked these questions.

One TSF feature that would help a lot here is allowing validator plugins
to generate content that appears in the report. If a validator generates
a score less than 1.0, you should be able to see *why* and what that
means. For example: which test(s) failed? Did the tests run once, or 100
times? and so on.

Another thing to be careful of, is the more advanced your testing the
more complex it becomes to generate the report. Similar to external
references, if your validator plugins pull data from external sources
you need to worry about availability and access control.

In our case, we have Trudag fail gracefully here, if some validator
plugin cannot fetch data then it returns a score of 0.0.

Lesson: Use validator plugins, but review them regularly.

---

Again, thanks for reading, and I love to read your feedback oneverything mentioned here.


Best regards,
Sam

--
Sam Thursfield (he/him), Software Engineer
  Codethink Ltd.                            http://www.codethink.co.uk/

Follow-Ups:
- Re: [tsf-dev] TSF process feedback, part 2: evaluating statements
  - From: Derek M Jones

Prev by Date: Re: [tsf-dev] TSF process feedback, part 1: statement graphs and references
Next by Date: Re: [tsf-dev] TSF process feedback, part 2: evaluating statements
Previous by thread: [tsf-dev] TSF process feedback, part 1: statement graphs and references
Next by thread: Re: [tsf-dev] TSF process feedback, part 2: evaluating statements
Index(es):
- Date
- Thread

Breadcrumbs