Re: [tsf-dev] TSF process feedback, part 4: Change management, RAFIA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [tsf-dev] TSF process feedback, part 4: Change management, RAFIA

From: Kaspar Matas <kaspar.matas@xxxxxxxxxxxxxxx>
Date: Tue, 28 Apr 2026 12:36:10 +0100
Delivered-to: tsf-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/tsf-dev/>
List-help: <mailto:tsf-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/tsf-dev>, <mailto:tsf-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/tsf-dev>, <mailto:tsf-dev-request@eclipse.org?subject=unsubscribe>

Hello,

Thank you for the much appreciated feedback.

The discussions in your linked emails have been thoughtful, and I'd liketo highlight two points:


1. The score
2. Complexity of the graph and how it can be managed

## The score

Since you asked for links, there have been numerous discussions aboutthe score, some of which are captured inhttps://gitlab.eclipse.org/eclipse/tsf/tsf/-/issues/74. That said, Iwould still emphasise that the discussions in the email threads relatedto your feedback have been accurate. Namely, there can be little valuein the aggregated scores if the project using them is not clear on howit is managing the graph (more on that in the second point).

The clearest way to understand how to manage the graph in terms of itsscore is to understand who the score is for, as also appropriatelydiscussed. This leads to another open topic: how to use TSF to aiddecision-making, which also has an issue written up inhttps://gitlab.eclipse.org/eclipse/tsf/tsf/-/issues/33.

I'd like to clarify that the algorithm for the score is not complicated(it just has maybe scary-looking expressions in the documentation ifusers choose to expand on them using more formal mathematical andlogical notation).

It is not quite Markov chain-like, as was discussed, but rather a simplerecursive weighted aggregation averaging child values in a DAG (with theadded caveat that completeness and correctness explained in the roadmapin https://pages.eclipse.dev/eclipse/tsf/tsf/model/scoring/roadmap.htmlshould be multiplied first before aggregating, which the tooling doesnot yet expose as well as it should, and weighting is being explored inhttps://gitlab.eclipse.org/eclipse/tsf/tsf/-/merge_requests/591). Thealgorithms in the documentation(https://pages.eclipse.dev/eclipse/tsf/tsf/model/scoring/implementation.html)are essentially implementation details for how to do this aggregationwith better performance. They used to show how we perform thisaggregation through finite matrix powers to utilise BLAS acceleration,and more recently we have switched to a dynamic programming approach, asit is more appropriate.

I'd also like to point out that sensitivity analysis can provide usefulinsights into the score, and this is now being changed to be enabled bydefault in the tooling:https://gitlab.eclipse.org/eclipse/tsf/tsf/-/issues/74 (where theimplementation got more performant thanks to the algorithm change).


## Graph complexity

After clarifying the scoring mechanism and following the discussions,you'll see that the score does not drive how TSF should be used. It isquite easy to go overboard with TSF: any statement, link, reference, orvalidator you add increases maintenance burden, as you've highlighted.It is up to adopters to justify that maintenance burden with the valueTSF provides (which you've noted that TSF does nevertheless stilljustify its use with the value it brings).

As discussed, it is easy to keep adding more and more content to TSF. Itis up to the project (and the people using and consuming it) to decidewhat the most critical aspects are to improve next (not only in regardsto TSF use but also how to balance TSF improvements against improvementsin the software that is being developed).

One interesting way the graph can indicate what to improve next is ifreferences and processes are set up such that TSF acts as an alarmsystem. However, I don't think this is always optimal as those alarmtriggers could often be handled before TSF, and then TSF outputs couldbe generated in CI once they pass. That said, how you organise TSFgeneration (per MR, daily, weekly, etc.) is ultimately up to eachproject.

TSF could provide more guidance upstream on how to use it effectively ascontributors gain more experience. A simple starting point could be tofollow general principles related to Claim–Argument–Evidence andMutually Exclusive, Collectively Exhaustive heuristics, which arebroadly useful for structuring arguments. More sophisticated frameworksusing approaches such as Goal Structuring Notation could also be used toreason about whether the data tracked via TSF is improving but beforewe'd say that upstream I think it would be best to also reason why anyparticular guidance around how to use TSF is appropriate and better thanany other known alternative approaches.

At the start of this year, TSF began exploring migrating the tooling toRust: https://pages.eclipse.dev/eclipse/tsf/tsf/tools/rust-tooling.html.This is ongoing, as we are also exploring ways to build more guidanceand guardrails into new tools (example inhttps://gitlab.eclipse.org/eclipse/tsf/tsf/-/merge_requests/596), whilecontinuing to maintain the existing Python `trudag` and gain furtherexperience on how best to use TSF.


Best wishes,
Kaspar

On 2026-04-24 11:40, Sam Thursfield via tsf-dev wrote:

Hello,

Here's the final instalment of my reflections on using TSF's processes.

In CTRL we use the following things that you know from TSF:

  1. A set of Statements which present an assurance case, which is
     managed with the trudag tool.
  2. References as evidence for the statements.
  3. Evaluation of the statements by subject matter experts (SMEs).
  4. Evaluation of the statements using validator scripts.
  5. Aggregate scores
  6. Monitoring when evidence changes
  7. Modifying the statements via Gitlab
  8. The RAFIA process and STPA.

This mail covers items 7 and 8. If you missed the first three parts,
you can find them here in the archives:

https://www.eclipse.org/lists/tsf-dev/msg00037.html
https://www.eclipse.org/lists/tsf-dev/msg00040.html
https://www.eclipse.org/lists/tsf-dev/msg00043.html

Modifying the statements via Gitlab
-----------------------------------

All of these mails are building to the key selling point of TSF and the
reason we use it: our assurance case is stored in Git, alongside the
product we are building, and the two evolve together.

As a software engineer, I consider this approach best practice for
maintaining documentation, but I understand that in the world of
compliance it's still something of a stretch goal. We do manage to
release our product every month with an up-to-date assurance case,
and of course I recommend everyone try this approach.

We use Gitlab to review changes and we follow the
approach documented here:
<https://pages.eclipse.dev/eclipse/tsf/tsf/extensions/management.html>.

Here are some of the lessons we've learned along the way.

Firstly, just keeping something in Git doesn't mean engineers will
update it, especially for things like images and diagrams that aren't
easily reviewable or searchable as plain text. As a maintainer you need
to keep your eye on these as always.

We use Gitlab to review changes, and our CI runs `trudag manage lint`
on each MR to highlight any suspect items or links. In fact, we had
to extend this. We also run a `trudag-diff` job which does the
following:

  * Lint the graph on the 'main' branch
  * Lint the graph on the merge request candidate branch
  * Display errors from candidate branch that are *not* present in
    'main'.

This is needed because of external references: some statements may
be marked as Suspect through no fault of the MR author. And, since
Trudag fetches the external references each time it runs, the output
of `trudag manage lint` may be different each time you run it, even
on the same Git commit! This is counter-intuitive and the `trudag-diff`
job is a helpful workaround.

The `trudag-diff` job is helpful to show where re-review is needed.
For example, an engineer changes a config file that is referenced in
a statement, and the statement has an SME evaluation. Trudag flags
the statement as suspect: the SME needs to review it and confirm
the statement is still true.

Since Trudag doesn't integrate with Gitlab, we do this as follows
for internal references:

  1. First the engineer marks the statement as reviewed (even though
     they are NOT necessarily the subject-matter expert), and records
     that in the `.dotstop.dot` file.

2. Then, a Trustable reviewer checks the `trudag-diff` job andnotices

     that a statement with an SME review has changed. They tag the SME
     on Gitlab.
  3. The SME reviews the evidence and the statement and replies on
     Gitlab.

This approach has the positive effect that our graph is reviewed
regularly, and the SME reviews give additional confidence on code
changes, so we rarely have to revert bad or unwanted changes later.

The downside is it does slow down development. Sometimes an SME
is unavailable, and you have to decide whether to wait for them to
return, or have someone else take over their score. It can lead
to four reviewers being called in over a one-line change that just
removes whitespace. And large changes that touch many files can
be much more expensive to land.

This is a problem that all software projects have, and my only advice
is to be flexible. TSF does allow merging changes *before* the
corresponding statements are reviewed, with a corresponding score of
0 for those changes. Sometimes that might be the right choice.

And, of course, the smaller your graph and the fewer references, the
lower the cost. It pays to be minimal!

RAFIA and STPA
--------------

My colleagues published an excellent article on this topic recently, so
instead of repeating that here I'll just share the link:
https://www.codethink.co.uk/articles/building-on-stpa/

---

So, this ends my "lessons learned" mail, I'm interested to hear
feedback from others. I know some of these issues are documented
in Gitlab already, in fact I'd be grateful if readers can reply with
links to any relevant issues you're aware of.

Best regards,
Sam

References:
- [tsf-dev] TSF process feedback, part 4: Change management, RAFIA
  - From: Sam Thursfield

Prev by Date: Re: [tsf-dev] TSF process feedback, part 3: aggregate scores, changing evidence
Next by Date: [tsf-dev] TSF process feedback - part 1
Previous by thread: [tsf-dev] TSF process feedback, part 4: Change management, RAFIA
Next by thread: [tsf-dev] TSF process feedback - part 1
Index(es):
- Date
- Thread

Breadcrumbs