[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
|
Re: [tsf-dev] TSF process feedback, part 4: Change management, RAFIA
|
Hello,
Thank you for the much appreciated feedback.
The discussions in your linked emails have been thoughtful, and I'd like
to highlight two points:
1. The score
2. Complexity of the graph and how it can be managed
## The score
Since you asked for links, there have been numerous discussions about
the score, some of which are captured in
https://gitlab.eclipse.org/eclipse/tsf/tsf/-/issues/74. That said, I
would still emphasise that the discussions in the email threads related
to your feedback have been accurate. Namely, there can be little value
in the aggregated scores if the project using them is not clear on how
it is managing the graph (more on that in the second point).
The clearest way to understand how to manage the graph in terms of its
score is to understand who the score is for, as also appropriately
discussed. This leads to another open topic: how to use TSF to aid
decision-making, which also has an issue written up in
https://gitlab.eclipse.org/eclipse/tsf/tsf/-/issues/33.
I'd like to clarify that the algorithm for the score is not complicated
(it just has maybe scary-looking expressions in the documentation if
users choose to expand on them using more formal mathematical and
logical notation).
It is not quite Markov chain-like, as was discussed, but rather a simple
recursive weighted aggregation averaging child values in a DAG (with the
added caveat that completeness and correctness explained in the roadmap
in https://pages.eclipse.dev/eclipse/tsf/tsf/model/scoring/roadmap.html
should be multiplied first before aggregating, which the tooling does
not yet expose as well as it should, and weighting is being explored in
https://gitlab.eclipse.org/eclipse/tsf/tsf/-/merge_requests/591). The
algorithms in the documentation
(https://pages.eclipse.dev/eclipse/tsf/tsf/model/scoring/implementation.html)
are essentially implementation details for how to do this aggregation
with better performance. They used to show how we perform this
aggregation through finite matrix powers to utilise BLAS acceleration,
and more recently we have switched to a dynamic programming approach, as
it is more appropriate.
I'd also like to point out that sensitivity analysis can provide useful
insights into the score, and this is now being changed to be enabled by
default in the tooling:
https://gitlab.eclipse.org/eclipse/tsf/tsf/-/issues/74 (where the
implementation got more performant thanks to the algorithm change).
## Graph complexity
After clarifying the scoring mechanism and following the discussions,
you'll see that the score does not drive how TSF should be used. It is
quite easy to go overboard with TSF: any statement, link, reference, or
validator you add increases maintenance burden, as you've highlighted.
It is up to adopters to justify that maintenance burden with the value
TSF provides (which you've noted that TSF does nevertheless still
justify its use with the value it brings).
As discussed, it is easy to keep adding more and more content to TSF. It
is up to the project (and the people using and consuming it) to decide
what the most critical aspects are to improve next (not only in regards
to TSF use but also how to balance TSF improvements against improvements
in the software that is being developed).
One interesting way the graph can indicate what to improve next is if
references and processes are set up such that TSF acts as an alarm
system. However, I don't think this is always optimal as those alarm
triggers could often be handled before TSF, and then TSF outputs could
be generated in CI once they pass. That said, how you organise TSF
generation (per MR, daily, weekly, etc.) is ultimately up to each
project.
TSF could provide more guidance upstream on how to use it effectively as
contributors gain more experience. A simple starting point could be to
follow general principles related to Claim–Argument–Evidence and
Mutually Exclusive, Collectively Exhaustive heuristics, which are
broadly useful for structuring arguments. More sophisticated frameworks
using approaches such as Goal Structuring Notation could also be used to
reason about whether the data tracked via TSF is improving but before
we'd say that upstream I think it would be best to also reason why any
particular guidance around how to use TSF is appropriate and better than
any other known alternative approaches.
At the start of this year, TSF began exploring migrating the tooling to
Rust: https://pages.eclipse.dev/eclipse/tsf/tsf/tools/rust-tooling.html.
This is ongoing, as we are also exploring ways to build more guidance
and guardrails into new tools (example in
https://gitlab.eclipse.org/eclipse/tsf/tsf/-/merge_requests/596), while
continuing to maintain the existing Python `trudag` and gain further
experience on how best to use TSF.
Best wishes,
Kaspar
On 2026-04-24 11:40, Sam Thursfield via tsf-dev wrote:
Hello,
Here's the final instalment of my reflections on using TSF's processes.
In CTRL we use the following things that you know from TSF:
1. A set of Statements which present an assurance case, which is
managed with the trudag tool.
2. References as evidence for the statements.
3. Evaluation of the statements by subject matter experts (SMEs).
4. Evaluation of the statements using validator scripts.
5. Aggregate scores
6. Monitoring when evidence changes
7. Modifying the statements via Gitlab
8. The RAFIA process and STPA.
This mail covers items 7 and 8. If you missed the first three parts,
you can find them here in the archives:
https://www.eclipse.org/lists/tsf-dev/msg00037.html
https://www.eclipse.org/lists/tsf-dev/msg00040.html
https://www.eclipse.org/lists/tsf-dev/msg00043.html
Modifying the statements via Gitlab
-----------------------------------
All of these mails are building to the key selling point of TSF and the
reason we use it: our assurance case is stored in Git, alongside the
product we are building, and the two evolve together.
As a software engineer, I consider this approach best practice for
maintaining documentation, but I understand that in the world of
compliance it's still something of a stretch goal. We do manage to
release our product every month with an up-to-date assurance case,
and of course I recommend everyone try this approach.
We use Gitlab to review changes and we follow the
approach documented here:
<https://pages.eclipse.dev/eclipse/tsf/tsf/extensions/management.html>.
Here are some of the lessons we've learned along the way.
Firstly, just keeping something in Git doesn't mean engineers will
update it, especially for things like images and diagrams that aren't
easily reviewable or searchable as plain text. As a maintainer you need
to keep your eye on these as always.
We use Gitlab to review changes, and our CI runs `trudag manage lint`
on each MR to highlight any suspect items or links. In fact, we had
to extend this. We also run a `trudag-diff` job which does the
following:
* Lint the graph on the 'main' branch
* Lint the graph on the merge request candidate branch
* Display errors from candidate branch that are *not* present in
'main'.
This is needed because of external references: some statements may
be marked as Suspect through no fault of the MR author. And, since
Trudag fetches the external references each time it runs, the output
of `trudag manage lint` may be different each time you run it, even
on the same Git commit! This is counter-intuitive and the `trudag-diff`
job is a helpful workaround.
The `trudag-diff` job is helpful to show where re-review is needed.
For example, an engineer changes a config file that is referenced in
a statement, and the statement has an SME evaluation. Trudag flags
the statement as suspect: the SME needs to review it and confirm
the statement is still true.
Since Trudag doesn't integrate with Gitlab, we do this as follows
for internal references:
1. First the engineer marks the statement as reviewed (even though
they are NOT necessarily the subject-matter expert), and records
that in the `.dotstop.dot` file.
2. Then, a Trustable reviewer checks the `trudag-diff` job and
notices
that a statement with an SME review has changed. They tag the SME
on Gitlab.
3. The SME reviews the evidence and the statement and replies on
Gitlab.
This approach has the positive effect that our graph is reviewed
regularly, and the SME reviews give additional confidence on code
changes, so we rarely have to revert bad or unwanted changes later.
The downside is it does slow down development. Sometimes an SME
is unavailable, and you have to decide whether to wait for them to
return, or have someone else take over their score. It can lead
to four reviewers being called in over a one-line change that just
removes whitespace. And large changes that touch many files can
be much more expensive to land.
This is a problem that all software projects have, and my only advice
is to be flexible. TSF does allow merging changes *before* the
corresponding statements are reviewed, with a corresponding score of
0 for those changes. Sometimes that might be the right choice.
And, of course, the smaller your graph and the fewer references, the
lower the cost. It pays to be minimal!
RAFIA and STPA
--------------
My colleagues published an excellent article on this topic recently, so
instead of repeating that here I'll just share the link:
https://www.codethink.co.uk/articles/building-on-stpa/
---
So, this ends my "lessons learned" mail, I'm interested to hear
feedback from others. I know some of these issues are documented
in Gitlab already, in fact I'd be grateful if readers can reply with
links to any relevant issues you're aware of.
Best regards,
Sam