Home » Eclipse Projects » Eclipse Titan » Strategies for shutting down complex component hierarchy
Strategies for shutting down complex component hierarchy [message #1836986] |
Wed, 20 January 2021 10:19  |
Eclipse User |
|
|
|
Dear TITAN community,
at Osmocom., we have one problem that hits us again and again over the years: How to properly shut down a complex hierarchy of components exchainging messages safely, without running into Dynamic test case errors during teardown.
In one of our typical test cases, we set up dozens to sometimes hundreds of components, most of them with internal ports connected between components. Maybe only half of those components are actual test cases, the others are just emulatiing some underlying protocol stack, or in some way facilitating the connection beween the IUT and the tets.
At some point, the actual test case components are terminating, and let's assume the test concluded successfully. All components up to now have either verdict "none" or "pass".
Then, the runtime starts stopping the various components, in whatever order. At this point it may happen that one of the components still is processing some message and sends it to another component that has alreay been shut down -> boom. Dynamic Test Error is reported, and the overal lverdict becomes "error", even though all of the tests had passed before.
Originally I had hoped that it is sufficient to simply make sure that all external ports like IPL4asp are clsoed first, so no external messages can be received anymore. That helps, but...
This can also be triggered by timeouts, as many protocol layers have internal timers for sending keep-alives. If such a timer fires while whetever is the next component has already been terminated -> boom.
We already tried to do an explicit "all component.stop()", it doesns't help either.
The most obvious solution to this problem would be to either have
* some way to "lock" the current verdict, i.e. whatever happens beyond this point can no longer affect the overall verdict. The test author could simply put that "lock" instruction once he knwos that anything relevant to his test has finished, and everything else happening thereafter is irrelevant for the overall verdict and must be ignored
or
* some way to reliably prevent all components from sending furrther messages through their ports
I'm somewhat surprised that there appears to be no obvious solution to the problem.
Sure, in "textbook TTCN3" you would probably normally have all of this complex non-testcase code as part of the "system simulator" which re resides outside of your TTCN3 runtime and hence on the "other" side of the test ports.
However, in TITAN with its capacity for "internal" ports between components, it is much more likely that there is a lot of code that just provides underlying logic/transport for the actual test casese. Even Ericsson has released such code like the SCCP_Emulation - so it seems to be an acceptable programming paradigm.
I also though it would be possible to designate entire component (types) as "not relevant to the final verdict", but that does not help. You still want to be able to catch violations of lower-lyer protocols inside those "emulation" components and do a setverdict(fail) if you encouter such a problem.
How do other people solve this ? How do you suggest to proceed?
Thanks,
Harald
[Updated on: Tue, 30 November 2021 05:24] by Moderator
|
|
| |
Re: Strategies for shutdding down complex component hierarchy [message #1837003 is a reply to message #1836990] |
Wed, 20 January 2021 15:12   |
Eclipse User |
|
|
|
Gábor Szalai wrote on Wed, 20 January 2021 16:43In our internal simulator, solved the issue using a controller component.
Yes, it is obviously possible to do this. But it is a *lot* of effort to implement such a port in every component, and repeat the related code over and over again. To me, that is a very expensive work-around for something that sounds relatively easy to do within the runtime.
We often have relatively complex/deep hierarchies, so it would be very difficult for one central component to even know all other components / component references. There is normally no need for that kind of knowledge. Yes, one could delegate the task of forwarding/flooding this "shutdown" request across the hierarchy. But really? Repeating the same code again and again?
Furthermore, not all components we use are developed by Osmocom. Take for example the SCCP_Emulation we use from the TITAN project. Adding such an "SHUTDOWN" interface would mean we'd have to fork every 3rd party component we use, maintain our own patchset on top, ...
If there weas some generic support by the TITAN runtime, one would reduce a lot of explicit extra code in every TITAN component out there and avoid all of the extra effort.
I still have a hard time believing that this has not been solved before in a "one size fits all" approach, for everyone. I hoped I simply had missed this feature. Given that you also seem to create elaborate work-arounds, it seems like it really doesn't exist :(
Do you think it's feasible to implement either of the approaches I suggested? I'm not familiar with the TITAN internals, but I might be able to have a look how verdicts are collected and try to implement the "verdict freeze'
|
|
| | | | | |
Re: Strategies for shutdding down complex component hierarchy [message #1848388 is a reply to message #1848380] |
Tue, 30 November 2021 09:05   |
Eclipse User |
|
|
|
Hi Olaf,
Olaf Bergengruen wrote on Tue, 30 November 2021 13:18Hi all,
For this reason at the beginning of each test case we ask the user to switch off the UE (and / or take the batteries out, clean the SIM card) and switch on again, and the complete TTCN executable is started again and all HW is resetted.
To clarify: This topic is not about resetting state in the IUT. It is about errors occurring in the ATS (TTCN3 test suite) after the actual testcases have completed. Those errors occur with a certain probability due to a complex component hierarchy with e.g. timers triggering messages betewen ATS components. So while the MTC starts to stop, or even during "all component.stop" the usual recipient of some itnernal message suddenly no longer exists, as the recipient has been stopped before the sender. -> boom.
As there is no way to "atomically" stop all components (and the shutdown order of components being non-deterministic) there is always a certain probability that one of those components is creating a DTE at some point during the shutdown process.
We have been seeing this ever since we started to use TITAN years ago, and it is the single most constant annoyance during all those years.
If the ATS has reached a state where the end of the MTC is reached in "pass", then nothing happening during shutdown should still negatively affect the test result. It is guaranteed to be a non-issue.
I think the same problem must appear in any reasonably complex test suite with a component hierarchy where parts of the ATS implement various layers of protocol stacks. There are timers, asynchronous messaging, etc. happening in this stack, and without a way to atomically shut all of them down, it is impossible to guarantee that no problem will happen during shutdown.
|
|
| | |
Re: Strategies for shutdding down complex component hierarchy [message #1848572 is a reply to message #1848545] |
Wed, 08 December 2021 06:27   |
Eclipse User |
|
|
|
Hi Olaf,
thanks a lot for your follow-up, it is much appreciated. I was not aware of the "all component.stop" and "any component.killed" constructs. We will investigate it.
From what I can understand, the only way how this construct would improve the situation, is if the notification of stopping/killing one component is processed at higher priority than any other messages of internal test ports between components.
I would think the gravity of the problem highly depends on the depth and complexity of component hierarchy. Particularly if you have many "non testcase logic" implemented in your ATS, i.e. entire protocol stacks as intermediate layers inside the ATS, the probability increases that some timer somewhere expires, causing a message to be sent on an internal test port between components, which in turn may fail as the recipient component might have been killed in a race condition just before the sender component was killed.
What I'm wondering is: Whatever mechanism is the "best common practice" out there: Why is it not implemented in the official TITAN components, suhc as for example the titan.ProtocolEmulations.SCCP or .M3UA? I would appreciate if anyone from the TITAN project could comment on that. How is one supposed to prevent any race condition during component shutdown when using those as-is?
Best Regards,
Harald
|
|
| | | | | | |
Re: Strategies for shutdding down complex component hierarchy [message #1848620 is a reply to message #1848616] |
Thu, 09 December 2021 16:33  |
Eclipse User |
|
|
|
Hi Gabor,
thanks a lot for your feedback
Gábor Szalai wrote on Thu, 09 December 2021 19:36The shutdown of a complex system should be designed from the beginning.
This seems to be the message I'm getting from various parties here. I still have a bit of a hard time wrapping my head around the _why_. Why would one spend a lot of time for something mundane as the component shutdown signaling/ordering? After all, at some point the MTC has concluded the test is "over" as it completes. Why can all the other PTC not simply be terminated automatically/implicitly in any random order, without further impact on the verdict?
I'm trying to understand the benefit of requiring everyone to have complex explicit code for nothing else but making sure no stray message on some random internal test port causes a DTE _ after the actual test case (MTC) has concluded_.
In my poor user point of view, I would expect it is the task of the language and runtime to enable the test developer to be as productive as possible, and spending no unneeded time writing complex code for what happens after the actual test has succeeded (or failed).
Gábor Szalai wrote on Thu, 09 December 2021 19:36
Please not that the M3UA and SCCP Emulation are designed as a standalone components connected with a simple test case. Also they were written about 15 years ago. They can be extended and modified to support more complex system. The easiest modification is to use try-catch block to avoid the DTE
Some thoughts:
* The age of a component/module should not matter, unless the language / runtime has introduce the danger of DTE during shutdown only recently. In fact, an older component could very well be more evolved/mature
* if every libraray/module has to deal with the shutdown order, and if library/module/components are supposed to be re-usable across projects and entities, then ther must be some kind of standardization for orderly shutdown. Otherwise, no single library/module could ever be re-used in another project, as everyone would come up with their own incompatible strategy of "orderly shutdown"
* in general, irrespective of the programming language, I have a strong resistance against modifying upstream libraries/modules. This introduces additional maintenance for keeping out-of-mainline patches, they need to be forward-ported and re-tested whenever upstream changes -> maintenance nightmare.
Gábor Szalai wrote on Thu, 09 December 2021 19:36
Also the DTE during the shutdown can be avoided to use alive type components. The test ports of the alive components are not disconnected/unmapped when the component finished only when killed or the MTC terminated.
Interesting idea. Maybe that could be a workaround, will investigate.
I still think that some kind of language or runtime support [like the "freezing of verdicts"] would avoid a lot of extra complexity that every developer has to write, and as stated above, which even impairs the effective re-use of existing modules due to lack of standardized ways of handling the orderly shutdown.
|
|
|
Goto Forum:
Current Time: Fri Jun 13 11:39:43 EDT 2025
Powered by FUDForum. Page generated in 0.10469 seconds
|