Hello,
Currently the bridge connections uses a const value to retry in case something goes wrong. I would like to collaborate by adding a backoff mechanism, to improve situations when several thousand mosquittos will try to bridge to another broker in the cloud. If the cloud broker disconnects all mosquittos at the same time, and they try to reconnect after "restart_timeout", they generate a load peak (i.e., TLS being an expensive operation).
Right now I am using "restart_timeout" for configuration. If one value is passed, it acts as normal. If 2 values are passed, it then runs using backoff.
I want to contribute this code to the project, but I would like your opinion on:
- Is this a good algorithm choice?
- Is "restart_timeout" the proper place to put the configuration?
- For documentation purposes, any suggestion on how to do it?
- I manually tested it, but, is there an easy way to test this part? If I'm not mistaken current tests run the real broker, so automated testing would imply dealing with waits in the orders of seconds, and randomness.
Best,
Abilio Marques