Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [mosquitto-dev] Bridge connection backoff

Hi Abilio,

This sounds like a sensible approach to me, thanks for suggesting it.
I haven't seen the decorrelated jitter approach before, but the
explanation makes sense.

To answer your other questions, yes restart_timeout seems the best
place to put it. Documentation would be best in both the example
mosquitto.conf and in man/mosquitto.conf.5.xml. It just needs to cover
the old mode and give a description of the new mode and how to use it.
No need to go into big detail on the description I don't think - just
something to give the user an idea of what is happening.

On the testing front, if you can make it work in the style of the
current integration type tests that would be good, but perhaps tricky
as you suggest. I'm also working towards more unit like testing
gradually, which might suit this better. There are tests of this sort
in the mqtt5 branch on github.

Regards,

Roger


On Sat, 3 Nov 2018 at 10:42, Abilio Marques <abiliojr@xxxxxxxxx> wrote:
>
> Hello,
>
> Currently the bridge connections uses a const value to retry in case something goes wrong. I would like to collaborate by adding a backoff mechanism, to improve situations when several thousand mosquittos will try to bridge to another broker in the cloud. If the cloud broker disconnects all mosquittos at the same time, and they try to reconnect after "restart_timeout", they generate a load peak (i.e., TLS being an expensive operation).
>
> To avoid this, I experimented with “Decorrelated Jitter”, as explained in https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/ . It was a simple change, and it can be configurable.
>
> Right now I am using "restart_timeout" for configuration. If one value is passed, it acts as normal. If 2 values are passed, it then runs using backoff.
>
> I want to contribute this code to the project, but I would like your opinion on:
> - Is this a good algorithm choice?
> - Is "restart_timeout" the proper place to put the configuration?
> - For documentation purposes, any suggestion on how to do it?
> - I manually tested it, but, is there an easy way to test this part? If I'm not mistaken current tests run the real broker, so automated testing would imply dealing with waits in the orders of seconds, and randomness.
>
> Best,
> Abilio Marques
> _______________________________________________
> mosquitto-dev mailing list
> mosquitto-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://www.eclipse.org/mailman/listinfo/mosquitto-dev


Back to the top