[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [mosquitto-dev] Bridge connection backoff

Hi Roger,

Thanks for your prompt replies. It motivates me to collaborate with the project.

I have implemented this as a couple of functions: backoff_reset, backoff_step (plus a helper to generate the random between 2 values). I put them inside bridge.c file. Right now the functions update bridge->restart_timeout. They could be called from loop.c, in a similar fashion to where restart_t is set to 0... or they can be static functions, called from within bridge.c itself depending on the err results from different calls, totally hiding that logic from loop.c ... what is your suggestion in this case?

Best,
Abilio M

On Mon, Nov 5, 2018 at 11:35 PM Roger Light <roger@xxxxxxxxxx> wrote:
Hi Abilio,

This sounds like a sensible approach to me, thanks for suggesting it.
I haven't seen the decorrelated jitter approach before, but the
explanation makes sense.

To answer your other questions, yes restart_timeout seems the best
place to put it. Documentation would be best in both the example
mosquitto.conf and in man/mosquitto.conf.5.xml. It just needs to cover
the old mode and give a description of the new mode and how to use it.
No need to go into big detail on the description I don't think - just
something to give the user an idea of what is happening.

On the testing front, if you can make it work in the style of the
current integration type tests that would be good, but perhaps tricky
as you suggest. I'm also working towards more unit like testing
gradually, which might suit this better. There are tests of this sort
in the mqtt5 branch on github.

Regards,

Roger


On Sat, 3 Nov 2018 at 10:42, Abilio Marques <abiliojr@xxxxxxxxx> wrote:
>
> Hello,
>
> Currently the bridge connections uses a const value to retry in case something goes wrong. I would like to collaborate by adding a backoff mechanism, to improve situations when several thousand mosquittos will try to bridge to another broker in the cloud. If the cloud broker disconnects all mosquittos at the same time, and they try to reconnect after "restart_timeout", they generate a load peak (i.e., TLS being an expensive operation).
>
> To avoid this, I experimented with âDecorrelated Jitterâ, as explained in https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/ . It was a simple change, and it can be configurable.
>
> Right now I am using "restart_timeout" for configuration. If one value is passed, it acts as normal. If 2 values are passed, it then runs using backoff.
>
> I want to contribute this code to the project, but I would like your opinion on:
> - Is this a good algorithm choice?
> - Is "restart_timeout" the proper place to put the configuration?
> - For documentation purposes, any suggestion on how to do it?
> - I manually tested it, but, is there an easy way to test this part? If I'm not mistaken current tests run the real broker, so automated testing would imply dealing with waits in the orders of seconds, and randomness.
>
> Best,
> Abilio Marques
> _______________________________________________
> mosquitto-dev mailing list
> mosquitto-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://www.eclipse.org/mailman/listinfo/mosquitto-dev
_______________________________________________
mosquitto-dev mailing list
mosquitto-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/mosquitto-dev