Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[mosquitto-dev] Time-related bug (of sorts) on Develop branch

Hi,

 

I’ve been working on a project where I work that uses Mosquitto on an embedded ARM system where there is no RTC directly attached to the Linux system. This has revealed an interesting bug of sorts in a recent change to src/database.c in the db__new_msg_id() function.

 

To explain: The Linux system has no RTC device. There is a way for it to obtain the clock, but this is not possible until a user space application starts up and establishes a link with another device. As such, the kernel starts up with the clock set according to some very obscure rules that essentially come down to the modification timestamp on one of the files in the kernel source tree at the time the kernel was built.

 

In this instance, that date is December 16th, 2020.

 

In the db__new_msg_id() function, a message id is generated using the system clock (realtime, at nanosecond resolution) to generate an id that is reasonably unique. The problem begins with the seconds value (thus the current unixtime) have MOSQ_UUID_EPOCH subtracted from it. This is #defined to have a unixtime value that corresponds to November 17th 2021.

 

This subtraction results in a very large number being created as the latter is greater than the former. At this point, there is no issue, because no previous db_id exists.

 

Some time after the startup of Mosquitto, our system connects to the clock source and sets the system time. Now the clock jumps forward from December 16th 2020 to (e.g.) December 8th 2021.

 

Now the calculation is different: MOSQ_UUID_EPOCH is /less/ than the current unixtime and the result is a small number. This in itself is not a problem, but then the function executes this while loop:

 

while ( id <= db.last_db_id ){

                id++;

}

 

This loop goes through about 17 quadrillion iterations trying to increment id from around 15,000,000 to 17,000,000,000,000,000. On a 500MHz single-core embedded processor, this takes a long time. In fact, it looks a lot like a lock-up.

 

I’ve made a temporary change in our local build system to replace the loop above with the following:

 

if ( id <= db.last_db_id )

{
                id = db.last_db_id + 1;

}

 

This has the same effect, but executes in considerably fewer instruction cycles. Single Mosquitto runs in a single-thread there are no concurrency issues around db.last_db_id. I suspect the real problem is the subtraction of MOSQ_UUID_EPOCH, but without knowing the consequences of making more drastic changes I wasn’t willing to poke that bear.

 

The change above resolves the apparent lock.

 

Regards

 

Rebecca Gellman

 

 

 


This e-mail and any files transmitted with it are proprietary and intended solely for the use of the individual or entity to whom they are addressed. If you have reason to believe that you have received this e-mail in error, please notify the sender and destroy this e-mail and any attached files. Please note that any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of the Curtiss-Wright Corporation or any of its subsidiaries. Documents attached hereto may contain technology subject to government export regulations. Recipient is solely responsible for ensuring that any re-export, transfer or disclosure of this information is in accordance with applicable government export regulations. The recipient should check this e-mail and any attachments for the presence of viruses. Curtiss-Wright Corporation and its subsidiaries accept no liability for any damage caused by any virus transmitted by this e-mail.

Back to the top