Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [paho-dev] python: implicit loop_stop on disconnect, multiple clients, watchdogs

I have been working on adding mqtt support to a project that used amqp for years, and
the error recovery has been stumping me.  I am going to go check if this issue is biting me too.
Thanks for bringing this up.

It would make more sense (principle of least surprise) to me that the loop_ controls would be independent of connection state. I don´t think connect/disconnect should mess with the loop at all.  And one of the features in my having active subscriptions to more than one broker at a time... So this is doubly relevant.  If one broker is broken, I don´t want the disconnect to break the other.  I will get this working with AMQP first, but then immediately turn to do the same with MQTT.



On Mon, Feb 17, 2025 at 10:12 AM Greg Troxel via paho-dev <paho-dev@xxxxxxxxxxx> wrote:
I'm using paho.mqtt.python 2.1.0 (NetBSD 10 amd64, python 3.12).
Generally things work well, but dealing with flaky networks has been a
little challenging.  Reading the docs and code, I'm having trouble
following.  The README points at this list for discussion.

I have a python program to monitor a UPS.  It connects to one broker
(not 5) and sends a json payload (voltage/load etc.) every minute, or on
demand if the payload is interesting.  That's ingested into a bunch of
Home Assistant sensors.  There's nothing interesting MQTT-wise about
this.

I chose loop_start as it seemed the simplest.  So connect_async, then
loop_start, and things are as expected.

I have had some network flakiness, as one would expect from time to
time.  Due to my general paranoia I wrote a watchdog which is cleared by
getting a PUBACK, to try to validate what I care about.  And in part, it
was due to not realizing that there is a built in mechanism that when
the TCP connection times out, paho.mqtt.python will tear down the socket
and make a new one.

On firing, the watchdog calls disconnect and then connect_async.  I
found that the program never recovered and just realized that disconnect
calls loop_stop, says the documentation.

Reading code, I am finding a few things hard to follow:

  I don't understand why loop_stop is called on disconnect.  To me,
  connect/disconnect is logically separate from running the event loop.
  It might be nice to make this louder, changing the 1-line disconnect
  description to
    `Use disconnect() to disconnect from the broker and stop loop processing.`
  to make it more likely people absorb this.  But perhaps it's only me.

  I can't find in the code where loop_stop is invoked on disconnect.
  Grepping for loop_stop in *.py, I see only the definition (plus a
  comment).

  It is now clear to me that loop_start is called on a client object,
  not at the higher level of the whole library.  I thus wonder:

    - Can you create multiple mqtt clients, loop_start on each, and have
      that all work?  (e.g. a program that talks to two brokers.)  The
      documentation implies yes, but it doesn't say it.

    - If you allow the client to become unreferenced and gc happens,
      does this cause the thread to cleanly exit?  Perhaps the thread
      class itself does that?  Is having a client gc'd an ok thing to
      do?  (It really seems like it should be, but it also doesn't seem
      trivial to make it work right.)

  I would expect that the callbacks happen in the loop thread, and this
  creates a multiprocessing hazard with using data structures also used
  in the normal thread.  The docs don't mention the need for
  synchronization.

I am now calling loop_start after connect_async on watchdog and expect
things to go much better.

Thanks,
Greg
_______________________________________________
paho-dev mailing list
paho-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/paho-dev

Back to the top