Hi Roger, Tatsuzo, Tifaifai, anyone who is interested with in mosquitto cluster,
I've tested the
mosquitto cluster by appropriate amount of simultaneous users, connect/subscribe/publish TPS, which gave a proper pressure(carry as much TPS as possible w/o causing latency) to the system.
It is meaningless to give a plain benchmark w/o any comparison, so I've also tested mosquitto bridge under a same scenario which is quite similar with our smart home service platform architecture and scenario(10 brokers, 20k subscribers, 1k publishes from
10 publishers which use persistent TCP from a http server):
9 brokers on 3 OpenStack VMs(4cores 8G RAM), after 30k persistant subscribers setup, send 10k publishes per second(actually only 2.5kps publishes due to client's bottleneck) from publishers(one publish per client use non-persistant TCP), with payload length
= 744 bytes.
With QoS=0, both cluster and bridge(topic # both 0) can work as normal, with QoS=1(topic # both 1), the CPU usage for each broker
stabilise at at 65%-75% in cluster, but 100% for the bridge broker during the publish phase, and meanwhile 30% messages lost due to bridge broker's overload(see appendix). More detail test reports which include connect/request response time, network
throughput, server monitoring are available under https://github.com/hui6075/mosquitto/tree/develop/benchmark .
I believe that the situation will be worse for bridge under QoS=2, but will not deteriorate for cluster since publish messages forward with it's origin QoS and process with QoS=0 in the cluster. The mosquitto cluster
equalized all brokers' load, and it is different with bridge, it is an entire MQTT logic broker for external clients such as duplicate client id elimination, persistent session inheritable after client's reconnection, and the most important is
that it's a autonomy system which provide continuous service under single point of failure which bridge doesn't have, so I sincerely hope that you can make any comments, code review, make performance testing under your scenario, etc., to make mosquitto cluster
be better.
Thanks!
BRs,
Jianhui
PS. an oProfile report has attached in appendix, which shows that a more efficient timer management should be involved to save the CPU cycles which bring from expiration polling.
Appendix:
bridge cpu usage snapshot(18225 is the bridge):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18225 mosquitt 20 0 45232 5576 1776 R 100.0 0.1 1:59.91 mosquitto(bridge)
18224 mosquitt 20 0 44704 5052 1760 R 91.0 0.1 1:28.08 mosquitto
18223 mosquitt 20 0 44708 5076 1760 R 82.7 0.1 1:30.28 mosquitto
4869 mosquitt 20 0 44708 5008 1764 R 79.4 0.1 1:41.62 mosquitto
4875 mosquitt 20 0 44720 5004 1764 R 78.0 0.1 1:38.25 mosquitto
4876 mosquitt 20 0 44724 5008 1764 R 75.7 0.1 1:38.51 mosquitto
2900 mosquitt 20 0 24480 4892 1572 R 71.4 0.1 1:25.55 mosquitto
2898 mosquitt 20 0 24480 4872 1572 S 68.1 0.1 1:26.22 mosquitto
2899 mosquitt 20 0 24488 4860 1572 R 66.1 0.1 1:25.79 mosquitto
cluster cpu usage snapshot:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19715 mosquitt 20 0 47632 7968 1796 R 73.4 0.1 2:17.25 mosquitto
19716 mosquitt 20 0 47660 7996 1796 R 72.4 0.1 2:20.55 mosquitto
19717 mosquitt 20 0 47512 7868 1796 R 70.7 0.1 2:16.84 mosquitto
6574 mosquitt 20 0 47796 8148 1800 R 64.4 0.1 2:14.92 mosquitto
6573 mosquitt 20 0 47928 8180 1800 S 63.1 0.1 2:17.60 mosquitto
6572 mosquitt 20 0 47808 8100 1800 R 62.8 0.1 2:15.44 mosquitto
3580 mosquitt 20 0 27364 7728 1604 R 62.8 0.1 1:48.19 mosquitto
3581 mosquitt 20 0 27552 7936 1604 R 62.1 0.1 1:48.50 mosquitto
3582 mosquitt 20 0 27824 8260 1604 S 60.8 0.1 1:47.63 mosquitto
Oprofile report:
CPU: Intel Haswell microarchitecture, speed 3500 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 6000
samples % linenr info image name symbol name
5786584 52.8127 loop.c:101 mosquitto mosquitto_main_loop
2749353 25.0926 subs.c:388 mosquitto sub__search
546670 4.9893 database.c:856 mosquitto db__message_write
267198 2.4386 subs.c:692 mosquitto retain__search.isra.2
...
:int mosquitto_main_loop(struct mosquitto_db *db, mosq_sock_t *listensock, int listensock_count, int listener_max)
:{
/* mosquitto_main_loop total: 5786584 52.8127 */
26163 0.2388 : HASH_ITER(hh_sock, db->contexts_by_sock, context, ctxt_tmp){
3211672 29.3121 : if(time_count > 0){
...
540500 4.9330 : context->pollfd_index = -1;
...
439382 4.0101 : if(context->events & EPOLLOUT) {
...
691792 6.3138 : if(context->current_out_packet || context->state == mosq_cs_connect_pending || context->ws_want_write){
From: jianhui zhan
Sent: Friday, December 29, 2017 9:34
To: General development discussions for the mosquitto project
Subject: Re: [mosquitto-dev] A non-centralize Mosquitto cluster design.
yes, 2000 PUB/SUBs testing is more of a functional testing then a stress testing, I will do some more testing to verify the performance.
From: mosquitto-dev-bounces@xxxxxxxxxxx <mosquitto-dev-bounces@xxxxxxxxxxx> on behalf of Tatsuzo Osawa <tatsuzo.osawa@xxxxxxxxx>
Sent: Friday, December 29, 2017 9:12
To: General development discussions for the mosquitto project
Subject: Re: [mosquitto-dev] A non-centralize Mosquitto cluster design.
Hi Jianhui,
Thank you for the further information, but I'm not sure the cluster can expand the performance.
The amount of '2000 PUB/SUBs' seems too small, it can be handled by using only one broker.
Could you simples the scenarios, and show the performance change in according with the number of brokers?
Regards,
Tatsuzo