Re: [hono-dev] [Hono-Scalability]: Hono-JMeter plugin testing challenges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [hono-dev] [Hono-Scalability]: Hono-JMeter plugin testing challenges

From: "Pellmann Marc (INST/ECS4)" <Marc.Pellmann@xxxxxxxxxxxx>
Date: Wed, 6 Sep 2017 09:45:43 +0000
Accept-language: de-DE, en-US
Delivered-to: hono-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/hono-dev>
List-help: <mailto:hono-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/hono-dev>, <mailto:hono-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/hono-dev>, <mailto:hono-dev-request@eclipse.org?subject=unsubscribe>
Thread-index: AQHTJsrlur4onpjx30qpGYphJuVRGKKniWtg
Thread-topic: [hono-dev] [Hono-Scalability]: Hono-JMeter plugin testing challenges

Hi Siva,

I just tested the standard testplan with our default docker swarm setup on my development laptop. Interestingly I also lost a lot of messages initially. It is much better when I change the prefetch of the receiver to e.g. 2000. This value influences the credits, that are given from the consumer to the dispatch router initially.

With this values I got 68.567 sended and 67.423 received messages (throughput of 6.812/sec) on my laptop with docker swarm.

All messages are shown as processed telemetry in Grafana, so they have been processed by Hono Messaging. It seems, that they have been discarded by the Dispatch Router. So my next step would be to change the config of the Dispatch Router. Remember it is okay for telemetry to be discarded, if there is too much load – but on the other side with the credit system we want to wait for the system to be able to process it – the JMeter plugin does not send if there are no credits available and in this test setup there are no other senders or receivers.

My answers:

Ø PFA grafana screenshot, discarded messages widget not showing any dropped message count. Where can I find same?

There is a debug logging in Hono Messaging with "no downstream credit available for link [{}], discarding message [{}]".

Ø Is there any empirical formula that one can come up with for the receiver? Something like receiver life time = no. of senders * test execution time + lag

As explained, the sender only sends if credits are available (which are given by the receiver) – so the lag should be only the time of message processing from sender à hono messaging à dispatch router à receiver. This might be depended of the setup.

Ø [Siva]: If there is a provision for the user to control “how many messages to send in a batch” (alias MAX_MESSAGES_PER_BATCH_SEND) at what rate/frequency from the sender sampler in JMeter, then user can expect total no of messages that a sender will send in the test execution time. AFAIK, batch sending frequency is not controlled in the present plugin? HTH.

The idea is, that the sender only send messages when there are credits. If there are no credits (and the local send queue is full) then it will not send all of this 300 messages - it will wait for credits. So the frequency depends on the available credits from the consumers.

Ø [Siva]: Okay. Here’s my understanding, correct me if I’m wrong – essentially hono-adapter will have 250 credits irrespective of no. of senders/devices.

So, to enhance device throughput capability – user will have the following options with the current state of hono.

a) user have to run multiple instances of Adapter and point batch of devices to different adapters

b) increase the credit count.

The 250 are only the default initial credits from the Dispatch Router (could be configured). Afterward it will be regulated by the credits, the consumer gives to the Dispatch Router. From the QPID Mailing list (http://mail-archives.apache.org/mod_mbox/qpid-users/201707.mbox/browser):

The router provides 250

(linkCapacity) credits to a newly attached sender and that sender can then

immediately transfer 250 deliveries. The rate at which the credit is

replenished will match the rate at which the slower consumer settles

deliveries. The link capacity only affects the number of outstanding

deliveries per sender that can be buffered in the router network.

Ø [Siva]: I’m using the default configuration for the dispatch router. I’m not sure what you mean by “fanout scenario - dispatch router to be the bottle neck if you have more consumers” and “telemetry scenario - single receiver might always be the bottle neck”. Please explain.

The dispatch router could be configured to multicast a message to all consumers for a given message on an address (e.g. telemetry/DEFAULT_TENANT) or it could balance a message to one of the consumers – so the solution could scale horizontally.

If there is only one instance of the dispatch router and maybe 100 consumers, the dispatch router has the work to deliver each one message to 100 consumers. So much work on the dispatch router and most of the consumer will idle if it is configured with multicast.

On the other side if there is only one consumer and many senders, the consumer need to be able to process all this messages and will slow down the overall sending (with the backpressure mechanism of giving new credits slowly).

Mit freundlichen Grüßen / Best regards

Marc Pellmann
INST/ECS4

From: hono-dev-bounces@xxxxxxxxxxx [mailto:hono-dev-bounces@xxxxxxxxxxx] On Behalf Of Katru, Siva Prasad
Sent: Mittwoch, 6. September 2017 06:44
To: hono developer discussions <hono-dev@xxxxxxxxxxx>
Subject: Re: [hono-dev] [Hono-Scalability]: Hono-JMeter plugin testing challenges

Hi Marc; Thanks for the quick response. Please see my comments inline.

@All; While on this thread, are there any scalability studies performed for Mosquitto server? Please share.

regards

Siva

From: hono-dev-bounces@xxxxxxxxxxx [mailto:hono-dev-bounces@xxxxxxxxxxx] On Behalf Of Pellmann Marc (INST/ECS4)
Sent: Tuesday, September 05, 2017 8:38 PM
To: hono developer discussions
Subject: Re: [hono-dev] [Hono-Scalability]: Hono-Jmeter plugin testing challenges

Hi Siva,

1 and 2)

The test should run and the receiver should receive all sender messages (at least if you give him more time). This should be true, if you selected the “Wait for credits” in the sender (as it is in the default) test case? If they would be discarded by Hono it would be visible in Grafana. Are there errors in the JMeter logs?

[Siva]: I’ve allocated double the amount of time for receiver compared to the sender as you can observe from the tests (row 4,5 in table below mail chain). During the time; when the sender is finished the job and only receiver thread is alive, observed the following log messges – meaning receiver is not busy and still unable to capture the telemetry. “Wait for credits” in the sender is enabled in all the tests. There were no errors in Jmeter log.

Jmeter log:

INFO o.e.h.j.c.HonoReceiver: Receiver Thread Group 1-1: received batch of 0 messages in 1037 milliseconds

INFO o.e.h.j.c.HonoReceiver: Receiver Thread Group 1-1: received batch of 0 messages in 987 milliseconds

………..

INFO o.e.h.j.c.HonoReceiver: Receiver Thread Group 1-1: received batch of 0 messages in 1017 milliseconds

INFO o.e.h.j.c.HonoReceiver: Receiver Thread Group 1-1: received batch of 0 messages in 1014 milliseconds

Is there any empirical formula that one can come up with for the receiver? Something like receiver life time = no. of senders * test execution time + lag

PFA grafana screenshot, discarded messages widget not showing any dropped message count. Where can I find same?

I do not really understand what you mean here? What do want to archive?

[Siva]: If there is a provision for the user to control “how many messages to send in a batch” (alias MAX_MESSAGES_PER_BATCH_SEND) at what rate/frequency from the sender sampler in JMeter, then user can expect total no of messages that a sender will send in the test execution time. AFAIK, batch sending frequency is not controlled in the present plugin? HTH.

From your response from 4), I will create an issue for this provision.

The idea to send (max) 300 messages in one JMeter sample is because with sending only one message, there is a lot of time in the sample management of JMeter. I agree that it would be nice to configure this (maybe you could create an issue?) But this 300 has nothing to do with the initial 250 link capacity. This 250 credits are given initially from the dispatch router intermediary. The following capacity exchange is done with own frames after they are under a certain limit. But the message flow and credit handling is complex and I would need to dive deeper in it to be sure.

[Siva]: I understand.

The handling with number of devices from the test is not necessarily exactly the same if you compare it with an adapter. The test opens a connection and a link for each thread (=sender, =device). At e.g. our adapters all devices would be handled over the same connection. But I do not think this will end in very different numbers, for the moment.

[Siva]: Okay. Here’s my understanding, correct me if I’m wrong – essentially hono-adapter will have 250 credits irrespective of no. of senders/devices.

So, to enhance device throughput capability – user will have the following options with the current state of hono.

a) user have to run multiple instances of Adapter and point batch of devices to different adapters

b) increase the credit count.

In our default configuration the dispatch router is configured to multiplex/fanout to all consumers. In this configuration all consumers need to get the message – so I would expect the the dispatch router to be the bottle neck if you have more consumers.

For the telemetry test it might be also a good idea to reconfigure the router to balance to the receivers (and use more of them) – if not, the single receiver might always be the bottle neck.

[Siva]: I’m using the default configuration for the dispatch router. I’m not sure what you mean by “fanout scenario - dispatch router to be the bottle neck if you have more consumers” and “telemetry scenario - single receiver might always be the bottle neck”. Please explain.

Further test cases below;

Mit freundlichen Grüßen / Best regards

Marc Pellmann

Bosch Software Innovations GmbH

INST/ECS4

Schöneberger Ufer 89-91

10785 Berlin

GERMANY

marc.pellmann@xxxxxxxxxxxx

Registered office: Berlin, Register court: Amtsgericht Charlottenburg, HRB 148411 B

Executives: Dr. Ing. Rainer Kallenbach, Michael Hahn

From: hono-dev-bounces@xxxxxxxxxxx [mailto:hono-dev-bounces@xxxxxxxxxxx] On Behalf Of Katru, Siva Prasad
Sent: Dienstag, 5. September 2017 15:52
To: hono developer discussions <hono-dev@xxxxxxxxxxx>
Subject: [hono-dev] [Hono-Scalability]: Hono-Jmeter plugin testing challenges

Hi; I’m trying to perform hono scalability studies using hono-jmeter plugin.

I would like to simulate the following situations for better understanding of the Hono scalability

· Latency test: Generate a constant throughput and measure the latency, CPU and RAM usage

· Telemetry test: Generate a constant throughput with multiple hono senders and a single receiver - measure the CPU & RAM usage

· Fan-out test: Multiple hono consumers are consuming incoming telemetry from a single sender – measure the CPU & RAM usage

I have started exploring the provided hono-jmeter testcase i.e., hono_jmeter_runtime.jmx. I’m able to execute the testcase successfully with 1 hono sender and 1 hono receiver by configuring with respective ports and certificates. I understand that sender is waiting till the receiver is available and sending a batch of 300 messages (hardcoded in the source code) at one go and the message sending continues till the test execution time (honoTestRuntime) is over.

In an attempt to create the constant throughput for longer durations (minimum 10min), I have carried out the below tests with different input parameters.

cid:image001.jpg@01D32645.00ED0A90

In this regards, I have the following questions

1. How to create constant throughput? In principle, it should be proportional to the no. of hono devices (when talking about sender throughput). But, I have not observed the same.

2. Although receiver is started before the sender, why is receiver unable to receive the messages (even with increased receiver lag time)? How to capture these metrics? Neither the Jmeter Summary report nor the Grafana UI displaying the missing messages.

3. It would be nice if there is a provision to define the periodicity for the messages to be sent.

4. I observed that there’s a credit of 250 messages per telemetry link. Whereas the default sender is sending batch of 300 messages, to avoid the confusion I tweaked the MAX_MESSAGES_PER_BATCH_SEND value from 300 to 250. Also, I assume this limit of 250 credits will limit the honoDevice sending rate to be below 250 messages/sec i.e., max throughput per telemetry link is 250. Is this number 250 defined based on design?

Appreciate your inputs on this issue.

Best

Siva

Registered Office: 130 Pandurang Budhkar Marg, Worli, Mumbai – 400018; Corporate Identity number: L28920MH1957PLC010839; Tel.: +91 (22) 3967 7000; Fax: +91 22 3967 7500;
Contact / Email: www.siemens.co.in/contact; Website: www.siemens.co.in. Sales Offices: Ahmedabad, Bengaluru, Bhopal, Bhubaneswar, Chandigarh, Chennai, Coimbatore, Gurgaon, Hyderabad, Jaipur, Jamshedpur, Kharghar, Kolkata, Lucknow, Kochi, Mumbai, Nagpur, Navi Mumbai, New Delhi,

References:
- [hono-dev] [Hono-Scalability]: Hono-Jmeter plugin testing challenges
  - From: Katru, Siva Prasad
- Re: [hono-dev] [Hono-Scalability]: Hono-Jmeter plugin testing challenges
  - From: Pellmann Marc (INST/ECS4)
- Re: [hono-dev] [Hono-Scalability]: Hono-JMeter plugin testing challenges
  - From: Katru, Siva Prasad

Prev by Date: Re: [hono-dev] [Hono-Scalability]: Hono-JMeter plugin testing challenges
Next by Date: Re: [hono-dev] REST API Interactive documentation
Previous by thread: Re: [hono-dev] [Hono-Scalability]: Hono-JMeter plugin testing challenges
Next by thread: [hono-dev] [Hudson] Build failed in Hudson: Hono-CI #583
Index(es):
- Date
- Thread

Breadcrumbs