Re: [hono-dev] Metrics for message processing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [hono-dev] Metrics for message processing

From: "Hudalla Kai (INST/ECS4)" <kai.hudalla@xxxxxxxxxxxx>
Date: Wed, 22 Aug 2018 13:27:01 +0000
Accept-language: en-US, de-DE
Delivered-to: hono-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/hono-dev>
List-help: <mailto:hono-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/hono-dev>, <mailto:hono-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/hono-dev>, <mailto:hono-dev-request@eclipse.org?subject=unsubscribe>
Thread-index: AQHUOJfyyTx+9sb6GEa/3gqyAothxaTKPZwAgAELdICAAFvVAA==
Thread-topic: [hono-dev] Metrics for message processing

On Wed, 2018-08-22 at 09:58 +0200, Jens Reimann wrote:
> 
> 
> On Tue, Aug 21, 2018 at 6:01 PM, Marc Pellmann <pellmann@xxxxxxxxx> wrote:
> > Hi Kai,
> > 
> > adapting the messaging metrics to our new setup, focused on adapters and
> > modifying them according to feedback from ops colleagues is a good thing!
> > 
> > We should also remove the meter/counter etc. in the name of the metric. This
> > was there to allow Spring Boot metrics without a dependency to a specific
> > library. It seems that Spring boot has given up this approach with micrometer
> > and it doesn't make that much sense either.
> > 
> 
> I would really appreciate dropping the meter/counter prefix.
>  
> > According to [1], the naming of metrics and tags are a good match. So to sum
> > it up we have
> > 
> > hono.messages.processed (received from device and successfully forwarded)
> > hono.messages.unprocessable (received from device but something is wrong with
> > the data)
> > hono.messages.undeliverable (successfully received and processed from device
> > but could not be forwarded)
> > hono.messages.capacity (link credits between adapter and AMQP network)
> > 
> > with tags for host, tenant, type, protocol
> > 
> > [1] https://micrometer.io/docs/concepts#_naming_meters
> > 

If we also want to account for the adapters' ability to interact with other
services then we end up with

hono.messages.processed (received from device and successfully forwarded)
hono.messages.unprocessable (received from device but something is wrong with
the data)
hono.messages.undeliverable (successfully received and processed from device
but could not be forwarded)
hono.messages.capacity (link credits between adapter and AMQP network)
hono.tenant.capacity (link credits between adapter and Tenant service)
hono.credentials.capacity (link credits between adapter and Credentials service)
hono.registration.capacity (link credits between adapter and Device Registration
service)

with tags for host, tenant, type, protocol

I also want to point out that we are discussing the names of the metrics we use
with Micrometer.

Can we agree on these metrics?

> > Marc
> > 
> > 
> > On Mon, Aug 20, 2018 at 5:10 PM Hudalla Kai (INST/ECS4) <kai.hudalla@bosch-si
> > .com> wrote:
> > > Hi list,
> > > 
> > > I am currently thinking about the metrics that we maintain for the messages
> > > we
> > > process. In the original design we had Hono Messaging as the central
> > > component
> > > that all protocol adapters had been connected to and which all downstream
> > > messages had to be sent to from the adapters. It therefore felt like the
> > > right
> > > place to implement the messaging metrics in and e.g. count the number of
> > > messages
> > > that have been forwarded successfully vs. the messages that had to be
> > > discarded
> > > due to a lack of credit.
> > > 
> > > With the deprecation of Hono Messaging, we are now maintaining the metrics
> > > in the
> > > protocol adapters directly. IMHO this is a good opportunity to think a
> > > little
> > > about the metrics we are maintaining as well.
> > > 
> > > Currently, we are record metrics for "processed", "discarded" and
> > > "undeliverable"
> > > messages. However, we have never clearly defined these terms. That was
> > > probably
> > > because the only place where it was relevant was Hono Messaging and the way
> > > it
> > > was implemented there served as the "definition".
> > > 
> > > As such, we currently use something like the following in Hono Messaging:
> > > 
> > > "processed": message from device complies with all requirements and has
> > > been
> > > successfully forwarded to the downstream consumer
> > > 
> > > "discarded": message has been sent pre-settled (by the adapter) and there
> > > is no
> > > credit available for the message to be forwarded. The message is then
> > > silently
> > > discarded, i.e. the sender is not informed about the failure to deliver the
> > > message. The device cannot distinguish this case from the "processed" case.
> > > 
> > > "undeliverable": message has been sent unsettled (by the adapter) and there
> > > is no
> > > credit available for the message to be forwarded. The message is then
> > > released
> > > and the adapter will signal the failure to deliver to the device (if the
> > > transport protocol allows to do so). The device may or may not be able to
> > > distinguish this case from the "processed" case.
> > > 
> > > The first metric clearly is of interest in order to see the current
> > > throughput of
> > > the system. The other two metrics, however, are harder to understand in the
> > > context of a particular protocol adapter because they require to understand
> > > how
> > > the transport protocol is mapped to AMQP 1.0. For example, a telemetry
> > > message
> > > that is published for a tenant using QoS 0 and for which no consumer is
> > > connected, will end up in the "discarded" metric whereas the same message
> > > published using QoS 1 would end up in the "undeliverable" metric, despite
> > > the
> > > fact that the reason for the failure to deliver is the same in both cases:
> > > no
> > > credit. 
> > > 
> > > After some discussion about this with our operations team, it became clear,
> > > that
> > > from their perspective it is actually more interesting to get an indication
> > > of
> > > the reason for a problem in the metric itself. In particular, it is of
> > > interest
> > > to distinguish between cases where messages cannot be processed due to
> > > errors
> > > caused by the device, e.g. malformed headers, versus errors where a message
> > > cannot be processed due to problems in the back end infrastructure, e.g. a
> > > service not being available or the aforementioned lack of credit. In the
> > > former
> > > case we need to advise device developers how to fix the problem, in the
> > > latter
> > > case the ops team needs to get going themselves.
> > > 
> > > In addition to this coarse distinction, it is still helpful to know the
> > > ratio of
> > > credit used vs. credit available because this may serve as an indicator for
> > > scaling the infrastructure up or down.
> > > 
> > > I would therefore like to introduce additional (adapter specific) metrics
> > > that
> > > are better suited to cover these requirements. These metrics should be
> > > tagged
> > > with the protocol, host, tenant and message type (e.g. telemetry, event
> > > ...) if
> > > possible, e.g. a message might be unprocessable because it lacks tenant
> > > information. In such a case the problem could be recorded using the
> > > "UNKNOWN"
> > > tenant ...
> > > 
> > > "meter.hono.messages.processed" - message has been successfully processed.
> > > 
> > > "meter.hono.messages.unprocessable" - message cannot be processed because
> > > the
> > > message does not contain all required information, e.g. malformed topic
> > > name,
> > > missing header, not authorized etc. This metric is used by an adapter to
> > > record a
> > > message that it either discards silently or rejects (signaling the problem
> > > to the
> > > device). In no case will the message being processed.
> > > 
> > > "meter.hono.messages.undeliverable" -  message cannot be processed because
> > > of a
> > > problem not caused by the sender of the message (the device), e.g. Tenant
> > > service
> > > is not available, no credit available, etc. This metric is used by an
> > > adapter
> > > regardless of whether the transport protocol allows for signaling back the
> > > problem to the device or not. For instance, an MQTT message published using
> > > QoS 0
> > > doesn't allow to signal back the failure whereas HTTP allows to send back a
> > > status code in the HTTP response. In no case will the message being
> > > processed.
> > > 
> > > "counter|meter.hono.messages.capacity" - the number of credits remaining
> > > for
> > > sending messages. TODO determine if a counter or a meter is more reasonable
> > > to
> > > use.
> > > 
> > > 
> > > We could then deprecate the existing protocol adapter specific metric(s)
> > > and
> > > eventually remove them together with Hono Messaging.
> > > 
> > > 
> > > WDYT?
> > > 
> > > -- 
> > > Mit freundlichen Grüßen / Best regards
> > > 
> > > Kai Hudalla
> > > Chief Software Architect
> > > 
> > > Bosch Software Innovations GmbH
> > > Ullsteinstr. 128
> > > 12109 Berlin
> > > GERMANY
> > > www.bosch-si.com
> > > 
> > > Registered Office: Berlin, Registration Court: Amtsgericht Charlottenburg;
> > > HRB
> > > 148411 B
> > > Chairman of the Supervisory Board: Dr.-Ing. Thorsten Lücke; Managing
> > > Directors:
> > > Dr. Stefan Ferber, Michael Hahn
> > > _______________________________________________
> > > hono-dev mailing list
> > > hono-dev@xxxxxxxxxxx
> > > To change your delivery options, retrieve your password, or unsubscribe
> > > from this list, visit
> > > https://dev.eclipse.org/mailman/listinfo/hono-dev
> > 
> > _______________________________________________
> > hono-dev mailing list
> > hono-dev@xxxxxxxxxxx
> > To change your delivery options, retrieve your password, or unsubscribe from
> > this list, visit
> > https://dev.eclipse.org/mailman/listinfo/hono-dev
> > 
> 
> 
> 
> _______________________________________________
> hono-dev mailing list
> hono-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from
> this list, visit
> https://dev.eclipse.org/mailman/listinfo/hono-dev

Follow-Ups:
- Re: [hono-dev] Metrics for message processing
  - From: Frank Karsten (INST/ECS4)

References:
- [hono-dev] Metrics for message processing
  - From: Hudalla Kai (INST/ECS4)
- Re: [hono-dev] Metrics for message processing
  - From: Marc Pellmann
- Re: [hono-dev] Metrics for message processing
  - From: Jens Reimann

Prev by Date: Re: [hono-dev] Metrics for message processing
Next by Date: Re: [hono-dev] Metrics for message processing
Previous by thread: Re: [hono-dev] Metrics for message processing
Next by thread: Re: [hono-dev] Metrics for message processing
Index(es):
- Date
- Thread

Breadcrumbs