Monitoring the DevWorkspace Operator

You can configure the OpenShift in-cluster monitoring stack to scrape metrics exposed by the DevWorkspace Operator.

Collecting DevWorkspace Operator metrics

To use the in-cluster Prometheus instance to collect, store, and query metrics about the DevWorkspace Operator:

Prerequisites
  • Your organization’s instance of Che is installed and running in Red Hat OpenShift.

  • An active oc session with administrative permissions to the destination OpenShift cluster. See Getting started with the CLI.

  • The devworkspace-controller-metrics Service is exposing metrics on port 8443. This is preconfigured by default.

Procedure
  1. Create the ServiceMonitor for detecting the Dev Workspace Operator metrics Service.

    Example 1. ServiceMonitor
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: devworkspace-controller
      namespace: eclipse-che (1)
    spec:
      endpoints:
        - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
          interval: 10s (2)
          port: metrics
          scheme: https
          tlsConfig:
            insecureSkipVerify: true
      namespaceSelector:
        matchNames:
          - openshift-operators
      selector:
        matchLabels:
          app.kubernetes.io/name: devworkspace-controller
    1 The Che namespace. The default is eclipse-che.
    2 The rate at which a target is scraped.
  2. Create a Role and RoleBinding to allow Prometheus to view the metrics.

    Example 2. Role
    kind: Role
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: prometheus-k8s
      namespace: openshift-operators
    rules:
      - verbs:
          - get
          - list
          - watch
        apiGroups:
          - ''
        resources:
          - services
          - endpoints
          - pods
    Example 3. RoleBinding
    kind: RoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: view-che-openshift-monitoring-prometheus-k8s
      namespace: openshift-operators
    subjects:
      - kind: ServiceAccount
        name: prometheus-k8s
        namespace: openshift-monitoring
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: prometheus-k8s
  3. Allow the in-cluster Prometheus instance to detect the ServiceMonitor in the Che namespace. The default Che namespace is eclipse-che.

    $ oc label namespace eclipse-che openshift.io/cluster-monitoring=true
Verification
  1. For a fresh installation of Che, generate metrics by creating a Che workspace from the Dashboard.

  2. In the Administrator view of the OpenShift web console, go to ObserveMetrics.

  3. Run a PromQL query to confirm that the metrics are available. For example, enter devworkspace_started_total and click Run queries.

    For more metrics, see DevWorkspace-specific metrics.

To troubleshoot missing metrics, view the Prometheus container logs for possible RBAC-related errors:

  1. Get the name of the Prometheus pod:

    $ oc get pods -l app.kubernetes.io/name=prometheus -n openshift-monitoring -o=jsonpath='{.items[*].metadata.name}'
  2. Print the last 20 lines of the Prometheus container logs from the Prometheus pod from the previous step:

    $ oc logs --tail=20 <prometheus_pod_name> -c prometheus -n openshift-monitoring

DevWorkspace-specific metrics

The following tables describe the DevWorkspace-specific metrics exposed by the devworkspace-controller-metrics Service.

Table 1. Metrics
Name Type Description Labels

devworkspace_started_total

Counter

Number of DevWorkspace starting events.

source, routingclass

devworkspace_started_success_total

Counter

Number of DevWorkspaces successfully entering the Running phase.

source, routingclass

devworkspace_fail_total

Counter

Number of failed DevWorkspaces.

source, reason

devworkspace_startup_time

Histogram

Total time taken to start a DevWorkspace, in seconds.

source, routingclass

Table 2. Labels
Name Description Values

source

The controller.devfile.io/devworkspace-source label of the DevWorkspace.

string

routingclass

The spec.routingclass of the DevWorkspace.

"basic|cluster|cluster-tls|web-terminal"

reason

The workspace startup failure reason.

"BadRequest|InfrastructureFailure|Unknown"

Table 3. Startup failure reasons
Name Description

BadRequest

Startup failure due to an invalid devfile used to create a DevWorkspace.

InfrastructureFailure

Startup failure due to the following errors: CreateContainerError, RunContainerError, FailedScheduling, FailedMount.

Unknown

Unknown failure reason.

Viewing DevWorkspace Operator metrics from an OpenShift web console dashboard

After configuring the in-cluster Prometheus instance to collect DevWorkspace Operator metrics, you can view the metrics on a custom dashboard in the Administrator perspective of the OpenShift web console.

Prerequisites
Procedure
  • Create a ConfigMap for the dashboard definition in the openshift-config-managed namespace and apply the necessary label.

    1. $ oc create configmap grafana-dashboard-dwo \
        --from-literal=dwo-dashboard.json="$(curl https://raw.githubusercontent.com/devfile/devworkspace-operator/main/docs/grafana/openshift-console-dashboard.json)" \
        -n openshift-config-managed
    2. $ oc label configmap grafana-dashboard-dwo console.openshift.io/dashboard=true -n openshift-config-managed
      The dashboard definition is based on Grafana 6.x dashboards. Not all Grafana 6.x dashboard features are supported in the OpenShift web console.
Verification steps
  1. In the Administrator view of the OpenShift web console, go to ObserveDashboards.

  2. Go to DashboardChe Server JVM and verify that the dashboard panels contain data.

Dashboard for the DevWorkspace Operator

The OpenShift web console custom dashboard is based on Grafana 6.x and displays the following metrics from the DevWorkspace Operator.

Not all features for Grafana 6.x dashboards are supported as an OpenShift web console dashboard.

DevWorkspace metrics

The DevWorkspace-specific metrics are displayed in the DevWorkspace Metrics panel.

Grafana dashboard panels that contain metrics related to `DevWorkspace startup
Figure 1. The DevWorkspace Metrics panel
Average workspace start time

The average workspace startup duration.

Workspace starts

The number of successful and failed workspace startups.

DevWorkspace successes and failures

A comparison between successful and failed DevWorkspace startups.

DevWorkspace failure rate

The ratio between the number of failed workspace startups and the number of total workspace startups.

DevWorkspace startup failure reasons

A pie chart that displays the distribution of workspace startup failures:

  • BadRequest

  • InfrastructureFailure

  • Unknown

Operator metrics

The Operator-specific metrics are displayed in the Operator Metrics panel.

Grafana dashboard panels that contain Operator metrics
Figure 2. The Operator Metrics panel
Webhooks in flight

A comparison between the number of different webhook requests.

Work queue depth

The number of reconcile requests that are in the work queue.

Memory

Memory usage for the DevWorkspace controller and the DevWorkspace webhook server.

Average reconcile counts per second (DWO)

The average per-second number of reconcile counts for the DevWorkspace controller.