Monitoring the Dev Workspace Operator
You can configure an example monitoring stack to process metrics exposed by the Dev Workspace Operator.
Collecting Dev Workspace Operator metrics with Prometheus
To use Prometheus to collect, store, and query metrics about the Dev Workspace Operator:
-
The
devworkspace-controller-metrics
Service is exposing metrics on port8443
. This is preconfigured by default. -
The
devworkspace-webhookserver
Service is exposing metrics on port9443
. This is preconfigured by default. -
Prometheus 2.26.0 or later is running. The Prometheus console is running on port
9090
with a corresponding Service. See First steps with Prometheus.
-
Create a ClusterRoleBinding to bind the ServiceAccount associated with Prometheus to the devworkspace-controller-metrics-reader ClusterRole. For the example monitoring stack, the name of the ServiceAccount to be used is
prometheus
.Without the ClusterRoleBinding, you cannot access Dev Workspace metrics because access is protected with role-based access control (RBAC). Example 1. ClusterRoleBindingapiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: devworkspace-controller-metrics-binding subjects: - kind: ServiceAccount name: prometheus namespace: monitoring roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: devworkspace-controller-metrics-reader
-
Configure Prometheus to scrape metrics from port
8443
exposed by thedevworkspace-controller-metrics
Service and from port9443
exposed by thedevworkspace-webhookserver
Service.The example monitoring stack already creates the prometheus-config
ConfigMap with an empty configuration. To provide the Prometheus configuration details, edit thedata
field of the ConfigMap.Example 2. Prometheus configurationapiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: monitoring data: prometheus.yml: |- global: scrape_interval: 5s (1) evaluation_interval: 5s (2) scrape_configs: (3) - job_name: 'DevWorkspace' scheme: https authorization: type: Bearer credentials_file: '/var/run/secrets/kubernetes.io/serviceaccount/token' tls_config: insecure_skip_verify: true static_configs: - targets: ['devworkspace-controller-metrics.<DWO_namespace>:8443'] (4) - job_name: 'DevWorkspace webhooks' scheme: https authorization: type: Bearer credentials_file: '/var/run/secrets/kubernetes.io/serviceaccount/token' tls_config: insecure_skip_verify: true static_configs: - targets: ['devworkspace-webhookserver.<DWO_namespace>:9443'] (5)
1 The rate at which a target is scraped. 2 The rate at which the recording and alerting rules are re-checked. 3 The resources that Prometheus monitors. In the default configuration, two jobs, DevWorkspace
andDevWorkspace webhooks
, scrape the time series data exposed by thedevworkspace-controller-metrics
anddevworkspace-webhookserver
Services.4 The scrape target for the metrics from port 8443
. Replace<DWO_namespace>
with the namespace where thedevworkspace-controller-metrics
Service
is located.5 The scrape target for the metrics from port 9443
. Replace<DWO_namespace>
with the namespace where thedevworkspace-webhookserver
Service
is located. -
Scale the
Prometheus
Deployment down and up to read the updated ConfigMap from the previous step.$ kubectl scale --replicas=0 deployment/prometheus -n monitoring && kubectl scale --replicas=1 deployment/prometheus -n monitoring
-
Use port forwarding to access the
Prometheus
Service locally:$ kubectl port-forward svc/prometheus 9090:9090 -n monitoring
-
Verify that all targets are up by viewing the targets endpoint at
localhost:9090/targets
. -
Use the Prometheus console to view and query metrics:
-
View metrics at
localhost:9090/metrics
. -
Query metrics from
localhost:9090/graph
.For more information, see Using the expression browser.
-
Dev Workspace-specific metrics
The following tables describe the Dev Workspace-specific metrics exposed by the devworkspace-controller-metrics
Service.
Name | Type | Description | Labels |
---|---|---|---|
|
Counter |
Number of Dev Workspace starting events. |
|
|
Counter |
Number of Dev Workspaces successfully entering the |
|
|
Counter |
Number of failed Dev Workspaces. |
|
|
Histogram |
Total time taken to start a Dev Workspace, in seconds. |
|
Name | Description | Values |
---|---|---|
|
The |
|
|
The |
|
|
The workspace startup failure reason. |
|
Name | Description |
---|---|
|
Startup failure due to an invalid devfile used to create a Dev Workspace. |
|
Startup failure due to the following errors: |
|
Unknown failure reason. |
Viewing Dev Workspace Operator metrics on Grafana dashboards
To view the Dev Workspace Operator metrics on Grafana with the example dashboard:
-
Prometheus is collecting metrics. See Collecting Dev Workspace Operator metrics with Prometheus.
-
Grafana version 7.5.3 or later.
-
Grafana is running on port
3000
with a corresponding Service. See Installing Grafana.
-
Add the data source for the Prometheus instance. See Creating a Prometheus data source.
-
Import the example
grafana-dashboard.json
dashboard.
-
Use the Grafana console to view the Dev Workspace Operator metrics dashboard. See Grafana dashboard for the Dev Workspace Operator.
Grafana dashboard for the Dev Workspace Operator
The example Grafana dashboard based on grafana-dashboard.json
displays the following metrics from the Dev Workspace Operator.
The Dev Workspace-specific metrics panel

- Average workspace start time
-
The average workspace startup duration.
- Workspace starts
-
The number of successful and failed workspace startups.
- Workspace startup duration
-
A heatmap that displays workspace startup duration.
- Dev Workspace successes / failures
-
A comparison between successful and failed Dev Workspace startups.
- Dev Workspace failure rate
-
The ratio between the number of failed workspace startups and the number of total workspace startups.
- Dev Workspace startup failure reasons
-
A pie chart that displays the distribution of workspace startup failures:
-
BadRequest
-
InfrastructureFailure
-
Unknown
-
The Operator metrics panel (part 1)

- Webhooks in flight
-
A comparison between the number of different webhook requests.
- Work queue duration
-
A heatmap that displays how long the reconcile requests stay in the work queue before they are handled.
- Webhooks latency (/mutate)
-
A heatmap that displays the
/mutate
webhook latency. - Reconcile time
-
A heatmap that displays the reconcile duration.
The Operator metrics panel (part 2)

- Webhooks latency (/convert)
-
A heatmap that displays the
/convert
webhook latency. - Work queue depth
-
The number of reconcile requests that are in the work queue.
- Memory
-
Memory usage for the Dev Workspace controller and the Dev Workspace webhook server.
- Reconcile counts (DWO)
-
The average per-second number of reconcile counts for the Dev Workspace controller.