YuniKorn leverages Prometheus to record metrics. The metrics system keeps tracking of scheduler's critical execution paths, to reveal potential performance bottlenecks. Currently, there are two categories for these metrics:
- scheduler: generic metrics of the scheduler, such as allocation latency, num of apps etc.
- queue: each queue has its own metrics sub-system, tracking queue status.
all metrics are declared in
YuniKorn metrics are collected through Prometheus client library, and exposed via scheduler restful service. Once started, they can be accessed via endpoint http://localhost:9080/ws/v1/metrics.
Aggregate Metrics to Prometheus
It's simple to setup a Prometheus server to grab YuniKorn metrics periodically. Follow these steps:
Setup Prometheus (read more from Prometheus docs)
Configure Prometheus rules: a sample configuration
- job_name: 'yunikorn'
- targets: ['docker.for.mac.host.internal:9080']
- start Prometheus
docker pull prom/prometheus:latest
docker run -p 9090:9090 -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
docker.for.mac.host.internal instead of
localhost if you are running Prometheus in a local docker container
on Mac OS. Once started, open Prometheus web UI: http://localhost:9090/graph. You'll see all available metrics from