Support K8s Predicates
Design
Predicates are a set of pre-registered functions in K8s, the scheduler invokes these functions to check if a pod is eligible to be allocated onto a node. Common predicates are: node-selector, pod affinity/anti-affinity etc. To support these predicates in YuniKorn, we don't intend to re-implement everything on our own, but to re-use the core predicates code as much as possible.
YuniKorn-core is agnostic about underneath RMs, so the predicates functions are implemented in K8s-shim as a SchedulerPlugin
.
SchedulerPlugin is a way to plug/extend scheduler capabilities. Shim can implement such plugin and register itself to
yunikorn-core, so plugged function can be invoked in the scheduler core. Find all supported plugins in
types.
Workflow
First, RM needs to register itself to yunikorn-core, it advertises what scheduler plugin interfaces are supported.
E.g a RM could implement PredicatePlugin
interface and register itself to yunikorn-core. Then yunikorn-core will
call PredicatePlugin API to run predicates before making allocation decisions.
Following workflow demonstrates how allocation looks like when predicates are involved.
pending pods: A, B
shim sends requests to core, including A, B
core starts to schedule A, B
partition -> queue -> app -> request
schedule A (1)
run predicates (3)
generate predicates metadata (4)
run predicate functions one by one with the metadata
success
proposal: A->N
schedule B (2)
run predicates (calling shim API)
generate predicates metadata
run predicate functions one by one with the metadata
success
proposal: B->N
commit the allocation proposal for A and notify k8s-shim
commit the allocation proposal for B and notify k8s-shim
shim binds pod A to N
shim binds pod B to N
(1) and (2) are running in parallel.
(3) yunikorn-core calls a schedulerPlugin
API to run predicates, this API is implemented on k8s-shim side.
(4) K8s-shim generates metadata based on current scheduler cache, the metadata includes some intermittent states about nodes and pods.
Predicates White-list
Intentionally, we only support a white-list of predicates. Majorly due to 2 reasons,
- Predicate functions are time-consuming, it has negative impact on scheduler performance. To support predicates that are only necessary can minimize the impact. This will be configurable via CLI options;
- The implementation depends heavily on K8s default scheduler code, though we reused some unit tests, the coverage is still a problem. We'll continue to improve the coverage when adding new predicates.
the white-list currently is defined in DefaultSchedulerPolicy.