Version: Next

Quota Preemption

Queues can be configured with a quota. The quota for a queue can be changed while the system is running. In the case that a quota is changed the new quota is applied immediately in the next scheduling cycle. Depending on the type of change there are different impacts.

For a queue that had its quota increased: no impact. The queue could not have used more than its old quota and the new quota is higher providing more resources to be allocated by the workloads running in the queue.

For a queue that had its quota decreased there are two cases.

the new, lowered, quota is larger than the current usage in the queue: no impact. Workloads will be allocated until the new quota is reached. All running workloads are unaffected.
the new, lowered, quota is smaller than the current usage in the queue: the queue is impacted. Any workloads that were pending in the queue will need to wait until resources become available. Workloads will keep on running until they are done.

This second case is what is targeted by quota preemption. Quota preemption provides the administrator the option to intervene in the running workload when lowering a quota. This document guide users to set up preemption delay for more details on the design, please refer design doc.

Global configuration

Quota preemption is available in YuniKorn 1.8 or later and turned off by default.

To turn on quota preemption it must be turned on globally at the partition level first in the YuniKorn config:

partitions:
  - name: <name of the partition>
    preemption:
      quotapreemptionenabled: <boolean value>

The default value for quotapreemptionenabled is false. Allowed values: true or false, any other value will cause a parse error.

When quota preemption is turned on at the partition level quota changes could trigger a preemption when a queue quota is changed.

Queue configuration

With the global configuration is turned on each queue must be configured to opt in to quota preemption. A queue can opt in by setting the quota.preemption.delay property on the queue.

queues:
  - name: default
    properties:
      quota.preemption.delay: <delay string>

The delay when not specified defaults to 0. A delay value explicitly set to 0 will prevent the quota change of the queue from triggering preemption. Any non-zero value for the delay will be added to the time the change of the quota was applied to the queue. That timestamp defines the trigger point for quota preemption. Quota preemption will only be triggered if the queue at the point in time of the change is above the new quota. The standard scheduling quota enforcement will immediately enforce the new quota in all other cases and no further preemption actions are needed.

The scheduler will not trigger quota preemption until the delay has passed. If at that point in time the queue usage has dropped below the quota set, no actions will be taken. The quota preemption tracking information will be cleaned up in that case.

To prevent multiple quota changes from impacting each other quota preemption works top down in the queue hierarchy. If a change of a quota has triggered preemption on a queue none of the children of that queue will be able to trigger quota preemption. This prevents complex victim selection interactions if multiple changes are made.

Victims for quota preemption can come from any queue below the queue that triggered the quota preemption. Quota preemption follows the same rules for victim selection as normal preemption. It cannot cause a queue to go below its guaranteed, allocations are sorted based on priority and can opt out. See the description in the preemption documentation for details.

An example configuration turning on quota preemption and setting a delay of 15 minutes on the "prod" queue:

partitions:
  - name: default
    preemption:
      quotapreemptionenabled: true
    queues:
      - name: root
        queues:
          - name: prod
            parent: false
            resources:
              max:
                {memory: 10T, vcore: 1000}
            properties:
              quota.preemption.delay: 15m

Dynamic Queues

Dynamic queues do not support quota preemption.

Inheritance

The current configuration does not support inheritance of the quota.preemption.delay value. YUNIKORN-3208 has been logged to support that functionality.

Recommendations

Quota preemption should be used with care. Using short delays is not recommended. Although no minimum delay is enforced any delay below a minute (60 seconds) should not be used.

If a queue mainly runs service type workloads up and down scaling of deployments should be considered when changing quotas. Workloads will not exit automatically and will be restarted if preempted. Using quota preemption could cause the service to be left in a degraded state. The controller will also try to recreate the workload.

When running batch workloads, the delay should be based on the runtime of the workloads. Preempting workloads should be a last resort. A workload that finishes automatically lowers the queue usage and will not require to be re-run. Preempted workloads have already used resources and will be more expensive overall.

tip

For consistency: until inheritance is provided setting a delay on a parent queue should not be set unless all children below it are also updated with the same delay.

Global configuration​

Queue configuration​

Recommendations​

Global configuration

Queue configuration

Recommendations