Skip to content

converged-computing/fluxqueue

Repository files navigation

Fluxqueue

Under development!

img/fluxqueue.png

I'm still thinking over improvements to fluxnetes, fluence, and related projects, and this is the direction I'm currently taking. I've been thinking of a design where Flux works as a controller, as follows:

  1. The controller has an admission webhook that intercepts jobs and pods being submit. For jobs, they are suspended. For all other abstractions, scheduling gates are used.
  2. The jobs are wrapped as FluxJob and parsed into Flux Job specs and passed to a part of the controller, the Flux Queue.
  3. The Flux Queue, which runs in a loop, moves through the queue and interacts with a Fluxion service to schedule work.
  4. When a job is scheduled, it is unsuspended and/or targeted for the fluxqueue custom scheduler plugin that will assign exactly to the nodes it has been intended for.
  5. We will need an equivalent cleanup process to receive when pods are done, and tell fluxion and update the queue. Likely those will be done in the same operation.

This project comes out of fluxqueue, which was similar in design, but did the implementation entirely inside of Kubernetes. fluxqueue was a combination of Kubernetes and Fluence, both of which use the HPC-grade pod scheduling Fluxion scheduler to schedule pod groups to nodes. For our queue, we use river backed by a Postgres database. The database is deployed alongside fluence and could be customized to use an operator instead.

Important This is an experiment, and is under development. I will change this design a million times - it's how I tend to learn and work. I'll share updates when there is something to share. It deploys but does not work yet! See the docs for some detail on design choices.

Design

Fluxqueue builds three primary containers:

  • ghcr.io/converged-computing/fluxqueue: contains the webhook and operator with a flux queue for pods and groups that interacts with fluxion
  • ghcr.io/converged-computing/fluxqueue-scheduler: (TBA) will provide a simple scheduler plugin
  • ghcr.io/converged-computing/fluxqueue-postgres: holds the worker queue and provisional queue tables

And we use ghcr.io/converged-computing/fluxion for the fluxion service.

Deploy

Create a kind cluster. You need more than a control plane.

kind create cluster --config ./examples/kind-config.yaml

Install the certificate manager:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.1/cert-manager.yaml

Then you can deploy as follows:

./hack/quick-build-kind.sh

You'll then have the fluxqueue service running, a postgres database (for the job queue), along with (TBA) the scheduler plugins controller, which we currently have to use PodGroup.

$ kubectl get pods -n fluxqueue-system
NAME                                                 READY   STATUS    RESTARTS   AGE
fluxqueue-chart-controller-manager-6dd6f95c6-z9qdk   0/1     Running   0          9s
postgres-5dc8c6b49d-llv2s                            0/1     Running   0          9s

You can then create a job or a pod:

kubectl apply -f test/job.yaml
kubectl apply -f test/pod.yaml

Which will currently each be suspended (job) or schedule gated (pod) to prevent scheduling. A FluxJob to wrap them is also created:

$ kubectl get fluxjobs.jobs.converged-computing.org 
NAME      AGE
job-pod   4s
pod-pod   6s

Next I'm going to figure out how we can add a queue that receives these jobs and asks to schedule with fluxion.

Development

Debugging Postgres

It is often helpful to shell into the postgres container to see the database directly:

kubectl exec -n fluxqueue-system -it postgres-597db46977-9lb25 bash
psql -U postgres

# Connect to database 
\c

# list databases
\l

# show tables
\dt

# test a query
SELECT group_name, group_size from pods_provisional;

TODO

  • Figure out how to add queue
  • Figure out how to add fluxion
  • kubectl plugin to get fluxion state?

License

HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.

See LICENSE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (MIT)

LLNL-CODE- 842614

About

Job scheduling in Kubernetes with Flux

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages