Under development!
I'm still thinking over improvements to fluxnetes, fluence, and related projects, and this is the direction I'm currently taking. I've been thinking of a design where Flux works as a controller, as follows:
- The controller has an admission webhook that intercepts jobs and pods being submit. For jobs, they are suspended. For all other abstractions, scheduling gates are used.
- The jobs are wrapped as
FluxJob
and parsed into Flux Job specs and passed to a part of the controller, the Flux Queue. - The Flux Queue, which runs in a loop, moves through the queue and interacts with a Fluxion service to schedule work.
- When a job is scheduled, it is unsuspended and/or targeted for the fluxqueue custom scheduler plugin that will assign exactly to the nodes it has been intended for.
- We will need an equivalent cleanup process to receive when pods are done, and tell fluxion and update the queue. Likely those will be done in the same operation.
This project comes out of fluxqueue, which was similar in design, but did the implementation entirely inside of Kubernetes. fluxqueue was a combination of Kubernetes and Fluence, both of which use the HPC-grade pod scheduling Fluxion scheduler to schedule pod groups to nodes. For our queue, we use river backed by a Postgres database. The database is deployed alongside fluence and could be customized to use an operator instead.
Important This is an experiment, and is under development. I will change this design a million times - it's how I tend to learn and work. I'll share updates when there is something to share. It deploys but does not work yet! See the docs for some detail on design choices.
Fluxqueue builds three primary containers:
ghcr.io/converged-computing/fluxqueue
: contains the webhook and operator with a flux queue for pods and groups that interacts with fluxionghcr.io/converged-computing/fluxqueue-scheduler
: (TBA) will provide a simple scheduler pluginghcr.io/converged-computing/fluxqueue-postgres
: holds the worker queue and provisional queue tables
And we use ghcr.io/converged-computing/fluxion
for the fluxion service.
Create a kind cluster. You need more than a control plane.
kind create cluster --config ./examples/kind-config.yaml
Install the certificate manager:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.1/cert-manager.yaml
Then you can deploy as follows:
./hack/quick-build-kind.sh
You'll then have the fluxqueue service running, a postgres database (for the job queue), along with (TBA) the scheduler plugins controller, which we currently have to use PodGroup.
$ kubectl get pods -n fluxqueue-system
NAME READY STATUS RESTARTS AGE
fluxqueue-chart-controller-manager-6dd6f95c6-z9qdk 0/1 Running 0 9s
postgres-5dc8c6b49d-llv2s 0/1 Running 0 9s
You can then create a job or a pod:
kubectl apply -f test/job.yaml
kubectl apply -f test/pod.yaml
Which will currently each be suspended (job) or schedule gated (pod) to prevent scheduling. A FluxJob to wrap them is also created:
$ kubectl get fluxjobs.jobs.converged-computing.org
NAME AGE
job-pod 4s
pod-pod 6s
Next I'm going to figure out how we can add a queue that receives these jobs and asks to schedule with fluxion.
It is often helpful to shell into the postgres container to see the database directly:
kubectl exec -n fluxqueue-system -it postgres-597db46977-9lb25 bash
psql -U postgres
# Connect to database
\c
# list databases
\l
# show tables
\dt
# test a query
SELECT group_name, group_size from pods_provisional;
- Figure out how to add queue
- Figure out how to add fluxion
- kubectl plugin to get fluxion state?
HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.
See LICENSE, COPYRIGHT, and NOTICE for details.
SPDX-License-Identifier: (MIT)
LLNL-CODE- 842614