Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource requests should be at executor level. #208

Open
claymcleod opened this issue Sep 10, 2024 · 2 comments
Open

Resource requests should be at executor level. #208

claymcleod opened this issue Sep 10, 2024 · 2 comments
Milestone

Comments

@claymcleod
Copy link

It seems strange that resource requests are specified at a task level instead of a executor level where images are actually specified. The lack of flexibility surrounding resource allocation at this level greatly inhibits one of the major potential benefits (the biggest potential benefit?) that the executors abstraction provides—you can't save on resources for commands that don't require a huge amount of CPU/RAM/disk.

@kellrott
Copy link
Member

Executors are run meant to be run sequentially on a single machine allocated to a task. In most deployments, the TES service allocates a machine (either VM or HPC node) starts up a runner, the runner moves all files into place and then invokes each executor one after the other. AWS charges you for the full VM for the full time that you use it, even if you are only using half of it for some of the processes. The only way to change the allocation size is to have request another sized VM and move the tasks there, which would be equivalent to issuing two different tasks. Same with HPC systems, like SLURM.

@claymcleod
Copy link
Author

claymcleod commented Sep 11, 2024

Thanks for the context @kellrott, and I think it does make sense for the specific instances you bring up here. Additionally, I knew the part about the executors running sequentially in order from this part of the spec:

executors

An array of executors to be run. Each of the executors will run one at a time sequentially. Each executor is a different command that will be run, and each can utilize a different docker image. But each of the executors will see the same mapped inputs and volumes that are declared in the parent CreateTask message.

That being said, the idea that executors were intended to run on a single machine is new (at least to me from my reading of the documentation). That might be something to clarify in the spec if it isn't already and I just missed it.


Above notwithstanding, I still think it makes sense to consider this change for situations where:

  • All of the executions run on one machine, but the executor can better schedule containers from several concurrent tasks because it knows the limits to expect from each individual container (things like the local Docker daemon on your laptop and Kubernetes).
  • Backends that abstract away the concept of a VM and simply lets you specify units of compute at the container level (most of the container services and, from an HPC perspective, something likebsub-ing singularity jobs to LSF).

In my mind, specifying the resources at the executor level would subsume the use cases you listed (e.g., by looking across all executors before requisition and picking the maximum resource usage), while servicing the cases I list above aren't possible in the current state of affairs.

@vsmalladi vsmalladi added this to the Next milestone Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants