You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to propose a set of changes to the teleprobe architecture that, if accepted, should allow it to scale for running multiple jobs:
Split teleprobe-server into 2 parts: teleprobe-api (one) and teleprobe-worker (many)
The teleprobe-api accepts requests to run a job
A job includes a list of binaries and associated tags which identifies on which each binary should run.
Maintains an in-memory queue of jobs and schedules them across workers.
Is public facing and authenticates requests to run jobs
The teleprobe-worker runs a binary and reports result and logs back to teleprobe-api.
A worker is configured with a list of targets. Each target contains the same information as today, but with a set of tags/labels.
At startup, each worker announces to teleprobe-api it's identity and the list of targets with tags/labels it supports.
Workers poll the teleprobe-api for binaries to run (long-polling with timeout) and runs those binaries (can run multiple in parallel, api knows if worker is busy).
Workers report logs/results back to the teleprobe-api
Are not public facing and is assumed to have an internal network for accessing the teleprobe-api
A further improvement could be to even split the teleprobe-api into an api and a scheduler part, allowing job information to persist across restarts, running multiple API for failover etc, but that would require introducing persistence and some form of coordination. So I consider that a future step and a natural evolution of the above should the need arise.
I'm happy to instead fork teleprobe for this capability, but it feels like a lot of overlap in the use case.
The text was updated successfully, but these errors were encountered:
I would like to propose a set of changes to the teleprobe architecture that, if accepted, should allow it to scale for running multiple jobs:
A further improvement could be to even split the teleprobe-api into an api and a scheduler part, allowing job information to persist across restarts, running multiple API for failover etc, but that would require introducing persistence and some form of coordination. So I consider that a future step and a natural evolution of the above should the need arise.
I'm happy to instead fork teleprobe for this capability, but it feels like a lot of overlap in the use case.
The text was updated successfully, but these errors were encountered: