Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: teleprobe server split into api and worker #28

Open
lulf opened this issue Dec 2, 2024 · 0 comments
Open

Proposal: teleprobe server split into api and worker #28

lulf opened this issue Dec 2, 2024 · 0 comments

Comments

@lulf
Copy link
Member

lulf commented Dec 2, 2024

I would like to propose a set of changes to the teleprobe architecture that, if accepted, should allow it to scale for running multiple jobs:

  • Split teleprobe-server into 2 parts: teleprobe-api (one) and teleprobe-worker (many)
    • The teleprobe-api accepts requests to run a job
      • A job includes a list of binaries and associated tags which identifies on which each binary should run.
      • Maintains an in-memory queue of jobs and schedules them across workers.
      • Is public facing and authenticates requests to run jobs
    • The teleprobe-worker runs a binary and reports result and logs back to teleprobe-api.
      • A worker is configured with a list of targets. Each target contains the same information as today, but with a set of tags/labels.
      • At startup, each worker announces to teleprobe-api it's identity and the list of targets with tags/labels it supports.
      • Workers poll the teleprobe-api for binaries to run (long-polling with timeout) and runs those binaries (can run multiple in parallel, api knows if worker is busy).
      • Workers report logs/results back to the teleprobe-api
      • Are not public facing and is assumed to have an internal network for accessing the teleprobe-api

A further improvement could be to even split the teleprobe-api into an api and a scheduler part, allowing job information to persist across restarts, running multiple API for failover etc, but that would require introducing persistence and some form of coordination. So I consider that a future step and a natural evolution of the above should the need arise.

I'm happy to instead fork teleprobe for this capability, but it feels like a lot of overlap in the use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant