Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor SchedulePolicy to improve code organization #2571

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

libratiger
Copy link
Contributor

@libratiger libratiger commented Dec 25, 2024

Motivation

When I try to deep into the Zero-Overhead Batch Scheduler , I find is hard to get clear on the scheduling, and is hard to impl a new scheduling policy, so I try to refactor SchedulePolicy,and make it easy to add new policy for me and others.

Modifications

  1. Move sorting logic into separate static methods for better maintainability
  2. Improve policy validation and adjustment logic

Testing:

Add new test file test_schedule_policy.py with basic unit tests
Cover policy initialization and FCFS scheduling validation

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@libratiger
Copy link
Contributor Author

python -m sglang.bench_one_batch --model-path Qwen/Qwen2.5-3B-Instruct

with following result

max_total_num_tokens=1802895
Warmup ...
Prefill. latency: 0.04046 s, throughput:  25310.52 token/s
Decode.  latency: 0.00728 s, throughput:    137.29 token/s
Decode.  latency: 0.00711 s, throughput:    140.58 token/s
Decode.  latency: 0.00711 s, throughput:    140.65 token/s
Decode.  latency: 0.00711 s, throughput:    140.62 token/s
Decode.  latency: 0.00710 s, throughput:    140.76 token/s
Decode.  median latency: 0.00711 s, median throughput:    140.62 token/s
Total. latency:  0.090 s, throughput:  11406.91 token/s
Benchmark ...
Prefill. latency: 0.03336 s, throughput:  30699.61 token/s
Decode.  latency: 0.00716 s, throughput:    139.63 token/s
Decode.  latency: 0.00711 s, throughput:    140.73 token/s
Decode.  latency: 0.00710 s, throughput:    140.84 token/s
Decode.  latency: 0.00710 s, throughput:    140.85 token/s
Decode.  latency: 0.00709 s, throughput:    140.99 token/s
Decode.  median latency: 0.00710 s, median throughput:    140.85 token/s
Total. latency:  0.140 s, throughput:   7431.72 token/s

@libratiger
Copy link
Contributor Author

cc @merrymercy

@libratiger
Copy link
Contributor Author

ping @hnyls2002 for your review feedback 😄

@merrymercy
Copy link
Contributor

@hnyls2002 please take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants