Skip to content

Commit e0625b0

Browse files
authored
[Docs] add benchmark refer (#358)
* docs:add benchmark refer polish polish * polish
1 parent 9639a52 commit e0625b0

File tree

3 files changed

+74
-5
lines changed

3 files changed

+74
-5
lines changed

docs/basic_usage/benchmarking.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs/benchmarks/benchmark.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Benchmarking for Speculative Decoding
2+
3+
## Overview
4+
5+
We provide a unified script to test the performance of the Speculative Decoding with EAGLE3 algorithm on multiple datasets. You can follow the steps below to run the benchmarks.
6+
7+
## Run Benchmarks
8+
9+
### Launch SGLang and Benchmarker Concurrently
10+
11+
`bench_eagle3.py` can help you launch a SGLang server process and a Benchmarking process concurrently. In this way, you don't have to launch the SGLang server manually, this script will manually handle the SGLang launch under different speculative decoding configurations. Some important arguments are:
12+
- `--model-path`: the path to the target model.
13+
- `--speculative-draft-model-path`: the path to the draft model.
14+
- `--port`: the port to launch the SGLang server.
15+
- `--trust-remote-code`: trust the remote code.
16+
- `--mem-fraction-static`: the memory fraction for the static memory.
17+
- `--tp-size`: the tensor parallelism size.
18+
- `--attention-backend`: the attention backend.
19+
- `--config-list`: the list of speculative decoding configuration to test, the format is `<batch-size>,<num-steps>,<topk>,<num-draft-tokens>`.
20+
- `--benchmark-list`: the list of benchmarks to test, the format is `<benchmark-name>:<num-prompts>:<subset>`.
21+
22+
```shell
23+
python3 bench_eagle3.py \
24+
--model-path meta-llama/Llama-3.1-8B-Instruct \
25+
--speculative-draft-model-path lmsys/sglang-EAGLE3-LLaMA3.1-Instruct-8B \
26+
--port 30000 \
27+
--trust-remote-code \
28+
--mem-fraction-static 0.8 \
29+
--tp-size 1 \
30+
--attention-backend fa3 \
31+
--config-list 1,0,0,0 1,3,1,4 \
32+
--benchmark-list mtbench gsm8k:5 ceval:5:accountant \
33+
--dtype bfloat16
34+
```
35+
36+
### Launch Benchmarker Independently
37+
38+
If you want to launch the SGLang server independently, you can use the following command.
39+
40+
```shell
41+
# you can launch a server
42+
python3 -m sglang.launch_server \
43+
--model meta-llama/Llama-3.1-8B-Instruct \
44+
--speculative-algorithm EAGLE3 \
45+
--speculative-draft-model-path lmsys/sglang-EAGLE3-LLaMA3.1-Instruct-8B \
46+
--speculative-num-steps 3 \
47+
--speculative-eagle-topk 1 \
48+
--speculative-num-draft-tokens 4 \
49+
--mem-fraction-static 0.75 \
50+
--cuda-graph-max-bs 1 \
51+
--tp 1 \
52+
--trust-remote-code \
53+
--host 0.0.0.0 \
54+
--port 30000 \
55+
--dtype bfloat16
56+
```
57+
58+
Then we can start benchmarking. Note that you should use the same host and port as the one used in the SGLang server. Note that `--skip-launch-server` is required to skip the launch of the SGLang server.
59+
60+
```bash
61+
python bench_eagle3.py \
62+
--model-path meta-llama/Llama-3.1-8B-Instruct \
63+
--port 30000 \
64+
--config-list 1,3,1,4 \
65+
--benchmark-list mtbench:5 ceval:5:accountant gsm8k:5 humaneval:5 math500:5 mtbench:5 aime:1 \
66+
--skip-launch-server
67+
```

docs/index.rst

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
SpecForge Documentation
2-
====================
2+
=======================
33

44
SpecForge is an ecosystem project developed by the SGLang team. It is a framework for training speculative decoding models so that you can smoothly port them over to the SGLang serving framework to speed up your inference.
55

@@ -25,7 +25,6 @@ SpecForge is an ecosystem project developed by the SGLang team. It is a framewor
2525

2626
basic_usage/data_preparation.md
2727
basic_usage/training.md
28-
basic_usage/benchmarking.md
2928

3029
.. toctree::
3130
:maxdepth: 1
@@ -39,3 +38,9 @@ SpecForge is an ecosystem project developed by the SGLang team. It is a framewor
3938

4039
examples/llama3-eagle3-online.md
4140
examples/llama3-eagle3-offline.md
41+
42+
.. toctree::
43+
:maxdepth: 1
44+
:caption: Benchmarks
45+
46+
benchmarks/benchmark.md

0 commit comments

Comments
 (0)