Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions docs/basic_usage/benchmarking.md

This file was deleted.

67 changes: 67 additions & 0 deletions docs/benchmarks/benchmark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Benchmarking for Speculative Decoding

## Overview

We provide a unified script to test the performance of the Speculative Decoding with EAGLE3 algorithm on multiple datasets. You can follow the steps below to run the benchmarks.

## Run Benchmarks

### Launch SGLang and Benchmarker Concurrently

`bench_eagle3.py` can help you launch a SGLang server process and a Benchmarking process concurrently. In this way, you don't have to launch the SGLang server manually, this script will manually handle the SGLang launch under different speculative decoding configurations. Some important arguments are:
- `--model-path`: the path to the target model.
- `--speculative-draft-model-path`: the path to the draft model.
- `--port`: the port to launch the SGLang server.
- `--trust-remote-code`: trust the remote code.
- `--mem-fraction-static`: the memory fraction for the static memory.
- `--tp-size`: the tensor parallelism size.
- `--attention-backend`: the attention backend.
- `--config-list`: the list of speculative decoding configuration to test, the format is `<batch-size>,<num-steps>,<topk>,<num-draft-tokens>`.
- `--benchmark-list`: the list of benchmarks to test, the format is `<benchmark-name>:<num-prompts>:<subset>`.

```shell
python3 bench_eagle3.py \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--speculative-draft-model-path lmsys/sglang-EAGLE3-LLaMA3.1-Instruct-8B \
--port 30000 \
--trust-remote-code \
--mem-fraction-static 0.8 \
--tp-size 1 \
--attention-backend fa3 \
--config-list 1,0,0,0 1,3,1,4 \
--benchmark-list mtbench gsm8k:5 ceval:5:accountant \
--dtype bfloat16
Comment on lines +23 to +33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The path to the bench_eagle3.py script is missing. Assuming users run commands from the repository root, the script will not be found. The path should be updated to benchmarks/bench_eagle3.py for the command to execute correctly.

Suggested change
python3 bench_eagle3.py \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--speculative-draft-model-path lmsys/sglang-EAGLE3-LLaMA3.1-Instruct-8B \
--port 30000 \
--trust-remote-code \
--mem-fraction-static 0.8 \
--tp-size 1 \
--attention-backend fa3 \
--config-list 1,0,0,0 1,3,1,4 \
--benchmark-list mtbench gsm8k:5 ceval:5:accountant \
--dtype bfloat16
python3 benchmarks/bench_eagle3.py \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--speculative-draft-model-path lmsys/sglang-EAGLE3-LLaMA3.1-Instruct-8B \
--port 30000 \
--trust-remote-code \
--mem-fraction-static 0.8 \
--tp-size 1 \
--attention-backend fa3 \
--config-list 1,0,0,0 1,3,1,4 \
--benchmark-list mtbench gsm8k:5 ceval:5:accountant \
--dtype bfloat16

```

### Launch Benchmarker Independently

If you want to launch the SGLang server independently, you can use the following command.

```shell
# you can launch a server
python3 -m sglang.launch_server \
--model meta-llama/Llama-3.1-8B-Instruct \
--speculative-algorithm EAGLE3 \
--speculative-draft-model-path lmsys/sglang-EAGLE3-LLaMA3.1-Instruct-8B \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--mem-fraction-static 0.75 \
--cuda-graph-max-bs 1 \
--tp 1 \
--trust-remote-code \
--host 0.0.0.0 \
--port 30000 \
--dtype bfloat16
```

Then we can start benchmarking. Note that you should use the same host and port as the one used in the SGLang server. Note that `--skip-launch-server` is required to skip the launch of the SGLang server.

```bash
python bench_eagle3.py \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--port 30000 \
--config-list 1,3,1,4 \
--benchmark-list mtbench:5 ceval:5:accountant gsm8k:5 humaneval:5 math500:5 mtbench:5 aime:1 \
--skip-launch-server
Comment on lines +61 to +66
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This command example has several issues that should be addressed:

  • The path to bench_eagle3.py is missing. It should be benchmarks/bench_eagle3.py.
  • For consistency with other examples in this file, python3 should be used instead of python.
  • The indentation of arguments is inconsistent with other code blocks. Using a standard 4-space indent improves readability.
  • The benchmark mtbench:5 is listed twice in --benchmark-list, which is redundant.
Suggested change
python bench_eagle3.py \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--port 30000 \
--config-list 1,3,1,4 \
--benchmark-list mtbench:5 ceval:5:accountant gsm8k:5 humaneval:5 math500:5 mtbench:5 aime:1 \
--skip-launch-server
python3 benchmarks/bench_eagle3.py \
--model-path meta-llama/Llama-3.1-8B-Instruct \
--port 30000 \
--config-list 1,3,1,4 \
--benchmark-list mtbench:5 ceval:5:accountant gsm8k:5 humaneval:5 math500:5 aime:1 \
--skip-launch-server

```
9 changes: 7 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
SpecForge Documentation
====================
=======================

SpecForge is an ecosystem project developed by the SGLang team. It is a framework for training speculative decoding models so that you can smoothly port them over to the SGLang serving framework to speed up your inference.

Expand All @@ -25,7 +25,6 @@ SpecForge is an ecosystem project developed by the SGLang team. It is a framewor

basic_usage/data_preparation.md
basic_usage/training.md
basic_usage/benchmarking.md

.. toctree::
:maxdepth: 1
Expand All @@ -39,3 +38,9 @@ SpecForge is an ecosystem project developed by the SGLang team. It is a framewor

examples/llama3-eagle3-online.md
examples/llama3-eagle3-offline.md

.. toctree::
:maxdepth: 1
:caption: Benchmarks

benchmarks/benchmark.md