# 快速模式（测试版：研究预览）

为 Claude Opus 4.6 和 Claude Opus 4.7 提供更高的输出速度，显著加快 token 生成速度，适用于延迟敏感和智能体工作流。

---

快速模式为 Claude Opus 4.6 和 Claude Opus 4.7 提供显著更快的输出 token 生成速度。通过在 API 请求中设置 `speed: "fast"`，您可以从同一模型获得高达 2.5 倍的每秒输出 token，但价格更高。

<Note>
快速模式处于测试版（研究预览）。[加入候补名单](https://claude.com/fast-mode)以请求访问。在 Anthropic 收集反馈期间，可用性有限。
</Note>

<Note>
此功能符合[零数据保留（ZDR）](/docs/en/build-with-claude/api-and-data-retention)条件。当您的组织拥有 ZDR 协议时，通过此功能发送的数据在 API 响应返回后不会被存储。
</Note>

## 支持的模型

快速模式在以下模型上受支持：

- Claude Opus 4.7（`claude-opus-4-7`）
- Claude Opus 4.6（`claude-opus-4-6`）

## 快速模式的工作原理

快速模式使用更快的推理配置运行相同的模型。智能或能力没有任何变化。

- 与标准速度相比，每秒输出 token 高达 2.5 倍
- 速度优势集中在每秒输出 token（OTPS）上，而不是首个 token 的时间（TTFT）
- 相同的模型权重和行为（不是不同的模型）

## 基本用法

<CodeGroup>
```bash cURL
curl https://api.anthropic.com/v1/messages \
    --header "x-api-key: $ANTHROPIC_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: fast-mode-2026-02-01" \
    --header "content-type: application/json" \
    --data '{
        "model": "claude-opus-4-7",
        "max_tokens": 4096,
        "speed": "fast",
        "messages": [{
            "role": "user",
            "content": "Refactor this module to use dependency injection"
        }]
    }'
```

```bash CLI
ant beta:messages create \
  --beta fast-mode-2026-02-01 \
  --transform 'content.0.text' --raw-output <<'YAML'
model: claude-opus-4-7
max_tokens: 4096
speed: fast
messages:
  - role: user
    content: Refactor this module to use dependency injection
YAML
```

```python Python nocheck hidelines={1..2}
import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    speed="fast",
    betas=["fast-mode-2026-02-01"],
    messages=[
        {"role": "user", "content": "Refactor this module to use dependency injection"}
    ],
)

print(response.content[0].text)
```

```typescript TypeScript hidelines={1..2}
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.beta.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 4096,
  speed: "fast",
  betas: ["fast-mode-2026-02-01"],
  messages: [
    {
      role: "user",
      content: "Refactor this module to use dependency injection"
    }
  ]
});

const textBlock = response.content.find(
  (block): block is Anthropic.Beta.Messages.BetaTextBlock => block.type === "text"
);
console.log(textBlock?.text);
```

```csharp C# hidelines={1..5}
using Anthropic;
using Anthropic.Models.Beta.Messages;

AnthropicClient client = new();

var response = await client.Beta.Messages.Create(new MessageCreateParams
{
    Model = "claude-opus-4-7",
    MaxTokens = 4096,
    Speed = Speed.Fast,
    Betas = ["fast-mode-2026-02-01"],
    Messages = [
        new() { Role = Role.User, Content = "Refactor this module to use dependency injection" }
    ],
});

Console.WriteLine(response);
```

```go Go hidelines={1..11,-1}
package main

import (
	"context"
	"fmt"
	"log"

	anthropic "github.com/anthropics/anthropic-sdk-go"
)

func main() {
	client := anthropic.NewClient()

	response, err := client.Beta.Messages.New(context.TODO(), anthropic.BetaMessageNewParams{
		Model:     anthropic.ModelClaudeOpus4_7,
		MaxTokens: 4096,
		Speed:     anthropic.BetaMessageNewParamsSpeedFast,
		Betas:     []anthropic.AnthropicBeta{anthropic.AnthropicBetaFastMode2026_02_01},
		Messages: []anthropic.BetaMessageParam{
			anthropic.NewBetaUserMessage(anthropic.NewBetaTextBlock("Refactor this module to use dependency injection")),
		},
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(response.Content[0].AsText().Text)
}
```

```java Java hidelines={1..8,-1}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.beta.AnthropicBeta;
import com.anthropic.models.beta.messages.BetaMessage;
import com.anthropic.models.beta.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;

void main() {
    AnthropicClient client = AnthropicOkHttpClient.fromEnv();

    BetaMessage response = client.beta().messages().create(
            MessageCreateParams.builder()
                    .model(Model.CLAUDE_OPUS_4_7)
                    .maxTokens(4096L)
                    .speed(MessageCreateParams.Speed.FAST)
                    .addBeta(AnthropicBeta.FAST_MODE_2026_02_01)
                    .addUserMessage("Refactor this module to use dependency injection")
                    .build());

    IO.println(response.content().get(0).text().get().text());
}
```

```php PHP hidelines={1..4}
<?php

use Anthropic\Client;

$client = new Client();

$response = $client->beta->messages->create(
    model: 'claude-opus-4-7',
    maxTokens: 4096,
    speed: 'fast',
    betas: ['fast-mode-2026-02-01'],
    messages: [
        ['role' => 'user', 'content' => 'Refactor this module to use dependency injection'],
    ],
);

echo $response->content[0]->text;
```

```ruby Ruby hidelines={1..2}
require "anthropic"

client = Anthropic::Client.new

response = client.beta.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 4096,
  speed: "fast",
  betas: ["fast-mode-2026-02-01"],
  messages: [{role: "user", content: "Refactor this module to use dependency injection"}]
)

puts response.content[0].text
```

</CodeGroup>

## 定价

快速模式在整个上下文窗口范围内（包括超过 20 万输入 token 的请求）按标准 Opus 费率的 6 倍定价。下表显示了 Claude Opus 4.6 和 Claude Opus 4.7 使用快速模式的定价：

| 输入 | 输出 |
|:------|:-------|
| $30 / MTok | $150 / MTok |

快速模式定价与其他定价修改器叠加：

- [提示缓存乘数](/docs/en/about-claude/pricing#prompt-caching)在快速模式定价之上应用
- [数据驻留](/docs/en/manage-claude/data-residency)乘数在快速模式定价之上应用

有关完整的定价详情，请参阅[定价页面](/docs/en/about-claude/pricing#fast-mode-pricing)。

## 速率限制

快速模式有独立的速率限制，与标准 Opus 速率限制分开。当您的快速模式速率限制被超出时，API 返回 `429` 错误，并带有 `retry-after` 头，指示何时会有容量可用。

响应包含指示快速模式速率限制状态的头：

| 头 | 描述 |
|:-------|:------------|
| `anthropic-fast-input-tokens-limit` | 每分钟最大快速模式输入 token |
| `anthropic-fast-input-tokens-remaining` | 剩余快速模式输入 token |
| `anthropic-fast-input-tokens-reset` | 快速模式输入 token 限制重置的时间 |
| `anthropic-fast-output-tokens-limit` | 每分钟最大快速模式输出 token |
| `anthropic-fast-output-tokens-remaining` | 剩余快速模式输出 token |
| `anthropic-fast-output-tokens-reset` | 快速模式输出 token 限制重置的时间 |

有关特定层级的速率限制，请参阅[速率限制页面](/docs/en/api/rate-limits)。

## 检查使用了哪种速度

响应的 `usage` 对象包含一个 `speed` 字段，指示使用了哪种速度，`"fast"` 或 `"standard"`：

<CodeGroup>
```bash cURL
curl https://api.anthropic.com/v1/messages \
    --header "x-api-key: $ANTHROPIC_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: fast-mode-2026-02-01" \
    --header "content-type: application/json" \
    --data '{
        "model": "claude-opus-4-7",
        "max_tokens": 1024,
        "speed": "fast",
        "messages": [{"role": "user", "content": "Hello"}]
    }'
```

```bash CLI
ant beta:messages create --beta fast-mode-2026-02-01 \
  --transform usage.speed --raw-output <<'YAML'
model: claude-opus-4-7
max_tokens: 1024
speed: fast
messages:
  - role: user
    content: Hello
YAML
```

```python Python nocheck
response = client.beta.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    speed="fast",
    betas=["fast-mode-2026-02-01"],
    messages=[{"role": "user", "content": "Hello"}],
)

print(response.usage.speed)  # "fast" 或 "standard"
```

```typescript TypeScript
const response = await client.beta.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  speed: "fast",
  betas: ["fast-mode-2026-02-01"],
  messages: [{ role: "user", content: "Hello" }]
});

console.log(response.usage.speed); // "fast" 或 "standard"
```

```csharp C# hidelines={1..5}
using Anthropic;
using Anthropic.Models.Beta.Messages;

AnthropicClient client = new();

var response = await client.Beta.Messages.Create(new MessageCreateParams
{
    Model = "claude-opus-4-7",
    MaxTokens = 1024,
    Speed = Speed.Fast,
    Betas = ["fast-mode-2026-02-01"],
    Messages = [new() { Role = Role.User, Content = "Hello" }],
});

Console.WriteLine(response.Usage.Speed);  // "fast" 或 "standard"
```

```go Go hidelines={1..11,-1}
package main

import (
	"context"
	"fmt"
	"log"

	anthropic "github.com/anthropics/anthropic-sdk-go"
)

func main() {
	client := anthropic.NewClient()

	response, err := client.Beta.Messages.New(context.TODO(), anthropic.BetaMessageNewParams{
		Model:     anthropic.ModelClaudeOpus4_7,
		MaxTokens: 1024,
		Speed:     anthropic.BetaMessageNewParamsSpeedFast,
		Betas:     []anthropic.AnthropicBeta{anthropic.AnthropicBetaFastMode2026_02_01},
		Messages: []anthropic.BetaMessageParam{
			anthropic.NewBetaUserMessage(anthropic.NewBetaTextBlock("Hello")),
		},
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(response.Usage.Speed) // "fast" 或 "standard"
}
```

```java Java hidelines={1..8,-1}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.beta.AnthropicBeta;
import com.anthropic.models.beta.messages.BetaMessage;
import com.anthropic.models.beta.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;

void main() {
    AnthropicClient client = AnthropicOkHttpClient.fromEnv();

    MessageCreateParams params = MessageCreateParams.builder()
            .model(Model.CLAUDE_OPUS_4_7)
            .maxTokens(1024L)
            .speed(MessageCreateParams.Speed.FAST)
            .addBeta(AnthropicBeta.FAST_MODE_2026_02_01)
            .addUserMessage("Hello")
            .build();

    BetaMessage response = client.beta().messages().create(params);
    IO.println(response.usage().speed());  // "fast" 或 "standard"
}
```

```php PHP hidelines={1..4}
<?php

use Anthropic\Client;

$client = new Client();

$response = $client->beta->messages->create(
    model: 'claude-opus-4-7',
    maxTokens: 1024,
    speed: 'fast',
    betas: ['fast-mode-2026-02-01'],
    messages: [['role' => 'user', 'content' => 'Hello']],
);

echo $response->usage->speed;  // "fast" 或 "standard"
```

```ruby Ruby nocheck
response = client.beta.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 1024,
  speed: "fast",
  betas: ["fast-mode-2026-02-01"],
  messages: [{ role: "user", content: "Hello" }]
)

puts(response.usage.speed)  # "fast" 或 "standard"
```
</CodeGroup>

```json 输出 hidelines={5..8}
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{ "type": "text", "text": "Hello!" }],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 8,
    "output_tokens": 12,
    "speed": "fast"
  }
}
```

要跟踪组织中的快速模式使用情况和成本，请参阅[使用量和成本 API](/docs/en/manage-claude/usage-cost-api)。

## 重试和回退

### 自动重试

当快速模式速率限制被超出时，API 返回带有 `retry-after` 头的 `429` 错误。Anthropic SDK 默认自动重试这些请求最多 2 次（可通过 `max_retries` 配置），在每次重试之前等待服务器指定的延迟。由于快速模式使用连续 token 补充，`retry-after` 延迟通常很短，一旦容量可用请求就会成功。

### 回退到标准速度

如果您希望回退到标准速度而不是等待快速模式容量，请捕获速率限制错误并在不使用 `speed: "fast"` 的情况下重试。在初始快速请求上设置 `max_retries` 为 `0` 以跳过自动重试，并在速率限制错误时立即失败。

<Note>
从快速模式回退到标准速度会导致[提示缓存](/docs/en/build-with-claude/prompt-caching)未命中。不同速度的请求不共享缓存前缀。
</Note>

由于将 `max_retries` 设置为 `0` 也会禁用其他瞬态错误（过载、内部服务器错误）的重试，下面的示例在这些情况下使用默认重试重新发出原始请求。

<CodeGroup>
```bash CLI
# `ant` 自动重试 429/5xx，没有每请求 max_retries
# 覆盖，因此在快速模式 429 时回退在内置重试耗尽后运行。
# --transform-error 显示 error.type 用于分支。
create_message_with_fast_fallback() {
  local speed="$1" max_attempts="${2:-3}" body out
  body=${3:-$(cat)}
  out=$(
    ant beta:messages create --beta fast-mode-2026-02-01 \
      ${speed:+--speed "$speed"} \
      --transform-error error.type --format-error yaml <<<"$body" 2>/dev/null
  ) && { printf '%s\n' "$out"; return; }
  case "$out" in
    rate_limit_error)
      if [[ -n "$speed" ]]; then
        create_message_with_fast_fallback "" "$max_attempts" "$body"
        return
      fi ;;
    overloaded_error | api_error | "")
      if (( max_attempts > 1 )); then
        create_message_with_fast_fallback "$speed" $((max_attempts - 1)) "$body"
        return
      fi ;;
  esac
  printf '%s\n' "${out:-connection_error}" >&2
  return 1
}

MESSAGE=$(
  create_message_with_fast_fallback fast <<'YAML'
model: claude-opus-4-7
max_tokens: 1024
messages:
  - role: user
    content: Hello
YAML
)
```

```python Python nocheck hidelines={1..2}
import anthropic

client = anthropic.Anthropic()


def create_message_with_fast_fallback(max_retries=None, max_attempts=3, **params):
    try:
        return client.beta.messages.create(**params, max_retries=max_retries)
    except anthropic.RateLimitError:
        if params.get("speed") == "fast":
            del params["speed"]
            return create_message_with_fast_fallback(**params)
        raise
    except (
        anthropic.APIStatusError,
        anthropic.APIConnectionError,
    ) as error:
        if isinstance(error, anthropic.APIStatusError) and error.status_code < 500:
            raise
        if max_attempts > 1:
            return create_message_with_fast_fallback(
                max_attempts=max_attempts - 1, **params
            )
        raise


message = create_message_with_fast_fallback(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    betas=["fast-mode-2026-02-01"],
    speed="fast",
    max_retries=0,
)
```

```typescript TypeScript hidelines={1..3,-1}
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
(async () => {
  async function createMessageWithFastFallback(
    params: Anthropic.Beta.MessageCreateParams,
    requestOptions?: Anthropic.RequestOptions,
    maxAttempts: number = 3
  ): Promise<Anthropic.Beta.Messages.BetaMessage> {
    try {
      return (await client.beta.messages.create(
        params,
        requestOptions
      )) as Anthropic.Beta.Messages.BetaMessage;
    } catch (e) {
      if (e instanceof Anthropic.RateLimitError && params.speed === "fast") {
        const { speed, ...rest } = params;
        return createMessageWithFastFallback(rest);
      }
      if (
        e instanceof Anthropic.InternalServerError ||
        e instanceof Anthropic.APIConnectionError
      ) {
        if (maxAttempts > 1) {
          return createMessageWithFastFallback(params, undefined, maxAttempts - 1);
        }
      }
      throw e;
    }
  }

  const message = await createMessageWithFastFallback(
    {
      model: "claude-opus-4-7",
      max_tokens: 1024,
      messages: [{ role: "user", content: "Hello" }],
      betas: ["fast-mode-2026-02-01"],
      speed: "fast"
    },
    { maxRetries: 0 }
  );
})();
```

```csharp C# hidelines={1..6}
using Anthropic;
using Anthropic.Exceptions;
using Anthropic.Models.Beta.Messages;

AnthropicClient client = new();

async Task<BetaMessage> CreateMessageWithFastFallback(
    MessageCreateParams parameters,
    int? maxRetries = null,
    int maxAttempts = 3)
{
    try
    {
        var requestClient = maxRetries is int retries
            ? client.WithOptions(options => options with { MaxRetries = retries })
            : client;
        return await requestClient.Beta.Messages.Create(parameters);
    }
    catch (AnthropicRateLimitException)
    {
        if (parameters.Speed is not null)
        {
            return await CreateMessageWithFastFallback(
                parameters with { Speed = null });
        }
        throw;
    }
    catch (Anthropic5xxException)
    {
        if (maxAttempts > 1)
        {
            return await CreateMessageWithFastFallback(
                parameters, maxAttempts: maxAttempts - 1);
        }
        throw;
    }
}

var message = await CreateMessageWithFastFallback(
    new MessageCreateParams
    {
        Model = "claude-opus-4-7",
        MaxTokens = 1024,
        Messages = [new() { Role = Role.User, Content = "Hello" }],
        Betas = ["fast-mode-2026-02-01"],
        Speed = Speed.Fast,
    },
    maxRetries: 0);
```

```go Go hidelines={1..11}
package main

import (
	"context"
	"errors"
	"fmt"

	anthropic "github.com/anthropics/anthropic-sdk-go"
	"github.com/anthropics/anthropic-sdk-go/option"
)

func createMessageWithFastFallback(
	ctx context.Context,
	client *anthropic.Client,
	params anthropic.BetaMessageNewParams,
	maxAttempts int,
	opts ...option.RequestOption,
) (*anthropic.BetaMessage, error) {
	message, err := client.Beta.Messages.New(ctx, params, opts...)
	if err != nil {
		var apierr *anthropic.Error
		if errors.As(err, &apierr) && apierr.StatusCode == 429 && params.Speed != "" {
			params.Speed = ""
			return createMessageWithFastFallback(ctx, client, params, maxAttempts)
		}
		if (errors.As(err, &apierr) && apierr.StatusCode >= 500) || !errors.As(err, &apierr) {
			if maxAttempts > 1 {
				return createMessageWithFastFallback(ctx, client, params, maxAttempts-1)
			}
		}
		return nil, err
	}
	return message, nil
}

func main() {
	client := anthropic.NewClient()
	message, err := createMessageWithFastFallback(
		context.TODO(),
		&client,
		anthropic.BetaMessageNewParams{
			Model:     anthropic.ModelClaudeOpus4_7,
			MaxTokens: 1024,
			Messages: []anthropic.BetaMessageParam{
				anthropic.NewBetaUserMessage(anthropic.NewBetaTextBlock("Hello")),
			},
			Speed: "fast",
			Betas: []anthropic.AnthropicBeta{anthropic.AnthropicBetaFastMode2026_02_01},
		},
		3,
		option.WithMaxRetries(0),
	)
	if err != nil {
		panic(err)
	}
	fmt.Println(message)
}
```

```java Java hidelines={1..2,5..10}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.errors.InternalServerException;
import com.anthropic.errors.RateLimitException;
import com.anthropic.models.beta.AnthropicBeta;
import com.anthropic.models.beta.messages.BetaMessage;
import com.anthropic.models.beta.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;
import java.util.Optional;

// 禁用 SDK 自动重试，以便下面的回退逻辑处理
AnthropicClient client =
        AnthropicOkHttpClient.builder().fromEnv().maxRetries(0).build();

BetaMessage createMessageWithFastFallback(
        MessageCreateParams params, int maxAttempts) {
    try {
        return client.beta().messages().create(params);
    } catch (RateLimitException e) {
        if (params.speed().isPresent()) {
            MessageCreateParams retryParams = params.toBuilder()
                    .speed(Optional.empty())
                    .build();
            return createMessageWithFastFallback(retryParams, maxAttempts);
        }
        throw e;
    } catch (InternalServerException e) {
        if (maxAttempts > 1) {
            return createMessageWithFastFallback(params, maxAttempts - 1);
        }
        throw e;
    }
}

void main() {
    BetaMessage message = createMessageWithFastFallback(
            MessageCreateParams.builder()
                    .model(Model.CLAUDE_OPUS_4_7)
                    .maxTokens(1024L)
                    .addUserMessage("Hello")
                    .addBeta(AnthropicBeta.FAST_MODE_2026_02_01)
                    .speed(MessageCreateParams.Speed.FAST)
                    .build(),
            3);
    IO.println(message.content().get(0).text().get().text());
}
```

```php PHP hidelines={1..3,8}
<?php

use Anthropic\Client;
use Anthropic\Core\Exceptions\APIConnectionException;
use Anthropic\Core\Exceptions\InternalServerException;
use Anthropic\Core\Exceptions\RateLimitException;
use Anthropic\RequestOptions;

$client = new Client();

function createMessageWithFastFallback(
    Client $client,
    array $params,
    ?RequestOptions $requestOptions = null,
    int $maxAttempts = 3,
) {
    try {
        return $client->beta->messages->create(
            ...$params,
            requestOptions: $requestOptions,
        );
    } catch (RateLimitException $e) {
        if (isset($params['speed'])) {
            unset($params['speed']);
            return createMessageWithFastFallback($client, $params);
        }
        throw $e;
    } catch (InternalServerException | APIConnectionException $e) {
        if ($maxAttempts > 1) {
            return createMessageWithFastFallback(
                $client, $params, maxAttempts: $maxAttempts - 1
            );
        }
        throw $e;
    }
}

$message = createMessageWithFastFallback(
    $client,
    [
        'model' => 'claude-opus-4-7',
        'maxTokens' => 1024,
        'messages' => [['role' => 'user', 'content' => 'Hello']],
        'betas' => ['fast-mode-2026-02-01'],
        'speed' => 'fast',
    ],
    RequestOptions::with(maxRetries: 0),
);
```

```ruby Ruby nocheck hidelines={1..2}
require "anthropic"

anthropic = Anthropic::Client.new

def create_message_with_fast_fallback(client, request_options: {}, max_attempts: 3, **params)
  client.beta.messages.create(**params, request_options: request_options)
rescue Anthropic::Errors::RateLimitError
  raise unless params[:speed] == "fast"
  params.delete(:speed)
  create_message_with_fast_fallback(client, **params)
rescue Anthropic::Errors::InternalServerError, Anthropic::Errors::APIConnectionError
  raise unless max_attempts > 1
  create_message_with_fast_fallback(client, max_attempts: max_attempts - 1, **params)
end

message = create_message_with_fast_fallback(
  anthropic,
  model: "claude-opus-4-7",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello" }],
  betas: ["fast-mode-2026-02-01"],
  speed: "fast",
  request_options: { max_retries: 0 }
)
```
</CodeGroup>

## 注意事项

- **提示缓存：** 在快速和标准速度之间切换会使提示缓存失效。不同速度的请求不共享缓存前缀。
- **支持的模型：** 快速模式在 Claude Opus 4.6 和 Claude Opus 4.7 上受支持。向不支持的模型发送 `speed: "fast"` 会返回错误。
- **TTFT：** 快速模式的优势集中在每秒输出 token（OTPS）上，而不是首个 token 的时间（TTFT）。
- **Batch API：** 快速模式不适用于 [Batch API](/docs/en/build-with-claude/batch-processing)。
- **优先层：** 快速模式不适用于[优先层](/docs/en/api/service-tiers)。
- **AWS 上的 Claude 平台：** 快速模式目前不适用于 [AWS 上的 Claude 平台](/docs/en/build-with-claude/claude-platform-on-aws)。

## 后续步骤

<CardGroup>
  <Card title="定价" icon="dollar-sign" href="/docs/en/about-claude/pricing#fast-mode-pricing">
    查看详细的快速模式定价信息。
  </Card>
  <Card title="速率限制" icon="gauge" href="/docs/en/api/rate-limits">
    检查快速模式的速率限制层级。
  </Card>
  <Card title="Effort 参数" icon="sliders" href="/docs/en/build-with-claude/effort">
    使用 effort 参数控制 token 使用。
  </Card>
</CardGroup>
