Flex 处理 | OpenAI API

Flex 处理以更慢的响应速度和偶尔的资源不可用为代价，响应 or Chat Completions 换取更低的请求成本。它非常适合非生产或较低优先级的任务，例如模型评估、数据充实和异步工作负载。

Flex 处理目前处于测试阶段，可用模型有限。受支持的模型列在中找到.

API 用量

要使用 Flex 处理，请设置 service_tier 参数设置为 flex in your API request:

Flex 处理示例

python

1
2
3
4
5
6
7
8
9
10
11
12
13
import OpenAI from "openai";
const client = new OpenAI({
    timeout: 15 * 1000 * 60, // Increase default timeout to 15 minutes
});

const response = await client.responses.create({
    model: "gpt-5.5",
    instructions: "List and describe all the metaphors used in this book.",
    input: "<very long text of book here>",
    service_tier: "flex",
}, { timeout: 15 * 1000 * 60 });

console.log(response.output_text);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from openai import OpenAI
client = OpenAI(
    # increase default timeout to 15 minutes (from 10 minutes)
    timeout=900.0
)

# you can override the max timeout per request as well
response = client.with_options(timeout=900.0).responses.create(
    model="gpt-5.5",
    instructions="List and describe all the metaphors used in this book.",
    input="<very long text of book here>",
    service_tier="flex",
)

print(response.output_text)

1
2
3
4
5
6
7
8
9
curl https://api.openai.com/v1/responses \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "instructions": "List and describe all the metaphors used in this book.",
    "input": "<very long text of book here>",
    "service_tier": "flex"
  }'

Flex 处理示例

python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import OpenAI from "openai";
const client = new OpenAI({
    timeout: 15 * 1000 * 60,
});

const response = await client.chat.completions.create({
    model: "gpt-5.5",
    messages: [
        { role: "developer", content: "List and describe all the metaphors used in this book." },
        { role: "user", content: "<very long text of book here>" },
    ],
    service_tier: "flex",
}, { timeout: 15 * 1000 * 60 });

console.log(response.choices[0].message.content);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from openai import OpenAI
client = OpenAI(
    timeout=900.0
)

response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "developer", "content": "List and describe all the metaphors used in this book."},
        {"role": "user", "content": "<very long text of book here>"},
    ],
    service_tier="flex",
    timeout=900.0,
)

print(response.choices[0].message.content)

1
2
3
4
5
6
7
8
curl https://api.openai.com/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer $OPENAI_API_KEY"   -d '{
    "model": "gpt-5.5",
    "messages": [
      {"role": "developer", "content": "List and describe all the metaphors used in this book."},
      {"role": "user", "content": "<very long text of book here>"}
    ],
    "service_tier": "flex"
  }' --max-time 900

API 请求超时

由于 Flex 处理的执行速度较慢，请求超时的可能性更大。以下是处理超时的一些注意事项：

默认超时时间：默认超时时间为 10 分钟 在使用官方 OpenAI SDK 发起 API 请求时。对于冗长的提示词或复杂的任务，您可能需要延长此超时时间。
配置超时时间：每个 SDK 都会提供一个参数来增加此超时时间。在 Python 和 JavaScript SDK 中，这是 timeout 如上方代码示例所示。
自动重试：OpenAI SDK 会自动重试导致 408 Request Timeout 错误代码两次，然后才会抛出异常。

资源不可用错误

Flex 处理有时可能缺乏足够的资源来处理您的请求，从而导致 429 Resource Unavailable 错误代码。 发生这种情况时不会向您收费。

请考虑采用以下策略来处理资源不可用错误：

使用指数退避策略重试请求：实现指数退避适用于可以容忍延迟的工作负载，旨在降低成本，因为当有更多可用容量时，你的请求最终能够完成。有关实现的详细信息，请参阅此 cookbook.
使用标准处理重试请求：当收到资源不可用错误时，如果你的用例值得承担偶尔更高的成本以确保成功完成，请实施带有标准处理的重试策略。为此，请设置 service_tier to auto 在重试的请求中，或者移除 service_tier 参数以使用该项目的默认模式。

推荐

入门

核心概念

Apps SDK

工具

运行与扩展

评估

实时与音频

模型优化

专业模型

正式上线

旧版 API

资源

入门指南

使用 Codex

配置

管理

自动化

学习

发布

核心概念

规划

构建

部署

转化应用

指南

资源

指南

文件上传

API

衡量

广告主 API

API 参考

最新

主题

主题

贡献

分类

主题

项目

活动

API 用量

API 请求超时

资源不可用错误