Overview
The streaming API allows you to receive chat completion responses in real-time as they are generated, rather than waiting for the entire response. This is particularly useful for chat interfaces where showing progressive responses creates a better user experience.
How to Use Streaming
To use streaming, simply set the stream
parameter to true
in your request to the chat completions endpoint:
POST /v1/chat/completions HTTP/1.1
Host: localhost:11435
Content-Type: application/json
{
"model": "your-model-id",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Tell me a story about a robot."
}
],
"temperature": 0.7,
"stream": true
}
Stream Format
When streaming is enabled, the API will return a stream of data in the format of Server-Sent Events (SSE). Each event contains a JSON object that represents a token or a small piece of the full response.
Note: The response will be streamed as a series of data chunks, each prefixed with data:
and terminated with two newlines \n\n
. The final chunk will be data: [DONE]
.
Streaming Response Structure
Each chunk in the stream will contain the following structure:
Field | Type | Description |
---|---|---|
id | string | A unique identifier for the completion. Each chunk has the same ID. |
object | string | Always "chat.completion.chunk". |
created | integer | The Unix timestamp (in seconds) of when the completion was created. Each chunk has the same timestamp. |
model | string | The model used for the completion. |
choices | array | An array containing incremental message content, with one choice for each requested completion. |
system_fingerprint | string | This fingerprint represents the backend configuration that the model runs with. |
service_tier | string | The service tier used to process the request (if applicable). |
usage | object | Only included in the final chunk if stream_options.include_usage is true. Contains token usage information. |
Streaming Choice Object
Each choice in the choices array contains:
Field | Type | Description |
---|---|---|
index | integer | The index of the choice in the choices array. |
delta | object | Contains the incremental content for the message. |
finish_reason | string | The reason for finishing the completion. Only present in the final chunk of a choice if it has finished. Values include "stop", "length", etc. |
Delta Object
The delta object can contain:
Field | Type | Description |
---|---|---|
role | string | The role of the message author. Only included in the first chunk. |
content | string | The content fragment of the message. Can be empty if there's no new content to show in this chunk. |
Example Streaming Response
Here's how a streaming response might look:
data: {
"id": "chatcmpl-123456789abcdef",
"object": "chat.completion.chunk",
"created": 1677858242,
"model": "your-model-id",
"choices": [
{
"index": 0,
"delta": {
"role": "assistant"
},
"finish_reason": null
}
]
}
data: {
"id": "chatcmpl-123456789abcdef",
"object": "chat.completion.chunk",
"created": 1677858242,
"model": "your-model-id",
"choices": [
{
"index": 0,
"delta": {
"content": "Once"
},
"finish_reason": null
}
]
}
data: {
"id": "chatcmpl-123456789abcdef",
"object": "chat.completion.chunk",
"created": 1677858242,
"model": "your-model-id",
"choices": [
{
"index": 0,
"delta": {
"content": " upon"
},
"finish_reason": null
}
]
}
...more chunks...
data: {
"id": "chatcmpl-123456789abcdef",
"object": "chat.completion.chunk",
"created": 1677858242,
"model": "your-model-id",
"choices": [
{
"index": 0,
"delta": {
"content": ""
},
"finish_reason": "stop"
}
]
}
data: [DONE]
Best Practices
- Always handle potential errors in your streaming code.
- Be prepared for network interruptions during streaming.
- Buffer the incoming tokens and consider updating your UI at reasonable intervals rather than for every token.
- When implementing a chat interface, make sure to display typing indicators while waiting for the stream to start.
Related Resources
- Chat Completion API - Documentation for the standard chat completion endpoint
- Examples - Additional code examples