Endpoint
POST/v1/chat/completions
Description
Creates a model response for the given chat conversation. This endpoint is compatible with the OpenAI Chat API and can be used as a drop-in replacement for applications that currently use OpenAI's services.
Request Parameters
Parameter | Type | Required | Description |
---|---|---|---|
model | string | Yes | ID of the model to use. Check with your administrator for available models. |
messages | array | Yes | A list of messages comprising the conversation so far. |
frequency_penalty | number | No | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
logprobs | boolean | No | Whether to return log probabilities of the output tokens or not. |
top_logprobs | integer | No | Number of most likely tokens to return at each token position, along with their log probabilities. |
max_tokens | integer | No | The maximum number of tokens to generate in the chat completion. |
n | integer | No | How many chat completion choices to generate for each input message. |
presence_penalty | number | No | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
response_format | object | No | An object specifying the format that the model must output. Can request JSON output from the model. |
seed | integer | No | If specified, the system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. |
stop | string or array | No | Up to 4 sequences where the API will stop generating further tokens. |
stream | boolean | No | If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the streaming documentation for details. |
temperature | number | No | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. |
top_p | number | No | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. |
tools | array | No | A list of tools the model may call. Currently, only functions are supported as tools. |
tool_choice | string or object | No | Controls which (if any) function is called by the model. Can be "none", "auto", or an object specifying a specific function to call. |
user | string | No | A unique identifier representing your end-user, which can help the API to monitor and detect abuse. |
Message Object
Each message in the messages array should have the following format:
Field | Type | Required | Description |
---|---|---|---|
role | string | Yes | The role of the message author. Must be one of "system", "user", "assistant", or "tool". |
content | string | Yes | The content of the message. |
name | string | No | The name to use for the message author. Only used when role is "tool". |
tool_call_id | string | No | The ID of the tool call that this message is responding to. Only used when role is "tool". |
Function Object (for Tools)
When using functions as tools, each function should have the following format:
Field | Type | Required | Description |
---|---|---|---|
name | string | Yes | The name of the function to be called. |
description | string | No | A description of what the function does, used by the model to choose when and how to call the function. |
parameters | object | No | The parameters the functions accepts, described as a JSON Schema object. |
Response Format
The response will contain the following fields:
Field | Type | Description |
---|---|---|
id | string | A unique identifier for the completion. |
object | string | Always "chat.completion". |
created | integer | The Unix timestamp (in seconds) of when the completion was created. |
model | string | The model used for the completion. |
choices | array | A list of completion choices the model generated for the input prompt. |
usage | object | Usage statistics for the completion request. |
Choice Object
Each choice in the choices array contains the following fields:
Field | Type | Description |
---|---|---|
index | integer | The index of the choice in the choices array. |
message | object | The message generated by the model. Contains role (usually "assistant") and content fields. |
finish_reason | string | The reason the model stopped generating further tokens. Can be "stop", "length", "content_filter", or "tool_calls". |
Usage Object
The usage object contains the following fields:
Field | Type | Description |
---|---|---|
prompt_tokens | integer | Number of tokens in the prompt. |
completion_tokens | integer | Number of tokens in the generated completion. |
total_tokens | integer | Total number of tokens used in the request (prompt + completion). |
Example Request
POST /v1/chat/completions HTTP/1.1
Host: localhost:11435
Content-Type: application/json
{
"model": "your-model-id",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello, who are you?"
}
],
"temperature": 0.7,
"max_tokens": 150
}
Example Response
{
"id": "chatcmpl-123456789abcdef",
"object": "chat.completion",
"created": 1677858242,
"model": "your-model-id",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm an AI assistant designed to be helpful, harmless, and honest. I can answer questions, provide information, assist with tasks, and engage in conversation on a wide range of topics. How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 23,
"completion_tokens": 42,
"total_tokens": 65
}
}
Related Resources
- Streaming API - Learn how to use streaming responses
- Examples - Additional code examples