Chat Completion API

Create model responses for chat conversations

Endpoint

POST/v1/chat/completions

Description

Creates a model response for the given chat conversation. This endpoint is compatible with the OpenAI Chat API and can be used as a drop-in replacement for applications that currently use OpenAI's services.

Request Parameters

Parameter Type Required Description
model string Yes ID of the model to use. Check with your administrator for available models.
messages array Yes A list of messages comprising the conversation so far.
frequency_penalty number No Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
logprobs boolean No Whether to return log probabilities of the output tokens or not.
top_logprobs integer No Number of most likely tokens to return at each token position, along with their log probabilities.
max_tokens integer No The maximum number of tokens to generate in the chat completion.
n integer No How many chat completion choices to generate for each input message.
presence_penalty number No Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
response_format object No An object specifying the format that the model must output. Can request JSON output from the model.
seed integer No If specified, the system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
stop string or array No Up to 4 sequences where the API will stop generating further tokens.
stream boolean No If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the streaming documentation for details.
temperature number No What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p number No An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
tools array No A list of tools the model may call. Currently, only functions are supported as tools.
tool_choice string or object No Controls which (if any) function is called by the model. Can be "none", "auto", or an object specifying a specific function to call.
user string No A unique identifier representing your end-user, which can help the API to monitor and detect abuse.

Message Object

Each message in the messages array should have the following format:

Field Type Required Description
role string Yes The role of the message author. Must be one of "system", "user", "assistant", or "tool".
content string Yes The content of the message.
name string No The name to use for the message author. Only used when role is "tool".
tool_call_id string No The ID of the tool call that this message is responding to. Only used when role is "tool".

Function Object (for Tools)

When using functions as tools, each function should have the following format:

Field Type Required Description
name string Yes The name of the function to be called.
description string No A description of what the function does, used by the model to choose when and how to call the function.
parameters object No The parameters the functions accepts, described as a JSON Schema object.

Response Format

The response will contain the following fields:

Field Type Description
id string A unique identifier for the completion.
object string Always "chat.completion".
created integer The Unix timestamp (in seconds) of when the completion was created.
model string The model used for the completion.
choices array A list of completion choices the model generated for the input prompt.
usage object Usage statistics for the completion request.

Choice Object

Each choice in the choices array contains the following fields:

Field Type Description
index integer The index of the choice in the choices array.
message object The message generated by the model. Contains role (usually "assistant") and content fields.
finish_reason string The reason the model stopped generating further tokens. Can be "stop", "length", "content_filter", or "tool_calls".

Usage Object

The usage object contains the following fields:

Field Type Description
prompt_tokens integer Number of tokens in the prompt.
completion_tokens integer Number of tokens in the generated completion.
total_tokens integer Total number of tokens used in the request (prompt + completion).

Example Request

POST /v1/chat/completions HTTP/1.1
Host: localhost:11435
Content-Type: application/json

{
  "model": "your-model-id",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello, who are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 150
}

Example Response

{
  "id": "chatcmpl-123456789abcdef",
  "object": "chat.completion",
  "created": 1677858242,
  "model": "your-model-id",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm an AI assistant designed to be helpful, harmless, and honest. I can answer questions, provide information, assist with tasks, and engage in conversation on a wide range of topics. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 42,
    "total_tokens": 65
  }
}

Related Resources