Chat Completion API - Model Backpack

Endpoint

POST/v1/chat/completions

Description

Creates a model response for the given chat conversation. This endpoint is compatible with the OpenAI Chat API and can be used as a drop-in replacement for applications that currently use OpenAI's services.

Request Parameters

Parameter	Type	Required	Description
model	string	Yes	ID of the model to use. Check with your administrator for available models.
messages	array	Yes	A list of messages comprising the conversation so far.
frequency_penalty	number	No	Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
logprobs	boolean	No	Whether to return log probabilities of the output tokens or not.
top_logprobs	integer	No	Number of most likely tokens to return at each token position, along with their log probabilities.
max_tokens	integer	No	The maximum number of tokens to generate in the chat completion.
n	integer	No	How many chat completion choices to generate for each input message.
presence_penalty	number	No	Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
response_format	object	No	An object specifying the format that the model must output. Can request JSON output from the model.
seed	integer	No	If specified, the system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
stop	string or array	No	Up to 4 sequences where the API will stop generating further tokens.
stream	boolean	No	If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the streaming documentation for details.
temperature	number	No	What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p	number	No	An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
tools	array	No	A list of tools the model may call. Currently, only functions are supported as tools.
tool_choice	string or object	No	Controls which (if any) function is called by the model. Can be "none", "auto", or an object specifying a specific function to call.
user	string	No	A unique identifier representing your end-user, which can help the API to monitor and detect abuse.

Message Object

Each message in the messages array should have the following format:

Field	Type	Required	Description
role	string	Yes	The role of the message author. Must be one of "system", "user", "assistant", or "tool".
content	string	Yes	The content of the message.
name	string	No	The name to use for the message author. Only used when role is "tool".
tool_call_id	string	No	The ID of the tool call that this message is responding to. Only used when role is "tool".

Function Object (for Tools)

When using functions as tools, each function should have the following format:

Field	Type	Required	Description
name	string	Yes	The name of the function to be called.
description	string	No	A description of what the function does, used by the model to choose when and how to call the function.
parameters	object	No	The parameters the functions accepts, described as a JSON Schema object.

Response Format

The response will contain the following fields:

Field	Type	Description
id	string	A unique identifier for the completion.
object	string	Always "chat.completion".
created	integer	The Unix timestamp (in seconds) of when the completion was created.
model	string	The model used for the completion.
choices	array	A list of completion choices the model generated for the input prompt.
usage	object	Usage statistics for the completion request.

Choice Object

Each choice in the choices array contains the following fields:

Field	Type	Description
index	integer	The index of the choice in the choices array.
message	object	The message generated by the model. Contains role (usually "assistant") and content fields.
finish_reason	string	The reason the model stopped generating further tokens. Can be "stop", "length", "content_filter", or "tool_calls".

Usage Object

The usage object contains the following fields:

Field	Type	Description
prompt_tokens	integer	Number of tokens in the prompt.
completion_tokens	integer	Number of tokens in the generated completion.
total_tokens	integer	Total number of tokens used in the request (prompt + completion).

Example Request

POST /v1/chat/completions HTTP/1.1
Host: localhost:11435
Content-Type: application/json

{
  "model": "your-model-id",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello, who are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 150
}

Example Response

{
  "id": "chatcmpl-123456789abcdef",
  "object": "chat.completion",
  "created": 1677858242,
  "model": "your-model-id",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm an AI assistant designed to be helpful, harmless, and honest. I can answer questions, provide information, assist with tasks, and engage in conversation on a wide range of topics. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 42,
    "total_tokens": 65
  }
}

Related Resources

Streaming API - Learn how to use streaming responses
Examples - Additional code examples