vllm.entrypoints.openai.cli_args
This file contains the command line arguments for the vLLM's OpenAI-compatible server. It is kept in a separate file for documentation purposes.
FrontendArgs
¶
Arguments for the OpenAI-compatible frontend server.
Source code in vllm/entrypoints/openai/cli_args.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
|
allow_credentials
class-attribute
instance-attribute
¶
allow_credentials: bool = False
Allow credentials.
allowed_headers
class-attribute
instance-attribute
¶
Allowed headers.
allowed_methods
class-attribute
instance-attribute
¶
Allowed methods.
allowed_origins
class-attribute
instance-attribute
¶
Allowed origins.
api_key
class-attribute
instance-attribute
¶
If provided, the server will require this key to be presented in the header.
chat_template
class-attribute
instance-attribute
¶
The file path to the chat template, or the template in single-line form for the specified model.
chat_template_content_format
class-attribute
instance-attribute
¶
chat_template_content_format: ChatTemplateContentFormatOption = "auto"
The format to render message content within a chat template.
- "string" will render the content as a string. Example:
"Hello World"
- "openai" will render the content as a list of dictionaries, similar to OpenAI
schema. Example:
[{"type": "text", "text": "Hello world!"}]
disable_fastapi_docs
class-attribute
instance-attribute
¶
disable_fastapi_docs: bool = False
Disable FastAPI's OpenAPI schema, Swagger UI, and ReDoc endpoint.
disable_frontend_multiprocessing
class-attribute
instance-attribute
¶
disable_frontend_multiprocessing: bool = False
If specified, will run the OpenAI frontend server in the same process as the model serving engine.
disable_uvicorn_access_log
class-attribute
instance-attribute
¶
disable_uvicorn_access_log: bool = False
Disable uvicorn access log.
enable_auto_tool_choice
class-attribute
instance-attribute
¶
enable_auto_tool_choice: bool = False
Enable auto tool choice for supported models. Use --tool-call-parser
to specify which parser to use.
enable_force_include_usage
class-attribute
instance-attribute
¶
enable_force_include_usage: bool = False
If set to True, including usage on every request.
enable_prompt_tokens_details
class-attribute
instance-attribute
¶
enable_prompt_tokens_details: bool = False
If set to True, enable prompt_tokens_details in usage.
enable_request_id_headers
class-attribute
instance-attribute
¶
enable_request_id_headers: bool = False
If specified, API server will add X-Request-Id header to responses. Caution: this hurts performance at high QPS.
enable_server_load_tracking
class-attribute
instance-attribute
¶
enable_server_load_tracking: bool = False
If set to True, enable tracking server_load_metrics in the app state.
enable_ssl_refresh
class-attribute
instance-attribute
¶
enable_ssl_refresh: bool = False
Refresh SSL Context when SSL certificate files change
enable_tokenizer_info_endpoint
class-attribute
instance-attribute
¶
enable_tokenizer_info_endpoint: bool = False
Enable the /get_tokenizer_info endpoint. May expose chat templates and other tokenizer configuration.
log_config_file
class-attribute
instance-attribute
¶
log_config_file: Optional[str] = VLLM_LOGGING_CONFIG_PATH
Path to logging config JSON file for both vllm and uvicorn
lora_modules
class-attribute
instance-attribute
¶
lora_modules: Optional[list[LoRAModulePath]] = None
LoRA modules configurations in either 'name=path' format or JSON format
or JSON list format. Example (old format): 'name=path'
Example (new
format): {"name": "name", "path": "lora_path",
"base_model_name": "id"}
max_log_len
class-attribute
instance-attribute
¶
Max number of prompt characters or prompt ID numbers being printed in log. The default of None means unlimited.
middleware
class-attribute
instance-attribute
¶
Additional ASGI middleware to apply to the app. We accept multiple
--middleware arguments. The value should be an import path. If a function
is provided, vLLM will add it to the server using
@app.middleware('http')
. If a class is provided, vLLM will
add it to the server using app.add_middleware()
.
prompt_adapters
class-attribute
instance-attribute
¶
prompt_adapters: Optional[list[PromptAdapterPath]] = None
Prompt adapter configurations in the format name=path. Multiple adapters can be specified.
response_role
class-attribute
instance-attribute
¶
response_role: str = 'assistant'
The role name to return if request.add_generation_prompt=true
.
return_tokens_as_token_ids
class-attribute
instance-attribute
¶
return_tokens_as_token_ids: bool = False
When --max-logprobs
is specified, represents single tokens as
strings of the form 'token_id:{token_id}' so that tokens that are not
JSON-encodable can be identified.
root_path
class-attribute
instance-attribute
¶
FastAPI root_path when app is behind a path based routing proxy.
ssl_ca_certs
class-attribute
instance-attribute
¶
The CA certificates file.
ssl_cert_reqs
class-attribute
instance-attribute
¶
Whether client certificate is required (see stdlib ssl module's).
ssl_certfile
class-attribute
instance-attribute
¶
The file path to the SSL cert file.
ssl_keyfile
class-attribute
instance-attribute
¶
The file path to the SSL key file.
tool_call_parser
class-attribute
instance-attribute
¶
Select the tool call parser depending on the model that you're using.
This is used to parse the model-generated tool call into OpenAI API format.
Required for --enable-auto-tool-choice
. You can choose any option from
the built-in parsers or register a plugin via --tool-parser-plugin
.
tool_parser_plugin
class-attribute
instance-attribute
¶
tool_parser_plugin: str = ''
Special the tool parser plugin write to parse the model-generated tool
into OpenAI API format, the name register in this plugin can be used in
--tool-call-parser
.
uvicorn_log_level
class-attribute
instance-attribute
¶
uvicorn_log_level: Literal[
"debug", "info", "warning", "error", "critical", "trace"
] = "info"
Log level for uvicorn.
add_cli_args
staticmethod
¶
add_cli_args(
parser: FlexibleArgumentParser,
) -> FlexibleArgumentParser
Source code in vllm/entrypoints/openai/cli_args.py
LoRAParserAction
¶
Bases: Action
Source code in vllm/entrypoints/openai/cli_args.py
__call__
¶
__call__(
parser: ArgumentParser,
namespace: Namespace,
values: Optional[Union[str, Sequence[str]]],
option_string: Optional[str] = None,
)
Source code in vllm/entrypoints/openai/cli_args.py
PromptAdapterParserAction
¶
Bases: Action
Source code in vllm/entrypoints/openai/cli_args.py
__call__
¶
__call__(
parser: ArgumentParser,
namespace: Namespace,
values: Optional[Union[str, Sequence[str]]],
option_string: Optional[str] = None,
)
Source code in vllm/entrypoints/openai/cli_args.py
create_parser_for_docs
¶
create_parser_for_docs() -> FlexibleArgumentParser
log_non_default_args
¶
log_non_default_args(args: Namespace)
Source code in vllm/entrypoints/openai/cli_args.py
make_arg_parser
¶
make_arg_parser(
parser: FlexibleArgumentParser,
) -> FlexibleArgumentParser
Create the CLI argument parser used by the OpenAI API server.
We rely on the helper methods of FrontendArgs
and AsyncEngineArgs
to
register all arguments instead of manually enumerating them here. This
avoids code duplication and keeps the argument definitions in one place.
Source code in vllm/entrypoints/openai/cli_args.py
validate_parsed_serve_args
¶
validate_parsed_serve_args(args: Namespace)
Quick checks for model serve args that raise prior to loading.