vllm.v1.utils
APIServerProcessManager
¶
Manages a group of API server processes.
Handles creation, monitoring, and termination of API server worker processes. Also monitors extra processes to check if they are healthy.
Source code in vllm/v1/utils.py
__init__
¶
__init__(
target_server_fn: Callable,
listen_address: str,
sock: Any,
args: Namespace,
num_servers: int,
input_addresses: list[str],
output_addresses: list[str],
stats_update_address: Optional[str] = None,
)
Initialize and start API server worker processes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_server_fn
|
Callable
|
Function to call for each API server process |
required |
listen_address
|
str
|
Address to listen for client connections |
required |
sock
|
Any
|
Socket for client connections |
required |
args
|
Namespace
|
Command line arguments |
required |
num_servers
|
int
|
Number of API server processes to start |
required |
input_addresses
|
list[str]
|
Input addresses for each API server |
required |
output_addresses
|
list[str]
|
Output addresses for each API server |
required |
stats_update_address
|
Optional[str]
|
Optional stats update address |
None
|
Source code in vllm/v1/utils.py
ConstantList
¶
Source code in vllm/v1/utils.py
copy_slice
¶
Copy the first length elements of a tensor into another tensor in a non-blocking manner.
Used to copy pinned CPU tensor data to pre-allocated GPU tensors.
Returns the sliced target tensor.
Source code in vllm/v1/utils.py
get_engine_client_zmq_addr
¶
Assign a new ZMQ socket address.
If local_only is True, participants are colocated and so a unique IPC address will be returned.
Otherwise, the provided host and port will be used to construct a TCP address (port == 0 means assign an available port).
Source code in vllm/v1/utils.py
report_usage_stats
¶
report_usage_stats(
vllm_config,
usage_context: UsageContext = ENGINE_CONTEXT,
) -> None
Report usage statistics if enabled.
Source code in vllm/v1/utils.py
shutdown
¶
shutdown(procs: list[BaseProcess])
Source code in vllm/v1/utils.py
wait_for_completion_or_failure
¶
wait_for_completion_or_failure(
api_server_manager: APIServerProcessManager,
engine_manager: Optional[
Union[CoreEngineProcManager, CoreEngineActorManager]
] = None,
coordinator: Optional[DPCoordinator] = None,
) -> None
Wait for all processes to complete or detect if any fail.
Raises an exception if any process exits with a non-zero status.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
api_server_manager
|
APIServerProcessManager
|
The manager for API servers. |
required |
engine_manager
|
Optional[Union[CoreEngineProcManager, CoreEngineActorManager]]
|
The manager for engine processes. If CoreEngineProcManager, it manages local engines; if CoreEngineActorManager, it manages all engines. |
None
|
coordinator
|
Optional[DPCoordinator]
|
The coordinator for data parallel. |
None
|