## Critical Vulnerability Information ### Vulnerability Overview - **Vulnerability Type**: SSRF Bypass in RecursiveUrlLoader via insufficient URL origin validation - **CVE ID**: CVE-2026-26019 - **Severity**: Moderate (CVSS v3 Score: 4.1/10) ### Affected Scope - **Affected Versions**: <= 1.1.13 - **Fixed Version**: 1.1.14 - **Package**: @langchain/community ### Description The `RecursiveUrlLoader` class in `@langchain/community` is a web crawler that recursively follows links starting from a base URL. The `preventOutside` option (enabled by default) is intended to restrict crawling to the same site as the base URL. - **Issue**: The implementation uses `String.startsWith()` to compare URLs, without performing semantic URL validation. Attackers can control the content of crawled pages, including links to domains sharing a common string prefix, causing the crawler to access attacker-controlled or internal infrastructure. - **Additional Issue**: No validation is performed against private or reserved IP addresses. ### Impact Attackers can exploit this vulnerability by influencing the content of crawled pages, leading the crawler to: - Retrieve cloud instance metadata (AWS, GCP, Azure) - Access internal services on private networks - Connect to localhost services - Steal response data via attacker-controlled redirect chains ### Solution - Replace `startsWith` checks with strict origin comparison using the URL API to ensure proper validation of scheme, hostname, and port. - Add SSRF validation to all fetch operations, blocking access to cloud metadata endpoints, private IP ranges, and IPv6 equivalent addresses. ### Mitigation Users should avoid using `RecursiveUrlLoader` on untrusted or user-influenced content, or run crawlers in network environments without access to cloud metadata or internal services.