We Audited 4 Major AI SDKs — 200 Critical Findings in LangChain, 17 in Vercel AI SDK
We ran static analysis across four of the most widely used AI SDK repositories — vercel/ai, LangChain.js, openai-node, and the official MCP Servers collection. What we found was not random noise. The same three failure modes appear in every codebase, suggesting something structural about how AI SDK code is written and reviewed.
A note on methodology and intent
All four repositories are maintained by skilled teams building infrastructure that millions of developers depend on. This audit is not a criticism of any team. It is a documentation of patterns that emerge at scale in any codebase — especially in a fast-moving ecosystem where velocity is often prioritized over hardening. The findings are the output of automated static analysis and have not been individually verified through runtime testing or exploit development. We publish this to help developers who build on top of these libraries understand the security posture of their dependency graph.
The Audit Scope
We selected four repositories that represent the current AI SDK ecosystem — a streaming integration layer, an agent orchestration framework, the official OpenAI client, and the reference implementation for the MCP protocol. Together they form the dependency stack that most AI applications in production sit on top of today.
Repository Files Critical High Medium Low Total
─────────────────────────────────────────────────────────────────────────
vercel/ai 2,900 17 468 3,949 6,026 10,460
langchain-ai/js 2,129 200 493 3,048 4,909 8,650
openai/openai-node 294 2 93 390 620 1,105
modelcontextprotocol 58 1 12 103 24 140
─────────────────────────────────────────────────────────────────────────
TOTAL 5,381 220 1,066 7,490 11,579 20,35520,355 findings across 5,381 files. 220 critical-severity issues. Before interpreting that number, it is worth asking the right questions: where are the findings concentrated, what do they actually represent, and what is the cross-cutting pattern?
The Headline Finding: 200 Critical Vulnerabilities in LangChain
The number that stands out immediately: 200 critical findings in LangChain.js, compared to 17 in vercel/ai, 2 in openai-node, and 1 in the MCP servers. That is not a marginal difference. It requires explanation.
All 200 critical findings in LangChain fall into a single category: hardcoded credentials. API keys, authentication tokens, and connection strings embedded directly in source files — not in .env files, not in CI secrets, but in committed TypeScript and JavaScript files that ship to npm and are cloned by developers worldwide.
Hardcoded Credentials — Integration Tests and Examples
The pattern is consistent across the 200 occurrences: API keys and authentication tokens committed into integration test files, example scripts, and notebook-style documentation. These include provider API keys for OpenAI, Anthropic, Google, and several vector database providers.
The keys are typically placeholders — strings that follow the correct format but are not live credentials. However, the pattern creates two risks: (1) developers copy examples as starting points and inadvertently retain the structure, replacing placeholder names with real values; (2) any live key committed even briefly creates a permanent record in git history. Once rotated, it still exists in any clone made before rotation.
This is a documentation and example code problem, not a production SDK problem. The LangChain SDK code itself — the parts that ship and run in user applications — does not have 200 hardcoded credentials. But the distinction matters less than it appears: the repository is the first thing developers clone when evaluating the library. Example patterns propagate into production code. This is how supply chain contamination starts — not through malicious injection, but through accidental imitation.
Repository Breakdown
vercel/ai
2,900 files — ~1.5M weekly npm downloads
17
Critical
17
Critical
468
High
3,949
Medium
6,026
Low
Top finding categories
The 1,217 hardcoded-secret findings come from a single fixture file containing synthetic LangGraph test data — 32-character hex strings that match Heroku API key patterns but are not real credentials. After excluding that fixture file, the medium-severity count drops by 31%. The production criticals — prototype pollution in the Anthropic provider and command injection in the codemod tool — are the findings that warrant remediation.
langchain-ai/langchainjs
2,129 files — agent orchestration framework
200
Critical
200
Critical
493
High
3,048
Medium
4,909
Low
Top finding categories
LangChain.js is a large, fast-moving monorepo with integrations for hundreds of AI providers. The 200 critical findings are concentrated in integration test files and example notebooks — not in the core agent runtime. The more operationally significant finding is 2,849 instances of missing error handling in a framework that orchestrates multi-step AI operations. When an individual agent step fails silently, debugging multi-agent pipelines becomes significantly harder.
openai/openai-node
294 files — official OpenAI TypeScript client
2
Critical
2
Critical
93
High
390
Medium
620
Low
Top finding categories
The openai-node client is the smallest repository in this audit and shows the lowest finding density — 3.75 findings per file versus 5.05 for LangChain. The 2 critical findings are hardcoded credentials in example files, consistent with the pattern seen across all four repositories. The 317 any-type-usage findings are notable given that TypeScript's primary value proposition is type safety — any suppresses that guarantee without a compiler warning.
modelcontextprotocol/servers
58 files — reference MCP server implementations
1
Critical
1
Critical
12
High
103
Medium
24
Low
Top finding categories
The official MCP server collection is the smallest and cleanest codebase in this audit — 2.41 findings per file, the lowest density of the four. The critical finding is a hardcoded credential in an example configuration. What is more interesting here is the concentration of null check and error handling findings in a security-sensitive context: MCP servers handle tool calls from AI models that may receive untrusted or malformed arguments. Missing null checks in tool handlers are not cosmetic issues — they are the precondition for crashes that could be triggered by a prompt injection attack.
Three Patterns That Appear in Every Repository
Individual findings are less interesting than the patterns that persist across repositories. Three categories appear in the top findings of all four codebases, regardless of size, maturity, or maintainer team. That is not a coincidence.
Pattern 1Missing Error Handling
2,493
vercel/ai
2,849
langchain
156
openai-node
19
mcp-servers
AI SDK code is asynchronous by nature — API calls, streaming responses, tool executions. Unhandled promise rejections are endemic in async JavaScript, and the pattern compounds in AI applications: a failed intermediate step in a multi-agent chain can propagate silently, producing incorrect output rather than a surfaced error. This is both a reliability issue and a security issue — unhandled errors can expose stack traces, partial state, or unexpected fallback behavior.
Pattern 2Missing Null Checks at API Boundaries
1,355
vercel/ai
2,097
langchain
214
openai-node
66
mcp-servers
AI model responses are optional by nature — fields may be absent, null, or differently structured than the schema predicts, especially across model versions and providers. Code that destructures response objects without null guards throws TypeError: Cannot read properties of undefined in production when a model returns an unexpected shape. In the context of MCP servers, where tool handler inputs arrive from AI models that may receive adversarially crafted prompts, null checks are not defensive coding — they are the input validation layer.
Pattern 3Hardcoded Credentials in Examples and Tests
17 crit
vercel/ai
200 crit
langchain
2 crit
openai-node
1 crit
mcp-servers
Every repository in this audit has critical findings for hardcoded credentials. In most cases these are placeholder strings in example files — strings that look like real API keys but are not live. The risk is not the placeholder itself; it is the pattern. Developers clone these repositories to understand how to use the SDK, then adapt the examples for their own applications. Credential-handling patterns from examples tend to persist in production code. The correct pattern — environment variable injection with a clear process.env.API_KEY — should be universal in examples, not optional.
What This Means if You Build on These Libraries
The security posture of your application is not just a function of your own code. It is a function of everything you import. These four libraries are likely somewhere in your dependency tree if you are building AI-enabled software. That has practical implications.
Inherited error handling gaps compound your own
If the SDK you call does not propagate errors cleanly, your error handling code may never receive a meaningful error object — just undefined, or a generic caught exception. Defense-in-depth means handling errors at your layer regardless of what the library does. Assume async calls can fail silently.
SDK response shapes change across model versions
AI model providers update their response schemas more often than SDK versions are pinned. Missing null checks in the SDK mean your application can break when a model update changes which fields are guaranteed to be present. Add your own defensive destructuring for any response field your application logic depends on.
Example code is not production-ready by default
Every repository in this audit has hardcoded credential patterns in example files. When you adapt example code, treat credential handling as the first thing to replace — not the last. Credential rotation after accidental commit is more expensive than doing it right the first time.
MCP tool handlers need explicit input validation
If you are running MCP servers, the inputs to your tool handlers come from AI models that may receive adversarially crafted prompts. Missing null checks and missing error type checks in the reference implementations suggest that input validation is not yet a cultural norm in MCP development. It needs to be.
Reading Findings Density, Not Raw Count
Raw finding counts are misleading when repositories have very different sizes. Findings per file gives a better signal of underlying quality:
| Repository | Files | Total Findings | Findings / File |
|---|---|---|---|
| modelcontextprotocol/servers | 58 | 140 | 2.41 |
| openai/openai-node | 294 | 1,105 | 3.76 |
| vercel/ai | 2,900 | 10,460 | 3.61 |
| langchain-ai/langchainjs | 2,129 | 8,650 | 4.06 |
On a per-file basis, the four repositories are within the same order of magnitude. The MCP servers collection scores best (2.41), the reference OpenAI client and vercel/ai are comparable (3.61–3.76), and LangChain.js is the densest (4.06). None of these numbers indicate a catastrophically insecure codebase — they are typical of actively developed TypeScript projects in the 2-3 year age range.
Three Actions Worth Taking
01
Scan your own codebase first
The SDK findings described here are inherited risk. Your own codebase is the variable you control. The same three patterns — missing error handling, missing null checks, hardcoded credentials — appear in most application code built on top of these libraries.
02
Treat AI response shapes as untrusted input
Model providers update response schemas. SDK versions lag behind. Defensive null checking on any field your application depends on costs 10 lines and prevents production crashes when upstream schemas change.
03
Validate MCP tool handler inputs explicitly
If you are building MCP servers, every tool handler receives inputs from an AI model that may have been manipulated. Treat tool inputs the same way you treat user input from a web form: validate type, range, and presence before acting on them.
Scan Your AI Application Code
See what CodeSlick surfaces in your own codebase. The same checks that flagged these findings across 4 major AI SDKs run in the WebTool in under a minute.
Scan data collected March 2026 against the main branches of vercel/ai, langchain-ai/langchainjs, openai/openai-node, and modelcontextprotocol/servers (shallow clones). Static analysis only — no runtime execution, no installed dependencies, no exploit verification. Finding counts reflect the output of automated tooling and include both confirmed vulnerabilities and patterns that require manual triage. The intent of this article is to document ecosystem-wide patterns, not to characterize the security posture of any individual project or team.