We Audited 4 Major AI SDKs: 200 Critical Findings in LangChain, 17 in Vercel AI SDK

The Audit Scope

We selected four repositories that represent the current AI SDK ecosystem — a streaming integration layer, an agent orchestration framework, the official OpenAI client, and the reference implementation for the MCP protocol. Together they form the dependency stack that most AI applications in production sit on top of today.

aggregate results — March 2026

Repository          Files    Critical   High     Medium    Low      Total
─────────────────────────────────────────────────────────────────────────
vercel/ai           2,900        17      468      3,949    6,026   10,460
langchain-ai/js     2,129       200      493      3,048    4,909    8,650
openai/openai-node    294         2       93        390      620    1,105
modelcontextprotocol   58         1       12        103       24      140
─────────────────────────────────────────────────────────────────────────
TOTAL               5,381       220    1,066      7,490   11,579   20,355

20,355 findings across 5,381 files. 220 critical-severity issues. Before interpreting that number, it is worth asking the right questions: where are the findings concentrated, what do they actually represent, and what is the cross-cutting pattern?

The Headline Finding: 200 Critical Vulnerabilities in LangChain

The number that stands out immediately: 200 critical findings in LangChain.js, compared to 17 in vercel/ai, 2 in openai-node, and 1 in the MCP servers. That is not a marginal difference. It requires explanation.

All 200 critical findings in LangChain fall into a single category: hardcoded credentials. API keys, authentication tokens, and connection strings embedded directly in source files — not in .env files, not in CI secrets, but in committed TypeScript and JavaScript files that ship to npm and are cloned by developers worldwide.

200 CRITICALlangchain-ai/langchainjs

Hardcoded Credentials — Integration Tests and Examples

The pattern is consistent across the 200 occurrences: API keys and authentication tokens committed into integration test files, example scripts, and notebook-style documentation. These include provider API keys for OpenAI, Anthropic, Google, and several vector database providers.

The keys are typically placeholders — strings that follow the correct format but are not live credentials. However, the pattern creates two risks: (1) developers copy examples as starting points and inadvertently retain the structure, replacing placeholder names with real values; (2) any live key committed even briefly creates a permanent record in git history. Once rotated, it still exists in any clone made before rotation.

This is a documentation and example code problem, not a production SDK problem. The LangChain SDK code itself — the parts that ship and run in user applications — does not have 200 hardcoded credentials. But the distinction matters less than it appears: the repository is the first thing developers clone when evaluating the library. Example patterns propagate into production code. This is how supply chain contamination starts — not through malicious injection, but through accidental imitation.

Repository Breakdown

vercel/ai

2,900 files — ~1.5M weekly npm downloads

Critical

468

High

3,949

Medium

6,026

Low

Top finding categories

console-log (debug output in production paths)3,062

missing-error-handling2,493

missing-null-checks1,355

hardcoded-secret-heroku-api-key (fixture data)1,217

The 1,217 hardcoded-secret findings come from a single fixture file containing synthetic LangGraph test data — 32-character hex strings that match Heroku API key patterns but are not real credentials. After excluding that fixture file, the medium-severity count drops by 31%. The production criticals — prototype pollution in the Anthropic provider and command injection in the codemod tool — are the findings that warrant remediation.

langchain-ai/langchainjs

2,129 files — agent orchestration framework

200

Critical

200

Critical

493

High

3,048

Medium

4,909

Low

Top finding categories

missing-error-handling2,849

missing-null-checks2,097

console-log1,123

any-type-usage (TypeScript type safety erosion)937

LangChain.js is a large, fast-moving monorepo with integrations for hundreds of AI providers. The 200 critical findings are concentrated in integration test files and example notebooks — not in the core agent runtime. The more operationally significant finding is 2,849 instances of missing error handling in a framework that orchestrates multi-step AI operations. When an individual agent step fails silently, debugging multi-agent pipelines becomes significantly harder.

openai/openai-node

294 files — official OpenAI TypeScript client

Critical

High

390

Medium

620

Low

Top finding categories

any-type-usage317

missing-null-checks214

missing-error-handling156

The openai-node client is the smallest repository in this audit and shows the lowest finding density — 3.75 findings per file versus 5.05 for LangChain. The 2 critical findings are hardcoded credentials in example files, consistent with the pattern seen across all four repositories. The 317 any-type-usage findings are notable given that TypeScript's primary value proposition is type safety — any suppresses that guarantee without a compiler warning.

modelcontextprotocol/servers

58 files — reference MCP server implementations

Critical

High

103

Medium

Low

Top finding categories

missing-null-checks66

missing-error-handling19

missing-error-type-check11

The official MCP server collection is the smallest and cleanest codebase in this audit — 2.41 findings per file, the lowest density of the four. The critical finding is a hardcoded credential in an example configuration. What is more interesting here is the concentration of null check and error handling findings in a security-sensitive context: MCP servers handle tool calls from AI models that may receive untrusted or malformed arguments. Missing null checks in tool handlers are not cosmetic issues — they are the precondition for crashes that could be triggered by a prompt injection attack.

Three Patterns That Appear in Every Repository

Individual findings are less interesting than the patterns that persist across repositories. Three categories appear in the top findings of all four codebases, regardless of size, maturity, or maintainer team. That is not a coincidence.

Pattern 1Missing Error Handling

2,493

vercel/ai

2,849

langchain

156

openai-node

mcp-servers

AI SDK code is asynchronous by nature — API calls, streaming responses, tool executions. Unhandled promise rejections are endemic in async JavaScript, and the pattern compounds in AI applications: a failed intermediate step in a multi-agent chain can propagate silently, producing incorrect output rather than a surfaced error. This is both a reliability issue and a security issue — unhandled errors can expose stack traces, partial state, or unexpected fallback behavior.

Pattern 2Missing Null Checks at API Boundaries

1,355

vercel/ai

2,097

langchain

214

openai-node

mcp-servers

AI model responses are optional by nature — fields may be absent, null, or differently structured than the schema predicts, especially across model versions and providers. Code that destructures response objects without null guards throws TypeError: Cannot read properties of undefined in production when a model returns an unexpected shape. In the context of MCP servers, where tool handler inputs arrive from AI models that may receive adversarially crafted prompts, null checks are not defensive coding — they are the input validation layer.

Pattern 3Hardcoded Credentials in Examples and Tests

17 crit

vercel/ai

200 crit

langchain

2 crit

openai-node

1 crit

mcp-servers

Every repository in this audit has critical findings for hardcoded credentials. In most cases these are placeholder strings in example files — strings that look like real API keys but are not live. The risk is not the placeholder itself; it is the pattern. Developers clone these repositories to understand how to use the SDK, then adapt the examples for their own applications. Credential-handling patterns from examples tend to persist in production code. The correct pattern — environment variable injection with a clear process.env.API_KEY — should be universal in examples, not optional.

What This Means if You Build on These Libraries

The security posture of your application is not just a function of your own code. It is a function of everything you import. These four libraries are likely somewhere in your dependency tree if you are building AI-enabled software. That has practical implications.

Inherited error handling gaps compound your own

If the SDK you call does not propagate errors cleanly, your error handling code may never receive a meaningful error object — just undefined, or a generic caught exception. Defense-in-depth means handling errors at your layer regardless of what the library does. Assume async calls can fail silently.

SDK response shapes change across model versions

AI model providers update their response schemas more often than SDK versions are pinned. Missing null checks in the SDK mean your application can break when a model update changes which fields are guaranteed to be present. Add your own defensive destructuring for any response field your application logic depends on.

Example code is not production-ready by default

Every repository in this audit has hardcoded credential patterns in example files. When you adapt example code, treat credential handling as the first thing to replace — not the last. Credential rotation after accidental commit is more expensive than doing it right the first time.

MCP tool handlers need explicit input validation

If you are running MCP servers, the inputs to your tool handlers come from AI models that may receive adversarially crafted prompts. Missing null checks and missing error type checks in the reference implementations suggest that input validation is not yet a cultural norm in MCP development. It needs to be.

Reading Findings Density, Not Raw Count

Raw finding counts are misleading when repositories have very different sizes. Findings per file gives a better signal of underlying quality:

Repository	Files	Total Findings	Findings / File
modelcontextprotocol/servers	58	140	2.41
openai/openai-node	294	1,105	3.76
vercel/ai	2,900	10,460	3.61
langchain-ai/langchainjs	2,129	8,650	4.06

On a per-file basis, the four repositories are within the same order of magnitude. The MCP servers collection scores best (2.41), the reference OpenAI client and vercel/ai are comparable (3.61–3.76), and LangChain.js is the densest (4.06). None of these numbers indicate a catastrophically insecure codebase — they are typical of actively developed TypeScript projects in the 2-3 year age range.

Three Actions Worth Taking

Scan your own codebase first

The SDK findings described here are inherited risk. Your own codebase is the variable you control. The same three patterns — missing error handling, missing null checks, hardcoded credentials — appear in most application code built on top of these libraries.

Treat AI response shapes as untrusted input

Model providers update response schemas. SDK versions lag behind. Defensive null checking on any field your application depends on costs 10 lines and prevents production crashes when upstream schemas change.

Validate MCP tool handler inputs explicitly

If you are building MCP servers, every tool handler receives inputs from an AI model that may have been manipulated. Treat tool inputs the same way you treat user input from a web form: validate type, range, and presence before acting on them.

Scan Your AI Application Code

See what CodeSlick surfaces in your own codebase. The same checks that flagged these findings across 4 major AI SDKs run in the WebTool in under a minute.

Scan in the WebTool Install the CLI

Security ResearchAI SDKsOpen Source AuditSASTLangChainVercel AIMCP SecuritySupply Chain

Scan data collected March 2026 against the main branches of vercel/ai, langchain-ai/langchainjs, openai/openai-node, and modelcontextprotocol/servers (shallow clones). Static analysis only — no runtime execution, no installed dependencies, no exploit verification. Finding counts reflect the output of automated tooling and include both confirmed vulnerabilities and patterns that require manual triage. The intent of this article is to document ecosystem-wide patterns, not to characterize the security posture of any individual project or team.