We Scanned a Popular AI SDK: What Every Large Codebase Looks Like Under a Scanner

Why We Picked This Repository

The vercel/ai SDK is the integration layer between AI models and JavaScript applications. It powers streaming, tool calling, and multi-model support for a significant fraction of AI-enabled web applications. With around 1.5 million weekly npm downloads, it is one of the most widely deployed JavaScript packages in the AI ecosystem.

That reach is exactly what makes it interesting for this kind of analysis. Security issues in infrastructure code do not stay contained — they propagate to every application that depends on it. And the team behind it is genuinely skilled; their code quality is high. If security debt accumulates here, it accumulates everywhere.

The Scan

We cloned the repository and ran codeslick scan across the full monorepo — 2,900 files, five languages (TypeScript, JavaScript, TSX, Svelte, Vue). Static analysis only; no runtime execution, no installed dependencies.

terminal

$ codeslick scan --path ./vercel-ai --all --json --quick

Scanning 2,900 files across 5 languages...

  Files scanned:        2,900
  Files with findings:  1,725
  Critical:                17
  High:                   458
  Medium:               3,949
  Low:                  6,026
  Total:               10,460

Scan completed in 44.2s

Raw numbers rarely tell the right story. Before drawing any conclusions, we need to ask three questions: Where are the findings? What do they actually mean? And how much of this is noise?

Reading the Signal — Where the Findings Actually Are

The vercel/ai repository is a monorepo with two distinct zones: packages/ (the SDK code that ships to npm) and examples/ (demo applications and reference implementations). These have very different risk profiles.

packages/ — ships to npm

Critical

239

High

This is the code that runs inside ~1.5M weekly dependents

examples/ — reference code

Critical

219

High

Not published to npm, but widely copied by developers

The first thing worth noticing: production code has fewer criticals than example code (3 vs 14). That ratio is actually a positive signal. It means the team maintains a higher bar for production packages than for demo applications — which is exactly the right priority. The production criticals still need attention, but the ratio shows discipline.

The Noise Problem: 30% of Mediums from One File

Before the meaningful findings, the noise. The raw count shows 3,949 medium-severity issues. Of those, 1,212 — about 31% — come from a single file:

packages/langchain/src/__fixtures__/langgraph.ts — test fixture

// This is synthetic LangGraph test data.
// The 32-character hex strings in this file
// match the regex pattern for Heroku API keys.
// They are not real credentials.
//
// CodeSlick flagged all 1,212 occurrences.
// Every scanner would.

This is how secrets detection works at scale: pattern matching against entropy and format, without runtime context. The fixture data is synthetic, but it looks like keys. A scanner cannot know the difference without being explicitly told.

This is not a criticism of the scanner. It is an honest description of the tradeoff every secrets detection tool makes. The practical fix is a .codeslickignore entry scoping __fixtures__/ out of secrets detection. One line eliminates 31% of the noise. We mention it because suppression configuration is where scanner adoption usually stalls — and it should not.

What Actually Matters: Production Findings

After filtering noise, three findings in packages/ are worth taking seriously.

CRITICALCWE-1321 · OWASP A08:2025

Prototype Pollution — Anthropic Provider

packages/anthropic/src/anthropic-messages-language-model.ts:1836

Spreading a JSON.parse() result directly with the spread operator is the textbook prototype pollution pattern. If the content of that JSON can be influenced by external input — even indirectly, through an API response a man-in-the-middle or supply chain compromise has tampered with — an attacker can pollute Object.prototype and affect the behavior of all objects in the runtime.

The remediation is already in the codebase: packages/provider-utils/src/secure-json-parse.ts exists specifically to handle this. Using it here closes the gap.

CRITICALCWE-78 · OWASP A03:2021

Command Injection — Codemod Tool

packages/codemod/src/lib/transform.ts:113

The codemod package is a developer tool — it runs migrations over user codebases. Any use of exec() with input that derives from user-controlled values (file paths, arguments, config) is a command injection vector. A crafted filename like ; rm -rf ~/ executes under the user's privileges. Switching to execFile() with a separate argument array eliminates the shell-interpolation risk entirely.

HIGHCWE-338

Cryptographically Weak ID Generation

packages/provider-utils/src/generate-id.ts:28

Math.random() generates predictable output. An attacker who observes a sequence of generated IDs can reconstruct the PRNG state and predict future values. If those IDs are used for session tokens, request deduplication, or any authorization-adjacent purpose, predictability becomes exploitability. crypto.randomUUID() or crypto.getRandomValues() are direct replacements with cryptographic guarantees.

The Most Interesting Finding: AI-Generated Code with Hallucinated Methods

Beyond the traditional security findings, CodeSlick's AI Code Detection system flagged something more unusual: 16 instances of AI-generated code containing hallucinated method calls across production packages. These are patterns where an LLM produced code that calls methods that do not exist in the relevant API.

The distribution is striking:

Package	Confidence	Hallucinated call(s)
packages/elevenlabs	HIGH	.append() ×5
packages/openai (transcription)	HIGH	.append() ×4
packages/groq (transcription)	HIGH	.append() ×4
packages/revai	HIGH	.append() ×2
packages/provider-utils/convert-to-form-data	HIGH	.append() ×3
packages/mcp/oauth-types	HIGH	.strip() ×3
packages/codemod (v5 codemods)	HIGH	.size() ×4
packages/codemod (v4/v6 codemods)	MEDIUM	.size(), .remove()

The .append() hallucination pattern appears in eight different provider packages — elevenlabs, openai, groq, revai, deepgram, and others — all in transcription model implementations. This is not a coincidence. It is a generation fingerprint.

LLMs consistently hallucinate .append() because it is an extremely common method in other contexts (Python lists, DOM operations, string builders). When generating code that loops over items and builds up a data structure, models interpolate this method from training distribution — and in some JavaScript contexts, it does not exist on the target object. The transcription providers appear to share a common scaffold that was generated in one pass and adapted per-provider.

Why this matters beyond the bug itself

Whether these calls throw at runtime depends on the execution path — if the hallucinated method is never reached, the bug is latent. But the pattern reveals something important: AI-generated scaffolding, copied and adapted across multiple packages, carries its errors with it. A single generation mistake becomes eight bugs. This is the replication risk of AI-assisted development that traditional code review does not catch — because all eight implementations look consistent with each other.

What This Tells Us About Security Debt at Scale

The vercel/ai results are a good proxy for what any large, actively developed JavaScript codebase looks like after 18-24 months of real development. A few patterns stand out:

Production code is better than example code — and that is expected

The team correctly prioritizes production packages. The higher critical count in examples reflects a reasonable tradeoff: demo code moves faster, with less scrutiny. The risk is that developers copy example code into production. Keeping examples clean matters more than it looks.

The irony finding: a vulnerability inside the safety module

The prototype pollution vulnerability inside secure-json-parse.ts — the module designed to prevent this — is not unusual. It reflects a pattern where the original purpose of a module drifts as it grows. The module name provides false assurance; the implementation needs the same review as any other.

230 unhandled promise rejections is an operational problem, not just a security one

Unhandled rejections in streaming code mean that failures are silently swallowed. Security implications aside, this makes incident response harder: when something goes wrong in a streaming session, the error may never surface. Error handling discipline and security discipline are the same discipline.

Noise suppression is where adoption fails

31% of medium findings from a single fixture file. Developers who see 3,949 medium findings and no way to filter the fixture noise will close the terminal and not come back. Scanner adoption depends on making the signal/noise ratio manageable — not just finding things, but helping teams focus on what matters.

Run It on Your Own Code

The findings in vercel/ai are not remarkable. They are representative. Any codebase with 2,900 files and 18 months of active development will have a comparable profile. The question is not whether the debt exists — it is whether you can see it.

The three things worth doing after reading this:

Separate your zones

Run your scanner against production code and test/example code separately. The findings mean different things in each context.

Configure noise suppression first

Before triaging findings, add .ignore rules for fixture and generated directories. Reducing noise by 30% before you start makes everything downstream more useful.

Look at the AI detection results

If your codebase uses AI-assisted development, the hallucination patterns are worth reviewing specifically. They cluster by generation session — find one, find the rest.

See What Your Codebase Looks Like

Scan your repository in under a minute. No installation required for the WebTool — paste your code and get results immediately.

Scan in the WebTool Install the CLI

Security ResearchOpen SourceSASTAI Code DetectionPrototype PollutionJavaScript Security

Scan data collected March 5, 2026, against the vercel/ai main branch (shallow clone). Static analysis only. All findings are the output of automated tooling and have not been individually verified through runtime testing. The intent of this article is educational — to illustrate what security scanning surfaces in a real-world codebase, not to characterize the security posture of the vercel/ai project.