We Scanned a Popular AI SDK — Here's What Every Large Codebase Looks Like
Security debt is not a sign of bad engineering. It is a sign of a codebase that has been used, extended, and shipped by real teams under real constraints. Every large, active project accumulates it. The question is: what does it look like, and what actually matters?
A note on this article
We chose vercel/ai because it is well-maintained, widely respected, and actively developed — not despite those qualities, but because of them. This is not a criticism of the team. It is a demonstration that security debt is structural, not personal. The findings we describe are representative of what appears in any codebase of this scale and age. We hope this is useful.
Why We Picked This Repository
The vercel/ai SDK is the integration layer between AI models and JavaScript applications. It powers streaming, tool calling, and multi-model support for a significant fraction of AI-enabled web applications. With around 1.5 million weekly npm downloads, it is one of the most widely deployed JavaScript packages in the AI ecosystem.
That reach is exactly what makes it interesting for this kind of analysis. Security issues in infrastructure code do not stay contained — they propagate to every application that depends on it. And the team behind it is genuinely skilled; their code quality is high. If security debt accumulates here, it accumulates everywhere.
The Scan
We cloned the repository and ran codeslick scan across the full monorepo — 2,900 files, five languages (TypeScript, JavaScript, TSX, Svelte, Vue). Static analysis only; no runtime execution, no installed dependencies.
$ codeslick scan --path ./vercel-ai --all --json --quick
Scanning 2,900 files across 5 languages...
Files scanned: 2,900
Files with findings: 1,725
Critical: 17
High: 458
Medium: 3,949
Low: 6,026
Total: 10,460
Scan completed in 44.2sRaw numbers rarely tell the right story. Before drawing any conclusions, we need to ask three questions: Where are the findings? What do they actually mean? And how much of this is noise?
Reading the Signal — Where the Findings Actually Are
The vercel/ai repository is a monorepo with two distinct zones: packages/ (the SDK code that ships to npm) and examples/ (demo applications and reference implementations). These have very different risk profiles.
packages/ — ships to npm
3
Critical
239
High
This is the code that runs inside ~1.5M weekly dependents
examples/ — reference code
14
Critical
219
High
Not published to npm, but widely copied by developers
The first thing worth noticing: production code has fewer criticals than example code (3 vs 14). That ratio is actually a positive signal. It means the team maintains a higher bar for production packages than for demo applications — which is exactly the right priority. The production criticals still need attention, but the ratio shows discipline.
The Noise Problem: 30% of Mediums from One File
Before the meaningful findings, the noise. The raw count shows 3,949 medium-severity issues. Of those, 1,212 — about 31% — come from a single file:
// This is synthetic LangGraph test data.
// The 32-character hex strings in this file
// match the regex pattern for Heroku API keys.
// They are not real credentials.
//
// CodeSlick flagged all 1,212 occurrences.
// Every scanner would.This is how secrets detection works at scale: pattern matching against entropy and format, without runtime context. The fixture data is synthetic, but it looks like keys. A scanner cannot know the difference without being explicitly told.
This is not a criticism of the scanner. It is an honest description of the tradeoff every secrets detection tool makes. The practical fix is a .codeslickignore entry scoping __fixtures__/ out of secrets detection. One line eliminates 31% of the noise. We mention it because suppression configuration is where scanner adoption usually stalls — and it should not.
What Actually Matters: Production Findings
After filtering noise, three findings in packages/ are worth taking seriously.
Prototype Pollution — Anthropic Provider
packages/anthropic/src/anthropic-messages-language-model.ts:1836
Spreading a JSON.parse() result directly with the spread operator is the textbook prototype pollution pattern. If the content of that JSON can be influenced by external input — even indirectly, through an API response a man-in-the-middle or supply chain compromise has tampered with — an attacker can pollute Object.prototype and affect the behavior of all objects in the runtime.
The remediation is already in the codebase: packages/provider-utils/src/secure-json-parse.ts exists specifically to handle this. Using it here closes the gap.
Command Injection — Codemod Tool
packages/codemod/src/lib/transform.ts:113
The codemod package is a developer tool — it runs migrations over user codebases. Any use of exec() with input that derives from user-controlled values (file paths, arguments, config) is a command injection vector. A crafted filename like ; rm -rf ~/ executes under the user's privileges. Switching to execFile() with a separate argument array eliminates the shell-interpolation risk entirely.
Cryptographically Weak ID Generation
packages/provider-utils/src/generate-id.ts:28
Math.random() generates predictable output. An attacker who observes a sequence of generated IDs can reconstruct the PRNG state and predict future values. If those IDs are used for session tokens, request deduplication, or any authorization-adjacent purpose, predictability becomes exploitability. crypto.randomUUID() or crypto.getRandomValues() are direct replacements with cryptographic guarantees.
The Most Interesting Finding: AI-Generated Code with Hallucinated Methods
Beyond the traditional security findings, CodeSlick's AI Code Detection system flagged something more unusual: 16 instances of AI-generated code containing hallucinated method calls across production packages. These are patterns where an LLM produced code that calls methods that do not exist in the relevant API.
The distribution is striking:
| Package | Confidence | Hallucinated call(s) |
|---|---|---|
| packages/elevenlabs | HIGH | .append() ×5 |
| packages/openai (transcription) | HIGH | .append() ×4 |
| packages/groq (transcription) | HIGH | .append() ×4 |
| packages/revai | HIGH | .append() ×2 |
| packages/provider-utils/convert-to-form-data | HIGH | .append() ×3 |
| packages/mcp/oauth-types | HIGH | .strip() ×3 |
| packages/codemod (v5 codemods) | HIGH | .size() ×4 |
| packages/codemod (v4/v6 codemods) | MEDIUM | .size(), .remove() |
The .append() hallucination pattern appears in eight different provider packages — elevenlabs, openai, groq, revai, deepgram, and others — all in transcription model implementations. This is not a coincidence. It is a generation fingerprint.
LLMs consistently hallucinate .append() because it is an extremely common method in other contexts (Python lists, DOM operations, string builders). When generating code that loops over items and builds up a data structure, models interpolate this method from training distribution — and in some JavaScript contexts, it does not exist on the target object. The transcription providers appear to share a common scaffold that was generated in one pass and adapted per-provider.
Why this matters beyond the bug itself
Whether these calls throw at runtime depends on the execution path — if the hallucinated method is never reached, the bug is latent. But the pattern reveals something important: AI-generated scaffolding, copied and adapted across multiple packages, carries its errors with it. A single generation mistake becomes eight bugs. This is the replication risk of AI-assisted development that traditional code review does not catch — because all eight implementations look consistent with each other.
What This Tells Us About Security Debt at Scale
The vercel/ai results are a good proxy for what any large, actively developed JavaScript codebase looks like after 18-24 months of real development. A few patterns stand out:
Production code is better than example code — and that is expected
The team correctly prioritizes production packages. The higher critical count in examples reflects a reasonable tradeoff: demo code moves faster, with less scrutiny. The risk is that developers copy example code into production. Keeping examples clean matters more than it looks.
The irony finding: a vulnerability inside the safety module
The prototype pollution vulnerability inside secure-json-parse.ts — the module designed to prevent this — is not unusual. It reflects a pattern where the original purpose of a module drifts as it grows. The module name provides false assurance; the implementation needs the same review as any other.
230 unhandled promise rejections is an operational problem, not just a security one
Unhandled rejections in streaming code mean that failures are silently swallowed. Security implications aside, this makes incident response harder: when something goes wrong in a streaming session, the error may never surface. Error handling discipline and security discipline are the same discipline.
Noise suppression is where adoption fails
31% of medium findings from a single fixture file. Developers who see 3,949 medium findings and no way to filter the fixture noise will close the terminal and not come back. Scanner adoption depends on making the signal/noise ratio manageable — not just finding things, but helping teams focus on what matters.
Run It on Your Own Code
The findings in vercel/ai are not remarkable. They are representative. Any codebase with 2,900 files and 18 months of active development will have a comparable profile. The question is not whether the debt exists — it is whether you can see it.
The three things worth doing after reading this:
01
Separate your zones
Run your scanner against production code and test/example code separately. The findings mean different things in each context.
02
Configure noise suppression first
Before triaging findings, add .ignore rules for fixture and generated directories. Reducing noise by 30% before you start makes everything downstream more useful.
03
Look at the AI detection results
If your codebase uses AI-assisted development, the hallucination patterns are worth reviewing specifically. They cluster by generation session — find one, find the rest.
See What Your Codebase Looks Like
Scan your repository in under a minute. No installation required for the WebTool — paste your code and get results immediately.
Scan data collected March 5, 2026, against the vercel/ai main branch (shallow clone). Static analysis only. All findings are the output of automated tooling and have not been individually verified through runtime testing. The intent of this article is educational — to illustrate what security scanning surfaces in a real-world codebase, not to characterize the security posture of the vercel/ai project.