What Are AI Code Hallucinations
AI code hallucinations are methods, functions, or APIs suggested by large language models (LLMs) that do not exist in the target programming language or framework. When a developer uses ChatGPT or GitHub Copilot and receives a suggestion like text.strip() in JavaScript, that is a Python method that does not exist in JavaScript (the correct method is .trim()).
These hallucinations occur because LLMs are trained on massive codebases across multiple languages. The model learns patterns from Python, Java, Go, and JavaScript simultaneously, causing cross-language confusion. When generating JavaScript code, the model may retrieve patterns from its Python training data, producing syntactically valid but semantically incorrect code.
Hallucinations are not syntax errors—they pass linting and type checking because the method call structure is correct. The code fails at runtime when the JavaScript engine attempts to invoke .strip() on a string object that has no such method, throwing TypeError: text.strip is not a function.
Why AI Hallucinations Are CRITICAL Severity (CVSS 8.5)
Runtime Errors Lead to Information Disclosure
When AI-generated code with hallucinations reaches production, runtime errors expose sensitive information through stack traces, error messages, and application behavior changes. This information disclosure is classified as CRITICAL severity (CVSS 8.5) because it provides attackers with reconnaissance data for subsequent attacks.
Example: Production Stack Trace Exposure
// AI-generated code with hallucination
function processUserInput(data) {
const cleaned = data.strip(); // Python method in JavaScript
return cleaned.toUpperCase();
}
// Production error exposed to user:
TypeError: data.strip is not a function
at processUserInput (app.js:42:24)
at handleRequest (server.js:156:18)
at IncomingMessage.emit (events.js:400:28)
Environment: production
Node version: v18.12.0
Database: postgresql://prod-db.internal:5432/users
The stack trace reveals file structure, technology stack, database location, and function names—enabling attackers to map the attack surface and identify version-specific vulnerabilities.
Business Impact
Organizations using AI coding assistants extensively generate thousands of lines of AI code daily. Without automated detection, hallucinations accumulate. Internal audits at major tech companies found 200+ AI hallucinations in production code, including cross-language method confusion.
Types of AI Hallucinations (119 Patterns Across 5 Languages)
1. Cross-Language Method Confusion
LLMs trained on multiple languages confuse similar operations across language boundaries.
Python Methods in JavaScript
const text = " hello ";
const trimmed = text.strip(); // Python → JavaScript is .trim()
const upper = text.toUpper(); // Python → JavaScript is .toUpperCase()
const items = [1, 2, 3];
items.append(4); // Python → JavaScript is .push()
JavaScript Methods in Python
text = "hello"
upper = text.toUpperCase() # JavaScript → Python is .upper()
items = [1, 2, 3]
items.push(4) # JavaScript → Python is .append()
Java Methods in JavaScript
const items = [1, 2, 3];
items.add(4); // Java → JavaScript is .push()
const hasItem = items.contains(3); // Java → JavaScript is .includes()
2. Framework-Specific Hallucinations
// React deprecated lifecycle methods
class UserProfile extends React.Component {
componentWillMount() { // Removed in React 17
this.fetchData();
}
}
3. Case and Naming Convention Errors
const result = text.replace_all("old", "new"); // snake_case → .replaceAll()
const upper = text.toUppercase(); // Missing 'C' → .toUpperCase()LLM Fingerprints: Detecting Code from ChatGPT vs Copilot vs Claude vs Cursor
Different AI models leave distinctive fingerprints in the code they generate. These patterns are as identifiable as handwriting—each AI has characteristic phrasing, commenting styles, error handling approaches, and structural choices that differ from human code and from each other. CodeSlick analyzes 32 LLM-specific fingerprints to attribute code to specific AI tools.
GPT-4 / ChatGPT Fingerprints (10 Patterns)
GPT-4 code is characterized by verbose explanations, educational phrasing, and a tendency to over-document obvious functionality.
Pattern 1: "Here's a comprehensive solution" Phrasing
// GPT-4 Generated Code
/**
* Comprehensive user authentication handler
*
* Here's a comprehensive solution that handles user authentication
* with robust error handling and edge case coverage. This function
* provides a complete implementation that you can use in production.
*
* @param {string} username - The user's username
* @param {string} password - The user's password
* @returns {Promise} - Returns the authenticated user object
*/
async function authenticateUser(username, password) {
// Implementation...
}
Pattern 2: Excessive Use of "Comprehensive", "Robust", "Solution"
// GPT-4 keyword frequency (per 100 lines of comments):
"comprehensive": 3-7 occurrences
"robust": 2-5 occurrences
"solution": 4-8 occurrences
"implementation": 5-10 occurrences
// Human code frequency:
"comprehensive": 0-1 occurrences
"robust": 0 occurrences
"solution": 0-1 occurrences
Pattern 3: Educational Step-by-Step Comments
// GPT-4 Generated
function processPayment(amount, card) {
// Step 1: Validate the credit card number
if (!isValidCard(card)) throw new Error('Invalid card');
// Step 2: Check if the amount is positive
if (amount <= 0) throw new Error('Invalid amount');
// Step 3: Process the transaction through the payment gateway
const result = paymentGateway.charge(amount, card);
// Step 4: Return the transaction confirmation
return result;
}
// Human Code (no numbered steps, straightforward)
function processPayment(amount, card) {
if (!isValidCard(card)) throw new Error('Invalid card');
if (amount <= 0) throw new Error('Invalid amount');
return paymentGateway.charge(amount, card);
}
Pattern 4: Overly Detailed Parameter Descriptions
// GPT-4: Every parameter explained in detail
/**
* @param {number} userId - The unique identifier for the user in the database
* @param {Object} options - Configuration options for the query
* @param {boolean} options.includeDeleted - Whether to include soft-deleted records
* @param {string[]} options.fields - Array of field names to return in the result
*/
// Human: Minimal, only non-obvious params documented
/**
* @param {Object} options.fields - Fields to include in response
*/
Pattern 5: "Let's" and "We'll" Phrasing in Comments
// GPT-4 Generated
// Let's create a helper function to validate the input
// We'll check if the user exists before proceeding
// Now we'll transform the data into the required format
// Human Code (imperative or no comments)
// Validate input
// Check user exists
// Transform data
GitHub Copilot Fingerprints (8 Patterns)
Copilot generates context-aware suggestions but often leaves placeholder comments when it cannot infer complete functionality.
Pattern 1: Placeholder TODO/FIXME Comments
// Copilot Generated
function calculateTax(amount, state) {
// TODO: Add state-specific tax rates
// FIXME: Handle edge cases
// TODO: Validate input
return amount * 0.08;
}
// Human Code (either implements it or leaves one specific TODO)
function calculateTax(amount, state) {
const rate = TAX_RATES[state] || 0.08;
return amount * rate;
}
Pattern 2: Generic Variable Names
// Copilot Generated (generic names)
function processData(data) {
const result = [];
const temp = data.map(item => item.value);
const processed = temp.filter(x => x > 0);
return processed;
}
// Human Code (domain-specific names)
function filterPositiveValues(measurements) {
const values = measurements.map(m => m.value);
return values.filter(v => v > 0);
}
Pattern 3: Incomplete Error Messages
// Copilot Generated
if (!user) throw new Error('Error');
if (!isValid) throw new Error('Invalid');
if (result === null) throw new Error('Failed');
// Human Code (specific error messages)
if (!user) throw new Error('User not found');
if (!isValid) throw new Error('Email format invalid');
if (result === null) throw new Error('Database query returned no results');
Pattern 4: Boilerplate Import Patterns
// Copilot Generated (imports everything)
import React, { useState, useEffect, useMemo, useCallback } from 'react';
// Only uses useState
// Human Code (imports only what's needed)
import React, { useState } from 'react';
Pattern 5: Try-Catch Without Specific Handling
// Copilot Generated
try {
const data = await fetchData();
return data;
} catch (error) {
console.error(error);
return null;
}
// Human Code (specific error handling)
try {
const data = await fetchData();
return data;
} catch (error) {
if (error.code === 'ECONNREFUSED') {
logger.error('Database connection failed', { error });
throw new ServiceUnavailableError();
}
throw error;
}
Claude (Anthropic) Fingerprints (8 Patterns)
Claude exhibits defensive programming patterns, extensive validation, and a preference for custom error classes.
Pattern 1: Custom Error Class Per Function
// Claude Generated
class ValidationError extends Error {
constructor(message) {
super(message);
this.name = 'ValidationError';
}
}
class ProcessingError extends Error {
constructor(message) {
super(message);
this.name = 'ProcessingError';
}
}
class TransformationError extends Error {
constructor(message) {
super(message);
this.name = 'TransformationError';
}
function validateInput(input) {
if (!input) throw new ValidationError('Input required');
// ...
}
// Human Code (uses standard Error or domain-level error classes)
function validateInput(input) {
if (!input) throw new Error('Input required');
}
Pattern 2: Exhaustive Type Checking
// Claude Generated
function processValue(value) {
if (typeof value !== 'number') {
throw new TypeError('Value must be a number');
}
if (!Number.isFinite(value)) {
throw new RangeError('Value must be finite');
}
if (value < 0) {
throw new RangeError('Value must be non-negative');
}
return value * 2;
}
// Human Code (minimal validation)
function processValue(value) {
if (value < 0) throw new Error('Value must be non-negative');
return value * 2;
}
Pattern 3: Defensive Null/Undefined Checks
// Claude Generated
function getUser(id) {
if (id === null || id === undefined) {
throw new Error('ID cannot be null or undefined');
}
if (typeof id !== 'string' && typeof id !== 'number') {
throw new TypeError('ID must be string or number');
}
// ...
}
// Human Code (assumes type from context)
function getUser(id) {
return users.find(u => u.id === id);
}
Pattern 4: Explicit Return Type Documentation
// Claude Generated (even in JavaScript)
/**
* @returns {Promise} Returns User object if found, null otherwise
* @throws {ValidationError} If userId is invalid
* @throws {DatabaseError} If database query fails
*/
// Human Code (minimal return docs)
/**
* @returns {Promise} User object
*/
Pattern 5: Over-Structured Code Organization
// Claude Generated (excessive structure for simple function)
class UserService {
private validator: UserValidator;
private repository: UserRepository;
private logger: Logger;
constructor(deps: Dependencies) {
this.validator = deps.validator;
this.repository = deps.repository;
this.logger = deps.logger;
}
async getUser(id: string): Promise {
this.logger.info('Fetching user', { id });
this.validator.validateUserId(id);
const user = await this.repository.findById(id);
this.logger.info('User fetched successfully', { id });
return user;
}
}
// Human Code (simple function)
async function getUser(id: string): Promise {
return userRepository.findById(id);
}
Cursor AI Fingerprints (6 Patterns)
Cursor leaves distinctive markers from its AI command interface and iterative refinement process.
Pattern 1: AI Command Markers in Comments
// Cursor Generated
// @ai: add error handling
// @cursor: implement validation
// AI-generated: refactored for clarity
function processData(data) {
// ...
}
Pattern 2: Incremental Refinement Artifacts
// Cursor Generated (shows iteration history)
function calculateTotal(items) {
// v1: simple sum
// v2: added tax calculation
// v3: added discount logic
const subtotal = items.reduce((sum, item) => sum + item.price, 0);
const tax = subtotal * 0.08;
const discount = subtotal > 100 ? subtotal * 0.1 : 0;
return subtotal + tax - discount;
}
// Human Code (only final version)
function calculateTotal(items) {
const subtotal = items.reduce((sum, item) => sum + item.price, 0);
const tax = subtotal * TAX_RATE;
const discount = subtotal > DISCOUNT_THRESHOLD ? subtotal * DISCOUNT_RATE : 0;
return subtotal + tax - discount;
}
Pattern 3: "Based on context" Comments
// Cursor Generated
// Based on the context from lines 45-67, this function handles...
// According to the pattern established above...
// Following the same approach as processUser()...
Pattern 4: Inconsistent Naming Conventions
// Cursor Generated (mixed conventions from iterative changes)
function fetchUserData(userId) {
const user_profile = getUserProfile(userId); // snake_case
const userSettings = getSettings(userId); // camelCase
const UserPreferences = getPrefs(userId); // PascalCase
return { user_profile, userSettings, UserPreferences };
}
// Human Code (consistent convention)
function fetchUserData(userId) {
const userProfile = getUserProfile(userId);
const userSettings = getSettings(userId);
const userPreferences = getPreferences(userId);
return { userProfile, userSettings, userPreferences };
}
Cross-Model Comparison
Here's the same function written by different AI models showing distinctive fingerprints:
// GPT-4: Verbose, educational, "comprehensive"
/**
* Comprehensive email validation function
*
* Here's a comprehensive solution that validates email addresses
* using a robust regular expression pattern. This implementation
* handles all common email formats and edge cases.
*/
function validateEmail(email) {
// Step 1: Check if email is provided
if (!email) return false;
// Step 2: Apply the email validation regex pattern
const emailRegex = /^[^s@]+@[^s@]+.[^s@]+$/;
// Step 3: Return the validation result
return emailRegex.test(email);
}
// Copilot: Generic, placeholder comments
function validateEmail(email) {
// TODO: Add more sophisticated validation
// FIXME: Handle international domains
const regex = /^[^s@]+@[^s@]+.[^s@]+$/;
return regex.test(email);
}
// Claude: Defensive, custom errors, type checking
class EmailValidationError extends Error {
constructor(message) {
super(message);
this.name = 'EmailValidationError';
}
}
function validateEmail(email) {
if (typeof email !== 'string') {
throw new TypeError('Email must be a string');
}
if (email === null || email === undefined) {
throw new EmailValidationError('Email cannot be null or undefined');
}
if (email.length === 0) {
throw new EmailValidationError('Email cannot be empty');
}
const emailRegex = /^[^s@]+@[^s@]+.[^s@]+$/;
return emailRegex.test(email);
}
// Cursor: Incremental artifacts, AI markers
function validateEmail(email) {
// @ai: simplified from previous version
// v2: using standard regex pattern
const regex = /^[^s@]+@[^s@]+.[^s@]+$/;
return regex.test(email);
}
// Human Code: Concise, no explanations
function validateEmail(email) {
return /^[^s@]+@[^s@]+.[^s@]+$/.test(email);
}
Detection Accuracy
CodeSlick's 32 LLM fingerprint patterns achieve:
- GPT-4: 87% attribution accuracy (verbose docstrings are highly distinctive)
- Copilot: 82% attribution accuracy (TODO patterns and generic names)
- Claude: 79% attribution accuracy (custom error classes are unique)
- Cursor: 73% attribution accuracy (AI markers are explicit but less common)
Combined with hallucination patterns (119) and heuristics (13), CodeSlick provides industry-leading AI code detection with 164 total signals across JavaScript, TypeScript, Python, Java, and Go.
AI Code Smells (13 Heuristics)
1. Over-Engineered Error Handling
// AI code: Wraps everything in try-catch
function getValue(key) {
try {
try {
const value = storage.get(key);
try {
return JSON.parse(value);
} catch (parseError) {
return null;
}
} catch (storageError) {
return null;
}
} catch (error) {
return null;
}
}
// Human code: Handles expected errors only
function getValue(key) {
const value = storage.get(key);
return value ? JSON.parse(value) : null;
}
2. Zero Edge Case Handling
// AI code: Happy path only
function divide(a, b) {
return a / b; // No check for b === 0
}
// Human code: Handles edge cases
function divide(a, b) {
if (b === 0) throw new Error("Division by zero");
return a / b;
}
Combined Heuristic Score
AI Confidence Score =
(hallucinations × 0.6) +
(heuristics × 0.25) +
(llmFingerprints × 0.15)
Severity:
Score ≥ 2.0 → CRITICAL (High confidence AI code with hallucinations)
Score ≥ 1.0 → HIGH (Likely AI code with issues)
Score ≥ 0.5 → MEDIUM (Possible AI code)How CodeSlick Detects AI Code (164 Protection Signals)
CodeSlick combines three detection layers to identify AI-generated code with hallucinations, fingerprints, and behavioral patterns.
Layer 1: Hallucination Pattern Matching (119 Patterns)
- JavaScript: 21 patterns (Python influence, Java influence, snake_case, typos)
- TypeScript: 17 patterns (Python-style, case errors, type coercion issues)
- Python: 30 patterns (15 base + 10 Django + 2 FastAPI + 2 SQLAlchemy + 1 Pydantic)
- Java: 12 patterns (JavaScript/Python methods in Java)
- Go: 47 patterns (16 JavaScript + 12 Python + 11 non-existent + 4 framework)
Layer 2: LLM Fingerprint Detection (32 Patterns)
- GPT-4: Verbose docstrings, "comprehensive" keyword, overly detailed comments
- Copilot: Placeholder TODOs, generic variable names, boilerplate patterns
- Claude: Custom error classes, defensive type checking, exhaustive validation
- Cursor: AI command markers, incremental refinement artifacts
Layer 3: Heuristic Scoring (13 Behavioral Checks)
- Over-engineered error handling (nested try-catch blocks)
- Unnecessary wrapper functions
- Zero edge case handling
- Perfect textbook formatting
- Generic variable names
- Missing context-specific logic
- Uniform comment density
Detection Workflow
codeslick analyze app.js --check-ai-code
# Output:
HIGH: AI-generated code detected (Confidence: 85%)
Line 44: text.strip() → JavaScript uses .trim()
Line 45: text.toUpper() → JavaScript uses .toUpperCase()
LLM fingerprint: GPT-4 (verbose docstrings)
Risk: Runtime errors in production (CVSS 8.5)Detect AI hallucinations and LLM fingerprints across JavaScript, TypeScript, Python, Java, and Go with 164 protection signals.
Prevention and Remediation Strategies
1. Automated Detection in CI/CD
# GitHub Actions
- name: Detect AI hallucinations
run: |
codeslick analyze \
--check-ai-code \
--fail-on critical,high \
--format sarif
2. IDE Integration and Real-Time Feedback
# Pre-commit hook
codeslick analyze --check-ai-code --staged-files
3. LLM Prompt Engineering
Bad prompt: "Write a function to trim whitespace"
Good prompt: "Write a JavaScript function using .trim() to remove whitespace.
Do not use Python methods like .strip()."
4. Code Review Focus Areas
- Verify methods exist in language documentation
- Remove unnecessary try-catch blocks
- Add null checks and boundary validation
- Replace generic variable names with domain terms