AI & Emerging

AI Code Hallucinations: Industry-First 164-Signal Detection System

119 patterns + 32 LLM fingerprints + 13 heuristics for detecting AI-generated errors

What Are AI Code Hallucinations

AI code hallucinations are methods, functions, or APIs suggested by large language models (LLMs) that do not exist in the target programming language or framework. When a developer uses ChatGPT or GitHub Copilot and receives a suggestion like text.strip() in JavaScript, that is a Python method that does not exist in JavaScript (the correct method is .trim()).

These hallucinations occur because LLMs are trained on massive codebases across multiple languages. The model learns patterns from Python, Java, Go, and JavaScript simultaneously, causing cross-language confusion. When generating JavaScript code, the model may retrieve patterns from its Python training data, producing syntactically valid but semantically incorrect code.

Hallucinations are not syntax errors—they pass linting and type checking because the method call structure is correct. The code fails at runtime when the JavaScript engine attempts to invoke .strip() on a string object that has no such method, throwing TypeError: text.strip is not a function.

Why AI Hallucinations Are CRITICAL Severity (CVSS 8.5)

Runtime Errors Lead to Information Disclosure

When AI-generated code with hallucinations reaches production, runtime errors expose sensitive information through stack traces, error messages, and application behavior changes. This information disclosure is classified as CRITICAL severity (CVSS 8.5) because it provides attackers with reconnaissance data for subsequent attacks.

Example: Production Stack Trace Exposure

// AI-generated code with hallucination
function processUserInput(data) {
  const cleaned = data.strip();  // Python method in JavaScript
  return cleaned.toUpperCase();
}

// Production error exposed to user:
TypeError: data.strip is not a function
  at processUserInput (app.js:42:24)
  at handleRequest (server.js:156:18)
  at IncomingMessage.emit (events.js:400:28)

Environment: production
Node version: v18.12.0
Database: postgresql://prod-db.internal:5432/users

The stack trace reveals file structure, technology stack, database location, and function names—enabling attackers to map the attack surface and identify version-specific vulnerabilities.

Business Impact

Organizations using AI coding assistants extensively generate thousands of lines of AI code daily. Without automated detection, hallucinations accumulate. Internal audits at major tech companies found 200+ AI hallucinations in production code, including cross-language method confusion.

Types of AI Hallucinations (119 Patterns Across 5 Languages)

1. Cross-Language Method Confusion

LLMs trained on multiple languages confuse similar operations across language boundaries.

Python Methods in JavaScript

const text = "  hello  ";
const trimmed = text.strip();        // Python → JavaScript is .trim()
const upper = text.toUpper();        // Python → JavaScript is .toUpperCase()
const items = [1, 2, 3];
items.append(4);                     // Python → JavaScript is .push()

JavaScript Methods in Python

text = "hello"
upper = text.toUpperCase()           # JavaScript → Python is .upper()
items = [1, 2, 3]
items.push(4)                        # JavaScript → Python is .append()

Java Methods in JavaScript

const items = [1, 2, 3];
items.add(4);                        // Java → JavaScript is .push()
const hasItem = items.contains(3);   // Java → JavaScript is .includes()

2. Framework-Specific Hallucinations

// React deprecated lifecycle methods
class UserProfile extends React.Component {
  componentWillMount() {              // Removed in React 17
    this.fetchData();
  }
}

3. Case and Naming Convention Errors

const result = text.replace_all("old", "new");  // snake_case → .replaceAll()
const upper = text.toUppercase();    // Missing 'C' → .toUpperCase()

LLM Fingerprints (32 Patterns)

AI-generated code exhibits unique stylistic patterns that distinguish it from human-written code. CodeSlick detects 32 LLM fingerprints specific to GPT-4, GitHub Copilot, Claude, and Cursor.

GPT-4 Fingerprints

/**
 * Comprehensive user authentication handler
 *
 * This function provides a comprehensive solution for user authentication,
 * handling all edge cases and providing robust error handling.
 */

Human docstrings are concise. GPT-4 overuses "comprehensive," "robust," and "solution."

GitHub Copilot Fingerprints

function calculateDiscount(price, userType) {
  // TODO: Add validation
  // FIXME: Handle edge cases
  return price * 0.9;
}

Copilot generates placeholder comments for functionality it cannot infer from context.

Claude Fingerprints

class ValidationError extends Error {}
class ProcessingError extends Error {}
class TransformationError extends Error {}

// One error class per function

Claude creates custom error classes defensively. Human code uses standard Error or domain errors.

AI Code Smells (13 Heuristics)

1. Over-Engineered Error Handling

// AI code: Wraps everything in try-catch
function getValue(key) {
  try {
    try {
      const value = storage.get(key);
      try {
        return JSON.parse(value);
      } catch (parseError) {
        return null;
      }
    } catch (storageError) {
      return null;
    }
  } catch (error) {
    return null;
  }
}

// Human code: Handles expected errors only
function getValue(key) {
  const value = storage.get(key);
  return value ? JSON.parse(value) : null;
}

2. Zero Edge Case Handling

// AI code: Happy path only
function divide(a, b) {
  return a / b;  // No check for b === 0
}

// Human code: Handles edge cases
function divide(a, b) {
  if (b === 0) throw new Error("Division by zero");
  return a / b;
}

Combined Heuristic Score

AI Confidence Score =
  (hallucinations × 0.6) +
  (heuristics × 0.25) +
  (llmFingerprints × 0.15)

Severity:
  Score ≥ 2.0 → CRITICAL (High confidence AI code with hallucinations)
  Score ≥ 1.0 → HIGH (Likely AI code with issues)
  Score ≥ 0.5 → MEDIUM (Possible AI code)

How CodeSlick Detects AI Code (164 Protection Signals)

CodeSlick combines three detection layers to identify AI-generated code with hallucinations, fingerprints, and behavioral patterns.

Layer 1: Hallucination Pattern Matching (119 Patterns)

  • JavaScript: 21 patterns (Python influence, Java influence, snake_case, typos)
  • TypeScript: 17 patterns (Python-style, case errors, type coercion issues)
  • Python: 30 patterns (15 base + 10 Django + 2 FastAPI + 2 SQLAlchemy + 1 Pydantic)
  • Java: 12 patterns (JavaScript/Python methods in Java)
  • Go: 47 patterns (16 JavaScript + 12 Python + 11 non-existent + 4 framework)

Layer 2: LLM Fingerprint Detection (32 Patterns)

  • GPT-4: Verbose docstrings, "comprehensive" keyword, overly detailed comments
  • Copilot: Placeholder TODOs, generic variable names, boilerplate patterns
  • Claude: Custom error classes, defensive type checking, exhaustive validation
  • Cursor: AI command markers, incremental refinement artifacts

Layer 3: Heuristic Scoring (13 Behavioral Checks)

  • Over-engineered error handling (nested try-catch blocks)
  • Unnecessary wrapper functions
  • Zero edge case handling
  • Perfect textbook formatting
  • Generic variable names
  • Missing context-specific logic
  • Uniform comment density

Detection Workflow

codeslick analyze app.js --check-ai-code

# Output:
HIGH: AI-generated code detected (Confidence: 85%)
  Line 44: text.strip() → JavaScript uses .trim()
  Line 45: text.toUpper() → JavaScript uses .toUpperCase()

  LLM fingerprint: GPT-4 (verbose docstrings)
  Risk: Runtime errors in production (CVSS 8.5)

Detect AI hallucinations and LLM fingerprints across JavaScript, TypeScript, Python, Java, and Go with 164 protection signals.

Prevention and Remediation Strategies

1. Automated Detection in CI/CD

# GitHub Actions
- name: Detect AI hallucinations
  run: |
    codeslick analyze \
      --check-ai-code \
      --fail-on critical,high \
      --format sarif

2. IDE Integration and Real-Time Feedback

# Pre-commit hook
codeslick analyze --check-ai-code --staged-files

3. LLM Prompt Engineering

Bad prompt: "Write a function to trim whitespace"

Good prompt: "Write a JavaScript function using .trim() to remove whitespace.
Do not use Python methods like .strip()."

4. Code Review Focus Areas

  • Verify methods exist in language documentation
  • Remove unnecessary try-catch blocks
  • Add null checks and boundary validation
  • Replace generic variable names with domain terms

Frequently Asked Questions

Related Guides

AI Code Hallucinations: Industry-First 164-Signal Detection System | CodeSlick Security Scanner