AI & Emerging

AI Code Hallucinations: Industry-First 164-Signal Detection System

119 patterns + 32 LLM fingerprints + 13 heuristics for detecting AI-generated errors

What Are AI Code Hallucinations

AI code hallucinations are methods, functions, or APIs suggested by large language models (LLMs) that do not exist in the target programming language or framework. When a developer uses ChatGPT or GitHub Copilot and receives a suggestion like text.strip() in JavaScript, that is a Python method that does not exist in JavaScript (the correct method is .trim()).

These hallucinations occur because LLMs are trained on massive codebases across multiple languages. The model learns patterns from Python, Java, Go, and JavaScript simultaneously, causing cross-language confusion. When generating JavaScript code, the model may retrieve patterns from its Python training data, producing syntactically valid but semantically incorrect code.

Hallucinations are not syntax errors—they pass linting and type checking because the method call structure is correct. The code fails at runtime when the JavaScript engine attempts to invoke .strip() on a string object that has no such method, throwing TypeError: text.strip is not a function.

Why AI Hallucinations Are CRITICAL Severity (CVSS 8.5)

Runtime Errors Lead to Information Disclosure

When AI-generated code with hallucinations reaches production, runtime errors expose sensitive information through stack traces, error messages, and application behavior changes. This information disclosure is classified as CRITICAL severity (CVSS 8.5) because it provides attackers with reconnaissance data for subsequent attacks.

Example: Production Stack Trace Exposure

// AI-generated code with hallucination
function processUserInput(data) {
  const cleaned = data.strip();  // Python method in JavaScript
  return cleaned.toUpperCase();
}

// Production error exposed to user:
TypeError: data.strip is not a function
  at processUserInput (app.js:42:24)
  at handleRequest (server.js:156:18)
  at IncomingMessage.emit (events.js:400:28)

Environment: production
Node version: v18.12.0
Database: postgresql://prod-db.internal:5432/users

The stack trace reveals file structure, technology stack, database location, and function names—enabling attackers to map the attack surface and identify version-specific vulnerabilities.

Business Impact

Organizations using AI coding assistants extensively generate thousands of lines of AI code daily. Without automated detection, hallucinations accumulate. Internal audits at major tech companies found 200+ AI hallucinations in production code, including cross-language method confusion.

Types of AI Hallucinations (119 Patterns Across 5 Languages)

1. Cross-Language Method Confusion

LLMs trained on multiple languages confuse similar operations across language boundaries.

Python Methods in JavaScript

const text = "  hello  ";
const trimmed = text.strip();        // Python → JavaScript is .trim()
const upper = text.toUpper();        // Python → JavaScript is .toUpperCase()
const items = [1, 2, 3];
items.append(4);                     // Python → JavaScript is .push()

JavaScript Methods in Python

text = "hello"
upper = text.toUpperCase()           # JavaScript → Python is .upper()
items = [1, 2, 3]
items.push(4)                        # JavaScript → Python is .append()

Java Methods in JavaScript

const items = [1, 2, 3];
items.add(4);                        // Java → JavaScript is .push()
const hasItem = items.contains(3);   // Java → JavaScript is .includes()

2. Framework-Specific Hallucinations

// React deprecated lifecycle methods
class UserProfile extends React.Component {
  componentWillMount() {              // Removed in React 17
    this.fetchData();
  }
}

3. Case and Naming Convention Errors

const result = text.replace_all("old", "new");  // snake_case → .replaceAll()
const upper = text.toUppercase();    // Missing 'C' → .toUpperCase()

LLM Fingerprints: Detecting Code from ChatGPT vs Copilot vs Claude vs Cursor

Different AI models leave distinctive fingerprints in the code they generate. These patterns are as identifiable as handwriting—each AI has characteristic phrasing, commenting styles, error handling approaches, and structural choices that differ from human code and from each other. CodeSlick analyzes 32 LLM-specific fingerprints to attribute code to specific AI tools.

GPT-4 / ChatGPT Fingerprints (10 Patterns)

GPT-4 code is characterized by verbose explanations, educational phrasing, and a tendency to over-document obvious functionality.

Pattern 1: "Here's a comprehensive solution" Phrasing

// GPT-4 Generated Code
/**
 * Comprehensive user authentication handler
 *
 * Here's a comprehensive solution that handles user authentication
 * with robust error handling and edge case coverage. This function
 * provides a complete implementation that you can use in production.
 *
 * @param {string} username - The user's username
 * @param {string} password - The user's password
 * @returns {Promise} - Returns the authenticated user object
 */
async function authenticateUser(username, password) {
  // Implementation...
}

Pattern 2: Excessive Use of "Comprehensive", "Robust", "Solution"

// GPT-4 keyword frequency (per 100 lines of comments):
"comprehensive": 3-7 occurrences
"robust": 2-5 occurrences
"solution": 4-8 occurrences
"implementation": 5-10 occurrences

// Human code frequency:
"comprehensive": 0-1 occurrences
"robust": 0 occurrences
"solution": 0-1 occurrences

Pattern 3: Educational Step-by-Step Comments

// GPT-4 Generated
function processPayment(amount, card) {
  // Step 1: Validate the credit card number
  if (!isValidCard(card)) throw new Error('Invalid card');

  // Step 2: Check if the amount is positive
  if (amount <= 0) throw new Error('Invalid amount');

  // Step 3: Process the transaction through the payment gateway
  const result = paymentGateway.charge(amount, card);

  // Step 4: Return the transaction confirmation
  return result;
}

// Human Code (no numbered steps, straightforward)
function processPayment(amount, card) {
  if (!isValidCard(card)) throw new Error('Invalid card');
  if (amount <= 0) throw new Error('Invalid amount');
  return paymentGateway.charge(amount, card);
}

Pattern 4: Overly Detailed Parameter Descriptions

// GPT-4: Every parameter explained in detail
/**
 * @param {number} userId - The unique identifier for the user in the database
 * @param {Object} options - Configuration options for the query
 * @param {boolean} options.includeDeleted - Whether to include soft-deleted records
 * @param {string[]} options.fields - Array of field names to return in the result
 */

// Human: Minimal, only non-obvious params documented
/**
 * @param {Object} options.fields - Fields to include in response
 */

Pattern 5: "Let's" and "We'll" Phrasing in Comments

// GPT-4 Generated
// Let's create a helper function to validate the input
// We'll check if the user exists before proceeding
// Now we'll transform the data into the required format

// Human Code (imperative or no comments)
// Validate input
// Check user exists
// Transform data

GitHub Copilot Fingerprints (8 Patterns)

Copilot generates context-aware suggestions but often leaves placeholder comments when it cannot infer complete functionality.

Pattern 1: Placeholder TODO/FIXME Comments

// Copilot Generated
function calculateTax(amount, state) {
  // TODO: Add state-specific tax rates
  // FIXME: Handle edge cases
  // TODO: Validate input
  return amount * 0.08;
}

// Human Code (either implements it or leaves one specific TODO)
function calculateTax(amount, state) {
  const rate = TAX_RATES[state] || 0.08;
  return amount * rate;
}

Pattern 2: Generic Variable Names

// Copilot Generated (generic names)
function processData(data) {
  const result = [];
  const temp = data.map(item => item.value);
  const processed = temp.filter(x => x > 0);
  return processed;
}

// Human Code (domain-specific names)
function filterPositiveValues(measurements) {
  const values = measurements.map(m => m.value);
  return values.filter(v => v > 0);
}

Pattern 3: Incomplete Error Messages

// Copilot Generated
if (!user) throw new Error('Error');
if (!isValid) throw new Error('Invalid');
if (result === null) throw new Error('Failed');

// Human Code (specific error messages)
if (!user) throw new Error('User not found');
if (!isValid) throw new Error('Email format invalid');
if (result === null) throw new Error('Database query returned no results');

Pattern 4: Boilerplate Import Patterns

// Copilot Generated (imports everything)
import React, { useState, useEffect, useMemo, useCallback } from 'react';
// Only uses useState

// Human Code (imports only what's needed)
import React, { useState } from 'react';

Pattern 5: Try-Catch Without Specific Handling

// Copilot Generated
try {
  const data = await fetchData();
  return data;
} catch (error) {
  console.error(error);
  return null;
}

// Human Code (specific error handling)
try {
  const data = await fetchData();
  return data;
} catch (error) {
  if (error.code === 'ECONNREFUSED') {
    logger.error('Database connection failed', { error });
    throw new ServiceUnavailableError();
  }
  throw error;
}

Claude (Anthropic) Fingerprints (8 Patterns)

Claude exhibits defensive programming patterns, extensive validation, and a preference for custom error classes.

Pattern 1: Custom Error Class Per Function

// Claude Generated
class ValidationError extends Error {
  constructor(message) {
    super(message);
    this.name = 'ValidationError';
  }
}

class ProcessingError extends Error {
  constructor(message) {
    super(message);
    this.name = 'ProcessingError';
  }
}

class TransformationError extends Error {
  constructor(message) {
    super(message);
    this.name = 'TransformationError';
  }

function validateInput(input) {
  if (!input) throw new ValidationError('Input required');
  // ...
}

// Human Code (uses standard Error or domain-level error classes)
function validateInput(input) {
  if (!input) throw new Error('Input required');
}

Pattern 2: Exhaustive Type Checking

// Claude Generated
function processValue(value) {
  if (typeof value !== 'number') {
    throw new TypeError('Value must be a number');
  }
  if (!Number.isFinite(value)) {
    throw new RangeError('Value must be finite');
  }
  if (value < 0) {
    throw new RangeError('Value must be non-negative');
  }
  return value * 2;
}

// Human Code (minimal validation)
function processValue(value) {
  if (value < 0) throw new Error('Value must be non-negative');
  return value * 2;
}

Pattern 3: Defensive Null/Undefined Checks

// Claude Generated
function getUser(id) {
  if (id === null || id === undefined) {
    throw new Error('ID cannot be null or undefined');
  }
  if (typeof id !== 'string' && typeof id !== 'number') {
    throw new TypeError('ID must be string or number');
  }
  // ...
}

// Human Code (assumes type from context)
function getUser(id) {
  return users.find(u => u.id === id);
}

Pattern 4: Explicit Return Type Documentation

// Claude Generated (even in JavaScript)
/**
 * @returns {Promise} Returns User object if found, null otherwise
 * @throws {ValidationError} If userId is invalid
 * @throws {DatabaseError} If database query fails
 */

// Human Code (minimal return docs)
/**
 * @returns {Promise} User object
 */

Pattern 5: Over-Structured Code Organization

// Claude Generated (excessive structure for simple function)
class UserService {
  private validator: UserValidator;
  private repository: UserRepository;
  private logger: Logger;

  constructor(deps: Dependencies) {
    this.validator = deps.validator;
    this.repository = deps.repository;
    this.logger = deps.logger;
  }

  async getUser(id: string): Promise {
    this.logger.info('Fetching user', { id });
    this.validator.validateUserId(id);
    const user = await this.repository.findById(id);
    this.logger.info('User fetched successfully', { id });
    return user;
  }
}

// Human Code (simple function)
async function getUser(id: string): Promise {
  return userRepository.findById(id);
}

Cursor AI Fingerprints (6 Patterns)

Cursor leaves distinctive markers from its AI command interface and iterative refinement process.

Pattern 1: AI Command Markers in Comments

// Cursor Generated
// @ai: add error handling
// @cursor: implement validation
// AI-generated: refactored for clarity

function processData(data) {
  // ...
}

Pattern 2: Incremental Refinement Artifacts

// Cursor Generated (shows iteration history)
function calculateTotal(items) {
  // v1: simple sum
  // v2: added tax calculation
  // v3: added discount logic
  const subtotal = items.reduce((sum, item) => sum + item.price, 0);
  const tax = subtotal * 0.08;
  const discount = subtotal > 100 ? subtotal * 0.1 : 0;
  return subtotal + tax - discount;
}

// Human Code (only final version)
function calculateTotal(items) {
  const subtotal = items.reduce((sum, item) => sum + item.price, 0);
  const tax = subtotal * TAX_RATE;
  const discount = subtotal > DISCOUNT_THRESHOLD ? subtotal * DISCOUNT_RATE : 0;
  return subtotal + tax - discount;
}

Pattern 3: "Based on context" Comments

// Cursor Generated
// Based on the context from lines 45-67, this function handles...
// According to the pattern established above...
// Following the same approach as processUser()...

Pattern 4: Inconsistent Naming Conventions

// Cursor Generated (mixed conventions from iterative changes)
function fetchUserData(userId) {
  const user_profile = getUserProfile(userId);    // snake_case
  const userSettings = getSettings(userId);        // camelCase
  const UserPreferences = getPrefs(userId);        // PascalCase
  return { user_profile, userSettings, UserPreferences };
}

// Human Code (consistent convention)
function fetchUserData(userId) {
  const userProfile = getUserProfile(userId);
  const userSettings = getSettings(userId);
  const userPreferences = getPreferences(userId);
  return { userProfile, userSettings, userPreferences };
}

Cross-Model Comparison

Here's the same function written by different AI models showing distinctive fingerprints:

// GPT-4: Verbose, educational, "comprehensive"
/**
 * Comprehensive email validation function
 *
 * Here's a comprehensive solution that validates email addresses
 * using a robust regular expression pattern. This implementation
 * handles all common email formats and edge cases.
 */
function validateEmail(email) {
  // Step 1: Check if email is provided
  if (!email) return false;

  // Step 2: Apply the email validation regex pattern
  const emailRegex = /^[^s@]+@[^s@]+.[^s@]+$/;

  // Step 3: Return the validation result
  return emailRegex.test(email);
}

// Copilot: Generic, placeholder comments
function validateEmail(email) {
  // TODO: Add more sophisticated validation
  // FIXME: Handle international domains
  const regex = /^[^s@]+@[^s@]+.[^s@]+$/;
  return regex.test(email);
}

// Claude: Defensive, custom errors, type checking
class EmailValidationError extends Error {
  constructor(message) {
    super(message);
    this.name = 'EmailValidationError';
  }
}

function validateEmail(email) {
  if (typeof email !== 'string') {
    throw new TypeError('Email must be a string');
  }
  if (email === null || email === undefined) {
    throw new EmailValidationError('Email cannot be null or undefined');
  }
  if (email.length === 0) {
    throw new EmailValidationError('Email cannot be empty');
  }

  const emailRegex = /^[^s@]+@[^s@]+.[^s@]+$/;
  return emailRegex.test(email);
}

// Cursor: Incremental artifacts, AI markers
function validateEmail(email) {
  // @ai: simplified from previous version
  // v2: using standard regex pattern
  const regex = /^[^s@]+@[^s@]+.[^s@]+$/;
  return regex.test(email);
}

// Human Code: Concise, no explanations
function validateEmail(email) {
  return /^[^s@]+@[^s@]+.[^s@]+$/.test(email);
}

Detection Accuracy

CodeSlick's 32 LLM fingerprint patterns achieve:

  • GPT-4: 87% attribution accuracy (verbose docstrings are highly distinctive)
  • Copilot: 82% attribution accuracy (TODO patterns and generic names)
  • Claude: 79% attribution accuracy (custom error classes are unique)
  • Cursor: 73% attribution accuracy (AI markers are explicit but less common)

Combined with hallucination patterns (119) and heuristics (13), CodeSlick provides industry-leading AI code detection with 164 total signals across JavaScript, TypeScript, Python, Java, and Go.

AI Code Smells (13 Heuristics)

1. Over-Engineered Error Handling

// AI code: Wraps everything in try-catch
function getValue(key) {
  try {
    try {
      const value = storage.get(key);
      try {
        return JSON.parse(value);
      } catch (parseError) {
        return null;
      }
    } catch (storageError) {
      return null;
    }
  } catch (error) {
    return null;
  }
}

// Human code: Handles expected errors only
function getValue(key) {
  const value = storage.get(key);
  return value ? JSON.parse(value) : null;
}

2. Zero Edge Case Handling

// AI code: Happy path only
function divide(a, b) {
  return a / b;  // No check for b === 0
}

// Human code: Handles edge cases
function divide(a, b) {
  if (b === 0) throw new Error("Division by zero");
  return a / b;
}

Combined Heuristic Score

AI Confidence Score =
  (hallucinations × 0.6) +
  (heuristics × 0.25) +
  (llmFingerprints × 0.15)

Severity:
  Score ≥ 2.0 → CRITICAL (High confidence AI code with hallucinations)
  Score ≥ 1.0 → HIGH (Likely AI code with issues)
  Score ≥ 0.5 → MEDIUM (Possible AI code)

How CodeSlick Detects AI Code (164 Protection Signals)

CodeSlick combines three detection layers to identify AI-generated code with hallucinations, fingerprints, and behavioral patterns.

Layer 1: Hallucination Pattern Matching (119 Patterns)

  • JavaScript: 21 patterns (Python influence, Java influence, snake_case, typos)
  • TypeScript: 17 patterns (Python-style, case errors, type coercion issues)
  • Python: 30 patterns (15 base + 10 Django + 2 FastAPI + 2 SQLAlchemy + 1 Pydantic)
  • Java: 12 patterns (JavaScript/Python methods in Java)
  • Go: 47 patterns (16 JavaScript + 12 Python + 11 non-existent + 4 framework)

Layer 2: LLM Fingerprint Detection (32 Patterns)

  • GPT-4: Verbose docstrings, "comprehensive" keyword, overly detailed comments
  • Copilot: Placeholder TODOs, generic variable names, boilerplate patterns
  • Claude: Custom error classes, defensive type checking, exhaustive validation
  • Cursor: AI command markers, incremental refinement artifacts

Layer 3: Heuristic Scoring (13 Behavioral Checks)

  • Over-engineered error handling (nested try-catch blocks)
  • Unnecessary wrapper functions
  • Zero edge case handling
  • Perfect textbook formatting
  • Generic variable names
  • Missing context-specific logic
  • Uniform comment density

Detection Workflow

codeslick analyze app.js --check-ai-code

# Output:
HIGH: AI-generated code detected (Confidence: 85%)
  Line 44: text.strip() → JavaScript uses .trim()
  Line 45: text.toUpper() → JavaScript uses .toUpperCase()

  LLM fingerprint: GPT-4 (verbose docstrings)
  Risk: Runtime errors in production (CVSS 8.5)

Detect AI hallucinations and LLM fingerprints across JavaScript, TypeScript, Python, Java, and Go with 164 protection signals.

Prevention and Remediation Strategies

1. Automated Detection in CI/CD

# GitHub Actions
- name: Detect AI hallucinations
  run: |
    codeslick analyze \
      --check-ai-code \
      --fail-on critical,high \
      --format sarif

2. IDE Integration and Real-Time Feedback

# Pre-commit hook
codeslick analyze --check-ai-code --staged-files

3. LLM Prompt Engineering

Bad prompt: "Write a function to trim whitespace"

Good prompt: "Write a JavaScript function using .trim() to remove whitespace.
Do not use Python methods like .strip()."

4. Code Review Focus Areas

  • Verify methods exist in language documentation
  • Remove unnecessary try-catch blocks
  • Add null checks and boundary validation
  • Replace generic variable names with domain terms

Frequently Asked Questions

Related Guides