How AI-Generated Code Fails Silently in Backend Systems

The cache was configured correctly. The annotation was on the right method, the key expression was valid, the cache manager was wired in. The code compiled. The tests passed. The application started without warnings.

The database was still getting hit on every request.

Claude Code had generated a @Cacheable method inside a service class, and I was calling that method from another method in the same class. Spring's caching is proxy-based — when you call a method internally, you're bypassing the proxy entirely. The annotation becomes decoration. No error. No warning. Just a cache that never fires.

That's the failure mode I want to write about. Not crashes. Not compilation errors. The code that looks right, passes tests, and then quietly does the wrong thing in production.

Why silent failures are worse

A NullPointerException with a stack trace is annoying, but it's locatable. You know where to start. A silent failure gives you symptoms instead: slow performance, unexpected database load, wrong data that doesn't crash anything. The source is somewhere in code you already verified.

Silent failures also carry a trust cost. When AI-generated code crashes, you review the crash. When it silently misbehaves, you often spend time questioning your infrastructure, your queries, your deployment — before you circle back to the code itself. The review cost is distributed and harder to account for.

I've cataloged four categories of these failures in my own Spring Boot codebase. They're not AI-specific in the sense that a human engineer could write any of them. What makes them AI-specific is how reliably they appear and how little the generated code signals that anything is wrong.

Category 1: Annotation semantics — proxy bypass

Spring's AOP annotations (@Cacheable, @Async, @Retryable, @Transactional on non-private methods) all depend on the proxy. The proxy wraps your bean at injection time. When you call an annotated method from outside the class, the proxy intercepts it. When you call it from inside the class, you're calling this.method() directly — the proxy is not involved.

AI generates annotations correctly in isolation. What it doesn't model is the call graph. If the annotated method is only ever invoked internally, the annotation does nothing.

Wrong:

@Service
public class ProductService {

    @Cacheable("products")
    public Product findById(Long id) {
        return repository.findById(id).orElseThrow();
    }

    public ProductDto getProductDetail(Long id) {
        // Calls findById internally — proxy is bypassed, cache never fires
        Product product = findById(id);
        return mapper.toDto(product);
    }
}

Correct:

@Service
public class ProductService {

    private final ProductService self; // inject self to go through proxy

    public ProductService(ProductService self) {
        this.self = self;
    }

    @Cacheable("products")
    public Product findById(Long id) {
        return repository.findById(id).orElseThrow();
    }

    public ProductDto getProductDetail(Long id) {
        Product product = self.findById(id); // proxy is involved
        return mapper.toDto(product);
    }
}

The self-injection pattern is not obvious. AI doesn't generate it because it's statistically rare in training data. The simpler pattern — annotate, call — is everywhere, and it's the one that gets generated.

Category 2: Transaction boundaries — what AI can't infer

AI doesn't know your data model's lazy-loading configuration. It generates service methods that look transactionally sound but access lazy-loaded associations outside the session that loaded the entity.

This one can surface as a LazyInitializationException — which is at least a loud failure. The silent version is when serialization happens to trigger the load before the session closes, and you think it works, but you're generating N+1 queries silently on every request.

Wrong:

@RestController
public class OrderController {

    @GetMapping("/orders/{id}")
    public OrderResponse getOrder(@PathVariable Long id) {
        Order order = orderService.findById(id);
        // orderService is @Transactional, but the transaction closes before this line
        // If items is LAZY, this triggers a new query outside the session
        return new OrderResponse(order, order.getItems());
    }
}

Correct:

@Service
public class OrderService {

    @Transactional(readOnly = true)
    public OrderResponse getOrderWithItems(Long id) {
        Order order = repository.findByIdWithItems(id); // fetch join in the query
        return new OrderResponse(order, order.getItems()); // accessed inside transaction
    }
}

AI tends to generate repository calls inside @Transactional service methods, then build the response DTO in the controller layer — after the transaction closes. The pattern looks layered and clean. The fetch behavior is wrong.

Category 3: Serialization silences — Jackson and Hibernate

Two patterns that come up often.

The first: LocalDateTime serialization. Without the JavaTimeModule configured, Jackson serializes LocalDateTime as an array: [2026, 4, 7, 14, 30, 0]. The application starts, the endpoint responds, unit tests with mocked data pass. API clients start receiving a completely different format than documented.

The second, and more costly: Hibernate enum mapping by ordinal.

Wrong:

@Entity
public class Subscription {

    @Enumerated // defaults to ORDINAL
    private SubscriptionStatus status;
}

public enum SubscriptionStatus {
    ACTIVE,   // stored as 0
    PAUSED,   // stored as 1
    CANCELLED // stored as 2
}

Correct:

@Entity
public class Subscription {

    @Enumerated(EnumType.STRING) // stored as "ACTIVE", "PAUSED", "CANCELLED"
    private SubscriptionStatus status;
}

With ordinal mapping, reordering the enum — or adding a value in the middle — silently corrupts existing records. PAUSED becomes CANCELLED. Existing data maps to the wrong state. No exception. The migration runs clean. The bug is in the data.

AI generates @Enumerated without EnumType.STRING because the default is valid syntax and valid syntax is what gets generated. It compiles. It works on a fresh schema. It breaks production data if you ever modify the enum.

Category 4: Happy-path test bias

This was described accurately somewhere as "asserting the same mistake twice." When AI generates both the implementation and the test, it can reproduce the same flawed assumption in both.

Wrong — implementation:

public List<Product> getPage(int page, int size) {
    int offset = page * size;
    return repository.findWithOffset(offset, size);
}

Wrong — AI-generated test that passes but proves nothing:

@Test
void returnsSecondPage() {
    List<Product> results = service.getPage(1, 10);
    assertThat(results).hasSize(10);
    // Never verifies what offset was actually used
    // If offset were calculated incorrectly, this test passes as long as 10 results exist
}

The test passes. The assertion is real. But it doesn't verify the calculation — it only verifies the result count. This pattern shows up most often in pagination logic, currency arithmetic, and date range calculations — anywhere the logic has a small invariant that doesn't show up in the happy path.

Why AI generates these

LLMs generate statistically likely code. The annotations are all valid. The patterns are all common. What's missing is the runtime invariant knowledge — the proxy model, the session lifecycle, the serialization defaults, the schema migration consequence.

Those invariants live in the system, not in the syntax. A senior engineer carries them as internalized rules. AI generates syntactically correct code that violates them because the violation is invisible at generation time.

What I actually do about it

When reviewing AI-generated code in a Spring Boot codebase, I check for these specifically:

Annotated method with internal call? Look for @Cacheable, @Async, @Retryable on methods called from the same class.
Lazy associations accessed outside @Transactional scope? Trace from where the entity is loaded to where its collections are accessed.
@Enumerated without EnumType.STRING? Flag every enum mapping. The default is wrong for any schema you'll modify.
LocalDateTime in any API response? Verify JavaTimeModule is registered.
Test asserts on output existence rather than logic? If the test only checks count or presence without verifying the calculation, it's testing survival, not correctness.

The review is fast once you know what to look for. The failures are consistent enough that this checklist runs in a few minutes per PR.

The actual position

This isn't a post against AI tooling in backend work. I wrote this because the failure mode documentation tends to be either vague ("AI makes mistakes") or written for JavaScript developers. The Spring Boot specifics matter — the proxy model, the session lifecycle, the serialization defaults — and those aren't obvious until you've seen the failure once.

The stack you're working in shapes which of these you hit most often. If you're thinking about why the stack matters for this kind of analysis, I'll cover that in a separate piece.

If you work in a Java codebase with AI tooling, I write about this kind of thing on LinkedIn and X.