How I Actually Use Claude Code as a Backend Engineer (Real Workflow, Real Limits)

Most of the Claude Code content I've read was written by someone with TypeScript open in their other tab. Not a criticism. Just true. And it means the backend conversation has been mostly absent.

Frontend-first AI content describes a different class of problems. A hallucinated JSX prop is annoying. A hallucinated Spring annotation that compiles fine but violates the bean lifecycle is a different thing — it doesn't blow up immediately. It surfaces later, somewhere inconvenient, in a way that takes longer to trace than if you'd written the bug yourself.

I've been using Claude Code in a real Spring Boot codebase for several months while building Resulign, an AI-powered resume analysis platform. Here's what that actually looks like: where it earns its place, where it fails without warning, what I've learned about prompting it for Java-specific work, and what I eventually stopped letting it near.

Not a tutorial. Not a review. A workflow account from someone who ships Java services and has made enough mistakes with AI-generated code to have opinions about it.

What I Actually Use It For

I don't use Claude Code to write features. I use it for the work around features — scaffolding, boilerplate, the parts where I know exactly what needs to exist and don't want to spend time on the mechanics.

Boilerplate anchored to existing patterns

This is where it earns the most time back. If I have a UserService with a clear structure — constructor injection, repository dependency, consistent exception handling — I can tell Claude Code to create a SubscriptionService following that same pattern and get something close to what I'd write myself, faster.

Always reference an existing file. "Create a service following the pattern in UserService.java" produces something useful. "Write me a Spring Boot service" produces something generic that ignores the conventions the project has been building up. The difference in output quality between those two prompts is not subtle.

Here's roughly what that prompt looks like:

Given UserService.java as a reference, create a SubscriptionService that:
- Uses the same constructor injection pattern
- Follows the same exception handling approach
- Interacts with a SubscriptionRepository
- Covers standard CRUD operations

Do not introduce new dependencies. Do not use Lombok.

The negative constraints matter as much as the positive ones. "Do not use Lombok" sounds trivial, but without it you get Lombok scattered into a codebase that doesn't use it.

Test case scaffolding

I write my own core test cases. Claude Code is good at finding the edge cases I'd skip when moving fast.

Given a method signature and its dependencies, I'll ask what edge cases a happy-path test won't catch. It reliably surfaces things I'd find in code review rather than during initial writing — null inputs on parameters I assumed would always be set, empty collections where the code assumes at least one element, boundary values on numeric fields.

I write the assertions. The list of what to test gets more complete faster, which is the part that usually gets cut when you're trying to ship.

Understanding unfamiliar code

When I pick up a library I haven't used before or inherit someone else's implementation, pasting a chunk and asking what's happening is faster than reading documentation linearly. Especially for library internals where the docs describe what something does but not why it's structured that way.

Lower-risk territory — an explanation being slightly off costs less than generated code being slightly off.

Repetitive query writing

JPQL at scale is tedious. If I have three existing repository methods as examples, Claude Code can write a fourth following the same structure reliably. As long as I provide the entity class, not just a description of it.

Where It Fails in Backend Systems

This part matters more than the previous one.

Spring annotation semantics

Claude Code gets Spring annotations wrong often enough that I treat everything annotation-related as draft until proven otherwise. Things I've seen more than once:

@Transactional on a private method. Spring uses a proxy to intercept transactional calls, and proxies can't intercept private methods. The annotation is there, it compiles, everything looks fine. The transaction silently doesn't apply.

Here's what the generated code looks like:

@Service
public class SubscriptionService {

    private final SubscriptionRepository repository;

    @Transactional
    private void processExpiredSubscriptions(List<Subscription> subscriptions) {
        subscriptions.forEach(s -> {
            s.setStatus(SubscriptionStatus.EXPIRED);
            repository.save(s);
        });
    }
}

That code looks correct. It isn't. The saves happen, but if one throws, nothing rolls back. You find out when you're in production with inconsistent data and trying to figure out how it got there.

@Async added without flagging that the calling class can't be the same bean. Self-invocation bypasses the proxy. The method runs synchronously and nothing tells you it isn't working.

The async problem goes further than the proxy issue, though. If an @Async method throws an unchecked exception, that exception doesn't propagate to the caller — it gets swallowed unless you've configured an AsyncUncaughtExceptionHandler. Claude Code never mentions this. The code runs, the exception disappears, and whatever downstream side effect you expected from a clean execution doesn't happen. These show up as silent data gaps in production, not as errors. That kind of failure is genuinely unpleasant to track down.

@Cacheable without @EnableCaching on the configuration. Small thing. The code compiles, runs, and caches nothing. You notice when the performance improvement you expected isn't showing up.

None of this is catastrophic if you're reviewing carefully. But reviewing carefully requires knowing what to look for. If you're using Claude Code to learn Spring Boot, you'll pick up its mistakes alongside everything else.

Hallucinated Spring Data method names

Spring Data's derived query methods are elegant until Claude Code generates one that doesn't match the entity's actual field structure. If your field is createdAt, it might generate findByCreatedDate — which won't resolve. Or it gets the name right but the return type wrong for what Spring Data infers.

These fail at startup, which makes them easier to catch than runtime failures. But they're friction you introduced.

The fix: paste the full entity class. Don't describe the entity — give it the entity.

Context-blind refactoring

If I ask for a refactor spanning multiple files without providing full context on all of them, I get changes that are locally correct and miss integration points.

This is the most expensive failure mode I've run into. The code looks right, often compiles, and breaks in a way that only shows up at runtime or in integration tests. The debugging is harder than it should be because the code looks like it was written deliberately.

I don't ask for cross-file refactors without designing them myself first. I use Claude Code to execute refactors I've planned, not ones I've described.

Dependency and version hallucination

It occasionally suggests importing from a class that doesn't exist in the library version I'm actually using, or references a config class from a starter that isn't in pom.xml. These fail fast, at compile or startup. But it's a consistent reminder to verify generated imports against what's on the classpath.

One pattern I've noticed: it tends to reference the most recent version of anything it knows about. On an older codebase, that means periodic suggestions referencing APIs that moved or changed in a version you're not on.

How I Prompt It for Spring Boot

A few things that consistently improve output quality:

Anchor to existing code. Reference a specific file. This matters more than anything else. "Following OrderService.java, create a PaymentService" beats any description I've tried to write from scratch.

Specify the Spring Boot version when asking about anything configuration or annotation-related. Spring Boot 3.x and 2.x have real differences — autoconfiguration, security wiring, property binding behavior. The output shifts when you're explicit about version context.

Split implementation from edge case analysis. "Write this method" and "what edge cases should this method handle" work better as separate prompts. Combined, the output tries to do both and does neither well.

Paste the entity class for repository questions. Field names, types, relationships need to be in context. A description isn't the same thing.

Use negative constraints. "Don't introduce new dependencies." "Don't change the exception handling pattern." "Don't add Lombok." I add these even when they feel obvious. They narrow the output in ways that save real editing time.

What I Stopped Letting It Do

Architectural decisions

It'll give you an answer that sounds reasonable and hasn't accounted for the decisions you made six months ago, the constraints your system is operating under, or why you ruled out the obvious approach last time. I use it to evaluate options I've already thought through — tradeoffs between two specific approaches in a specific context — not to identify the approach. Delegating architecture to a tool with no memory of your system is a reliable way to accumulate debt that's hard to explain when you're cleaning it up.

Security-sensitive code

Authentication filters, authorization logic, input validation, anything in the Spring Security filter chain. The failure modes are expensive and not always obvious in review. A subtle misconfiguration in a security filter doesn't always get caught before it ships. I write this by hand, against documented Spring Security patterns, and review it more carefully than anything else in the codebase.

Database migrations

Flyway and Liquibase scripts need to account for the actual production data state — table sizes, existing index coverage, lock behavior at scale. Claude Code doesn't have any of that context. It can generate a migration that runs fine locally against a few hundred rows and locks a table for several minutes in production.

I write migrations manually, validate against a real schema snapshot, and test anything significant against production-volume data before it goes near a real environment. It's slower. It's the kind of thing you only get wrong once.

Async and scheduled boundaries

@Scheduled, @Async, CompletableFuture chains, anything that crosses thread context. The generated output gets this wrong often enough, and the failures are intermittent and timing-dependent — unpleasant to debug and hard to reproduce. I write this by hand with a clear picture of what's happening across threads.

The Net Impact

Boilerplate and scaffolding are faster. Reading unfamiliar code is faster. Test coverage is more complete. Those gains are real and I wouldn't give them up.

The review tax is also real. Generated code I didn't think through doesn't have the same mental model attached to it, and I read it more carefully than code I wrote. That adds up across a week.

The failure cost isn't zero. Annotation mistakes that compile and misbehave at runtime, cross-file refactors that break integration points — these happen. Not constantly, but the debugging time when they do wasn't in the estimate.

Whether it's worth it depends on how you use it. Engineers I've seen get the most out of Claude Code treat it like a fast collaborator they still need to review — not a system they can delegate to. Same mental model I'd apply to any junior engineer on a team: useful from day one, but you're still accountable for what ships.

Expect to hand off the hard parts and you'll find they're still yours.

I write about backend systems, building products as a technical founder, and using AI tooling honestly. Follow on LinkedIn or X if that's useful.