The hidden costs of AI-generated code: what engineering leaders need to know

AI for engineering

Ted Julian

Chief Executive Officer & Founder

May 12, 2026

AI coding tools have completely upended the old build-versus-buy calculus. A working prototype that used to require a sprint now takes an afternoon. Features and internal tools that weren't worth a development cycle suddenly are. Teams can move from idea to running code without vendor evaluations, procurement cycles, or months-long integration projects. For engineering leaders who've spent careers waiting on slow processes, the prospects are tantalizing.

As a result, the case for DIY has never been stronger. You own the roadmap. You own the data. You're not adapting your workflows to fit a vendor's assumptions about what your business should look like. If customization to your organization's requirements would unlock substantial strategic value, building it yourself might be the right call.

We’re still early in the hype cycle for AI coding tools, however. Which means the “trough of disillusionment” can’t be far away. What will that look and feel like for early adopters who decide to DIY?

Day thirty, sixty, and ninety

The intoxicating speed at which some functional apps can be created is well understood. And their shortcomings, like architectural consistency, are coming into view. But we’ve not lived with these generated apps long enough for their cost of ownership to be fully understood.

We do have some early signals, however. A peer-reviewed analysis of 806 open-source repositories found roughly a 41 percent increase in code complexity and a 30 percent increase in static analysis warnings after AI coding tool adoption. Meanwhile, GitClear's analysis of 211 million lines of code from 2020 to 2024 showed that copy-pasted code surpassed refactored code for the first time in 2024, with duplicated code blocks growing eightfold, while refactoring dropped from 25 percent of all code changes to under 10 percent. This suggests that teams are generating more code and maintaining less; a gap that could compound with every sprint.

A warning flag in the first thirty days might look like this. Someone opens a function to fix a small bug. As they do so, they discover three hundred lines handling four unrelated concerns; likely the result of multiple AI sessions that nobody quite stitched together. To fix this, one must understand the whole function. But in this case, the required context doesn't exist: no commit message, no comment explaining the logic, no ticket tracing the requirement. What should take an hour to untangle takes a day.

Meanwhile, the same validation function exists in three files, each slightly different. Error handling is inconsistent: in one route exceptions are thrown, in another the function returns null, in a third it logs to console and continues silently. Each pattern came from a different session. Each session optimized for the immediate prompt without knowledge of what the rest of the codebase does.

By day sixty, a routine feature request hits a wall. The checkout logic is spread across seven files. Making the change without breaking something adjacent requires reconstructing an execution path that nobody fully understands. The feature that should take two days takes two weeks.

The ninety-day conversation is the one nobody wants to schedule. The team is spending 20 to 30 percent of sprint capacity on bugs that trace back to the original implementation. Feature velocity is a fraction of what it was on day one. There are functions in the codebase that everyone agrees work, but that nobody can explain and that everyone is afraid to touch.

Then, the bills start coming in

Hits to velocity are no fun. But a security breach is a real crisis.

Research consistently finds that nearly half of all AI-generated code contains basic security vulnerabilities like injection vulnerabilities, authentication failures, and information exposure. We’re not talking complex or esoteric security issues, but basic stuff like the OWASP Top 10. One analysis of seven early-stage AI-generated codebases turned up 970 security issues, 801 of them rated high severity. Of those, unsafe input handling, insecure file operations, and exposed credentials accounted for more than 70 percent of the findings.

The Linux kernel maintainers formalized this problem before most organizations acknowledged it existed. Their guidance for AI-assisted contributions? Assume full human responsibility for every line submitted, because the reviewer cannot assume the AI understood the system it was modifying.

This risk is not esoteric. In the United States, the average cost of a data breach hit a record $10.22 million in 2025, up 9 percent from the prior year. This is not to say that every additional vulnerability is cause for panic. For each, there are always questions of exploitability, impact with the context of the application, and more. But at this scale, the probability math on the increased exposure is concerning to say the least.

AI compresses the build, not the lifecycle

AI may dramatically reduce the cost of creating software. But what about owning it? Someone still has to deploy it, maintain it, and patch it when dependencies deprecate. Someone also must account for its behavior under load, under attack, and under requirements that didn't exist when the first prompt was written. In enterprise environments, building version one of a tool is often 10 to 20 percent of the total lifecycle effort. The remaining 80 to 90 percent is everything that comes after. AI compresses the first part. But we don’t have enough experience yet to fully understand the rest. Will AI and its application to code evolve to address these aspects of long-term ownership? Probably. Until that’s proven, all we have to go on is a healthy dose of skepticism from hard-earned experience.

There's another cost that rarely shows up in the build-versus-buy spreadsheet: token efficiency. Every AI-powered capability you build internally consumes compute you pay to a foundational model provider with limited incentive to optimize. The economics can compound across bloated prompts, redundant calls, inefficient context management, and more quietly accumulating on your infrastructure bill.

In contrast, a vendor selling you an AI-powered product is in a structurally different position. Bessemer Venture Partners' research shows AI-native companies operating at 50 to 60 percent gross margins against the 80 to 90 percent that traditional SaaS companies achieved. That margin pressure is a feature for the customer. It means every vendor in this market has a direct financial incentive to eliminate token waste.

Inefficiency that's opaque to your internal team is an existential problem for a vendor serving that same workload across thousands of customers. In fact, GitHub just announced a full pricing overhaul for Copilot, abandoning flat-rate subscriptions because the inference costs of agentic workflows had become unsustainable to absorb. That's what unmanaged inference costs look like at scale, and vendors learn that lesson fast.

Tech debt is another concern, albeit perhaps the most acknowledged one. Both the 2024 and 2025 DORA reports found that increased AI adoption correlates with decreased software delivery stability. Teams are generating more code and deploying less reliably. That's not a paradox. AI tools optimize for code that runs, not for code that's correct. The gap between those two things is where the debt accumulates, incrementally, across thousands of small decisions where the cost of writing was near zero and the cost of understanding was deferred.

Engineering Intelligence at AI Speed

The answer isn't to slow down the builds. The velocity is real, the competitive pressure to use it is intense, and the tools are only getting better.

But oversight has to adapt to keep up with pace. This requires visibility into what's actually accumulating in the codebase as AI-generated code compounds. The teams that get this right won’t slow down. They’ll maintain pace over the long haul, because they aren’t borrowing from the future to do it.

Flux gives engineering leaders ground-truth visibility into what's actually accumulating in their codebase. See it in action.

Ted Julian

Chief Executive Officer & Founder

About

Ted

Ted Julian is the CEO and Founder of Flux, as well as a well-known industry trailblazer, product leader, and investor with over two decades of experience. A market-maker, Ted launched his four previous startups to leadership in categories he defined, resulting in game-changing products that greatly improved technical users' day-to-day processes.

About Flux

Flux is a code-first engineering intelligence platform that helps engineering leaders make better decisions with ground-truth visibility into the work actually happening across today’s complex, AI-accelerated codebases. Check our sandbox environment, get a demo, and subscribe to our newsletter to get the latest info, resources, and blog posts.

More blogs by

Ted