The Iron Triangle has long been a guiding principle in software development: "fast, good, or cheap—but you only get to pick two." This constraint felt like an immutable law of project management. Then AI generated code burst onto the scene, promising to shatter this limitation once and for all. I blogged about this last year, investigating the definition of good, and sharing early research on the quality of AI generated code.
“Life moves pretty fast” said punk anti-hero - and almost certainly vibe coder - Ferris Bueller. Research this year is shedding additional light on automated code generation.
The METR study dropped some uncomfortable truths this July, and Stack Overflow's latest developer survey isn't painting the rosy picture vendors want you to see. As engineering leaders managing expectations from boards drunk on AI hype, we need to examine what the data actually shows about machine-generated code quality and productivity.
Take vendor stats with a grain of salt. But academic researchers running gold-standard randomized trials is another story. The METR study tracked experienced open-source developers and found they took 19% MORE time to complete tasks when using AI tools, not less.
Plot twist: these same developers expected to be 24% faster.
Stack Overflow's 2025 survey backs this up with some eye-opening trends. Only 29% of developers trust AI tool outputs now, down from 40% just a year ago. And 66% report spending more time fixing "almost-right" AI generated code than they save in the initial writing phase.
The "almost right but not quite" phenomenon I think defines the current era. The coding assistant delivers something that looks great... until you realize it's subtly wrong. Then down the rabbit hole, debugging.
Quality remained consistent in the METR study—automated coding tools didn't cause drops in coding standards. The hidden cost showed up in rework and validation time. Which means AI generated code isn't automatically making us faster OR cheaper when quality matters.
Here's where the economics get interesting, particularly if you manage engineering budgets.
The time costs of debugging and rework offset the initial speed gains. Worse, technical debt accumulation from "good enough" AI generated code that passes initial review but creates maintenance headaches down the road.
Then, senior developers - the ones you can't afford to lose - get pulled into review and correction cycles. They're not writing new features; they're babysitting machine-generated output. Supporting data? 35% of Stack Overflow visits now relate to problems with AI generated code.
The bottom line suggested by this data is that fast-but-flawed code generation creates a debt that someone has to pay later. Usually it's your most experienced team members, working overtime to clean up the mess while stakeholders wonder why velocity isn't matching the promises. Developers feel set up to fail, and engineering leaders get caught in the middle explaining why the "magic" isn't working as advertised.
We've got a genuine paradox brewing over AI coding assistants. Usage hit 84% among developers—up from 76% just last year. Microsoft's GitHub Copilot, Google's Gemini, OpenAI's various offerings, AWS CodeWhisperer... everybody's using something. This adoption drove the AI code tools market to $7.7 billion in 2025.
But trust metrics are dropping, particularly as devs gain experience across different codebases and use cases. For example, AI generated code can be well suited to boilerplate and simple functions. But it struggles badly with context understanding in large codebases—exactly where experienced developers spend most of their time. You can see this in developer behavior. Stack Overflow found 75% of developers still prefer asking another person when they're unsure about something, rather than relying on automated code suggestions.
This puts Engineering leaders in a bind as they try to balance the limitations of code generation in their environment with outsized expectations from leadership.
So what's a pragmatic engineering leader supposed to do?
Start with realistic expectations about AI generated code capabilities and limitations. These tools excel at documentation, test generation, and simple refactoring tasks. They struggle with architecture decisions, complex business logic, and anything requiring deep codebase understanding.
Tools and processes that provide visibility into AI code generation impact can be a huge help. Not just velocity metrics, but quality measurements that show the true impact of AI assistance. Right now, most engineering leaders are flying blind on whether their AI investments are actually paying off. This makes it hard to handle a CEO who reads about 2x productivity gains and wonders why your team isn't delivering comparable results. Data is your friend here.
So we’re still stuck with the iron triangle - AI generated code hasn't magically solved the fundamental constraints of software development.
But it has changed the variables. Speed might come with hidden quality costs that surface weeks later. Cheap might mean expensive when you factor in review time and technical debt. Good still requires human expertise and oversight, regardless of what marketing materials claim.
As engineering leaders, we need data-driven decisions over hype-driven adoption. Measure actual impact, not promised benefits.
Ted Julian is the CEO and Co-Founder of Flux, as well as a well-known industry trailblazer, product leader, and investor with over two decades of experience. A market-maker, Ted launched his four previous startups to leadership in categories he defined, resulting in game-changing products that greatly improved technical users' day-to-day processes.