The adoption of AI coding tools is surging, yet many engineering leaders are still fixated on usage statistics rather than tangible outcomes. This oversight creates a significant blind spot, prompting a pivotal question that remains largely unasked: how much of the code generated by AI agents actually makes it to production?
It’s not about the volume of code generated, the number of prompts issued, or the count of active users. What truly matters is the fraction of code that survives code review, passes continuous integration (CI), gets merged, deployed, and ultimately used by customers. Unfortunately, most engineering leaders lack the answers to this critical question, and the AI providers have little incentive to help them uncover it.
Real Spending, Missing Visibility
The Stanford AI Spend Index reveals that the median expenditure on AI coding tools has reached $86 per developer per month, based on data from 140 companies employing over 113,000 developers. Notably, the top quartile of spenders exceeds $195 monthly, with some firms investing as much as $28,000 per developer each month.
Anthropic has recently surpassed $30 billion in annualized revenue, a remarkable increase from $9 billion just four months earlier. Furthermore, SemiAnalysis indicates that 4% of all public GitHub commits are now attributed to Claude Code, with projections suggesting this will rise to over 20% by year-end. The CEO of Linear has even declared traditional issue tracking obsolete.
Despite coding agents being installed in over 75% of Linear’s enterprise workspaces, there remains a glaring gap in tracking how much generated code actually reaches production.
The Underlying Incentive Problem
AI providers typically charge based on token consumption. Consequently, the more tokens engineers use, the more revenue the provider generates. They profit from token consumption, not from the successful review, merging, or deployment of the generated code. This creates a fundamental misalignment.
A developer who prompts an AI agent multiple times to generate a function that is later rewritten by a human reviewer incurs significantly higher costs than a developer who gets it right on the first try. The provider benefits from the former scenario, while the latter scenario is far more beneficial for the organization. Currently, most engineering leaders cannot distinguish between these two scenarios; they only see a single line item on the AI bill without insights into which tokens contributed to actual production code versus those that generated waste.
This situation is not a conspiracy; it’s a structural issue. Addressing this challenge falls squarely on the shoulders of engineering leaders, as AI providers lack the motivation to resolve it on their own.
A Historical Parallel
Reflecting on the early days of cloud computing, companies rapidly adopted platforms like AWS and Azure, often overspending under the guise of efficiency. It took years for the FinOps discipline to materialize, as organizations realized they were overspending by 30 to 40 percent on cloud infrastructure due to a lack of measurement.
The trajectory of AI spending mirrors this pattern, albeit at a faster growth rate and with a larger measurement gap. Cloud providers eventually had to accommodate cost optimization tools in response to customer demands, a shift that is on the horizon for the AI sector.
Engineering leaders who take the initiative to measure outcomes will likely optimize their spending more quickly, negotiate better deals, and gain clearer insights into which tools are effective and which should be eliminated. In contrast, those who do not prioritize measurement will continue to incur costs without a clear understanding of the value derived from AI.
What Measurements Matter
The current landscape is saturated with dashboards that display adoption rates and seat utilization, which engineering leaders already have in abundance. What is sorely lacking is the capability to trace AI-generated code from its inception to its deployment in production.
Commit-level attribution is essential to reveal which agent authored the code, the ratio of AI-generated versus human-edited contributions, whether the code passed review, and its ultimate fate in deployment.
By correlating AI expenditure with production results, companies can finally pose critical questions: Which teams derive true benefits from AI agents, and which simply waste tokens? Which vendors create code that seamlessly integrates into production and which ones complicate the review process? Is the increase in AI costs due to successful adoption or costly failures?
At Waydev, we have dedicated the past year to developing a platform that measures AI adoption, impact, and ROI throughout the software development lifecycle, linking AI expenses directly to production outcomes.
Understanding Value Beyond Adoption
The AI industry encourages engineering leaders to believe that increased usage equates to greater value. However, usage doesn't inherently translate to impact.
Adoption does not equal value. Usage is not synonymous with impact. The number of tokens consumed does not correlate with the amount of code successfully shipped. A team generating 10,000 lines of AI code weekly but only deploying 2,000 to production does not outperform a team producing 3,000 lines and deploying 2,500. Yet, current adoption dashboards often misrepresent this reality, showcasing the first team as superior.
This misconception represents a growing blind spot, and as costs rise each quarter, the era of unchecked AI spending is coming to a close. Engineering leaders who establish a robust measurement framework now will lead the dialogue on AI ROI for the coming decade, while those who delay will find themselves explaining expenses they never fully understood.
Source: TNW | Insider News