Fast Code, Stuck Value
Eliyahu Goldratt, author of The Goal, warned that measurements heavily dictate human behavior: "Tell me how you measure me, and I will tell you how I behave." His argument was precise and uncomfortable: measuring local efficiencies over global impact causes people to optimize for the metric at the expense of the system. They aren't being lazy or dishonest. They are doing exactly what the measurement asks of them.
This is the problem at the heart of most AI coding adoption today.
Organizations have invested heavily in AI development tools and built dashboards to track the results. Copilot acceptance rate: up. Pull request volume: up. Code churn: down. Developer satisfaction scores: improved. Walk into a board presentation with those numbers and you look like you're winning.
But stop and ask what any of this actually represents. Nobody hired a software team because they wanted more code. Nobody pays for faster pull request cycles. What organizations want — what customers pay for — is functionality that solves their problems. New capabilities, delivered reliably, that they couldn't get yesterday. The metrics most teams use to measure AI adoption don't measure that. They measure the activity we hope will lead to it.
Goldratt called this the local optimum trap: the sub-system improves, the numbers look good, and the system as a whole remains unchanged. The measurement isn't wrong exactly — it just answers a question nobody actually asked.
What we need is a metric that measures what we actually want: valuable functionality reaching customers, faster.
Why Current Metrics Fail
The standard software delivery metrics — story points, PR velocity, sprint burndown, code coverage, deployment frequency — all share a structural flaw: they measure activity at a specific stage of the pipeline, not value flowing through the entire system.
Consider what happens when you double developer throughput with AI coding agents while leaving requirements gathering and testing unchanged. Developers produce twice as many features ready for QA. QA becomes the constraint. Features pile up waiting for test cycles. The queue grows. Value delivery doesn't double — it may not improve at all.
PR velocity shows improvement. Sprint burndown shows improvement. Code coverage is unaffected. Meanwhile, customers are waiting just as long for the things they asked for.
This is precisely why Goldratt insisted on measuring the system, not the activity. When you optimize a non-constraint, the activity looks better while the constraint gets worse. The metrics diverge from reality.
Goldratt's Dollar Days: Measuring What Matters
In The Goal and his later work on financial metrics for the Theory of Constraints, Goldratt introduced a deceptively simple measurement concept called Dollar Days.
The insight is that time and value are inseparable in a flow system. A finished product sitting in a warehouse isn't just inventory — it's inventory accumulating cost with every passing day. A sales order that's late isn't just a scheduling problem — it's revenue at risk multiplied by the urgency of delay.
Goldratt combined these dimensions into a single number:
Throughput Dollar Days (TDD) = the dollar value of an order × the number of days it's been due but not delivered
Inventory Dollar Days (IDD) = the dollar value of inventory × the number of days it's been sitting in the system
Both metrics increase when work is delayed. Both decrease when work flows. You can't make them look good by speeding up one stage while another stage creates a backlog. They measure the whole system.
The metrics also have natural priority-weighting built in. A $500,000 order sitting overdue for five days costs more than a $5,000 order sitting overdue for five days. High-value work that's stuck accumulates faster, making the cost of the constraint visible.
Feature Dollar Days: The Software Analog
The software equivalent maps directly onto Goldratt's framework. Instead of manufacturing orders and inventory, we have features — discrete units of work that carry business value and take time to move through the delivery system.
Feature Dollar Days (F-DD) = Σ (Estimated Business Value of Feature × Days In Progress)
Where Days In Progress begins the moment a feature is committed to — not when coding starts, but when the team formally takes on the work. This is the full cycle time: from requirements to deployed and delivering value to customers.
At any given point in time, sum this across every feature your team has in flight. That's your F-DD score. You want it trending down. A rising F-DD score means value is accumulating in the system without flowing to customers. A falling F-DD score means features are completing faster than new ones are being started — the definition of healthy flow.
You Don't Need Precise Numbers
One of the most common objections to value-based metrics is that business value is hard to quantify. It's true that precise dollar figures are difficult to assign to features. But Goldratt's insight doesn't require precision — it requires relative weighting and consistent tracking.
Start with T-shirt sizing:
| Size | Relative Value |
|---|---|
| XS | 1 |
| S | 3 |
| M | 8 |
| L | 20 |
| XL | 50 |
Assign each feature a size when it enters the system. Track days in progress. Multiply. Sum across the portfolio.
The specific numbers don't matter as much as the trend and the distribution. When you see F-DD rising despite increased PR velocity, you know value is piling up somewhere. When you see F-DD dropping across the organization, you know the constraint has loosened. The metric points you at the right problem.
What F-DD Reveals That PR Velocity Hides
The power of F-DD is that it exposes where work is piling up. By tracking F-DD broken down by stage, you can see exactly where the constraint lives:
F-DD rising in the "Requirements/Refinement" stage means features are being requested faster than the team can elaborate them. This is the most common bottleneck after AI coding tools are adopted — the system can produce code faster than it can produce clear requirements. Sprint compression techniques like Mob Elaboration directly address this.
F-DD rising in the "Testing/Review" stage means code is completing faster than validation can occur. This is the second most common bottleneck. AI-assisted test generation and automated acceptance testing are the appropriate investments here — not more developers.
F-DD rising in the "Ready to Deploy" stage means deployment pipelines or release processes are slowing the final handoff to customers. Continuous delivery practices address this.
F-DD evenly distributed and declining means flow is healthy across the system.
No amount of PR velocity analysis tells you this. PR velocity is scoped to a single stage. F-DD spans the entire value stream.
A Worked Example
Imagine a team with eight features in progress at the start of a quarter. Sizes and days in system:
| Feature | Value Size | Days | F-DD |
|---|---|---|---|
| Customer onboarding | XL (50) | 14 | 700 |
| Reporting dashboard | L (20) | 22 | 440 |
| Mobile notifications | M (8) | 31 | 248 |
| API rate limiting | S (3) | 12 | 36 |
| Export to CSV | XS (1) | 9 | 9 |
| SSO integration | XL (50) | 7 | 350 |
| Audit logging | M (8) | 18 | 144 |
| Search improvements | L (20) | 11 | 220 |
Total F-DD: 2,147
At a glance, three features dominate: customer onboarding, SSO integration, and reporting dashboard. All three are high-value and have been in the system for an extended period. This is where to look first.
If PR velocity is up but F-DD on those three features is still climbing, they're stuck somewhere outside of coding. That's where to invest — not in more AI tooling.
The Metric Validates Sprint Compression
One of the most striking predictions of F-DD is that sprint compression, when done correctly, should produce a dramatic decline in the metric.
Consider the math. A feature in a two-week sprint cycle might accumulate 14+ days of F-DD before it's even completed within a sprint, let alone deployed. If that same feature can be completed in a two-day sprint — as teams using AI-DLC and Mob Elaboration have demonstrated — the F-DD it accumulates drops by an order of magnitude.
This is why Goldratt's Drum-Buffer-Rope framework matters for sprint length. When requirements and testing are the constraint (the drum), the pace of the system is limited by how fast those stages can process work. AI reduces the buffer needed at the coding stage. Reducing sprint length reduces the rope — the maximum time a feature can sit in work-in-progress before it must complete or be reassigned.
Teams that have compressed to two-day sprints report better focus, faster feedback, and — critically — features that ship more continuously rather than batching up at the end of a sprint. F-DD measures exactly this: value flowing continuously rather than accumulating.
Implementing F-DD in Practice
You don't need new tools. Most teams can implement this with a spreadsheet and discipline.
Start simply:
- List every feature currently in progress (use your existing backlog tool — Jira, Linear, GitHub Issues)
- Assign value sizes collaboratively — this is a forcing function for the team to discuss relative priority
- Track the date each feature entered active work
- Calculate and sum daily (or weekly — weekly is enough)
- Trend the number and track the stage breakdown
The conversations it enables are more valuable than the number itself. When a high-value feature has been in progress for three weeks and the team can't explain why, that conversation wasn't happening before. F-DD makes the cost of slow flow visible to everyone — not just the product owner who's been following up on it individually.
Tie it to retrospectives. At the end of each sprint or cycle, review: did F-DD go up or down? If up, where did work accumulate? What would have to change to address that stage? Over time, this creates a continuous improvement loop aimed at the constraint rather than at activity metrics.
The Measurement Forces the Right Question
Goldratt described MRP adopters in the 1980s this way: the companies that achieved order-of-magnitude gains didn't just install the software — they changed what decisions they made and how fast they made them. The companies that didn't change their policies got marginal improvements at best.
The same dynamic is playing out with AI coding agents today. The tools are here. The question is whether the process changes to take advantage of them.
F-DD is a measurement discipline that forces the question. You cannot make F-DD look good by speeding up coding alone. You can only improve F-DD by improving the flow of the entire system. That means addressing requirements, testing, and deployment bottlenecks — exactly the changes that separate order-of-magnitude velocity gains from marginal overhead reduction.
If your dashboards show healthy AI adoption metrics but F-DD is climbing, you have your answer. The investment is real. The policy constraint is real. And now you have a number that makes both visible at the same time.
Conclusion
Goldratt gave us Throughput Dollar Days because he understood that you cannot manage a flow system without a metric that spans the flow. Activity metrics at individual stages are useful for diagnosis, but they're the wrong scorecard.
Feature Dollar Days applies the same principle to software delivery. It combines the value of what you're building with the time it takes to reach customers — and it naturally rises when bottlenecks form anywhere in the system, regardless of how fast individual stages are moving.
For architects and technical leaders who have adopted AI coding tools and are waiting to see the velocity gains materialize: this is the metric that will tell you where the gains are being absorbed. It will point you at requirements elaboration, at testing cycles, at deployment pipelines — the stages that haven't yet been transformed by AI.
The tools are fast. The system isn't yet. F-DD is how you find out where to look next.
You May Also Like
Coding Was Never the Bottleneck
Brad Jolicoeur - 05/25/2026
Your AI Coding Agents Aren't Slow. Your Process Is.