Grading Trades — and why nobody can pick winners

Our trade grades are calibrated and unbiased — and they still can't tell you who will win a trade, because trades are an efficient market and ~92% of the outcome is unforeseeable at the time.

What the grade is

Every player in a deal gets an expected value — what he was projected to produce at the moment of the trade, in WAR and in surplus dollars (a Bayesian blend of pedigree and recent form over his years of team control). We then track what he actually produced for his new club. The "who won" verdict is the realized side — it is descriptive, not a skill claim.

The tier-adjusted dollar value of a WAR

Surplus = production value − salary. The question is how to price production value in dollars. The simple answer is "multiply WAR by the league-average FA price" (~$8M/WAR). The better answer — and what we use — is tier-adjusted: WAR delivered by a star is empirically worth more than WAR delivered by a role player.

The reason is the FA market is segmented. Calibrating on 200+ FA signings (Spotrac, 2020-26) using AAV/projected-θ at signing:

Projected θ (WAR/yr)	n	Median AAV	Implied $/WAR	Multiplier on the $8M base
< 1.5 — replacement-level / depth	91	$6M	~$5M	0.65×
1.5 – 2.5 — role player	19	$16M	~$7M	0.95×
2.5 – 3.5 — above-average everyday	36	$28M	~$10M	1.25×
3.5 – 5.0 — star	9	$40M	~$11M	1.40×
5.0+ — superstar	2	$60M	~$13M	1.60×

Stars get a real scarcity premium per WAR — there are ~5 of them on the FA market in any given offseason, and the bidding pushes their per-WAR rate well above the league baseline. Replacement-level FA vets, by contrast, anchor at the low end because the supply is deep.

Concretely: a 4-WAR star's expected production prices at 4 × $8M × 1.40 = $45M/year, while a 1-WAR depth piece prices at $5M/year. Cost-controlled stars (Skenes, Caminero, Chourio) now carry larger surplus because their cheap WAR is valued at the elite tier; mega-contracts (Soto, Judge) grade less brutally negative than under the flat-$8M model because the elite WAR they deliver is genuinely worth more.

Salary is still in nominal dollars, era-neutralized via that season's league-average $/WAR. So you can compare a 1985 trade to a 2025 trade directly — both end up scaled into today's-dollar equivalent value.

How accurate are the expected values?

Measured on 7,387 acquired players in settled trades (1985–2018):

Unbiased. Mean expected 1.44 WAR vs. mean realized 1.52 — the projections are centered on reality, neither systematically high nor low.
75% land within ±2 WAR of what actually happened (median absolute error 1.2 WAR).
The ± bands are honest. 95% of realized outcomes fall inside the stated ±2-SD band — when the model says "wide," it means it.

Trade-grade calibration (left) and the spread of realized outcomes (right)

The left panel is the good news: bin acquired players by what we projected, and the average realized value lands right on the $y=x$ line. The right panel is the catch — the very same data, one dot per player.

Why you still can't pick winners

Those unbiased projections explain almost none of the individual outcome: the correlation between expected and realized WAR is just 0.29 (R² = 0.08). And the bottom line for a whole deal — does the side our model projected to win actually end up ahead? — is 53%, a coin flip with a thumb barely on the scale.

That is not a defect to engineer away. Roughly 92% of a trade's outcome is unforeseeable at the time: injuries, development curves, regression to the mean, role and park changes, a swing-plane tweak in a new org. A good model prices the bet correctly and quantifies the variance honestly. It does not pretend to a foresight that does not exist.

Trades are an efficient market

This should be uncomfortable for anyone selling trade "grades" as verdicts. A completed trade is a two-sided agreement: two front offices looked at the same players and each decided its side was worth doing. When two informed parties clear at a price, that price is — almost by construction — fair; the realized winner is then chosen mostly by chance.

And front offices hold vastly more information than the public ever will — full medicals, proprietary scouting and biomechanics, internal projections, makeup reports. If anyone could systematically beat the trade market, it would be them. They can't, and we can show it with their own track records:

Our front-office metrics — WAB and TAB — are somewhat sticky year to year, but they mostly capture the roster a GM inherited and the amateur draft, not his trade desk.
When we isolated trade decision-making as a candidate repeatable skill — does a GM's tendency to "win" his deals persist? — it didn't. The sell-side decision metric we built had year-to-year reliability near zero and predicted future trade outcomes at essentially 0. There is no detectable "sell alpha."
The one front-office signal that both repeats and pays off is Acquisition Leverage — where on the win curve you spend, i.e. timing relative to contention. That's a roster-construction choice, not an ability to out-trade the GM across the table.

This is the efficient-market hypothesis applied to a baseball trade desk: in a market whose participants are this resourced and this motivated, prices are fair and edges get arbitraged away. What's left over — the part that decides who "won" — is variance. The market always wins.

So what is a trade grade good for?

The expected value is a calibrated, unbiased read on the bet each side made — good for "was this reasonable at the time?" (usually yes, even for deals that aged terribly).
The ± band is the honest measure of how little anyone could know — enormous for prospects, tight for veterans.
The realized side is what happened, with full hindsight — a record, not a referendum on the executive.

In short: we can give you the odds and price the chips correctly. We can't — and neither can anyone, including the people holding the medicals — tell you which way the wheel lands. That's not our limitation; it's the market's.

Fit on full team-seasons (≥100 games), 1985–2024/25. All figures are franchise-level outcomes credited to the decision-maker in the relevant year — see the GM profiles for per-executive numbers.

Rosternomics — quantifying front-office skill, attribution-clean. Follow @rosternomics on X. Decision-makers on record back to 1933; resource-controlled metrics (WAR & payroll) cover 1985–2026.
Data: FanGraphs (WAR), Baseball-Reference & the Chadwick Bureau register (rosters, IDs), the Lahman Database, Baseball America prospect rankings (via The Baseball Cube), and public payroll records. Transaction and trade data was obtained free of charge from and is copyrighted by Retrosheet. Executive birth/death data via Wikidata (CC0); bios are original. Emoji graphics by OpenMoji (CC BY-SA 4.0).
All figures are franchise-level outcomes credited to the decision-maker in the acquisition year — not solely attributable to one executive.