GM WAR — wins above a replacement GM

The headline front-office metric: how many wins a general manager adds per season above an average front office on the same budget, with the franchise he inherited and the luck of a short sample stripped out.

What it answers

Every other front-office number on this site measures what happened on a GM's watch. GM WAR asks the harder question: how much of it was actually him? It is the wins-above-budget an executive adds per season, stripped of the franchise he inherited and the noise of a short sample — wins above a replacement GM, an average front office getting the expected return on the same payroll. Units are wins per season.

Step 1 — start from talent, not wins

The build is a ladder, and the first rung is choosing the right outcome. Actual wins (WAB) are more than half luck — one-run games, bullpen sequencing, health — that a GM does not control and that does not repeat. TAB (talent above budget: roster WAR versus what payroll predicts) strips that conversion luck and measures what the executive actually builds. The payoff is reliability — how much a number in one season tells you about the next:

metric	year-to-year stickiness	what it contains
WAB (wins above budget)	0.37	talent + conversion luck
TAB (talent above budget)	0.47	talent only

The gap between them — turning talent into wins — has a GM-to-GM reliability of about 0.05: essentially no repeatable skill. So GM WAR builds on TAB and treats WAB as the scoreboard receipt, not a skill signal.

Step 2 — separate the GM from the franchise

A long-tenured GM at a great organization looks good partly because of the organization — its farm system, analytics group, market. To pull the two apart we fit a crossed random-effects model over every team-season since 1985, estimating a franchise effect and an executive effect jointly:

\[ \text{TAB}_{t} \;=\; \mu \;+\; \underbrace{\text{org}_{\text{franchise}}}_{\text{the machine}} \;+\; \underbrace{\text{gm}_{\text{executive}}}_{\text{the person}} \;+\; \varepsilon_t \]

The 46 executives who ran more than one franchise are what make this identifiable: when a GM's edge travels with him to a new org, the model can tell his skill apart from the building he left behind. The variance splits cleanly:

source	share of single-season TAB variance
the franchise (org)	~5%
the executive (GM skill)	~21%
year-to-year luck / noise	~74%

Two lessons hide in that table. First, the org is small — most of the persistence is the person, not the logo. Second, three-quarters of any single season is noise, which is why the last step matters.

Step 3 — shrink for sample size

Because one season is mostly noise, a GM with three lucky years should not outrank a proven twenty-year veteran. GM WAR is an empirical-Bayes estimate: each executive's org-adjusted TAB is shrunk toward zero in proportion to how little we have seen, and carries a ±2 SD band. A five-season GM is pulled roughly halfway to zero with a wide band; a twenty-season GM barely moves. Reliability of the pooled career estimate climbs with tenure (Spearman-Brown):

tenure	reliability of the GM WAR estimate
1 season	0.23
5 seasons	0.61
10 seasons	0.75
20 seasons	0.86

Every executive is credited over his own tenure window — a President of Baseball Ops and the GM beneath him both get the seasons they shared, exactly like the franchise-level columns, so no one is starved and no senior is diluted by his lieutenant.

How to read it

Units: wins per season. +5 ≈ five wins of talent-above-budget a year attributable to the executive, net of his org. It is the org-cleaned, shrunk version of TAB per season.
The ± is a 95% band. Bold on the leaderboard means the band excludes zero — distinguishable from an average GM. Most executives are not: skill is real but a slow signal, legible over a tenure, not a single season.
"–" = no estimate: the exec was never the head of baseball ops for a completed season (served entirely under a POBO, so the credit goes to the boss; or was hired too recently to have one).

Why call it "WAR"

Player WAR is wins above a replacement player. GM WAR is wins above a replacement GM — one who simply spends to expectation and gets the league-average return on his budget. It is not roster WAR and not wins-above-a-replacement-player: the baseline is the budget, and the number is org-adjusted and shrunk. Same idea, different replacement level.

Robustness — the attacks we ran

The org-vs-executive split is identified two ways: the abundant within-franchise GM turnover (every team ran 5–12 executives since 1985) and the 46 executives who switched franchises. We tested how much it leans on the switchers and on the non-random timing of hires and fires:

Remove the switchers entirely. Split each switcher into separate per-franchise executives, so the org is identified only by within-franchise turnover — and the org effects barely move: correlation 0.91 with the full model. The separation does not rest on the 46 switchers.
Selection / endogenous timing. Executives are hired after good years and fired after bad ones, so their first and last seasons aren't random. Drop every GM's first and last season and refit: estimates are 0.95-correlated with the full model, move under 1 win on average, and the top-10 is 9/10 unchanged. Transition-year selection bias is negligible.

Why org-adjustment lowers stickiness — and why that's correct. The more you strip out the organization, the less sticky the number (no adjustment 0.66 → org-isolated 0.61). That is expected, not a flaw: some of what makes an executive look consistently good genuinely is the durable quality of his organization, and removing it leaves a smaller, noisier personal signal. The stickiness cost is the price of measuring the person rather than the logo.

Honest limits

It is a slow signal — only about a dozen of the ~115 multi-year executives have a band that clears zero. Read the band, not just the point estimate.
Resource-conditioned. TAB controls for on-field payroll but not off-field resources — pitching labs, international scouting, analytics headcount. An ownership that funds infrastructure lifts its GM's number, and good GMs are hired by richer organizations. Read GM WAR as the executive package given the resources he was handed, not a context-free skill.
It inherits TAB's blind spots: an injury lowers a player's WAR, so it dings the GM even when it is not his fault. TAB removes conversion luck, not availability luck.
Two executives who always worked together (a POBO and his GM) cannot be fully told apart — they get similar numbers, which is the honest answer.

Fit on full team-seasons (≥100 games), 1985–2024/25. All figures are franchise-level outcomes credited to the decision-maker in the relevant year — see the GM profiles for per-executive numbers.

Rosternomics — quantifying front-office skill, attribution-clean. Follow @rosternomics on X. Decision-makers on record back to 1933; resource-controlled metrics (WAR & payroll) cover 1985–2026.
Data: FanGraphs (WAR), Baseball-Reference & the Chadwick Bureau register (rosters, IDs), the Lahman Database, Baseball America prospect rankings (via The Baseball Cube), and public payroll records. Transaction and trade data was obtained free of charge from and is copyrighted by Retrosheet. Executive birth/death data via Wikidata (CC0); bios are original. Emoji graphics by OpenMoji (CC BY-SA 4.0).
All figures are franchise-level outcomes credited to the decision-maker in the acquisition year — not solely attributable to one executive.