RRosternomics

GM WAR — wins above a replacement GM

The headline front-office metric: how many wins a general manager adds per season above an average front office on the same budget, with the franchise he inherited and the luck of a short sample stripped out.

What it answers

Every other front-office number on this site measures what happened on a GM's watch. GM WAR asks the harder question: how much of it was actually him? It is the wins-above-budget an executive adds per season, stripped of the franchise he inherited and the noise of a short sample — wins above a replacement GM, an average front office getting the expected return on the same payroll. Units are wins per season.

Step 1 — start from talent, not wins

The build is a ladder, and the first rung is choosing the right outcome. Actual wins (WAB) are more than half luck — one-run games, bullpen sequencing, health — that a GM does not control and that does not repeat. TAB (talent above budget: roster WAR versus what payroll predicts) strips that conversion luck and measures what the executive actually builds. The payoff is reliability — how much a number in one season tells you about the next:

metricyear-to-year stickinesswhat it contains
WAB (wins above budget)0.37talent + conversion luck
TAB (talent above budget)0.47talent only

The gap between them — turning talent into wins — has a GM-to-GM reliability of about 0.05: essentially no repeatable skill. So GM WAR builds on TAB and treats WAB as the scoreboard receipt, not a skill signal.

Step 2 — separate the GM from the franchise

A long-tenured GM at a great organization looks good partly because of the organization — its farm system, analytics group, market. To pull the two apart we fit a crossed random-effects model over every team-season since 1985, estimating a franchise effect and an executive effect jointly:

\[ \text{TAB}_{t} \;=\; \mu \;+\; \underbrace{\text{org}_{\text{franchise}}}_{\text{the machine}} \;+\; \underbrace{\text{gm}_{\text{executive}}}_{\text{the person}} \;+\; \varepsilon_t \]

The 46 executives who ran more than one franchise are what make this identifiable: when a GM's edge travels with him to a new org, the model can tell his skill apart from the building he left behind. The variance splits cleanly:

sourceshare of single-season TAB variance
the franchise (org)~5%
the executive (GM skill)~21%
year-to-year luck / noise~74%

Two lessons hide in that table. First, the org is small — most of the persistence is the person, not the logo. Second, three-quarters of any single season is noise, which is why the last step matters.

Step 3 — shrink for sample size

Because one season is mostly noise, a GM with three lucky years should not outrank a proven twenty-year veteran. GM WAR is an empirical-Bayes estimate: each executive's org-adjusted TAB is shrunk toward zero in proportion to how little we have seen, and carries a ±2 SD band. A five-season GM is pulled roughly halfway to zero with a wide band; a twenty-season GM barely moves. Reliability of the pooled career estimate climbs with tenure (Spearman-Brown):

tenurereliability of the GM WAR estimate
1 season0.23
5 seasons0.61
10 seasons0.75
20 seasons0.86

Every executive is credited over his own tenure window — a President of Baseball Ops and the GM beneath him both get the seasons they shared, exactly like the franchise-level columns, so no one is starved and no senior is diluted by his lieutenant.

How to read it

Why call it "WAR"

Player WAR is wins above a replacement player. GM WAR is wins above a replacement GM — one who simply spends to expectation and gets the league-average return on his budget. It is not roster WAR and not wins-above-a-replacement-player: the baseline is the budget, and the number is org-adjusted and shrunk. Same idea, different replacement level.

Robustness — the attacks we ran

The org-vs-executive split is identified two ways: the abundant within-franchise GM turnover (every team ran 5–12 executives since 1985) and the 46 executives who switched franchises. We tested how much it leans on the switchers and on the non-random timing of hires and fires:

Why org-adjustment lowers stickiness — and why that's correct. The more you strip out the organization, the less sticky the number (no adjustment 0.66 → org-isolated 0.61). That is expected, not a flaw: some of what makes an executive look consistently good genuinely is the durable quality of his organization, and removing it leaves a smaller, noisier personal signal. The stickiness cost is the price of measuring the person rather than the logo.

Honest limits

Fit on full team-seasons (≥100 games), 1985–2024/25. All figures are franchise-level outcomes credited to the decision-maker in the relevant year — see the GM profiles for per-executive numbers.