The headline front-office metric: how many wins a general manager adds per season above an average front office on the same budget, with the franchise he inherited and the luck of a short sample stripped out.
Every other front-office number on this site measures what happened on a GM's watch. GM WAR asks the harder question: how much of it was actually him? It is the wins-above-budget an executive adds per season, stripped of the franchise he inherited and the noise of a short sample — wins above a replacement GM, an average front office getting the expected return on the same payroll. Units are wins per season.
The build is a ladder, and the first rung is choosing the right outcome. Actual wins (WAB) are more than half luck — one-run games, bullpen sequencing, health — that a GM does not control and that does not repeat. TAB (talent above budget: roster WAR versus what payroll predicts) strips that conversion luck and measures what the executive actually builds. The payoff is reliability — how much a number in one season tells you about the next:
| metric | year-to-year stickiness | what it contains |
|---|---|---|
| WAB (wins above budget) | 0.37 | talent + conversion luck |
| TAB (talent above budget) | 0.47 | talent only |
The gap between them — turning talent into wins — has a GM-to-GM reliability of about 0.05: essentially no repeatable skill. So GM WAR builds on TAB and treats WAB as the scoreboard receipt, not a skill signal.
A long-tenured GM at a great organization looks good partly because of the organization — its farm system, analytics group, market. To pull the two apart we fit a crossed random-effects model over every team-season since 1985, estimating a franchise effect and an executive effect jointly:
\[ \text{TAB}_{t} \;=\; \mu \;+\; \underbrace{\text{org}_{\text{franchise}}}_{\text{the machine}} \;+\; \underbrace{\text{gm}_{\text{executive}}}_{\text{the person}} \;+\; \varepsilon_t \]The 46 executives who ran more than one franchise are what make this identifiable: when a GM's edge travels with him to a new org, the model can tell his skill apart from the building he left behind. The variance splits cleanly:
| source | share of single-season TAB variance |
|---|---|
| the franchise (org) | ~5% |
| the executive (GM skill) | ~21% |
| year-to-year luck / noise | ~74% |
Two lessons hide in that table. First, the org is small — most of the persistence is the person, not the logo. Second, three-quarters of any single season is noise, which is why the last step matters.
Because one season is mostly noise, a GM with three lucky years should not outrank a proven twenty-year veteran. GM WAR is an empirical-Bayes estimate: each executive's org-adjusted TAB is shrunk toward zero in proportion to how little we have seen, and carries a ±2 SD band. A five-season GM is pulled roughly halfway to zero with a wide band; a twenty-season GM barely moves. Reliability of the pooled career estimate climbs with tenure (Spearman-Brown):
| tenure | reliability of the GM WAR estimate |
|---|---|
| 1 season | 0.23 |
| 5 seasons | 0.61 |
| 10 seasons | 0.75 |
| 20 seasons | 0.86 |
Every executive is credited over his own tenure window — a President of Baseball Ops and the GM beneath him both get the seasons they shared, exactly like the franchise-level columns, so no one is starved and no senior is diluted by his lieutenant.
Player WAR is wins above a replacement player. GM WAR is wins above a replacement GM — one who simply spends to expectation and gets the league-average return on his budget. It is not roster WAR and not wins-above-a-replacement-player: the baseline is the budget, and the number is org-adjusted and shrunk. Same idea, different replacement level.
The org-vs-executive split is identified two ways: the abundant within-franchise GM turnover (every team ran 5–12 executives since 1985) and the 46 executives who switched franchises. We tested how much it leans on the switchers and on the non-random timing of hires and fires:
Why org-adjustment lowers stickiness — and why that's correct. The more you strip out the organization, the less sticky the number (no adjustment 0.66 → org-isolated 0.61). That is expected, not a flaw: some of what makes an executive look consistently good genuinely is the durable quality of his organization, and removing it leaves a smaller, noisier personal signal. The stickiness cost is the price of measuring the person rather than the logo.
Fit on full team-seasons (≥100 games), 1985–2024/25. All figures are franchise-level outcomes credited to the decision-maker in the relevant year — see the GM profiles for per-executive numbers.