Team Strength Estimate (TSE)
Measuring Tactical Ability
Back to SRA
PWP values are a compressed scale (in comparison to pitcher's winning percentage in real baseball) in that they represent a player's net contribution to overall team performance. Top players have PWP in excess of 530 (for example, clear first round picks in the 1995 card set like Bonds, Griffey and Bagwell), and a truly great player who had a hall of fame type year has a PWP around 550 (Greg Maddux is the canonical example in the 95 card set). The worst everyday players have PWP values around 490-480, and reserves are below that level. Note that PWP is highly dependent on the league environment - you cannot measure PWP in isolation. PWP is designed to reflect that fact that a card only has value relative to what else is available. For example, an 8-team draft with 8 Barry Bonds available in LF would give each Barry a 500 LF PWP, since he is average by definition. PWP values like 550 may sound unimpressive, but remember what it means - add just this one guy to an otherwise average team and you instantly get a 550 team !
To get a team strength value from a collection of PWPs that would make up a team roster, subtract 500 from each player's PWP value, add up these 'centralized' PWPs for all regulars and pitchers (bench strength isn't directly used at the moment), and then add 500 to the result. This is what we call the TSE. As a simple example, suppose an otherwise 500 team (a roster with an average lineup and bullpen) had a 4 man rotation whose individual PWP values were 510, 505, 490, and 470. Then the TSE would be 10+5-10-30+500=475. Actual TSE computations take platoons and expected usage of pitchers into account (e.g., aces get used more than just mediocre starters), but the principle is the same as in this example.
To determine whether TSE is an accurate measure, SRA applied it to draft data and associated record data from the 1995 TBA season (graciously supplied by John Kreuz, the TBA director). To insure that the observed team performance is uncorrelated we selected one team from each TBA draft (using multiple teams per draft would inject a dependency in the observed W/L records, since the two teams from the same draft would have played each other). In order to limit the possible effects of draft position, we fixed a draft position for all TBA drafts and measured the TSE for teams in that position across the available drafts. This assumes that the set of TBA players who ended up with the 'nth.' pick were as a group average tactically, so that their observed performance as a group should be tracked by their aggregate TSE. Note that for technical reasons, only games played in the first round of tournament play were used in this analysis.
The summary data for teams from the first, middle, and last picks in 36 1995 TBA drafts are given below. Draft sizes ranged from 8 to 14 teams. Note that total games played will not be the same for each row because players often forgo games late in the day when they no longer have a chance to win their division. In an even numbered draft, the middle pick was defined as n/2.
W L Observed W% aveTSE pick 267 274 494 498 first 316 287 524 525 middle 275 291 486 489 lastCare should be taken in interpreting this data. Each row is not an independent case, since there are correlations between the rows in W/L record. At first glance, it would seem that TSE is slightly optimistic, but detailed statistical analysis does not bear this out. Overall, the results indicate that TSE is a good indicator of a manager's strategic abilities with a small error (about 5 points) and no systematic bias (just as likely to over-predict versus under-predict for a given manager).
Given that TSE is an accurate measure, then the difference between a given manager's aggregate TSE and observed winning percentage over several tournaments gives an indication of his tactical skill. In order to get an adequate sample of games played for the available TBA data, this information was collected for those individuals who played more than 100 TBA games during 1995. There were 18 such TBA players. To preserve privacy, player names are replaced with arbitrary symbols. The column labeled TAC is the difference between the TSE and winning percentage (W%), the column labeled STR is TSE-500.
Sorted by W% Sorted by STR Sorted by TAC Plyr N TSE W% TAC STR Plyr N TSE W% TAC STR Plyr N TSE W% TAC STR P1 104 582 673 91.2 81.8 P1 104 582 673 91.2 81.8 P3 106 499 632 133 -0.89 P2 110 540 645 106 39.9 P10 168 580 577 -2.9 80.3 P4 166 515 627 111 15.3 P3 106 499 632 133 -0.89 P7 148 545 588 42.5 45.3 P2 110 540 645 106 39.9 P4 166 515 627 111 15.3 P2 110 540 645 106 39.9 P5 115 503 600 97.1 2.87 P5 115 503 600 97.1 2.87 P11 135 530 533 3.46 29.9 P8 104 582 673 91.2 81.8 P7 148 545 588 42.5 45.3 P9 118 514 585 70.9 13.8 P6 117 510 598 88.4 9.85 P8 138 492 587 94.6 -7.62 P6 117 510 598 88.4 9.85 P9 118 514 585 70.9 13.8 P9 118 514 585 70.9 13.8 P12 160 509 513 3.72 8.78 P7 148 545 588 42.5 45.3 P10 168 580 577 -2.9 80.3 P14 259 509 475 -34 8.77 P12 160 509 513 3.72 8.78 P11 135 530 533 3.46 29.9 P5 115 503 600 97.1 2.87 P11 135 530 533 3.46 29.9 P12 160 509 513 3.72 8.78 P3 106 499 632 133 -0.89 P13 117 498 496 -1.8 -2.46 P13 117 498 496 -1.8 -2.46 P13 117 498 496 -1.8 -2.46 P10 168 580 577 -2.9 80.3 P14 259 509 475 -34 8.77 P17 112 493 384 -109 -7.22 P15 114 489 465 -24 -11.4 P15 114 489 465 -24 -11.4 P8 138 492 587 94.6 -7.62 P16 152 472 447 -25 -27.7 P16 152 472 447 -25 -27.7 P15 114 489 465 -24 -11.4 P14 259 509 475 -34 8.77 P17 112 493 384 -109 -7.22 P18 116 474 362 -112 -25.6 P17 112 493 384 -109 -7.22 P18 116 474 362 -112 -25.6 P16 152 472 447 -25 -27.7 P18 116 474 362 -112 -25.6It is important to point out that this set of players are not typical. Clearly these players are better than average with an aggregate winning percentage of 540. This stands to reason - strong players are more likely to play many tournaments than weak players.
1. Tactical ability has a significantly larger impact on performance than strategic ability. This is clear from the larger dynamic range of the TAC factor (+133 to -112) as compared to the STR factor (+82 to -28). Note that for both factors, there are more player above 0 (i.e., above average) than below 0. This is mostly a consequence of the atypical player sample as mentioned above.
When the same analysis is applied to a larger player sample consisting of all managers who played in at least 3 tournaments, the TAC factor ranged from +130 to -134, and STR ranged from +82 to -70. More importantly, the distribution of TAC and STR values were essentially symmetric about 0, indicating a more representative group of the 'average' TBA player.
The bottom line is tactical ability is roughly about twice as important as strategic ability. Put simply, how good you are at playing the game means a lot more than having a good draft.
2. Strategic ability is important, in that a really poor roster can put you in a hole too deep to climb out of.
3. It is possible that factors not measured by TSE (such as bench strength and overall team flexibility) also have important effects. Clearly, the distinction drawn in this study between tactical and strategic is somewhat arbitrary, and it is possible to have two teams with identical TSE that differ greatly in the 'achievable' TAC. A simple example would be a team with a bench that is populated by reserves with similar card structures to the regulars they are backing up, so that the team as a whole cannot respond very well to one-sided starters. Another such example would be a bullpen that can get both LH and RH hitters out, but that has few (or no) LH pitchers. TSE considers a bullpens' ability to get both LH and RH hitter out, but doesn't explicitly measure the 'handedness' of the pitchers in the pen.
4. Until TSE is extended to analyze 'role' players, it would be premature to interpret TSE as a comprehensive measure of the team quality. It is, however, certainly a decent estimate of team quality as the draft position studies indicate. The primary obstacle to extending TSE to consider role players is the lack of an objective criteria for assessing their frequency of use. It is easy to get a handle on how often platoon players, starting pitchers, and front-line relief pitchers get used, but it is more difficult to measure the true impact of pinch hitters, defensive subs, and bullpen L/R 'balance'.