Team Strength and Managing Skill

Just how much impact does managerial ability have on winning ? Are there managers who can win with whatever they have ? Or is the underlying strength of the team really the deciding factor ? In this article the results of a multi-year effort by Strat Research Associates (SRA) to answer this question will be summarized.

Table of Contents

Background

Team Strength Estimate (TSE)

Measuring Tactical Ability

Conclusions

Back to SRA

Background

Before we proceed, it is useful to define a few terms. We will define the process of assembling a roster 'strategic' ability, whereas the selection of pitchers, starting lineup, and in-game decisions amount to 'tactical' ability. Finally, note that in this study the strategic component assumes a pure draft league where no players are kept from year to year.

The Team Strength Estimate (TSE)

The TSE is obtained by using a method based on the same principles used in the LINEUP spreadsheet [see the Fall 95 Baseball Quarterly issue of STRAT FAN for a description of LINEUP]. Since LINEUP only assesses one 9- man team versus another 9-man team using the runs created method, SRA has developed a program (called DRAFT) that estimates the winning percentage contribution of a given player at a given position to a hypothetical team that is average in all other respects. That is, an 'average' player has a winning percentage value of 500 in DRAFT. DRAFT requires as input the ballpark dimensions (this should be the average dimensions given the home and road parks) and number of teams in the league, and computes a winning percentage for all positions played by every player in it's data base. DRAFT automatically computes what the average player at each position (including LH and RH starters and relievers) 'looks like' given the league size, so that a player's position winning percentage (or PWP) is league size dependent. In order to compute the average player at each position (referred to as the league environment) DRAFT also requires that the eligible players be mapped into positions in a rough rank order of their 'value', a process that currently is done iteratively in a manual, off-line mode. Once the league environment is known, computing PWP is straight forward. For example, to assess Barry Bonds' LF PWP, replace the average left fielder with Barry's offensive and defensive numbers, compute the # of runs this team should score and allow, and then compare that to the average team runs scored and allowed via the Pythagorean method (W/L= [runs for / runs against]^2 ) to get a winning percentage.

PWP values are a compressed scale (in comparison to pitcher's winning percentage in real baseball) in that they represent a player's net contribution to overall team performance. Top players have PWP in excess of 530 (for example, clear first round picks in the 1995 card set like Bonds, Griffey and Bagwell), and a truly great player who had a hall of fame type year has a PWP around 550 (Greg Maddux is the canonical example in the 95 card set). The worst everyday players have PWP values around 490-480, and reserves are below that level. Note that PWP is highly dependent on the league environment - you cannot measure PWP in isolation. PWP is designed to reflect that fact that a card only has value relative to what else is available. For example, an 8-team draft with 8 Barry Bonds available in LF would give each Barry a 500 LF PWP, since he is average by definition. PWP values like 550 may sound unimpressive, but remember what it means - add just this one guy to an otherwise average team and you instantly get a 550 team !

To get a team strength value from a collection of PWPs that would make up a team roster, subtract 500 from each player's PWP value, add up these 'centralized' PWPs for all regulars and pitchers (bench strength isn't directly used at the moment), and then add 500 to the result. This is what we call the TSE. As a simple example, suppose an otherwise 500 team (a roster with an average lineup and bullpen) had a 4 man rotation whose individual PWP values were 510, 505, 490, and 470. Then the TSE would be 10+5-10-30+500=475. Actual TSE computations take platoons and expected usage of pitchers into account (e.g., aces get used more than just mediocre starters), but the principle is the same as in this example.

To determine whether TSE is an accurate measure, SRA applied it to draft data and associated record data from the 1995 TBA season (graciously supplied by John Kreuz, the TBA director). To insure that the observed team performance is uncorrelated we selected one team from each TBA draft (using multiple teams per draft would inject a dependency in the observed W/L records, since the two teams from the same draft would have played each other). In order to limit the possible effects of draft position, we fixed a draft position for all TBA drafts and measured the TSE for teams in that position across the available drafts. This assumes that the set of TBA players who ended up with the 'nth.' pick were as a group average tactically, so that their observed performance as a group should be tracked by their aggregate TSE. Note that for technical reasons, only games played in the first round of tournament play were used in this analysis.

The summary data for teams from the first, middle, and last picks in 36 1995 TBA drafts are given below. Draft sizes ranged from 8 to 14 teams. Note that total games played will not be the same for each row because players often forgo games late in the day when they no longer have a chance to win their division. In an even numbered draft, the middle pick was defined as n/2.

 W    L     Observed W%      aveTSE            pick
267  274       494            498              first 
316  287       524            525              middle
275  291       486            489              last
Care should be taken in interpreting this data. Each row is not an independent case, since there are correlations between the rows in W/L record. At first glance, it would seem that TSE is slightly optimistic, but detailed statistical analysis does not bear this out. Overall, the results indicate that TSE is a good indicator of a manager's strategic abilities with a small error (about 5 points) and no systematic bias (just as likely to over-predict versus under-predict for a given manager).

Measuring Tactical Ability

Given that TSE is an accurate measure, then the difference between a given manager's aggregate TSE and observed winning percentage over several tournaments gives an indication of his tactical skill. In order to get an adequate sample of games played for the available TBA data, this information was collected for those individuals who played more than 100 TBA games during 1995. There were 18 such TBA players. To preserve privacy, player names are replaced with arbitrary symbols. The column labeled TAC is the difference between the TSE and winning percentage (W%), the column labeled STR is TSE-500.

      Sorted by W%               Sorted by STR              Sorted by TAC
Plyr N  TSE  W% TAC  STR   Plyr N  TSE  W% TAC  STR   Plyr N  TSE  W% TAC  STR
P1  104 582 673 91.2 81.8  P1  104 582 673 91.2 81.8  P3  106 499 632 133 -0.89
P2  110 540 645 106  39.9  P10 168 580 577 -2.9 80.3  P4  166 515 627 111  15.3
P3  106 499 632 133  -0.89 P7  148 545 588 42.5 45.3  P2  110 540 645 106  39.9 
P4  166 515 627 111  15.3  P2  110 540 645 106  39.9  P5  115 503 600 97.1 2.87
P5  115 503 600 97.1 2.87  P11 135 530 533 3.46 29.9  P8  104 582 673 91.2 81.8
P7  148 545 588 42.5 45.3  P9  118 514 585 70.9 13.8  P6  117 510 598 88.4 9.85
P8  138 492 587 94.6 -7.62 P6  117 510 598 88.4 9.85  P9  118 514 585 70.9 13.8
P9  118 514 585 70.9 13.8  P12 160 509 513 3.72 8.78  P7  148 545 588 42.5 45.3
P10 168 580 577 -2.9 80.3  P14 259 509 475 -34  8.77  P12 160 509 513 3.72 8.78
P11 135 530 533 3.46 29.9  P5  115 503 600 97.1 2.87  P11 135 530 533 3.46 29.9
P12 160 509 513 3.72 8.78  P3  106 499 632 133  -0.89 P13 117 498 496 -1.8 -2.46
P13 117 498 496 -1.8 -2.46 P13 117 498 496 -1.8 -2.46 P10 168 580 577 -2.9 80.3
P14 259 509 475 -34  8.77  P17 112 493 384 -109 -7.22 P15 114 489 465 -24  -11.4
P15 114 489 465 -24  -11.4 P8  138 492 587 94.6 -7.62 P16 152 472 447 -25  -27.7
P16 152 472 447 -25  -27.7 P15 114 489 465 -24  -11.4 P14 259 509 475 -34  8.77
P17 112 493 384 -109 -7.22 P18 116 474 362 -112 -25.6 P17 112 493 384 -109 -7.22
P18 116 474 362 -112 -25.6 P16 152 472 447 -25  -27.7 P18 116 474 362 -112 -25.6
It is important to point out that this set of players are not typical. Clearly these players are better than average with an aggregate winning percentage of 540. This stands to reason - strong players are more likely to play many tournaments than weak players.

Conclusions

From this study several conclusions are evident.

1. Tactical ability has a significantly larger impact on performance than strategic ability. This is clear from the larger dynamic range of the TAC factor (+133 to -112) as compared to the STR factor (+82 to -28). Note that for both factors, there are more player above 0 (i.e., above average) than below 0. This is mostly a consequence of the atypical player sample as mentioned above.

When the same analysis is applied to a larger player sample consisting of all managers who played in at least 3 tournaments, the TAC factor ranged from +130 to -134, and STR ranged from +82 to -70. More importantly, the distribution of TAC and STR values were essentially symmetric about 0, indicating a more representative group of the 'average' TBA player.

The bottom line is tactical ability is roughly about twice as important as strategic ability. Put simply, how good you are at playing the game means a lot more than having a good draft.

2. Strategic ability is important, in that a really poor roster can put you in a hole too deep to climb out of.

3. It is possible that factors not measured by TSE (such as bench strength and overall team flexibility) also have important effects. Clearly, the distinction drawn in this study between tactical and strategic is somewhat arbitrary, and it is possible to have two teams with identical TSE that differ greatly in the 'achievable' TAC. A simple example would be a team with a bench that is populated by reserves with similar card structures to the regulars they are backing up, so that the team as a whole cannot respond very well to one-sided starters. Another such example would be a bullpen that can get both LH and RH hitters out, but that has few (or no) LH pitchers. TSE considers a bullpens' ability to get both LH and RH hitter out, but doesn't explicitly measure the 'handedness' of the pitchers in the pen.

4. Until TSE is extended to analyze 'role' players, it would be premature to interpret TSE as a comprehensive measure of the team quality. It is, however, certainly a decent estimate of team quality as the draft position studies indicate. The primary obstacle to extending TSE to consider role players is the lack of an objective criteria for assessing their frequency of use. It is easy to get a handle on how often platoon players, starting pitchers, and front-line relief pitchers get used, but it is more difficult to measure the true impact of pinch hitters, defensive subs, and bullpen L/R 'balance'.


95DEC29