Setting the record straight on Election Probability Models [Archive]

TruthIsAll

07-01-2008, 12:47 PM

Election Forecasting Methodology

TruthIsAll

There has been much misinformation regarding electoral and popular vote win probability calculations in the media, in academia (via election forecasting models) and on the Internet. This analysis will show that the probability calculations are incorrect even if one assumes a fraud-free election. Election projection models based on current polling data which assign McCain more than a 10% win probability (assuming that the election is held today) are mathematically incorrect.

The 2008 Election Model is updated frequently for the latest state and national polls. Assuming the election was held on July 1, Obama had a 99.92% probability of winning the electoral vote in a fraud-free election. He had a corresponding 99.92% probability of winning the popular vote. The probabilities were calculated using two independent methods: 1) a 5000-trial Monte Carlo Simulation - to determine the probability of winning the electoral vote, and 2) the Normal distribution function for the probability of winning the popular vote.

Both calculations utilize the latest state polls. They assume that Obama will win 60% of the undecided vote. Each state poll has a 4.0% MOE (600 sample). The aggregate MoE for the popular vote probability calculation is 2.0% (30,000 sample in 50 states).

His projected 53.2% popular vote and 2% MoE are input to
NORMDIST (.532. 50, .02/1.96, true) which returns the 99.92% probability.

Obama won 4996 (99.2%) of 5000 Monte Carlo simulated election trials.

Sensitivity Analysis
Undecided voter allocation assumption
Obama 50% 55% 60% 65% 70%

State model: Projected weighted average vote share
Obama 52.2 52.7 53.2 53.7 54.2
McCain 47.8 47.3 46.8 46.3 45.8

MoE Probability Obama wins popular vote (normal distribution)
2.0% 98.4 99.6 99.92 100.0 100.0
3.0% 92.4 96.1 98.22 99.3 99.7

Monte Carlo Simulation (5000 election trials)
Probability Obama wins electoral vote (trial wins/5000)
Wins 4892 4972 4996 4999 5000
Prob 97.8 99.4 99.92 100.0 100.0

Obama Average Electoral Vote
Average 316 330 343 358 372
Median 317 331 344 358 372

There are two basic methods used to forecast presidential elections:
1) Projections based on state and national polls
2) Time-series regression models.

Statistical polling (state and national) is an indicator of current voter preference. In the Election Model, state poll shares are adjusted for undecided voters and the associated win probabilities are then input to a 5000 election trial Monte Carlo simulation. The goal is to calculate the expected electoral vote shares and the probability of an electoral vote victory. The probability is simply the number of winning election trials divided by 5000. The projection is not a long-term forecast; it assumes the election is held on the day of the projection.

Intuitively, the probability of winning the True (fraud-free) popular vote should correlate to the Monte Carlo simulation probability of winning the electoral vote. In fact, if both probabilities are within a percentage point of each other, we can have confidence that they are correct mathematically. Probabilities generated by academics are inconsistent with forecast vote shares (see below) and do not check them against the probability of winning the electoral vote.

The Election Model Monte Carlo simulation uses poll-based vote projections to determine the probability of winning the state. The probability is calculated for all 50 states and 5000 simulated election trials are executed to determine the average electoral vote split and the number of winning trials for each candidate. The probability of winning the electoral vote is just the number of winning trials divided by 5000.

The probability of winning the popular vote is based on the projected aggregate state 2-party vote share and margin of error. These are input to the Excel normal distribution function NORMDIST: Prob (popular vote win) = NORMDIST (vote share, 0.50, MoE/1.96, True)

Academics and political scientists create multiple regression models to forecast election vote shares months in advance. The models utilize time-series data as relevant input variables. These are typically economic growth, inflation, job growth, interest rates, foreign policy, historical election vote shares, etc. Regression modeling is an interesting theoretical exercise which does not account for the daily events which affect voter psychology.

Polling and regression models are analogous to the current market value of a stock and its intrinsic (theoretical) value. The intrinsic value is based on forecast annual cash flows and rarely is equal to market value. The latest poll is to the current stock price as the regression model vote share is to intrinsic value.

Inherent problems exist in election models. The implicit forecast assumption is that the official recorded vote will accurately reflect the True Vote; the election will be fraud-free. Election forecasts and media pundits never account for the possibility of fraud. Final state and national polls, when adjusted for undecided voters and estimated turnout, are superior to regression models based on historical time series executed months in advance.

Election fraud has permeated all elections since 2000. Election forecasting models which predicted a Bush win in 2000 and 2004 were only superficially "correct". Bush won the recorded vote. But Gore and Kerry won the True vote. Except for the Election Calculator model, which accounts for uncounted votes, virtually all election forecast models, including the Election Model, assume that the election will be fraud-free.

FRAUD is never used as a factor variable in academic election regression forecasting models. That's understandable, but it's not even mentioned as a factor which could conceivably skew the forecast. It's also never mentioned by the media/pollsters in their daily "horserace" tracking polls. But that,too, is understandable. If Democrats haven't raised the issue after two stolen elections, why should they expect the GOP media to do it for them?

Statistical analyses provided by internet spreadsheet bloggers which concluded that BushCo stole the elections from Gore and Kerry was dismissed as "just another conspiracy theory" by the media right after the election. In fact, these "conspiracy freaks" have even been banned after posting on from various so-called liberal discussion forums, such as Daily Kos and myDD. Only recently has Kos allowed discussion of the topic. Polling sites never mention fraud in their trend analysis.

Is there anyone out there who still believes that Bush won fairly?
The following 2004 election forecasting models were executed 2-9 months before the election.
http://www.apsanet.org/content_13000.cfm

The average Bush 53.9% projection deviated from the aggregate unadjusted state exit poll (47.7%). None of the models forecast the electoral vote or mentioned the possibility of election fraud. Except for Beck/Tien, the popular vote win probabilities were incompatible with forecast vote share. Assuming a 3.0% margin of error, a 53% vote share implies a 97.5% popular vote win probability. A 54% vote share implies a 99.99% probability.

Author........Date Pick 2-pty Win Prob
Beck/Tien... 27-Aug Kerry 50.1 50
Abramowitz.. 31-Jul Bush 53.7 -
Campbell.... 06-Sep Bush 53.8 97
Wlezien..... 27-Jul Bush 52.9 75
Holbrook.... 30-Aug Bush 54.5 92
Lockabie.... 21-May Bush 57.6 92
Norpoth..... 29-Jan Bush 54.7 95

Recorded.... 2-Nov Bush 51.2

Exit Polls
State......... Kerry 52.3 99.9 Unadjusted WPE method
Nat EP1...... Kerry 51.9 99.9 12:22am, 39 Gore/41 Bush Voted 2k weights
Nat EP2...... Kerry 52.9 100.0 12:22am, adj. 37.6/37.4 wts, 122.3m recorded

Election Model (11/01)
State......... Kerry 51.8 99.8 EV Simulation: 4995 wins/5000 trials
National..... Kerry 51.8 99.8 Final 5 national polls average projection

Election Calculator
True Vote...... Kerry 53.7 100.0 Voted 2k shares, 39.5/37.1, 125.7m votes cast; 2000 voters: mortality (5%), 95% turnout in 2004