Selected DU threads from 2005 [Archive]

TruthIsAll

04-06-2009, 09:22 AM

http://www.democraticunderground.com/discuss/duboard.php?az=view_all&address=132x676438

TruthIsAll (1000+ posts) Thu Aug-26-04 10:56 PM
Original message
The TIA Election Model: A Primer for Non-Geeks

Edited on Thu Aug-26-04 11:10 PM by TruthIsAll
Election Model Methodology: A Primer

For all those non-mathematicians and non-statisticians who may be interested, this is a simple guide to the methodology used in my election simulation model:

You can find the latest poll analysis here:
http://www.geocities.com/electionmodel /

There are generally three methods used to track elections and predict a winner. Each makes certain assumptions.

The first method analyzes economic factors: growth, jobs, inflation, etc. Forecasters claim to have had success using this approach (after all, this is what they do for a living). They create an econometric-like model which employs multiple regression/factor analysis to derive a formula based on these variables in order to predict the popular vote.

For the life of me, I can’t figure out how some of these forecasters can predict a 58% popular vote for Bush considering the economic pain inflicted since Jan. 2001. But that’s not an issue I wish to discuss here.

A second method tracks national polling trends and projects the potential movement of undecided or third party voters. There are about 15 major national pollsters. This method also seeks to predict the popular vote. Predicting a majority vote does not mean the winner will gain 270 electoral votes – but this extremely rare event can only in extremely tight elections where the popular vote spread is less than 0.5%. In fact, in a 51-49 popular vote split, there is virtually ZERO probability that the popular vote winner would lose in the Electoral College. It didn’t happen in 2000, because Gore won Florida while he also received 543,000 more votes than Bush (a 0.5% spread).

The third method tracks individual state polls in order to predict electoral votes. The focus is on the hard-fought twenty or so battleground states.

In my election model, I employ methods two and three. I believe that polls are pretty good indicators, provided that they are fresh and non-biased. And they have been pretty accurate as a whole. Why not just use the information that the real voters are telling us?

Like many others, I use the polling results as a starting point, and then make assumptions for the allocation of undecided/other voters to Kerry and Bush. The undecided voters usually split 2-1 for the challenger (Kerry).

To illustrate this important point: If a poll is tied 45-45, then Kerry is really “winning” by 52-48%, since he can expect to receive 7% or more of the undecided 10% (based on historical polling results).

An advantage of national polling is its simplicity. To determine who is “ahead” in the horse race, we only need a single number: the poll point “spread”: between the candidates. If the spread exceeds the polling margin of error (the MoE is typically +/-3% for nationwide polls of 1000 sample size) it is a statistical fact that the current leader has at least a 95% chance of winning the election. But that is just the probability for ONE poll only.

If, however, THREE polls (3000 sample size) are considered, the MoE tightens to +/-1.80%. Let’s assume a 52-48% average spread. The probability that the leader will receive somewhere between 50.2-53.8% (a 3.60% range) of the vote is also 95%. If we add 2.5% for the possibility that he will exceed 53.8%, he now has a 97.5% + cumulative probability of exceeding 50.2% of the vote.

For fifteen polls, the MoE is a very-tight 0.80%. This means that for the same 52-48 % spread, the probability is 95% that the leader will receive between 51.2% and 52.8% of the vote. In this case, the probability of EXCEEDING 50% is 99.99+%.

The bottom line: Assuming Kerry has an average 52%-48% lead in 15 polls the day before the election, then if he loses, it will be due to something OTHER than mere CHANCE.

The 95% confidence interval around the mean comes right out of Statistics 101. The MoE is 1.96 times the standard deviation, which is a statistical measure of the spread of polling results around the mean. The standard deviation is used in the normal distribution (the bell-shaped curve) to determine a confidence interval around the sample MEAN poll result.

But ultimately, the winner must get at least 270 electoral votes. How do we calculate the probability of an election win using the latest state polling results? We extend the method used for the national polls: we now calculate the probabilities of winning each state. To get at the probability of an electoral vote win, we combine these individual probabilities by running a computer election simulation. State polls typically sample 500-600, so the MoE is wider (+/-4%) for each state.

Each candidate has a state win probability based on the latest state poll. In a 50-50 poll split, both have 50% probability of winning the state. If the split is 60-40, the probability of the leader winning the state is 99.999%. It gets interesting when the polls are close, say 51-49%. In this case, the leader has a 69% chance of winning the election. For a 52-48 split, it’s about 83%. For a 53-47 split, it’s 97%.

Now, let’s get back to the probability of Kerry winning at least 270 votes. Its called Monte Carlo simulation: we let the computer “run” 1000 election trials to determine Kerry’s probability of winning 270 Electoral Votes.

In each election trial, we generate a RANDOM (RND)) number between 0 and 1 for EACH state and COMPARE it to the probability of Kerry winning the state.

For example, if the RND generated for FL is .55 and Kerry has a .60% probability of winning FL, he wins the state. If, on the other hand, the RND is .75, then FL goes to Bush. In the same fashion, the computer generates random numbers for all the states and assigns electoral votes to the winner. It does this 1000 times, so we are in effect running 1000 simulated election trials. Say Kerry wins 980 trials. Then he has a 98% chance of winning. In my election model, we calculate Kerry’s average and maximum electoral votes for the these trials.

In today’s run, Kerry won 948 trials, so he has a 94.8% probability of winning the election. He received an average of 323 electoral votes in the 1000 trials.

One advantage of this method is the limited poll “whiplash” in EV's as the "close" states change hands daily in the polls. The changes in EV in the simulation model are minimized. The simulation does not produce the wild ups and downs in EV as do most other models, which just assign the state's EV to the latest poll leader (even if he leads by just 1%).

Hopefully, this clarifies the methodology. Using national and state models has the advantage of providing a mathematical confirmation between the two approaches .

The main point is this: we need to analyze as many polls as possible and recognize the fact that this REDUCES the overall margin of error and we can have more confidence in the results.

A final word, perhaps one that cannot be over-emphasized: The analysis produces a PROBABILITY of a Kerry win, assuming the election were held TOMMORROW. It does not PREDICT a Kerry win. There IS a difference. The model cannot determine the probability that the ELECTION WILL BE STOLEN CYBERSPACE.

I assume a fair election – but I don’t expect one. Not with the BFEE involved.

TIA.

______________________________________________________________________________

EXIT POLLS: THE LATEST MYSTERY POLLSTER BLOG - AND MY COMMENTS.
http://www.democraticunderground.com/discuss/duboard.php?az=view_all&address=203x178839

The crux of the spin is that we should use a 99.5% condidence interval, and not the standard 95% level. Of course, using 99.5% GUARANTEES THAT EVERY POLL RESULT WILL FALL WITHIN THE MARGIN OF ERROR.

THIS IS TOO, TOO OBVIOUS A CLASSIC DIVERSION TO OBSCURE THESE FACTS: THAT THE BUSH VOTE TALLIES EXCEEDED THE MOE IN AT LEAST SIXTEEN STATES, IF ONE USES THE CALCULATED EXIT POLL MOE IN EACH STATE; OR THAT THEY EXCEEDED THE MOE IN TWENTY-THREE STATES, IF ONE USES THE HISTORICALLY PROVEN 2.0% MOE FOR EXIT POLLS.

_____________________________________________________________________________

My response to the latest Mystery Pollster (Jan. 8) blog

http://www.democraticunderground.com/discuss/duboard.php?az=view_all&address=203x276319

Just browsing his site. It seems that the "smoking gun" has left some tracks. I offer my 2c here, for all you DUers who have been reading my stuff these past 6 months.

I'm sure many of you have heard of Sandy Koufax, the great Dodger pitcher. For four years running (1963-1966) he was by far the best pitcher in baseball. Sandy came over the top, the same way, curve or fastball. His delivery was straight and true. Perfection. A true scientist on the mound. No tricks. Just a hard fast ball and a wicked curve. No one could touch him.

Sandy was to the science of pitching what Ted Williams was to the science of hitting. Both were true masters who continually refined their craft. Like Williams, Sandy had two supreme gifts: physical and mental. And he was committed to the truth in everything he ever did in life, before and after he quit baseball at the age of 30 in 1966.

I know. I saw them both play.

Having said that, congratulations to the Mystery Pollster. He has just won the Sandy Koufax Award.

______________________________________________________________________

IT LOOKS LIKE THE MYSTERY POLLSTER DOES NOT WALK ON WATER AFTER ALL

http://www.democraticunderground.com/discuss/duboard.php?az=view_all&address=203x303315

The Math guys are teaching him a thing or two, as they methodically destroy Mitofsky's bogus theory using cold, hard elementary algebra.

Fallacious arguments and asinine excuses are easy to make up.
But they are also very easy to expose, once the forensic auditors analyze them using readily available data.

However you slice it, one plus one equals two.
Always.

Mathematics: The no-spin zone

I wonder what Sandy Koufax thinks about all this?

As one astute observer replied:
"... even if we take the reports' main conclusions as given - that there is a problem with the weather, or distance from the polling site, or interviewer education, or interviewer age, or the interviewing rate et al - it seems that we are inevitably destined for reductio ad absurdum - and the next time the exitpolls are 'wrong', the purported solution will appear as a perfect national blend of gender/age/race/education - and when that doesnt work, we'll hold out for a solution where we try to map those same elements by state, and then by precinct. and when that doesnt work, we'll look at the demographic spread of the recruiters in an attempt to stamp out bias at that level. and so on. and if that doesnt work, we'll find some other seemingly random, contrived statistic that fits the purported narrative such as 'when people pay attention to elections, the WPE increases by order of magnitude - and we have a single data point to prove it'. do others get the same sense? it all just seems kinda futile. which brings me to my next point...

b) (trying not to sound flippant) if we consider exitpolls generally, in the ukraine and elsewhere, they tend to be used as indicators of fraud or otherwise (and i appreciate that the nov2 exitpolls were specifically designed for purposes other than to identify fraud) - but arent they also subject to the same considerations - distrust of the media, age/edu/gender/source of interviewers, weather, distance from polls etc"?

________________________________________________________________________

OK, MYSTERY POLLSTER: REFUTE THE MATH, IF YOU CAN

http://www.democraticunderground.com/discuss/duboard.php?az=view_all&address=203x137129

You have been trying your best to pooh-pooh Freeman, Berkeley, Paulos and others.

Now it's time to put up or shut up.
Show us your probability model, if you have one.
Time to fish or cut bait.

THESE PROBABILITIES HAVE NOT BEEN REFUTED; IN FACT, THEY HAVE BEEN CONFIRMED BY OTHER MATHEMATICIANS HERE AT DU.

These are the probabilities that Bush's tallies could have deviated as they did from the exit polls based on the MOE and calculated using the binomial distribution:

1) If you assume the calculated MOE, based on sample size (as for standard, pre-election polls), then the probability that the exit poll discrepancy exceeds the MOE in a given state is .025. The probability that the discrepancy will exceed the MoE in at least 16 states of 51 states is 1 out of 13.5 TRILLION.

2) If you assume a 2% MOE (more likely for exit polls), then the PROBABILITY of Bush's vote tallies exceeding the MOE in at least 23 states is ZERO. YOU CAN'T COMPUTE THE ODDS, UNLESS YOU HAVE FIGURED OUT A WAY TO DIVIDE BY ZERO.

If N = number of states exceeding the MOE, then the probability is:
Prob = 1-BINOMDIST(N-1, 51, ,025, TRUE)

For (1) and (2) above:
P(1)= 1-BINOMDIST(15, 51, .025, TRUE) ; the odds are 1/P(1)=1/13.5 trillion
P(2)= 1-BINOMDIST(22, 51, .025, TRUE) ; the odds are 1/P(2) = 1/0 =???

It's as simple as that. Try it yourself in Excel.

__________________________________________________________________________

Ignore the RW spin. Calc the SAMPLE-SIZE for any MoE and confidence level

http://www.democraticunderground.com/discuss/duboard.php?az=view_all&address=203x368488#368491

Their baaaack.

The "State Exit Polls (73,000 respondents) and
National Exit Poll (13047) were lousy samples" crowd.

So, let's reverse the spin.
Are they already backing off rBr?

Want a particular MoE? Calculate your own sample-size. No rocket science here.

And here's what Edison-Mitofsky say about MoE:
Surprise!Its 1% (rounded) for 8,000+.
We calculate it as 0.87% for 13047.

________________________________________________________________________

http://www.democraticunderground.com/discuss/duboard.php?az=view_all&address=203x365550

EXIT POLLS: WHAT YOU SHOULD KNOW

Edited on Tue May-03-05 01:10 AM by TruthIsAll
Let's dispel the following myths from those naysayers who since the election continue to claim that exit polls are:

1) unsophisticated and unreliable,
2) not based on true random samples,
3) biased by reluctant responders and
4) do not take historical trends into account.

There should be no Mystery regarding Exit Polls. And believe it or not, the Mystery Pollster and I are in complete agreement:

Late in the day exit polls, like the 12:22am National Exit Poll of 13047 respondents and the various state exit polls (both had Kerry winning by 51-48%) are scientific, accurate and reliable.

The Mystery Pollster

EXIT POLLS: WHAT YOU SHOULD KNOW

snip

SOPHISTICATED AND RELIABLE
I have always been a fan of exit polls. Despite the occasional controversies, exit polls remain among the most sophisticated and reliable political surveys available. They will offer an unparalleled look at today's voters in a way that would be impossible without quality survey data. Having said that, they are still just random sample surveys, possessing the usual limitations plus some that are unique to exit polling (I also remain dubious about weighting telephone surveys to match them, but that is another story for another day).

A RANDOM SAMPLING OF 1495 PRECINCTS
A quick summary of how exit polls work: The exit pollster begins by drawing a random sampling of precincts within a state, selected so that the odds of any precinct being selected are proportionate to the number that typically vote in that precinct. The National Election Pool Exit Poll, which is conducting the exit polling for the six major networks today, will send exit pollsters to 1,495 precincts across the country.

A RANDOM SELECTION OF 100 VOTERS PER PRECINCT
One or sometimes two interviewers will report to each sampled precinct. They will stand outside and attempt to randomly select roughly 100 voters during the day as they exit from voting. The interviewer will accomplish this task by counting voters as they leave the polling place and selecting every voter at a specific interval (every 10th or 20th voter, for example). The interval is chosen so that approximately 100 interviews will be spread evenly over the course of the day.

RELUCTANT RESPONDERS: STATISTICAL CORRECTIONS FOR ANY BIAS
When a voter refuses to participate, the interviewer records their gender, race and approximate age. This data allow the exit pollsters to do statistical corrections for any bias in gender, race and age that might result from refusals.

snip

WEIGHTINGS BASED ON HISTORICAL DATA AND VOTER TURNOUT
One of the unique aspects of the exit poll design is the way it gradually incorporates real turnout and vote data as it becomes available once the polls close. The exit poll designers have developed weighting schemes and algorithms to allow all sorts of comparisons to historical data that supports the networks as they decide whether to "call” a state for a particular candidate. When all of the votes have been counted, the exit poll is weighted by the vote to match the actual result.

snip

___________________________________________________________________________

New USCV criticism Revised -- TIA what about the math part of this?

http://www.democraticunderground.com/discuss/duboard.php?az=view_all&address=203x363383