Gallup Position on "Weighting" an Entire Sample to Reflect a Target Distribution of Party Identification

I'm posting here the latest Gallup statement on the issue of weighting a poll sample to some assumed target value of party identification. I presented a paper on this issue at AAPOR last May in Miami Beach, Fla., and we at Gallup have been continually reviewing, researching, and discussing our policies on this issue for years.

I welcome any comments or feedback -- including, of course, points of disagreement and/or rationale for other procedures. This has, at times in the past, been a hot-button issue, and it's obviously an important one.

The decision to weight a national sample to a population parameter of party identification (PID) assumes that there is a reliable measure of PID to which the sample can be weighted. There are, however, no such measures at this point in time. There are no official government-provided data on party identification (as there are for education, race, age, gender, and region, all provided by the U.S. Census Bureau). There are no consistent state-level registration figures on party registration. Each state has different rules for registering to vote, and some states have no party-based registration at all. Exit poll data, collected during each presidential election year, are themselves based on surveys, subject to a variety of sources of error, and therefore not realistically usable as a party ID weighting target. There is also evidence to suggest that there is in fact no stable population value of PID at all, and that PID in the population varies like other political attitudes over relatively short periods.

An alternative is to weight samples to an estimate of party identification based on a rolling average computed from the current and other recent surveys. This is a form of bootstrapped weighting, using multiple sample-generated measures as the basis for a population estimate to which a single sample is in turn weighted.

This procedure involves weighting on the basis of respondents' answers to a single question about party identification (across multiple surveys), and the resulting weighting affects the response patterns to all other questions included in the survey.

Gallup does not believe that one question about party identification -- usually situated at the back of the survey instrument -- should be the basis for such major transformation of the entire sample. This is mainly because the answer to the party identification question may vary for reasons other than sampling error -- including real-world change.

Weighting a sample to a population parameter assumes that it is necessary to correct for sampling error. By sampling error, we mean a situation in which the sampling procedure selects the wrong people, that is, people who are not representative of the underlying population. Weighting either to a static parameter or to a smoothed average attempts to correct for these assumed sampling irregularities.

However, Gallup is not convinced that variation in party identification from poll to poll predominantly results from sampling error. In other words, Gallup is not convinced that party identification varies from sample to sample because the wrong percentages of "real" Republicans, independents, or Democrats are selected into the sample.

Instead, it seems at least equally likely that survey-to-survey variation in self-reported party identification is caused by two other factors: 1) measurement error and 2) real change in the population.

By measurement error, we mean primarily that question context may influence respondents' answers to the party identification question.

In most of Gallup's polls, the party identification question is asked in the last section of the questionnaire, with all substantive policy questions asked beforehand. These preceding substantive questions may influence how respondents answer the party identification question. For example, a survey asking about domestic policies may result in more people without a stable party identification identifying as Democrats, while a survey focused on terrorism could cause the same group to identify as Republicans.

It does not take a great deal of variation to affect the distribution of party identification. If only 5% of respondents change their identification from Democrat to independent, and another 5% change from independent to Republican (meaning that 90% do not change), the Republican-Democratic margin within a sample would "swing" by 10 percentage points.

The hypothesis that variation in self-reported party identification is the result of measurement error assumes that party identification for some Americans is fluid and as much an attitude as it is a self-report of a stable characteristic such as age, gender, or education. Rather than being an immutable demographic characteristic that respondents will usually self-report the same across surveys (as would be the case with gender, marital status, or number of children), PID can be viewed, for at least a subgroup of Americans, as more of an attitude subject to change as a result of short-term environmental stimuli.

If variation in party identification is the result of measurement error, then weighting the entire sample based on PID increases the error or bias in other measures within the sample. For example, in most Gallup surveys, presidential job approval is asked at the beginning of the survey. Party identification, as noted, is asked at the end. If a given sample is perfectly representative of the underlying population, it is possible that the people in that sample might give different responses to the PID question depending on the context of a particular questionnaire. But the presidential job approval question, asked at the beginning of the survey, would not have been subject to those questionnaire influences (because it was the first question asked). Thus, weighting the entire sample based on responses to the PID question at the back of the survey would distort the (accurate) responses to the job approval question. The responses to the job approval question, in short, would be altered in the misguided assumption that variation in response to the PID question resulted from a broad sample issue, rather than the result of a measurement issue affecting only questions at the back of the survey.

There is also the possibility that variation in party identification reflects neither sampling nor measurement error, but rather "real" change in the population. In other words, if all the adults in the country were interviewed and asked exactly the same questions from survey to survey, it would not be unusual to find that the percentages of people who identified with each party were different from survey to survey, even within a short period. This would be considered "real" population change because everyone was interviewed (avoiding sampling error) and all the questions were identical (avoiding measurement error). Thus, weighting the entire sample based on responses to the PID question would decrease, rather than increase, the representativeness of the sample compared to the underlying population.

There are considerable data suggesting that variations in PID from survey to survey result from either real population change or measurement error (as opposed to sampling error).

Panel studies (conducted with the same people over time) show that a significant number of respondents change their party identification over short periods of time in a way that can, under some circumstances, alter the overall party distribution of the sample. Panel studies show that much of this change is into and out of the "independent" category from either the Republican or the Democratic category. Because the same individuals are involved at each stage of the panel study, differences cannot be attributed to sampling error. That people change party identification within a matter of days suggests that there are real short-term shifts in the PID distribution in the population that are result from either random "changing of one's mind" (again, among the same people) or measurement error caused by differential survey context.

Party identification often changes and shifts in surveys in the direction that real-world events would predict. After a party's convention, for example, PD shifts in the direction of that party. This underscores the assumption that short-term PID shifts can be real.

The Associated Press/Ipsos poll has provided additional evidence that PID is labile and subject to fluctuation based on short-term elements in the environment. This poll asks a random half of the sample its version of the PID question at the beginning of the survey, and the other half at the end of the survey. The results show there is often a significant difference in party identification when it is measured at the beginning of the same survey and when it is measured at the end of the survey. If party identification were an immutable demographic characteristic such as race, survey context should make no difference in the percentages who claim affiliation with each party. There may be some sampling error explanation for the differences between PID measures for the two split half samples, but AP/Ipsos analysis suggests that survey context most probably is a major factor in causing the differences.

The distribution of self-reported party identification has higher survey-to-survey variance than other demographic variables such as marital status, employment status, and income. This further underscores the "special" status of PID as a variable subject to types of survey-to-survey variation not found for other demographic variables.

All of this does not rule out that for many people, party identification has the same characteristics as a long-term demographic variable. Many people hold consistent party identification over time. But it is our assumption that the party identification measure also has the characteristics of short-term attitudes for a limited proportion of the population.

And this is enough to create the appearance of significant change. Even though party identification may be the equivalent of an attitude for only a relatively small proportion of the public, that proportion is large enough to cause the overall measure of party identification to fluctuate more than one would expect from a more stable characteristic, such as age and gender.

Weighting a sample to known demographic parameters is widely believed to be a beneficial procedure to correct for sampling errors in surveys. But all weighting decisions must balance the benefits of weighting against possible costs. Weighting has the serious effect of giving some respondents' opinions more weight and other respondents' opinions less weight than would be the case without weighting. This process changes the fundamental composition of the sample.

We are willing to do weighting if we know -- based on comparisons to official statistics -- that the sample is not perfectly representative of the larger population to which we want to generalize. But we are very cautious about doing this in other situations.

In summary, it is Gallup's assumption that party ID can vary from survey to survey for reasons beyond sampling error. These reasons include measurement error (for example, the effect of the questionnaire environment) and real population change.

There is no reliable measure of the distribution of party identification within the population, and to the degree that there is short-term population change in PID, there is no such thing as a stable population parameter of party identification. The distribution of party identification may vary in the total population from day to day, week to week.

Attempting to weight an entire sample based on a smoothed estimate of PID involves the impossible challenge of trying to isolate some proportion of the change in PID that results from sampling error and not "real" population change or simple measurement error. While weighting to a smoothed estimate could, in theory, help eliminate sampling error for a particular sample, it is not possible to know to what degree this is being done. Weighting to a smoothed estimate can also create more bias in a sample by changing that sample's overall composition in a way that a) incorrectly alters what is an estimate of a real change in the population or b) incorrectly alters an entire sample based on measurement error involved in one variable at the end of the survey questionnaire -- error that did not affect the measurement of variables included nearer the beginning of the questionnaire.

This is not to say that it is inappropriate to smooth the reporting of individual variables. Analysts may want to report a rolling average or other smoothed procedure in order to provide a longer-term perspective on the trends of a specific measure of interest. In other words, even if one assumes that survey-to-survey variation reflects real-world population change, one may want to look at data trends from a broader perspective. This effort to produce a smoothed estimate can be done for any given variable, including party identification.

This procedure, however, is best conducted on variable-to-variable basis. The rationale for smoothing one variable (for example, party identification) and then weighting all other variables in a dataset to that smoothed average is less defensible for the reasons enumerated.

Gallup is constantly studying its procedures on these important issues, and welcomes additional research and data that may suggest other ways of proceeding.

Author(s)

Dr. Frank Newport is a Gallup Senior Scientist and the author of Polling Matters (Warner Books, 2004) and The Evangelical Voter.