Looking Closely at Survey Response Rates

PRINCETON, NJ -- Survey response rates have become a hot topic. A number of articles written after the November midterm elections focused on Senate races in which pre-election polls did not pick the correct winner, and blamed these miscalls on declining response rates. Syndicated columnist Arianna Huffington, for example, has made it her business to argue that lower response rates invalidate the entire survey or polling process.

A lot of this discussion is misinformed and inaccurate. In fact, the pre-election polls this year were quite accurate in general; the issue of declining response rates has been studied and restudied for years by survey scientists; and to date there is little evidence that lower response rates per se are having an appreciably negative effect on the reliability or viability of survey results.

I think it's important to understand the exact implications of declining response rates for the overall quality of poll results.

Let's step back for a moment and look broadly at the entire survey research process. The sampling process used in polls is one that at its most basic involves proceeding from large groups to small groups -- in a series of steps. The largest group is the population whose characteristics the process is designed to estimate. The smallest group is the final sample from which interviews are collected. The first and primary step in the survey process is drawing the initial random sample from the population. This first, basic sample is only a fraction of the size of the population and is drawn such that it is statistically representative of the overall population.

Several steps occur between the drawing of the first sample and the determination of the final sample from which completed interviews are collected. The final sample is almost always a smaller subset of the first sample. The size of the initial sample compared to the size of the final sample is -- crudely -- the response rate. The focus of all the attention has been on this ratio, with the implication that the failure to complete interviews with everyone in the initially drawn sample discredits the final results.

But this difference between the number of elements in the initial sample and the number of completed interviews in the final sample isn't fatal to the survey process as long the principles of randomness and equal probability of selection are preserved at each stage of the process.

The basic sample is drawn with care and precision, designed in every way to be wholly representative of the population under consideration. Thus, it would be nice and simple if pollsters just completed interviews with each person selected in the basic sample. But this is almost never possible.

For one thing, basic telephone samples often contain nonworking numbers and businesses, and in today's environment increasingly include phone numbers used for faxes and computers and not for general household use. These numbers have to be thrown out. Second, it is often not possible to complete an interview with every household that falls into the basic sample because not every household can be contacted (the persons who live there, for example, may be on an extended vacation during the interviewing period, or they may be always on the phone). Third, there is the fact that some individuals contacted in a sample refuse to be interviewed.

Let's look at this process in detail. We have assumed that a small group (the initial sample) is selected randomly from the large population to which the results are to be generalized. So far, so good. That is step one in the "large to small" process. For purposes of illustration, assume for the moment that this initial sample consists of 1,000 households, selected using accepted principles of random probability sampling. The next step is to attempt to contact these 1,000 households. A percentage of them will fall out of the sample because they are businesses or nonworking numbers. That may reduce the sample down to, let's say, 800 numbers.

Then, of those 800, another 200 households may be unreachable in the time frame allocated by the researcher. Their phone numbers may always be busy, they may be numbers used for faxes or computer modems only, or household members at these numbers may use caller ID or other screening devices and refuse to answer if they don't recognize the calling party. Now the sample is down to 600 phone numbers. Assume that a live person answers at each of these 600 households. But perhaps 200 of those live persons will refuse to participate in the interview. This brings the sample down to 400 people with whom we complete the final interview.

So, we have moved from the target population (which in the case of the general population of America, consists of about 100 million households) to a sample of 1,000. From that sample of 1,000, we have moved to a sample of 800 because we had to throw out businesses and nonworking numbers. We reduced that sample of 800 further, to 600, as a result of being unable to reach 200 households. Finally, we ended up with 400 people with whom we actually completed interviews.

Theoretically, each of these reduction steps is OK. Our goal is to end up with a sample that is representative of the population, and this can be accomplished by maintaining the randomness of the process at each step. Statisticians tell us that a random sample of a random sample is still random.

Error can creep into the process at several points. If the initial sample is not truly representative of the population, then the whole process falls apart. And if the smaller, final sample is not representative of the initial sample, the process also becomes less valid.

But, to repeat, a low response rate does not necessarily mean that a survey's results are unrepresentative. The potential problems with survey accuracy are not intrinsic to a low response rate. The fact that a sample is reduced in size in going from an initial sample to the final sample is of little consequence as long as the opinions of those who answer the questions at the end are not systematically different from those who are unavailable or decline to participate. The potential problems occur if there is nonresponse bias, or a failure of the process at each step to include samples that are representative of the ones from which they are drawn.

Nonresponse bias can occur, for example, if a certain type of person is most likely to be infrequently at home or to refuse to be interviewed (say, those with the highest levels of education, or Republicans, or redheads, or women under age 30, or individuals with strongly skewed views on the topic under consideration). Again, the problem would not be because the final sample was smaller than the initial sample (a lower response rate), but rather because the people who ended up in the final sample were somehow different from those in the initial sample (nonresponse bias).

In theory, it seems logical that one way to reduce the potential for nonresponse bias is to take every step to ensure that all households in a basic sample wind up being included in the resulting sample of completed interviews. Some pollsters attempt to "work the sample to death" -- in other words, continue to call and call and call every number in the sample until an interview is completed with someone at that number. By doing this, the possibility that there is bias resulting from a lower probability of selection for those who are infrequently at home can be reduced. To put it another way, with a 20 call-back design (meaning that every number falling into the sample is called back at least 20 times in an effort to complete an interview), the young single person who is seldom at home gets a chance of falling into the completed pool (because with 20 calls, he or she will ultimately be caught at home) that is more equal to that of the older person who is almost always at home (and who is reached on the first call).

Additionally, special efforts can be made to "convert refusals" by assigning special interviewers who do nothing but work with people who are reluctant to be interviewed and persuading them to complete the interview. This helps reduce any bias that might result from having a certain type of person being more likely than another to refuse to participate.

Government contracts often include design specifications that call for these types of ideal processes: making high numbers of attempts to complete interviews with every number in the sample, attempting to convert refusals, and in general making every effort to get a completed interview out of every working residential number falling into the sample.

This takes an enormous amount of time and money, however. Some government surveys, for example, may be in the field for months in order to allow time for all of these procedures to take place. That is all well and good for some types of research, but if we are interested in gauging Americans' reactions to fast-breaking news events, keeping surveys in the field for 3 months just won't work.

Most polls are also weighted at the end of the process so that any obvious discrepancies between the sample's distribution on key demographic characteristics and the known distribution of these characteristics in the population can be adjusted for.

But a lot of these efforts may not matter. Perhaps the most important finding of all in relationship to the response-rate issue is this: There is not a lot of evidence of response bias in the process, no matter how hard survey scientists look. Most current research shows that lower response rates do not have nearly as much of an effect on survey results as might have been thought. The compromises brought into the process by the need to move quickly, and thus, to accept lower response rates, don't seem to seriously harm the quality or the representativeness of the data. Thus, what seems like a problem in theory is hard to prove in practice. Most studies show that the results obtained from final samples in which response rates are low are quite close to those that are found when efforts are made to increase response rates significantly. Additionally, despite the media focus after the November 2002 midterm election on the miscalls of certain senatorial and gubernatorial races, most pre-election polls with relatively low response rates were remarkably accurate in their estimates of the final, actual votes.

Pollsters continue to monitor the situation very carefully, but it is the belief of most professional survey researchers today that the negative impact of lower survey response rates sounds more dramatic than it really is.