Methodology

Sampling
Response Rate
Weighting
References

This door-to-door survey was fielded from May 16 to July 19, 2005, resulting in 1,001 completed interviews. Interviewing was conducted by International Communications Research (ICR) of Media, Pennsylvania. All interviews were carried out by 104 interviewers working throughout the contiguous United States. Eligible respondents were household members, males or females, age 18 years old and older. Respondents were selected using the most recent birthday method. There was no substitution of respondents within household, and no substitution across households as well.

Sampling

The study used a classic cluster sample design (on survey samples, see Kish 1965). The objective of this design is to provide an approximate self-weighting, or epsem, sample of households across the continental United States. The sample was designed to specifically represent the adult population residing in occupied residential housing units, and by definition excluded residents of institutions, group quarters, or those residing on military bases. As a consequence of this sampling design, the margin of error for the survey is estimated to be ± 3.08 %.

In order to control and ensure that the sample selection process results in a representative selection of primary sampling units (PSUs), the sample frame, consisting of all residential units, was stratified by the four Census regions and metropolitan versus non-metropolitan status. Metropolitan statistical areas (MSAs) are defined by the Office of Management and Budget (OMB). MSAs are defined on a county basis, except in New England; so to maintain consistency, the alternative, county-based New England County Metropolitan Area (NECMA) definitions were utilized for that region. This scheme results in a four-by-two stratification matrix, which is shown in Table 1.

Table 1. Number of Primary Sampling Unit Per Stratum

STRATUM

Households

(000)

PSU’s

 

Northeast - Metro

19,901

17

Northeast - Non-Metro

2,265

2

North Central - Metro

19,269

18

North Central - Non-Metro

6,519

6

South - Metro

30,118

28

South - Non-Metro

9,037

8

West - Metro

19,710

18

West - Non-Metro

2,914

3

Within each primary stratum, all counties, and by extension every census tract, block group and household, were ordered in a strict hierarchical fashion to minimize sampling variance. Within each metropolitan stratum, MSAs and their constituent counties were arrayed by size (i.e., number of households). Within each MSA, the central-city county or counties were listed first, followed by all non-central-city counties. In the four non-metropolitan strata, states and individual counties within each state were arrayed in serpentine order, North-to-South, and East-to-West. Within county, Census Tracts and Block Groups were arrayed in numerical sequence, which naturally groups together households within cities, towns, and other minor civil divisions (MCDs).

The design assumed approximately five completed interviews per cluster and two clusters per PSU, with a total of 100 PSUs and a total of 201 clusters. Allocation of these PSUs across the eight strata was proportionate to households—this allocation is also shown in Table 1.

Selection of PSU’s within the eight primary strata was accomplished as follows: (1) An initial determination of selection stratum size was made by dividing the total number of households in the primary stratum by half the number of PSUs allocated to that stratum. (2) Any county with a household total within 75% of the overall total was designated as a self-representing PSU. (3) An adjusted figure was then computed removing the household totals of any self-representing PSUs. A revised selection by stratum size was then computed and “paper boundaries” constructed for the remainder of the primary stratum. These selection stratum boundaries were allowed to bisect all census boundaries, so that the selection strata were all of equal size within each of the eight primary strata. (4) Within each selection stratum, two PSUs (county or portion of a county) were selected at random using probability proportional to size (PPS) procedure. (5) Prior to PSU selection, small counties—those less than 5,000 households—were aggregated with the smallest neighboring counties until the 5,000 minimum was reached.

Within each sample PSU, two block groups (BG) were selected at random, without replacement, using a PPS procedure. All residential housing units within a sample BG were then identified using the U.S. Postal Service Delivery Sequence File (DSF) and one address selected at random. The next fourteen residential addresses were then identified, along with any intervening commercial, vacant, or seasonal units. The result was a designated walking list that was supplied to each interviewer, along with a map showing the exact segment location, streets, addresses, etc. The street/address listing typically captures about 98 % of all occupied housing units.

The street/address listings corresponding to the randomly selected starting point (housing unit) within each sample block group were secured and provided to the interviewer working each segment. As part of the fieldwork, the interviewer co-listed households not on the supplied pre-list, contacting both households on the supplied listing sheet and those identified and co-listed.

Interviewers were given street/address listings with 15 addresses, and were instructed to work the first ten pieces to a maximum of six callbacks. In order to properly manage the release of sample and strive to work all released sample to its maximum attempts, interviewers were asked to check in once they had attained five interviews or worked the first ten pieces to final dispositions or six active attempts, whichever came first. Throughout the field period the field director made daily decisions regarding whether each interviewer should continue working their first ten pieces or be provided more sample to work. Again, the overall goal was to attain a maximum number of attempts with as little sample as possible within a limited field period and an overall goal of approximately 1,000 completed interviews.

Response Rate

The overall response rate for this study was calculated to be 40.03 % using AAPOR’s (American Association for Public Opinion Research) Response Rate #3 formula1 (Table 2 reports the dispositions of all attempted interviews). The Cooperation Rate is 50.84 %. However, this sample was only designed to be representative of the English- and Spanish-speaking population in the United States. If we therefore omit the 87 potential respondents who do not speak either English or Spanish, then the response rate (according to the #3 formula) climbs to 41.47 %. Although it is worrisome that refusals are considerably larger than failures to contact anyone, most refusals (620) were at the household level, rather than at the level of a known-respondent (348). Generally, the response rate for this survey can be considered above average, although it is below the rate for the highest quality academic surveys (e.g., the General Social Survey, conducted by NORC).

Table 2. Outcomes of All Attempted Interviews

OUTCOMES

N

TOTAL PIECES WORKED

2,974

 

 

INTERVIEW (Category 1)

 

Full interview

1,001

 

 

ELIGIBLE, NON-INTERVIEW (Category 2)

 

Refusals

968

Language Problem

87

 

 

UNKNOWN ELIGIBILITY, NON-INTERVIEW (Category 3)

 

No contact

536

 

 

NOT-ELIGIBLE (Category 4)

 

No Such Address

15

Not a housing Unit

158

Vacant

153

No Access

96

The respondents were provided an incentive of $50 to participate in the survey. Extant research shows that giving incentives to respondents has no appreciable impact on their responses (e.g., Singer, Van Hoewyk, and Maher 1998).

Weighting

Although the sample was designed as approximately epsem, variations in primary stratum size, self-representing PSUs, along with variations between expected and actual sample size within segment, all result in the need for some minor weighting adjustments to achieve equal representation across the sample. The computation of these weights was a simple function of segment size and sample take, followed by a roll-up to the PSU-level, and then to the full stratum.

Two sets of weights were calculated, with 3 weights in each set (Design Weight, Post-Stratification Weight, and Composite Weight). The only difference between the two sets of weights is that the first set do not adjust for the number of persons in the household, whereas the second set of weights do make such a correction. The full set of weights are listed below:

Table 3. Full Set of Weights for the Survey

 

Not Adjusting for Household Size

Adjusting for
Household Size

Design Weight

DWEIGHTA

dweight (used in ESS)

Post-Stratification Weight

PSWEIGHTA

PSWEIGHTB

Composite Weight

NATWT

NATWTB

The Design Weights (DWEIGHTA and dweight) were calculated as follows. Within each primary stratum, selection strata were of equal size. Assuming the two PSUs and their associated two clusters each had equal sample takes, a simple expansion factor to the stratum total provided the Design Weight. When cluster sample sizes varied within a PSU, a weighting adjustment was required to compensate for this variation—basically, the actual sample size was normalized to the expected take (e.g., if a cluster resulted in four interviews it received an initial weight of 1.25). The weighted sums were then utilized to compute the normal expansion factor to total stratum size. Again, for one set of weights, an additional step was taken in that the strata adjustment factor was multiplied by the number of adults in the household. These weights were initially summed to the number of households in the contiguous U.S. , but were then factored down to sum to the total number of respondents, 1,001.

The Design Weights were then applied in a post-stratification weighting procedure that took into account any disproportionality of the completed interviews based on national U.S. Census Current Population Survey estimates across race/ethnicity, age, education, and gender.

In the past, post-stratification adjustments were accomplished by multiplying the count in each sub-group by a number called a weight, defined as the ratio of its proportion of the total population to its proportion of the sample. In order to apply the adjustment to all measurements derived from a dataset, the weight value was attached to each individual case. When a sample needed to be adjusted on multiple dimensions, the entire sample was divided into mutually exclusive cells corresponding to the possible combinations of sub-groups from each dimension, called cell weights.

Using cell ratios to weight a sample works well when the number of sub-groups to be adjusted is small, but when many dimensions are involved, the number of cells required quickly grows too large to compute weights using that method. For example, using cell ratios to adjust a sample on eight age groups, six income groups and four geographic regions would entail computing 192 (8 x 6 x 4) different cell weights. Furthermore, some of those sample cells may be empty, which would necessitate combining sub-groups in ways that might negatively affect the analysis.

An alternative to cell weighting is a method called sample balancing. Iterative proportional fitting, or IPF, is a widely accepted sample balancing technique originally developed by W. Edwards Deming and Frederick F. Stephan to adjust samples taken for economic and social surveys on selected demographic characteristics against data obtained from the U.S. Census. The theory behind IPF is explained in Deming’s book Statistical Adjustment of Data (1964). Details on the Deming-Stephan method are spelled out in Chapter VII: “Adjusting to Marginal Totals.”

IPF uses least-squares curve fitting algorithms to obtain a unique weight for each case that minimizes the root mean square error (RMSE) across multiple dimensions simultaneously. Then it applies these weights to the data and repeats the procedure using the newly obtained marginal counts to obtain yet another set of weights. This process is repeated for a specified number of iterations or until the difference in the RMSE between successive steps becomes less than a specific minimum value.

This study employed an IPF procedure using the statistical software, QBAL.2 Not only is QBAL “industry standard” software for sample balancing post-stratification, but it also allows for the application of a pre-existing weight to the input data for the sample balancing process. As such, the pre-weight described above was entered into the program, and post-stratification targets were entered for age, race/ethnicity, gender, and education based on 2004 CPS estimates. This process produced the composite weighting variables (NATWT and NATWTB).

Since the Post-Stratification weights (PSWEIGHTA and PSWEIGHTB) were by definition the difference between the Design Weights and the Composite Weights, these were calculated by simply taking the Composite Weight and dividing by the Design Weight, and then factoring to equal a sum of weights of 1,001.

Given that this multiplicity of weights can be confusing for many users, and since several of the weights are unlikely to be used at all, the dataset being released includes the following three weights: NATWT, NATWTB, and dweight. (If users would like to receive a version of the dataset that includes the other three weights, it is available upon request.) We strongly recommend that anybody who is conducting cross-national analysis of the U.S. in comparison to ESS data use dweight, since it corresponds to ESS weighting procedures (i.e., household size adjustment, but no post-stratification), and in general we would recommend using this weight, even for those working on the U.S. data only. The default weight on the dataset is therefore set to dweight, but users should feel free to change the weight variable as suits their needs and preferences.

 

References

American Association for Public Opinion Research.2004. Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys. [http://www.aapor.org/pdfs/standarddefs_3.1.pdf, accessed 9/22/2005].

Deming, W. Edwards. 1964 [1943]. Statistical Adjustment of Data. New York: Dover Publications.

Kish, Leslie. 1965. Survey Sampling. New York: John Wiley & Sons.

Singer, Eleanor, John Van Hoewyk, and Mary P. Maher. 1998. “Does the Payment of Incentives Create Expectation Effects?” Public Opinion Quarterly 62 (#2, Summer): 152-164.

1 See American Association for Public Opinion Research 2004.

2 See http://www.jwdp.com/files/qbguide.pdf [accessed 9/16/2005 ].