Lee noted that participants were very different in Detroit and a combination of St. Clair and Macomb. They differed in age, race, education level, employment level, and homeless experiences. Respondents also reported very different substance use. In Detroit, heroin was the drug of choice; in St.
Clair and Macomb, other types of drugs were used in addition to heroin. In the second example of respondent-driven sampling, the Health and Life Study of Koreans, the target population was foreign-born Korean. American adults in Los Angeles County and in Michigan.
- Why Hidden Populations Are So Hard to Count.
- About this Course?
- Nancy Bates (Editor of Hard-To-Survey Populations).
- Bestselling Series;
- Epilepsy and the Corpus Callosum 2.
Korean Americans make up about 0. Unlike injection drug users, Korean Americans are rare, but not a highly stigmatized group. Frequently when immigrants come to the United States, they develop ethnic enclaves. These social networks are quite important to them. As a result, Lee and her colleagues thought that respondent-driven sampling might work for studying Korean Americans. This survey was conducted on the web.
Participants could visit the website 3 to learn about the study and agree to participate in the survey. Each potential participant was provided with a unique number used to monitor participation. Incentives were provided using bank checks as a way to make sure no one took the survey more than once. The target sample size was As of January , the study ongoing at the time of the workshop had about completes.
What is unique is that some benchmarks about foreign-born Koreans are available from the American Community Survey ACS , so sample estimates from respondent-driven sampling can be compared to ACS estimates. The study started with formative research, an important part of respondent-driven sampling.
Because the target population is likely to be unfamiliar to researchers, formative research is a way for researchers to understand the community and how to approach it. Lee and her colleagues conducted three rounds of focus groups, with just over 30 participants in total, with two groups in Korean and one in English. The discussion focused on the purpose of the study, respondent-driven sampling, and the use of coupons. Participants discussed these issues and provided input on the incentive levels to use in different components of the study.
From these discussions, Lee said, it became clear that researchers had to be very clear in describing the study purpose. They understood the concept of respondent-driven sampling, and from the focus groups it seemed each participant recruiting two other people was realistic. Focus group participants said that there should be no incentive for recruiting, contrary to guidance in the respondent-driven sampling literature. Researchers decided to use two coupons that expired within 2 weeks.
Lee said that they started with 12 seeds in Los Angeles in June Seeds were recruited through referral and selected to have balance on gender, age, and dominant language. Researchers conducted in-person meetings with each seed, describing the importance of the project and inviting them. After 2 weeks there were few new participants. Lee reported that as of January , they had completed interviews from seeds in Los Angeles.
In Michigan, they had completed interviews from 85 seeds. Lee observed when seeds do not recruit other participants, the chain stops at the seed. In situations like this, it means some chains are very short. In this situation, the memoryless assumption is unlikely to hold. Lee provided comparisons between estimates computed five different ways from this study and estimates from the ACS. The first estimator was the unweighted mean. The fourth estimate was the unweighted estimate post-stratified by known population totals for age, gender, and education.
She observed that the unweighted estimates were quite far from the benchmark ACS. The sample recruited for the web survey was younger, more highly educated, and more likely to have limited English proficiency than the ACS benchmark. More surprising to researchers was that the web survey respondents had more issues with activities of daily living than did the ACS benchmark. The post-stratified RDS-II estimator was improved for age, gender, and education the post-stratifying variables , but for the remaining variables some were improved and some not.
In summary, Lee said that the main conclusion is that noncooperation—not recruiting other people—is a problem for generating long chains, which brings the memorylessness property into question. It requires improvisation to make respondent-driven sampling work. In addition, sample size hence, chain length is a random variable in respondent-driven sampling.
As a result, inference is quite limited. The benefit of using respondent-driven sampling is to recruit people who are typically hard to recruit. However, noncooperation must be addressed to meet theoretical assumptions of respondent-driven sampling, and this has yet to be addressed in the literature. Patrick Sullivan quoted from an issue of the Lancet 4 about the difficulty in addressing HIV in much of the world where men who have sex with men MSM are in danger if their sex lives are exposed. He observed. July 20, This speaks to the evolution of trying to sample MSM for health issues.
Young men of color are a critical part of the expanding HIV epidemic. Sullivan said researchers may want to reach MSM for HIV prevention research, but it is also a population with significant health disparities with respect to mental health, cancer, substance use, and smoking. He described three ways to reach these men: 1 venue-based sampling, 2 online sampling, and 3 virtual venues like sex-seeking apps. Until about 20 years ago, same-sex behavior was criminalized, and bars and sex venues were the places to find gay men who often were not open in other locations. Sampling in these venues created a bias toward men who might have a higher level of sexual activity.
With decreased stigma and improvements in human rights and laws, gay men are more integrated into U. The process first involves formative work to enumerate venues. Venues are places where during a 4-hour period, at least 8 members of the target population are available.
- Breast Cancer Genes and the Gendering of Knowledge.
- Hard-to-survey populations - Institute for Social and Economic Research (ISER)!
- 7 Lessons Learned About Recruiting Hard-to-Reach Populations for Qualitative Research!
In a recent study in Atlanta, where they were recruiting MSM, they identified such venues using a threshold of 30 men per time period. These enumerations need to be validated by stopping suspected members of the target population, asking about demographic characteristics, age, and other information. This universe of venues is a first-stage sampling frame.
Each venue is asked about time periods when it would be likely to find the threshold of. Surveillance of HIV risk and prevention behaviors of men who have sex with men: A national application of venue-based, time-space sampling. Public Health Reports, Each venue is assigned certain time periods when the threshold would be met. A sampling calendar is developed with venues as primary sampling units and time periods as secondary. Within a sampled venue, the flow of men across a specific point is observed and every nth man is approached.
Known as systematic flow-based sampling, the result is a sample that combines cluster sampling with flow-based sampling at the final stage. This process has been used every 3 years since , with a cycle completing in It decreased in young white MSM. Although the venues change from cycle to cycle and the venue sampling frame is refreshed for each cycle in each city, the methods have been consistent. Sullivan noted in the surveillance reports, CDC does not do weighting or adjustment; they minimize biases by having a consistent process but report essentially raw numbers.
In academic publications, they account for clustering by venue and then adjust for differences in other demographic characteristics over time. Sullivan described work to evaluate the validity of the sampling frame of venues using a geospatial sex-seeking app.
Delaney 7 used it to prepare maps of Atlanta showing the density of black and white MSM listed on the app. At each selected point on a grid, Delaney determined the radius of a circle, centered on himself, that would contain 50 sex-seeking men. These circles were used to estimate the relative density of MSM, separately for black and white men.
This analysis identified a major area of high activity that was missing venues on the sampling frame, the Atlanta University Center, the site of four historically black colleges and universities. Sullivan started his discussion of online sampling approaches by illustrating an evaluation of bias. Age-specific race and ethnicity disparities in HIV infection and awareness among men who have sex with men: 20 U. The Journal of Infectious Diseases, Using a geolocation social networking application to calculate the population density of sex-seeking gay men for research and prevention services.
Journal of Medical Internet Research, 16 11 :e Bias in online recruitment and retention of racial and ethnic minority men who have sex with men. Journal of Medical Internet Research, 13 2 :e The motivation was a review of data from online studies that found black and Hispanic MSM were underrepresented in studies relative to their prevalence in the population of New York City. One of their hypotheses was that visual features of an ad on the site might influence the probability that someone responds to it.
They developed two ads with white models, two with black models, and two with Hispanic models to test the hypothesis that the ads could help target under-recruited groups by matching ads with demographic groups. They also evaluated incomplete surveys, demonstrating that completion rates are highest for white men and lowest for black men, resulting in potential bias. Sullivan noted further work is needed to find incentives, including nonmonetary incentives, to increase participation rates and reduce bias.
AMIS recruits from four different online approaches: 1 general social networking such as Facebook or Twitter , 2 general gay interest such as politics, advocacy, or style , 3 gay social networking, and 4 sex-seeking apps. In the past few years, they have included in the sample some men who took the survey in a previous year and expressed interest in participating again. The question is open whether mixing or changing recruitment sources might change bias over time and impact time trend analysis. According to Sullivan, recruiting approach matters.
For example, MSM recruited through a sex-seeking app were more likely to have had an HIV test, to have had an STI test, to be living with HIV, and to report condom-less anal intercourse, and were somewhat more likely to use marijuana. As a result, in preparing survey results, AMIS uses standardization to a general population to look at time trends and compare estimates over time. Comparisons between estimates made with different approaches help to understand where they are the same and where different.
He said online sampling can be viewed as a complement to other methods, and at a high level it seems to give trends consistent with other sampling methods. Sullivan provided one more comparison. One question they wanted to address was how men recruited through Facebook differed from those recruited through venue-based sampling in terms of HIV prevalence, STI prevalence, retention in the study, and risk behaviors. The results indicate that for most of the outcomes, the men recruited through Facebook had the same outcomes as those recruited through venue-based sampling.
Overall, he said, the two methods are complementary and, in this case, Facebook and venues were two different access points to largely similar populations. Sullivan stressed MSM are the major risk group in the U. HIV epidemic and also have other health disparities. Venue-based sampling has the advantage of being a systematic approach. Expanding the types of venues beyond bars would eliminate some of the biases of just going to sex partner—meeting venues.
Online sampling can also be used to access MSM. However, black and Hispanic men are under-recruited, and have greater retention loss than white men. These biases need to be addressed. Krista Gile began by summarizing the sampling approaches presented. She reminded the audience that the goal of identifying and finding members of small populations is to make statements about the whole population. To do this, researchers need to think about statistical issues, including 1 the size of the target population, 2 the population proportions of characteristics of interest, and 3 the associations between variables or multivariate results.
Many of these methods do not address quantifying uncertainty well. If uncertainty can be quantified, confidence intervals can be prepared and hypotheses tested. Quantifying uncertainty is important to consider when considering which methods to use. Gile compared the sampling frames of the four methods discussed probability sampling, respondent-driven sampling, venue-based sampling, and online sampling. A key point for probability sampling is to start with a sampling frame, such as a list of people.
All probability samples start. The comparability of men who have sex with men recruited from venue-time-sampling and Facebook: A cohort study. Journal of Medical Internet Research, 3 3 :e As discussed by Lee, Gile said, respondent-driven sampling starts by selecting seeds, and the seeds lead to web-like network samples. The ultimate sample depends on who the seeds are, how they were selected often by a convenience mechanism , and their network within the target population.
Who ends up in the sample depends on where the process started. She expressed hope that improvements in implementation and inference and the property of memorylessness will ultimately overcome this negative aspect of respondent-driven sampling,. In venue-based sampling, the population is divided into those who might be found at venues of interest and those who are not. There might be individuals who show up in multiple venues or in no venues at all. The basic sampling unit is a venue-time unit. In these settings, it is important to think about who is excluded and who may be overrepresented.
Gile noted online sampling, through different websites or ads, might reach different parts of the target population. The ad might be displayed to individuals in different ways. The question remains who sees that ad and, critically, who is going to click on it. Gile compared the four methods on different points: elements of formative research and rapport, setting up the sampling frame and what is known about sampling rates and decisions about participation, methods for statistical inference point estimates, confidential intervals , dependence between sampled individuals, and populations not suitable for each of the methods.
In these situations, the literature is quite extensive. For respondent-driven sampling, formative research is also important to learn about the target population, help to select diverse seeds, and get buy-in from the community.
PROBABILITY SAMPLING METHODS FOR SMALL POPULATIONS
In the end, the researcher wants a small number of seeds, and from there hopefully the study will spread. Formative research is also needed in online sampling. Everyone who answers survey questions, particularly within a sensitive group, is giving researchers time and information about her or his truth. The more trust, the better the quality of information provided and the more likely researchers are to get the answers they want. Gile noted that as a statistician, she does not usually think about these issues, but would rather see data from a place where people are thinking about how to authentically connect with the target population, which questions are relevant and will be answered well, and who is engaged in participating in the survey.
These factors influence the quality and completeness of the data. In particular, she said, respondent-driven and venue-based sampling require large amounts of trust. Researchers need to find seeds for respondent For venue-based sampling, formative research is needed to find venue-times and develop relationships. All sampling methods are helped by knowing and having close connections with the target population. Gile commented that a survey can observe only the people within the sampling frame. A key question is who is in and who is out.
In a probability sample, the sampling frame hopefully covers everyone, although coverage needs to be assessed. In respondent-driven sampling, it is assumed that people are connected by a network, and that their self-reported number of ties reflects their rates of inclusion in the study. In venue-based sampling, the assumption is that people from the target population frequent the places sampled. In online sampling, people must visit the particular websites. If the differential sampling rates of people in the target population are known, estimates can be adjusted for the fact that different people within the frame are more or less likely to be sampled.
If those rates are not known, creative work is needed. It is important to think about what is driving differential sampling rates. In a probability sample, the design controls the different rates of inclusion for the different people in the sampling frame.
Satellite images cut survey costs and help identify hard-to-reach populations — CTMGH
This allows for more straightforward and clean-cut inference, which is why probability sampling is the gold standard for survey research. In venue-based sampling, sampling probabilities might also depend on the extent of venue use. In online sampling, the website of interest and clicking on an ad are the two distinct features.
In venue-based and online sampling, there is discussion but no consensus on how to determine sampling weights or members of the population. In many probability samples with in-person interviewers, a potential respondent is approached by the interviewer. That person may refuse to participate, but the interviewer has the proximity for helping him or her to decide whether to be in the study.
Similarly, in venue-based sampling, the potential respondent is approached by an interviewer. With respondent-driven sampling, coupons are passed out by other participants in the study. Researchers do not know how many people were approached and declined to participate. Similarly, with online sampling, decisions happen in privacy, and the researcher has no idea what goes into that process. The methods for statistical inference rely on sampling probabilities.
The probability sample, when it can be done, enables many powerful things with statistics. Respondent-driven sampling has many methods for inference, but requires many assumptions. With venue-based sampling, the sample is drawn at the first level on venue times. However, inferences are desired for the population of people. Gile observed that it is unclear the sample of venue-times is extrapolated to inference for a population of people. With online data, perhaps post-stratification would be possible, but post-stratification requires valid reference data. Low dependence between sampled individuals means that when a new individual is sampled, she or he will provide a great deal of additional information.
With high dependence between sampled individuals, the information may be similar to the information already collected. With venue-based sampling, there might be similarities among the people who frequent that venue during that time. Similarly, in respondent-driven sampling, the people recruited may be similar.
As a result, each additional person surveyed is going to provide less additional information than a process that involves independent samples. If there is no suitable sampling frame, probability sampling cannot be used. Gile noted Elliott provided clever examples of how to do probability samples, but if a frame cannot be defined, it is not possible.
Respondent-driven sampling needs people who are well connected by a network. Venue-based sampling needs people who congregate in a physical place. Online sampling needs a population with online activity who are likely to click on an ad. Gile said probability sampling allows for straightforward and valid inference in a wide variety of settings. If probability sampling is feasible, it is preferred for this reason.
Respondent-driven sampling is good at reaching unknown parts of the population and allows for approximately valid inference. Studies have shown that some respondent-driven sampling can get to people who might not have been reached by other methods. Venue-based sampling presents a valid sampling frame based on times and locations and avoids many biases that might occur in more subjective sampling methods.
Finally, online sampling offers great ease of implementation and tremendous cost-benefit over the other approaches. Gile posed a few questions for discussion. How can sampling weights for venue-based samples be estimated? How can the missing in online surveys be monitored? How can multiple methods for surveying a population be used? However, researchers and public health officials do not just want to find another person; they need to be fair and careful to not disadvantage certain populations.
This may mean that some of the methods used and developed seem less powerful on the surface than some of those used by other actors, but it is important to be responsible to constituents. Gordon Willis NCI asked Gile about the choice of sample design for small population studies based on study objectives, and in particular whether the study involves estimation of population frequencies as opposed to identifying associations in the data.
For example, if he wanted to know the unemployment rate in a particular population, he would lean toward a population-based probability sample approach.
Surveying immigrants without sampling frames – evaluating the success of alternative field methods
On the other hand, if he was assessing a stop-smoking intervention known to be effective in one population, it might be preferable to use a more limited and less expensive nonprobability approach to assess whether the intervention is effective in another small population. Gile said the method used should be dependent on what one wants to learn.
Power analysis and sample size calculations are intended to be used for this. Calculations can determine whether a larger or smaller sample can be used. If only a rough answer is needed, a more basic approach may be fine. If a very precise answer is needed, researchers need to do something more precise.
Elliott added that another way to reframe the question would be to ask whether a regression coefficient is less biased than a population prevalence estimate when using a nonprobability sample. He said studies that have addressed this question suggest there is probably less bias on average in estimating the effectiveness of an intervention using a regression relationship. He cautioned that using a nonprobability sample to test effectiveness still has a chance of bias. Sullivan pointed out that for some populations there are no sampling frames, so probability sampling is not possible.
Respondent-driven sampling arose because of hidden populations. In some situations, nonprobability methods may be the only choice. He also observed that in some situations it is more important to monitor changes over time in health behaviors. In these situations, it may be better to compromise some of the accuracy of point estimates to make sure methods are sufficiently replicable to identify changes over time.
Graham Kalton pointed out two large projects with issues of generalizability: the UK Biobank cohort study that has enrolled around , people aged 40 to 69 in selected areas in the United Kingdom with a very low response rate, and the planned All of Us study in the United States that will enroll about 1 million volunteers. In his view, the argument that such studies can be used for measuring associations needs to be treated with due caution and evaluated. He questioned whether a seed or recruiter would use a random method. However, it was an assumption that respondent-driven sampling relied on when it first started.
She acknowledged it is tricky to check these assumptions because researchers frequently do not know much about the target population. Gile agreed in many cases the assumption about recruitment being random is violated in respondent-driven sampling. One of her former students is working on an estimator that adjusts for differential recruitment effectiveness. Robert Croyle NCI asked about a hypothetical grant process where an application proposes something that is really a census.
Roger Tourangeau is a Vice President at Westat. Timothy P. Kirk M. Subscribe now to be the first to hear about specials and upcoming releases. Title Author.
Surveying Hard-to-Survey Populations Despite the Unfavorable Environment
Hard-to-Survey Populations. Description of this Book Surveys are used extensively in psychology, sociology and business, as well as many other areas, but they are becoming increasingly difficult to conduct. It will be relied upon by both novices and experts. This preview is indicative only. The content shown may differ from the edition of this book sold on Wheelers. My Account Sign in Register. Out of Print. Pre-release title. On Special.