Modeling and analyzing respondent-driven sampling as a counting process Academic Article uri icon


  • Summary Respondent-driven sampling (RDS) is an approach to sampling design and analysis which utilizes the networks of social relationships that connect members of the target population, using chain-referral. RDS sampling will typically oversample participants with many acquaintances. Naive estimators, such as the sample average, will thus be biased towards the state of the most highly connected individuals. Current methodology cannot estimate population size from RDS, and promotes inverse probability weighted estimators for population parameters such as HIV prevalence. We propose to use the timing of recruitment, typically collected and discarded, in order to estimate the population size via a counting process model. Once population size and degree frequencies are made available, prevalence can be debiased in a post-stratified framework. We adapt methods developed for inference in epidemiology and software reliability to estimate the population size, degree counts and frequencies. A fundamental advantage of our approach is that it makes the assumptions of the sampling design explicit. This enables verification of the assumptions, maximum likelihood estimation, extension with covariates, and model selection. We develop large-sample theory, proving consistency and asymptotic normality. We further compare our estimators to other estimators in the RDS literature, through simulation and real-world data. In both cases, we find our estimators to outperform current methods. The likelihood problem in the model we present is separable, and thus efficiently solvable. We implement these estimators in an accompanying R package, chords, available on CRAN.

publication date

  • March 1, 2017