INTEGRATING PROBABILITY AND NONPROBABILITY SAMPLES FOR SURVEY INFERENCE

Survey data collection costs have risen to a point where many survey researchers and polling companies are abandoning large, expensive probability-based samples in favor of less expensive nonprobability samples. The empirical literature suggests this strategy may be suboptimal for multiple reasons, among them that probability samples tend to outperform nonprobability samples on accuracy when assessed against population benchmarks. However, nonprobability samples are often preferred due to convenience and costs. Instead of forgoing probability sampling entirely

evaluate supplementing inferences based on small probability samples with prior distributions derived from nonprobability data. We demonstrate that informative priors based on nonprobability data can lead to reductions in variances and mean squared errors for linear model coefficients. The method is also illustrated with actual probability and nonprobability survey data. A discussion of these findings, their implications for survey practice, and possible research extensions are provided in conclusion. KEYWORDS: Bayesian inference; Data integration; Online panels; Quota sampling; Web surveys.

INTRODUCTION
For more than a decade, the survey research industry has witnessed an increasing competition between two distinct sampling paradigms: probability and nonprobability sampling. Probability sampling is characterized by the process of drawing samples from a population using random selection, with every population element having a known (or knowable) nonzero inclusion probability. In contrast, nonprobability sampling involves some form of arbitrary selection of elements into the sample for which inclusion probabilities are unknowable (and possibly zero for some population elements). Both paradigms have strengths and weaknesses. The primary appeal of probability sampling is its theoretical basis in design-based inference, which permits unbiased estimation of the population mean along with measurable sampling error. However, in practice, unbiased estimation is not assured as response rates in probability surveys can be quite low. Another challenge of probability sampling is the need for large sample sizes for robust estimation, which can be problematic for survey organizations working with small-to medium-sized budgets.
A key advantage of nonprobability sampling, relative to probability sampling, is costs. Nonprobability samples can be drawn and fielded in a number of relatively inexpensive ways. The most common way is through volunteer web panels where survey firms entice large numbers of volunteers to take part in periodic surveys in exchange for money or gifts (Callegaro, Baker, Bethlehem, Göritz, Krosnick et al. 2014). However, nonprobability sampling has limitations. For instance, the lack of an underlying mathematical theory akin to probability sampling is problematic with respect to achieving accuracy and measuring uncertainty (sampling error) for estimates derived from nonprobability samples (Baker, Brick, Bates, Battaglia, Couper et al. 2013). Most benchmarking studies show that nonprobability samples tend to be less accurate than probability samples for descriptive population estimates (Malhotra and Krosnick 2007;Chang and Krosnick 2009;Yeager, Krosnick, Chang, Javitz, Levendusky et al. 2011;Blom, Ackermann-Piek, Helmschrott, Cornesse, and Sakshaug 2017;Dutwin and Buskirk 2017;MacInnis, Krosnick, Ho, and Cho 2018;Pennay, Neiger, Lavrakas, and Borg 2018). Multivariate estimates (e.g., regression coefficients), on the other hand, tend to be less susceptible to discrepancies between probability and nonprobability samples (Ansolabehere and Rivers 2013;Pasek 2016).
A variety of methods have been proposed to improve the accuracy of nonprobability sample survey estimates, such as quota sampling, sample matching, and weighting techniques. These methods attempt to adjust the composition of the nonprobability sample to that of a reference probability sample or population figures based on a set of adjustment variables (Lee 2006;Rivers 2007;Lee and Valliant 2009;Rivers and Bailey 2009;Valliant and Dever 2011;Ansolabehere and Rivers 2013). These methods generally assume that the adjustment variables-which often comprise demographic characteristics-explain the selection mechanism that led to inclusion in the nonprobability sample. This can be a strong assumption if the target variable of interest subject to selection bias is not strongly related to the adjustment variables. Moreover, these methods do not solve the problem of quantifying uncertainty in nonprobability estimates, which is why measures of variability are infrequently reported alongside nonprobability survey estimates.
Because neither sampling paradigm is a panacea, efforts have been undertaken to combine both probability and nonprobability samples to produce a single inference that compensates for the limitations of each standalone paradigm. Elliott and Haviland (2007) evaluate a composite estimator to supplement a standard probability sample with a nonprobability sample. They show that the estimator, based on a linear combination of both sample types and a bias function, can produce estimates with a smaller mean squared error (MSE) relative to a probability-only sample. DiSogra, Cobb, Chan, and Dennis (2012), with further enhancements by Fahimi, Barlas, Gross, and Thomas (2014), propose a method they coin "blended calibration," where a probability sample weighted to known population totals is combined with an unweighted nonprobability sample, and the combined sample is calibrated to differentiator variables in the probability-only sample. The authors present evidence that this method produces estimates with smaller bias and MSE relative to more traditional sample adjustment methods. Elliott (2009) (see also Elliott and Valliant 2017) proposes a pseudo-design-based estimation procedure that uses a probability sample to estimate pseudo-inclusion probabilities for elements of a nonprobability sample. Both samples are then combined to derive estimates that are shown to have improved accuracy and smaller MSE compared with estimates derived from a probability-only sample.
A limitation of these studies is the necessity of a large probability sample to produce robust calibration weights or pseudo-inclusion probabilities. Given the high costs of probability samples, it is unlikely that survey firms can always afford to field a large probability sample in parallel with a nonprobability sample. A more practical scenario-one we consider in the present investigationis to field a small probability sample survey and integrate it with a parallel nonprobability sample survey to improve the efficiency (i.e., reduce the variance) of the small sample estimates. Such a strategy is commonly used in small area estimation where relatively inexpensive auxiliary information collected from external data sources is used to improve the efficiency of estimates derived from small samples for specified domains (Rao 2003).
In this article, we consider a Bayesian approach for integrating a relatively small, presumably unbiased, probability sample with a parallel, relatively less expensive, but possibly biased, nonprobability sample. The Bayesian framework offers several advantages in this context. First, the Bayesian framework offers a natural apparatus for integrating data from different sources with varying levels of quality. Second, the Bayesian framework provides an intuitive structure that incentivizes the higher-quality data (in our case, the probability sample survey data) by giving decreasing weight to the lower-quality "prior information" (in our case, the nonprobability sample data) as the number of high-quality (i.e., probability sample) observations increases. Further, the Bayesian framework is capable of quantifying the uncertainty in estimation-a feature which is currently lacking for estimates based on nonprobability samples. A related idea of integrating probability and nonprobability samples is also explored in Sakshaug, Wi sniowski, Perez Ruiz, and Blom (2019) who describe a simulation-based approach rather than the simple and direct analytical derivations presented in this article.
The aims of this research are threefold. First, we evaluate the extent to which informative nonprobability-based prior information can reduce the variability of regression estimates derived from small-and medium-sized probability samples. Further, we evaluate the trade-off between variance and bias, particularly when the nonprobability samples are subject to large biases. Lastly, we evaluate the MSE of estimates for different sample sizes and bias parameters to determine whether the method is likely to be practically useful from an error perspective. The evaluation is performed using simulation studies and a real-world application involving a probability sample web survey and eight nonprobability sample web surveys conducted in parallel by different survey vendors using the same questionnaire.
The remainder of the article is organized as follows: In section 2, we describe the methodology and notation. In section 3, we describe the setup and present results of the simulation assessment. In section 4, the data sources are described and results of the application are presented. Final conclusions and limitations of the method are discussed in section 5.

METHODOLOGY AND MODELING APPROACH
We consider the situation in which the researcher's primary interest is to produce estimates of effect sizes for covariates specified in a model for predicting a certain outcome variable. For that purpose, we employ a linear regression model in which we estimate the coefficients. Let us denote the n Â 1 vector of the response variable by y ¼ ðy 1 ; . . . ; y n Þ T and an n Â p design matrix X ¼ ½x 1 ; . . . ; x p containing the fixed and nonstochastic covariates that are of substantive interest. We assume that y $ NðXb; s À1 I n Þ; (1) where b ¼ ðb 1 ; . . . ; b p Þ 0 is a column vector of length p, s is a precision (inverse variance, s ¼ 1=r 2 ) parameter, and I n is the n Â n identity matrix. This specifies the likelihood pðyjb; s; XÞ for the outcome variable.
For a fully Bayesian model specification, we require a prior distribution for model parameters b and s in (1) to produce a marginal posterior distribution for the coefficient of interest, which might be one (or many) of the elements belonging to vector b. The posterior, pðb; sjy; XÞ, is obtained by applying Bayes theorem (Bayes 1763): pðb; sjy; XÞ / pðyjb; s; XÞ Â pðb; sÞ; (2) where pðb; sÞ ¼ pðbjsÞpðsÞ is the prior distribution for the model parameters. This specification is scale-dependent because the prior for b coefficients depends on the scale of the outcome variable as measured by s. We employ a conjugate prior specification, that is, a probability distribution that leads to the posterior distribution belonging to the same distributional family as the prior (Carlin and Louis 2008). The conjugate prior for pðsÞ is a gamma distribution with shape parameter a and rate b: pðsÞ / s aÀ1 e Àbs : (3) For a ¼ b ¼ 0, this distribution reduces to Jeffrey's noninformative invariant prior (Jeffreys 1946;Zellner 1971), which can be interpreted as an improper uniform distribution for log r. This noninformative prior for precision is used throughout this article. The conjugate prior for the regression coefficients vector b is a multivariate normal distribution: where subscript p denotes the length of the vector b, k 0 is a scalar, and V is a p Â p matrix of hyperparameters. This specification is an extension of the version presented by Ntzoufras (2011) as it allows a separate specification of the matrix and scalar scaling factors for the prior variance of the regression coefficients. Then, the joint posterior distribution is of the normal-gamma form (Ntzoufras 2011, p. 11). Our interest is in the marginal distribution of b, which is a multivariate noncentred t distribution with expectation and variance RSS ¼ ðy À XbÞ T ðy À XbÞ; where I p is the p Â p identity matrix. Equation (7) is a maximum likelihood estimator (MLE) for the linear regression coefficients. Matrix W in (6) can be viewed as a "weight" assigned to a data-driven MLE and I p À W a weight for the prior. If the scaling factor, k 0 V, is very large (implying large prior variance for b), the weight assigned to the prior mean l b will be very small. Alternatively, if the prior for b is relatively tight, more weight is assigned to the prior mean. The RSS in (10) is the residual sum of squares in the classical regression analysis, whereas SS is the sum of RSS and a measure of the distance between the MLE for b and the prior mean (Ntzoufras 2011, p. 13). The marginal posterior for the precision is a gamma distribution with shape parameter n=2 þ a and rate SS=2 þ b. With a noninformative Jeffrey's prior, i.e., a ¼ b ¼ 0 in (3), the expected value of the posterior is equivalent to the MLE for precision.

Specification of the Prior Distributions
We consider five specifications of the prior in (4) for the regression coefficients. The first one is a reference specification, hereafter referred to as "noninformative," in which we "let the (probability) data speak for themselves". That is, we assume This specification leads to a posterior (2) with the expectation of b in (5) being with W ¼ X T X þ ð1=nÞI p À1 X T X. This prior allows a probability sample of size n to dominate the inference. With n becoming larger, the posterior converges to the MLE. Interestingly, the expected value of the posterior (equation 12) is a ridge estimator (Hoerl and Kennard 1970;Amemiya 1985). By using (8) and (9-10), the posterior variance of b is fRSS þb T ½ðX T XÞ À1 þ nI p À1b g; (13) which, for sufficiently large n, yields values equivalent to the MLE variance. The second prior specification, denoted as "conjugate," is informative in the sense that it utilizes a nonprobability sample to inform the posterior. Here, we assume 1ðp Hotelling < 0:05Þ þ 1 n NP 1ðp Hotelling ! 0:05Þ; whereb NP is an MLE of the coefficients in a regression model based on a nonprobability sample of size n NP , and 1ðwÞ is an indicator function taking the value one if w is true and zero otherwise. Additionally, p Hotelling is a p-value from a Hotelling T 2 test for equality of two vectors (Anderson 1984;Khuri 2003, p. 48). The Hotelling T 2 test is a multivariate generalization of the Student t test. In our application of the test, the null hypothesis is H 0 :b NP ¼b, and we assess whether the two vectors of coefficients are equal. The implicit assumption is that the results based on the probability sample are unbiased with respect to the target population. Hence, if the difference between all coefficients based on the nonprobability sample,b NP , is statistically significant at the 5 percent level (p Hotelling < 0:05 andb NP is "biased" with respect to the probability sample-basedb), then the prior variance for b is relatively large: the scaling factor k 0 is 1= log n NP , which is larger than the 1=n NP proposed for the cases when the difference between probability and nonprobability coefficients is not statistically significant (p Hotelling ! 0:05).
The rationale for using one of the two scaling factors is that we prefer to avoid any potential biases that might be present in the nonprobability samples. Again, the bias considered here is assessed in relation to the probability sample. If the nonprobability MLE coefficients show significant differences in comparison with the probability ones, then the prior variance will not tend to extremely small values with an increase in nonprobability sample size. On the other hand, when the MLE coefficients from the nonprobability sample are unbiased compared with the probability sample, then as the nonprobability sample size increases, the smaller the prior variance becomes, which, in turn, allows further shrinkage of the posterior variance of b, comparing with (13). Our approach to assessing bias in coefficients is dictated by typical applications, in which practitioners rarely have information about the gold standard against which they could assess bias. However, if such information is available, then the procedure can easily be modified to allow setting k 0 depending on the true bias of the nonprobability coefficients and correcting for it in the prior mean l b .
The third specification of the prior is a Zellner's g-prior (referred to as "Zellner;" see Zellner 1986;Liang, Paulo, Molina, Clyde, and Berger 2008) where the Fisher's information matrix (inverse covariance matrix of covariates) is used to rescale the prior variance. This permits accounting for the scale and correlation structure of the covariates in the posterior, which might be relevant if strong multicollinearity is present and various scales are used for different variables. The prior is specified with hyperparameters where X NP is a matrix of covariates from the nonprobability sample, and k 0 is a scaling hyperparameter g as in the original specification. Zellner's g-prior is a convenient and widely used prior in the linear regression setting, mainly due to its simplicity, computational efficiency, and understandable interpretation (Liang et al. 2008, p. 411). It requires selecting only one hyperparameter, k 0 , as a measure of prior dispersion of the model parameters. Various recommendations for this choice have been proposed, including k 0 being set to the sample size, squared number of model dimensions, or values obtained from using empirical Bayes methods (Liang et al. 2008, p. 413).
To create the prior based on nonprobability data, we propose using the scaling factors V and k 0 that depend on the potential bias in the MLE of the coefficients based on nonprobability sample data, which is assessed against the MLE based on the probability sample data. The two scaling factors are a function of the nonprobability design matrix and sample size, respectively.
We suggest using the nonprobability-based MLE for the prior mean and the Fisher's information matrix of the nonprobability covariates X NP for V. In the case of bias in the MLE estimates, we recommend using k 0 ¼ n 2 NP which implies smaller variance and a more informative prior compared with, for example, probability sample size n as recommended by Kass and Wasserman (1995). When bias is not present in the nonprobability MLE, then the even more informative prior depends only on the Fisher's information matrix from the nonprobability sample and k 0 ¼ 1.
The next two informative prior specifications are variations of the previously described conjugate and Zellner's priors. The key part of them is the distance between the MLEs based on the probability and nonprobability data,b and b NP , respectively, and the standard error of the MLEs based on nonprobability data,r b;NP (cf. Sakshaug et al. 2019). These distance and standard error are used to rescale the variance of the prior distribution for the regression coefficients. Thus, the fourth prior specification (referred to as "conjugate-distance") is specified as whereas the fifth proposed specification (referred to as "Zellner-distance") is an extension of the Zellner prior from (15): Here, diagðvÞ denotes a diagonal matrix with elements of v on its diagonal, maxða; bÞ is an element-wise maximum of two vectors a and b, andr b;NP is a vector of ML standard errors of the regression coefficients estimated using nonprobability data. Since nonprobability samples are usually larger than the probability ones, an implicit assumption here is thatr b;NP are much smaller than the respective standard errors based on probability data. This specification implies that the prior variance does not depend on the binary outcome of the Hotelling T 2 test and allows the prior variance for each coefficient to be rescaled by its own factor that depends on the distance between the coefficients or the standard error of the coefficient based on nonprobability data-whichever is larger for a given coefficient. The squared distance introduced to the prior variance permits shrinking the prior distribution in cases where the difference is relatively small and allows for larger variability if discrepancies between probability and nonprobability data arise. Further, the probability-based MLE is used only in relation to the nonprobability-based MLE to construct the prior for coefficient variability, rather than central tendency. Hence, the shrinkage in the posterior estimator depends on the relative distance between probability and nonprobability samples (measured by a difference between MLEs) or nonprobability MLE standard error, rather than a probability sample alone.
The role of the nonprobability-based MLE standard errors is to serve as "safety vents" should the distance between the MLEs be extremely small. In such cases, it prevents the prior from being overly tight, which may cause numerical matrix inversion problems. If one decides that there is no risk of extremely small differences arising betweenb andb NP , then the prior can easily by transformed to rely on the distance between the two coefficients only.

MODEL ASSESSMENT USING SIMULATED DATA
To assess the proposed methods, we generate both probability and nonprobability data from a simple linear regression model with assumed known coefficients. We also introduce bias in one of the coefficients in the simulated nonprobability samples. We then estimate the parameters using these generated probability samples and, as described in the previous section, priors constructed based on nonprobability samples as specified in (11) and (14-17). Lastly, we assess how bias introduced in the nonprobability data affects the posteriors for the coefficients.
The simulated data were created by assuming various sets of coefficients and using data generating models: We also assume that x 1 and x 2 are correlated with correlation q ¼ 0:1. Here, we present two scenarios. In the first one (denoted as scenario A), we assume This scenario reflects a situation in which the effect of the covariate of interest, say x 2 , on the outcome is relatively small compared with the other covariate. In scenario B, we assume r 1 ¼ 4; r 2 ¼ 0:5; r y ¼ 2; b 0 ¼ 1; b 1 ¼ 0:5 and b 2 ¼ 2, which reflects a relatively strong effect of x 2 .
Next, by using both scenarios, we generate one hundred sets of probability data with sample sizes n 2 f50; 100; 150, 200, 300, 500, 750, 1,000}. For each of the probability sets, we generate fifty nonprobability samples of sizes n NP 2 f10 3 ; 10 4 ; 5 Á 10 4 g by using the same models but with bias in coefficient b 2 induced by multiplying it by a factor f 2 f0:5; 1, 1:5; 2; 3g. This bias in the coefficient leads to bias in the generated response; factor f ¼ 1 implies the nonprobability sample is unbiased. These assumptions about bias are rather extreme, implying that the nonprobability samples are severely biased, as compared with the probability data (e.g., in scenario B, the expectation of the response variable under no bias is E(y) ¼ 11, but when f ¼ 3, this almost triples to E(y) ¼ 31.
We evaluate the performance of each of the priors by calculating the meansquared error for the posterior characteristics of the parameters b. For a given posterior distribution, we define the MSE as MSEðpðb; sjy; XÞÞ ¼ E½ðpðb; sjy; XÞ À b Ã Þ 2 ; where b Ã is the vector of true parameters used to generate data. The MSE can be decomposed into variance and squared bias: MSEðpðb; sjy; XÞÞ ¼ Bias 2 ðpðb; sjy; XÞÞ þ Varðpðb; sjy; XÞÞ: We calculate the bias as the difference between the mean of the posterior means for all iterations, pðb; sjy; XÞ and the true coefficients: with variance being the average posterior variance of the coefficient across all iterations.
In figure 1, we present the bias as measured by the difference between the posterior mean and the true coefficient, posterior variance, and mean-squared error (i.e., the sum of squared bias and posterior variance) of the coefficients b 0 , b 1 , and b 2 in scenario A. All methods except for Zellner lead to reductions in variance even with a large level of bias induced in b 2 (i.e., f ¼ 3). Further, substantial reductions in MSE compared with noninformative results are observed when using conjugate (for b 0 and b 2 ) and conjugate-distance (b 0 , b 1 , and b 2 ) priors, which stems from the low posterior bias produced by these methods. For the Zellner method, however, gains in efficiency are minimal, and for Zellner-distance, they are offset by a larger bias observed even for large probability sample sizes, except for the cases with relatively low levels of induced bias (i.e., f 2 ð0:5; 1:5Þ).
In figure 2, we present the analogous results for the scenario B simulation. Here, we observe that gains in efficiency (reduction in MSE) are obtained when using the conjugate-distance method, even for large (in terms of absolute value) values of induced bias (e.g., f ¼ 3). Both conjugate and Zellner-distance methods lead to large bias in b 0 and b 2 which, for Zellner-distance, cannot be offset by relatively small gains in efficiency if induced bias is present (f 6 ¼ 1). The Zellner method does not lead to large bias, but gains in efficiency are observed only when no induced bias is present (f ¼ 1). This is also indicated by the increasing proportions of the Hotelling test rejecting the null hypothesis stating the probability and nonprobability-based coefficients are the same (section 2.1) in scenario B compared with scenario A (see figure A.1 of the appendix). In figure A.2 of the appendix, we also present the coefficients of variability for the two simulations.
Overall, the conjugate and conjugate-distance methods seem to yield gains in efficiency and produce the posterior estimates least sensitive to bias induced in nonprobability samples, especially for coefficients directly affected by this bias (b 0 and b 2 ). The Zellner and Zellner-distance methods are much more susceptible to potential bias in the nonprobability samples. Nevertheless, one should keep in mind that when the bias implied by any informative prior distribution is very large, the posterior will be overwhelmed by it, especially when nonprobability samples are much larger than probability samples. Also, if the posterior distribution for a coefficient based on the probability data alone is tight (or when using MLEs, the standard errors are small), then even small amounts of bias in the prior based on nonprobability sample may lead to reduced rather than improved efficiency. Thus, the proposed methods are recommended when precision of the probability-based estimates is relatively low and it is desirable to increase it. Typically, such situations occur when small probability samples are available.

Probability and Nonprobability Data
Next, we evaluate the method and four informative prior specifications using actual survey data. The probability survey data come from the German Internet Panel (GIP). The GIP is an ongoing, probability-based longitudinal survey designed to represent the population of Germany 16-75 years of age. The survey is funded by the German Research Foundation (DFG) as part of the Collaborative Research Center 884 "Political Economy of Reforms" based at the University of Mannheim. Sample members are selected through a multistage stratified area probability sample. At the first stage, geographic districts are selected from a database of 52,947 districts in Germany. A random sample of listed households are then drawn, and all age-eligible members of the sampled households are invited to join the panel (Blom, Gathmann, and Krieger 2015). The GIP is designed to cover both the online and offline populations and provides internet service and/or internet-enabled devices to facilitate participation for the offline population ( module we utilize was administered in March 2015. During this month, completed interviews were obtained from 3,426 out of 4,989 (or 68.7 percent of) panelists. After removing responses with partial missing information and adjusting the age range of the GIP to match the age range of the nonprobability samples (discussed below), we utilize a sample size of 2,681 cases. As part of a separate methodological project studying the accuracy of nonprobability surveys (Blom et al. 2017), the GIP team sponsored several nonprobability web survey data collections that were conducted in parallel with the GIP in March 2015. The nonprobability web surveys were carried out by different commercial vendors in response to a call for bids. The key stipulation of the call was that each vendor should recruit a sample of approximately one thousand respondents that is representative of the general population 18-70 years of age living in Germany. The call for bids did not explicitly specify how representativeness should be achieved. A total of seventeen bids were received, of which seven were selected based on technical and budgetary criteria. An eighth commercial vendor voluntarily agreed to perform the data collection without remuneration after learning about the study aims. Additional information about the eight nonprobability surveys can be found in table 1, including the quota sampling variables used and costs. For confidentiality reasons, we do not disclose the names of the survey vendors and identify them simply as survey one, survey two, etc., ordered from least expensive (pro bono) to most expensive (10,676.44 EUR).
In this application, the outcome variable is body mass index (BMI), a commonly studied variable and disease risk factor. A histogram of BMI, as derived from the GIP, is provided in appendix B. The variable has been constructed by using respondents' self-reported height and weight. Body mass index follows an approximate normal distribution with a slight right skew. In practice, height and weight (and hence, BMI) may not be properly recorded due to measurement error (e.g., respondents adding to their height or reducing their weight). We neglect this limitation in the application. Mean BMI for the GIP and eight nonprobability surveys are shown in figure 3. We observe that all of the nonprobability surveys produce a larger mean BMI than the probability-based GIP survey. The most severe bias is apparent in nonprobability surveys six and one, respectively. 1 Following from the simulation assessment in Section 3, our interest lies in estimating coefficients from a linear regression model using BMI as the outcome variable. The model is fitted using the following covariates: age (continuous), sex (1 ¼ female, 0 ¼ male), self-reported health status (1 ¼ fair/poor/very poor, 0 ¼ very good/good), marital status (1 ¼ single, 0 ¼ other), education (mutually exclusive binary variables, that is, 1 ¼ Hauptschulabschluss [basic secondary level] and 0 otherwise, 1 ¼ Mittlere reife [extensive secondary level] and 0 otherwise, 1 ¼ Fachhochschule [vocational school] and 0 otherwise, with university degree being the baseline category), and regularly employed full or part time (1 ¼ no, 0 ¼ yes). Each respondent characteristic was measured in both the GIP and nonprobability surveys using the same questionnaire items. In addition, we include a survey weight variable produced by the GIP team that includes a raking adjustment to population benchmarks as a covariate in the model. Incorporating the survey weight variable as a covariate in the analysis of survey data and Bayesian inference has been considered in previous work (Rubin 1985;Pfeffermann, 1993;Skinner 1994). We do not have access to further sample design information (clustering, stratification) for the GIP. Incorporating such information in the modeling approaches is a topic we leave for future work. Both continuous covariates (age and weights) are standardized to put coefficients on a comparable scale. Maximum likelihood estimators of the regression coefficients of BMI on the covariates, fitted using eight nonprobability surveys, are plotted in figure 4, with the MLE of the coefficients fitted using 2,681 available cases in the GIP data (in gray). The fitted coefficients are similar between the GIP and the eight nonprobability surveys. The largest differences occur for education where the lowest education classification (hauptschulabschluss) has a larger positive effect on BMI in the nonprobability surveys than in the GIP survey. Females are negatively associated with BMI, and this effect is slightly weaker in the nonprobability surveys compared with the GIP. Other variables that tend to have a significant effect on BMI include age (þ) and fair-poor general health status (þ). As a whole, the coefficients are roughly comparable between surveys, and there is no strong indication of bias in the nonprobability surveys.

Computation and Evaluation
We compare the performance of the five specifications of the prior (noninformative, conjugate, Zellner, conjugate-distance, and Zellner-distance; see section 2 for details) by estimating posteriors under different probability sample size scenarios and using different nonprobability surveys to construct the prior distributions. We consider the probability sample sizes n 2 f50; 100; 150; 200;  300, 500, 750, 1,000}. We use these various sample sizes to simulate a situation in which a survey sponsor can only afford a small to medium size probability sample alongside a potentially larger nonprobability sample. These probability sample sizes are drawn at random from the full GIP sample and are constructed cumulatively so that the same cases used in the smaller sample size scenarios are included in the larger sample size scenarios. The size of the nonprobability sample used to construct the informative prior distributions is not manipulated, and the full sample size (roughly one thousand respondents; see table 1) is used. Next, we evaluate the five prior specifications by calculating the bias, variance, and MSE (sum of the variance and squared bias) for the estimated regression coefficients as in (18-19) described in section 2. To calculate bias in (20), we use a vector of MLEs calculated with the entire GIP sample, which we assume to be unbiased. This strong assumption can be relaxed if more reliable sources of data on a given characteristic (in our case, BMI) are known.
The entire evaluation procedure (including drawing of the probability samples) is repeated one hundred times to produce one hundred sets of variance, bias, and MSE values for each sample size scenario. In the following results section, we report the average of these values across the one hundred iterations for each sample size. Bias

Results
The results for bias, variance, and MSE of each regression coefficient are presented in figure 5. For brevity, we present here the results only for the case where one of the more biased nonprobability surveys, survey six, is used to construct the prior distributions for the four informative models. The results using each of the other nonprobability surveys can be found in appendix figures C.1 (bias), C.2 (variance), and C.3 (MSE). The second column in figure 5 shows that conjugate and Zellner prior specifications yield coefficient estimates with considerably smaller variances relative to the estimates derived under the noninformative (reference) prior, particularly for the smaller probability sample sizes. We observe moderate reductions in variance for the conjugate-distance prior and significantly smaller reductions under the Zellner-distance prior. The largest reductions occur for the smallest sample sizes (fifty and one hundred cases). These reductions continue at a decreasing rate until a probability sample size of about five hundred cases is reached; at this point, the variances converge under all five priors. Further, the variances produced under the conjugate and Zellner priors for the smallest probability sample sizes are roughly equivalent to the variances produced under the noninformative prior for the much larger probability sample sizes (starting with five hundred or more cases). The same variance reduction patterns are observed when each of the other seven nonprobability surveys are used to construct the priors (see appendix figure C.2).
Next, we turn to bias. Figure 5 (first column) shows that a few of the coefficients are biased when the informative priors are used. This is particularly true for the variables sex and education, which were previously identified as being biased in the nonprobability surveys (see MLEs in figure 4). A relatively smaller bias is observed for marital status (single) and health status (genhlth). As expected, the biases are most prominent for the smallest probability sample sizes, where the influence of the informative priors is at their peak. All informative priors have very similar impact on bias, although one can see a slight indication that the conjugate and Zellner priors yield a slightly larger and more persistent bias than the other two priors, but this depends on the coefficient. As with the variance results, the bias induced by the informative priors tends to converge to the noninformative reference model as the probability sample size increases and the influence of the priors weaken.
Lastly, we examine the joint impact of variance and bias in the form of mean squared errors for the regression coefficients. Figure 5 (third column) shows the MSE results, which share a similar pattern to the variance results described previously. That is, the MSEs produced by the model with the informative priors are considerably smaller than the MSEs under the noninformative reference model for the smallest probability sample sizes. The reductions in MSE are smallest for the Zellner-distance prior. This pattern holds even for the biased covariates identified earlier, except for the Zellner-distance prior. Thus, it is apparent that reductions in variance outweigh the increase in bias under the informative prior models. As with the variance results, the MSE gap between the values yielded by using informative and noninformative priors closes as the probability sample size increases and reaches an equilibrium at about five hundred cases. For nearly all coefficients, the MSEs under the informative priors for the smallest probability sample sizes are approximately equivalent to the MSEs produced under the noninformative (reference) prior for the largest probability sample size (n ¼ 1,000).

DISCUSSION
In this article, we evaluate a method of integrating relatively small probability samples with nonprobability samples to improve the efficiency (i.e., reduce variability) and reduce the mean squared error for estimated regression coefficients. We exploit a Bayesian framework and consider four prior specifications constructed using nonprobability sample survey data that inform posterior estimations based on small probability sample survey data. This approach may be considered a reversal of the more traditional approach of adjusting a nonprobability sample toward a supplementary probability sample (or other highquality benchmark source), as we deliberately "borrow strength" from the nonprobability survey data and use these data to influence the probability sample survey estimates.
Using simulations and a real-data application, we show that the nonprobability-based informative priors can yield reductions in variance and, subsequently, MSE for linear regression coefficients estimated from small probability samples. The reductions in variance/MSE yielded such values that were consistent with the variance/MSE values of much larger probability-only samples, suggesting that smaller probability sample sizes coupled with relatively inexpensive nonprobability samples could produce levels of error that are comparable with large and more expensive probability samples.
However, using potentially biased nonprobability survey data to inform estimations has the potential to introduce bias in the final estimations and offset gains in efficiency. This issue was not encountered in the case study where only modest biases were present in the nonprobability surveys and these biases did not affect the efficiency gains. Further, the simulation study showed that large biases can potentially overwhelm the method and produce larger MSEs, depending on the particular prior specification. In particular, the Zellner-distance typically led to this behavior. The conjugate and conjugate-distance methods were reasonably robust to large amounts of bias induced in the simulated nonprobability samples. Nevertheless, one should be cautious in implementing the proposed methods when fitting regression models with covariates that are particularly susceptible to very large biases in nonprobability surveys. Some additional limitations of the method should be noted. First, the proposed method relies on the assumption that the probability sample is unbiased, particularly when compared with the nonprobability sample. This assumption may not hold in some situations in which one is modeling rare or difficult to measure phenomena (e.g., homelessness, HIV). We acknowledge that even high-quality probability samples can be subject to measurement issues due to nonsampling error and might sometimes provide biased estimates relative to a nonprobability sample designed specifically for measuring these phenomena (e.g., respondent-driven sampling). Second, the method presented here is limited to continuous outcome variables. Extending the method to handle other variable types (e.g., ordinal, nominal, count) is possible and will be considered in future work. Further research extensions include accounting for complex sample design features, model misspecification, and alternative prior specifications. The four informative priors considered here are only a small subset of possible specifications. Alternative specifications that, for instance, can correct for bias in the nonprobability samples could be considered, as well. Further, a strength of the study application was a demonstration of the method using several nonprobability surveys collected by different commercial vendors. This demonstration noted some variability in the results that warrants further investigation. Future work ought to investigate the properties of nonprobability surveys that make them potentially good (or bad) candidates to construct useful prior distributions.
Based on the results of our evaluation, we provide the following advice to practitioners. For practitioners who utilize the probability sampling approach but cannot afford to continue fielding large sample sizes, we view this method as a means to reduce the probability sample size without inducing more variability in the estimates. Indeed, adopting this method and allocating some resources to field a parallel nonprobability data collection, which need not be so large itself (e.g., 1,000 cases), is likely to produce similar measures of variability (and MSE) compared with a large probability-only sample at a reduced cost. The risk of inducing bias by introducing nonprobability data collection is likely to be small as the proposed prior specifications, particularly the conjugate methods, were reasonably robust to varying levels of bias.
For practitioners who defend the nonprobability sampling approach, the primary concern would be the cost of fielding an additional (albeit, small) probability sample. In our view, the benefits of collecting a small parallel probability sample outweigh the added costs for three reasons. First, the empirical literature continues to show that probability samples produce more accurate estimates than nonprobability samples, thus introducing some probability data collection is likely to be perceived as a more scientific and credible approach. Second, the Bayesian framework is a natural system for incorporating multiple data sources with varying levels of quality, is principally structured to incentivize sparse, yet high quality observations in the posterior estimations, and is a commonly accepted method in official statistics for estimating small domains.
Third, collecting a small probability sample need not be so costly if the data are collected via an existing probability-based omnibus survey or relatively inexpensive probability-based mail survey. While it would be important to achieve a high response and minimize the risk of nonresponse bias, this might be more feasible with a small sample than with a large one.
In conclusion, it is interesting to know that probability and nonprobability samples can be integrated in a way that exploits their advantages to compensate for their weaknesses and improve estimation of model parameters. It is further possible that the method can lead to cost savings for a fixed variance (or MSE) if the nonprobability sample units are significantly cheaper to interview than the probability sample units. Another attractive quality of the method is its computational efficiency. It is easily implemented using freely available software such as R and any statistical software that allows Bayesian inference and specification of the prior distributions for the linear regression model. The R code used to obtain results is available in the online supplementary material.