Income Source Confusion Using the SILC

Abstract We use a unique panel of household survey data—the Austrian version of the European Union Statistics on Income and Living Conditions (SILC) for 2008–2011—which have been linked to individual administrative records on both state unemployment benefits and earnings. We assess the extent and structure of misreporting across similar benefits and between benefits and earnings. We document that many respondents fail to report participation in one or more of the unemployment programs. Moreover, they inflate earnings for periods when they are unemployed but receiving unemployment compensation. To demonstrate the impact of income source confusion on estimators, we estimate standard Mincer wage equations. Since unemployment is associated with lower education, the reports of unemployment benefits as earnings bias downward the returns to education. Failure to report unemployment benefits also leads to substantial sample bias when selecting on these benefits, as one might in estimating the returns to job training.


Introduction
Surveys collect income by source asking specific questions about labor market earnings, entitlement payments, and social safety net programs.Marquis and Moore (1990), Bollinger and David (1997), Bollinger and David (2000), Lynn et al. (2012), Meyer and Mittag (2019), Celhay, Meyer, and Mittag (2021), and Meyer, Mittag, and Goerge (2022) have demonstrated or suggested that source confusion in survey respondents may impact survey responses.Angel, Heuberger, and Lamei (2018) use linked survey and administrative data on Austria to show that while there is substantial error in reporting the level and receipt of many sources, the overall estimates of poverty are far less biased.While they do not explore the detailed reasons, one possible explanation is that some incomes are reported in the wrong place.Cross-reporting may imply that individuals get the specific source wrong but total income close to correct.
The allocation of income into the correct source is important for many research questions (see Bollinger and David 1997;Lynn et al. 2012).The literature using linked survey and administrative data has studied the data error properties separately in earnings or in state benefits.Hokayem, Bollinger, and Ziliak (2015) and Bollinger et al. (2019) examine nonresponse and measurement error in US earnings, and Meyer, Mok, and Sullivan (2015) and Meyer and Mittag (2019) in state transfers.Bollinger andDavid (2000, 2005) document the persistence of response errors over time in food stamps.Mathiowetz and Duncan (1988) examine unemployment spells, but not the benefit amount.Kapteyn and Ypma (2007) and Jenkins and Rios-Avila (2021a) analyze measurement error in earnings with Swedish and UK data, respectively, and Lynn et al. (2012) study measurement error in UK benefit receipt.It is rare to have linked survey and administrative data and even rarer to have such data for multiple income sources.
We use data from the Austrian version of the European Union Statistics on Income and Living Conditions (SILC) for 2008-2011, linked to administrative records on both Austrian state benefits and earnings.While Angel, Heuberger, and Lamei (2018) and Angel et al. (2019) use the same linked data, they focus on the measurement of poverty and examine the relationship between household and survey characteristics and response error, respectively.Fuchs et al. (2020) show that the take-up of social assistance is overestimated with the SILC, compared to the administrative data, due to errors in the reported incomes.
There are various reasons why individuals misreport income in household surveys.Errors may be due to genuine reporting mistakes or deliberate misreporting (Tourangeau, Rips, and Rasinski 2000).Reporting mistakes could be due to program name confusion or confusion of conceptually related benefits.Individuals with many income sources may only know the total.They may also consider some benefits as earnings if those benefits are associated with their employment.Survey design may also make it easier to report different incomes as one.Deliberate misreporting may occur if items are sensitive, for example, benefits for low-income families-then individuals may conceal receipt or report them as earnings.
Our paper has two goals.First, to assess the extent and structure of misreporting across similar benefits and between benefits and earnings.Bollinger and David (2000), Celhay, Meyer, and Mittag (2021), and others have considered program confusion, but to the best of our knowledge ours is the first paper to examine reporting benefits as earnings.The three sources of income Income Source Confusion Using the SILC with the highest underreporting in the SILC are Austria's three main unemployment programs: Unemployment Insurance, Unemployment Assistance, and Assistance for Covering Living Costs.The programs are not mutually exclusive.As we document below, many respondents fail to report participation in one or more of the programs.Moreover, it appears that they inflate earnings for periods when they are receiving unemployment compensation, particularly those who fail to report receipt of the benefit entirely.
Second, we demonstrate the impact of the correlated misreporting on estimates typical in the literature.Survey data are often used for estimation of wage equations because of the rich covariates available (which are often unavailable in administrative data).We document that reports of unemployment as earnings lead to a downward bias in estimates of the returns to education.We then examine the role of job training for future earnings and employment.We document that the returns to job training can be biased, in some cases quite substantially, including sign changes.The misreporting of various unemployment benefits biases the sample, as well as leading to typical types of measurement error bias.

Unemployment Benefits in Austria
The three main unemployment benefits in Austria, recorded in the SILC, are the Unemployment Insurance (UI, "Arbeitslosengeld"), Unemployment Assistance (UA, "Notstandshilfe"), and Assistance for Covering Living Costs (ACLC, "Beihilfe zur Deckung des Lebensunterhaltes").All three benefits are administered by the same labor market agency and allow for limited labor market participation.While UI is insurance based and is the biggest unemployment program, UA and ACLC are smaller programs targeting lowincome individuals.
UI is an insurance-based benefit and is provided for a limited time, generally 20 up to 52 weeks.Qualified individuals receive a proportion (usually at 55 percent) of previous earnings for duration, dependent on age and employment history.UI is the main unemployment program in Austria, and has the highest participation of the three.
UA is targeted at low-income groups using an income test on the unemployed person's and their partner's (if present) earnings.It extends benefits when low-income individuals exhaust their UI benefit, so at a given point in time one can receive either UI or UA, but not both.Rather than receiving UA automatically after UI, individuals need to apply for UA but transition is supported by the labor market agency.UA can be received for up to 12 months, but follow-up applications are possible.The benefit amount is paid out as a proportion of UI-depending on circumstances, at 92 percent or 95 percent of UI-and may be capped after 6 months of receipt.

544
C.R. Bollinger and I.V. Tasseva While on UI and UA, the labor market agency may require participants to take part in job training programs.ACLC provides income support during those programs for low-income individuals only and can be received simultaneously with UI or UA.The benefit lasts for the duration of the at-leastweek-long training.The amount equals the difference between a guaranteed minimum and the current entitlement to UI/UA.
There are several reasons why respondents may cross-report one benefit as another in the SILC.Similar program names may cause confusion, but the program names in German are very different from each other.However, all three benefits are paid during periods of unemployment, administered by the same labor market agency, and the amounts are linked.They are similar conceptually, which may lead to program confusion (see, e.g., Balarajan and Collins 2013).For example, UA may get reported as UI, if respondents think of UA as a less generous version of UI.But respondents may also report UI as UA if they cannot recall correctly the timing of UI receipt (a problem known as "backward telescoping").Benefit cross-reporting may occur if respondents do not know the individual benefit amounts, but know the total they receive; or if respondents find it quicker to report everything as one benefit.
Respondents may also report benefits as earnings in the SILC.There may be stigma (Currie 2006)-UA and ACLC are targeted at low-income individuals who may misreport them as earnings.Individuals may think of benefits as earnings-UI and UA are linked to previous employment, while ACLC pays for job training.Or individuals may not know the individual income amounts but the total they receive and report only one source.

Data: The Austrian SILC
The SILC is a household income survey conducted in all EU countries and beyond (Atkinson and Marlier 2010).The Austrian SILC used in our analysis is for survey years 2008-2011 (income reference period for the calendar years 2007-2010).The sample is based on all private households registered in the central population register ("Zentrales Melderegister").The data include information on individual and household characteristics, including specific income sources.SILC serves as the main source for official statistics on income, poverty, and inequality, produced by Statistics Austria and Eurostat.
The key feature of the data, which allows us to provide novel evidence on income cross-reporting, is that the data include individual-level survey reports and administrative records on both earnings and a range of state benefits.Next, we describe in more detail the main data characteristics.

Linked Survey Reports and Administrative Records
Starting in 2012, SILC survey participants are no longer asked to provide information on earnings and certain types of state benefit; this information is acquired directly from various administrative sources (Heuberger, Glaser, and Kafka 2013).To provide a continuous data series, Statistics Austria retrospectively matched the individuals in the survey to their administrative data for the survey years 2008-2011 (Heuberger, Glaser, and Kafka 2013).
Survey respondents were matched using a pseudonymized personal identifier used nationally for each administrative data source and in the central population register used for the SILC sampling frame (Heuberger, Glaser, and Kafka 2013).The SILC collects data about all household members at an address.Someone who is living but is not registered at an address-and so is not part of the SILC sampling frame-may end up in the sample.Their identifier then needs to be obtained but may not be found due to errors in the survey or administrative data (e.g., incorrect birth date) or if nonexistent (e.g., for younger individuals or individuals with nonAustrian citizenship).Identifiers were found for 95.6 percent of the sample in 2008, 97.7 percent in 2009, 96.8 percent in 2010, and 99.4 percent in 2011(Statistics Austria 2014). 1he earnings data derive from wage tax data electronically supplied by employers to the tax authority.The wage tax data include data on all taxable earnings of employees and pensioners, potentially excluding any undeclared under-the-table earnings that may be captured in the survey.The benefit data are provided by the authority administering benefit payments (see Angel, Heuberger, and Lamei 2018).
Recent work by Kapteyn and Ypma (2007) and Jenkins and Rios-Avila (2021a) suggests that mismatch can play a role.The Austrian system samples from an original frame that already has the identifier, hence the match is generally not ex post.The high rates of deterministic match are evidence that this likely does not play a role.Kapteyn and Ypma (2007), Paulus (2015), Bollinger et al. (2018), Bollinger et al. (2019), and Jenkins and Rios-Avila (2021a) all suggest that administrative earnings may fail to reflect under-thetable earnings.We show that some of what is often believed to be under-thetable earnings may actually be cross-reporting benefits as earnings.We note that generally administrative records of state benefits are believed to be quite accurate (Bollinger and David 1997;Meyer and Mittag 2019).We proceed assuming the benefit administrative data represent the truth; we remain ambiguous about the earnings records.
The survey information on earnings and benefits is based on individuallevel interviews with adult respondents aged 16þ. 2 We use the SILC "additional data" files with detailed information for the level of gross earnings and state benefits and the number of months in receipt for each income.These data give us the necessary level of detail to distinguish between the different types of unemployment benefit.Angel, Heuberger, and Lamei (2018) and Angel et al. (2019) use the SILC harmonized income variables, which aggregate all unemployment benefits into a single variable.
The reference period of the survey and administrative data is the calendar year before the date of interview.We compare the gross annual amounts based on the survey versus administrative data.Both the survey and administrative data measure gross annual earnings.The administrative records measure annual benefit amounts, while the survey measures monthly amounts.We multiply the monthly benefit amount by the number of months in receipt to derive gross annual benefit amounts in the survey. 3

Sample
The SILC follows individuals in a four-year rotating panel.Individuals are staggered so that each year 25 percent of the sample is renewed.Individuals initially surveyed in 2008 appear in all four years of our sample.Individuals initially interviewed in either 2005 or 2011 appear only once.Table 1 exhibits the number of observations in our sample by wave.
We include individuals aged 16þ, but limit earnings regressions to ages 19-64.We exclude missing/imputed values for earnings as an outcome and a control variable.Where a categorical control variable (e.g., occupation) has missing/imputed values, we use a dummy variable to indicate this.
Table 1 provides summary statistics for the full sample (column 1), men (column 2), and women (column 3), including observations with missing/imputed values.The proportion of cases with missing/imputed UI/ACLC survey is 0.1 percent, while the missing/imputed cases in survey earnings are too few to disclose.
There are 44,970 observations in total, with slightly more women (23,696) than men (21,274).Mean age is 47.4 years.The majority of people (59.7 percent) have middle-level education, though men tend to be more educated 2. SILC questionnaires are available at https://ec.europa.eu/eurostat/web/income-and-living-conditions/quality/questionnaires.SILC quality reports contain further information on the data, including sampling design and response errors: https://ec.europa.eu/eurostat/web/income-and-living-conditions/quality/eu-and-national-quality-reports.3. The error in the survey annual benefit amounts can stem from misreporting the benefit duration and/or the average monthly amount.As we are interested in the total error in benefits, as well as in earnings, we do not attempt to disentangle the two error components.Future research will address this.(continued) 548 C.R. Bollinger and I.V. Tasseva than women.Men are also more likely to be in full-time, full-year employment than women and to have earnings-59.4percent of men versus 49.8 percent of women-suggesting a prevalence of part-time and/or part-year employment among women.The sample has 86 percent born in Austria, 5.9 percent in the rest of the EU-27, and 8.1 percent in Turkey, the Western Balkans, or other countries.In the sample, 21 percent reported job training, with 1.4 percent having training by the labor market agency.Of respondents, 5.1 percent received UI, 1.7 percent UA and 0.7 percent ACLC.More men (6 percent) report UI than women (4.2 percent), but the proportions by sex are similar for UA (ACLC receipt by sex is too small to disclose).Most UI/ UA/ACLC recipients report receipt in one year only, while around a quarter of UI and UA and 11.4 percent of ACLC recipients had spells spanning multiple years.

Results
To document the extent and relationship of misreporting in the three unemployment programs and labor market earnings, we employ a number of approaches.First, we examine cross tabulations and average benefit and earnings amounts compared to administrative amounts.Second, we estimate linear probability models of benefit receipt in the survey conditional on receipt in the administrative data, with demographic control variables included.We include administrative earnings and the difference in survey and administrative earnings as well as measures of reporting accuracy (false positive, false negative, and true positive) as main variables of interest.Finally, we examine how the levels of misreporting, especially for earnings, are correlated with the accuracy of reporting other benefits.

Benefits and Earnings Underreporting
Table 2 documents the extent of earnings and benefit misreporting in the survey for each income source separately and for the total of the three benefits (UIþUAþACLC).The first row reports the rates of false negatives: the share of recipients according to the administrative data who do not report the benefit in the survey.The rate is high for all three benefits: 42.6 percent for UI, 52.1 percent for UA, and 77.4 percent for ACLC.The false negative rate for respondents receiving any of the unemployment benefits is lower, at 38.4 percent.The overall receipt of any unemployment benefits is better captured in the survey than each individual benefit.The false negative rate for earnings is 8.9 percent.One explanation is that some individuals simply minimize effort on the survey by denying income sources.
We report two definitions of false positive.The first, typically found in the literature, is the share of all administrative nonrecipients of the income source who report receipt in the survey.False positives tend to be low as a rate, especially for social safety net programs, as the denominator is large.Indeed, for each unemployment benefit as well as the total, the rate is less than 1 percent, though the number of false positives is not insubstantial-105 for UA to 317 for UI.The issue of underreporting benefit receipt seems to be more serious, consistent with findings of underreporting UK and US state benefits in survey data (Lynn et al. 2012;Meyer, Mok, and Sullivan 2015;Meyer and Mittag 2019).For earnings, 6.2 percent of those with no administrative earnings report earnings.One explanation for this is underthe-table earnings.However, another explanation may be reporting of social safety net income as earnings.
The second definition of false positive measures the extent of false positives for individuals, conditional on administrative receipt of one of the other unemployment programs.These false positives are suggestive of program confusion of some type.The rate is relatively high for UI-22 percent of UA/ACLC recipients falsely report a UI receipt in the survey.
In the remaining rows, we document misreporting of amounts first for those with receipt in both survey and administrative measures, and for all who either receive or report receipt.The average differences for all four income sources are typically not large, but this masks substantial variation.For true positives, across all four sources, a large proportion are within 10 percent.This is particularly true for earnings and UA, and for the combined total of the three benefits.Only for UI are underreports of 10 percent to 50 percent more common.For all those who receive or report receipt, large false negatives for UI/UA/ACLC dominate the distributions.For earnings, differences within 10 percent dominate, although false negatives and false positives place more weight in the tails.Source: Own calculations with the SILC.Note: Rates in percent and number of observations shown for false negative/positive.False negative in percent is the share of benefit recipients according to the administrative data who do not report the benefit in the survey.Two definitions of false positive are considered: a) in percent equals the share of benefit/earnings nonrecipients according to the administrative data who report the benefit/earnings in the survey; b) in percent equals the number of benefit nonrecipients according to the administrative data who report a receipt in the survey, as a share of the number of recipients of the other two benefits in the administrative data (by definition, there is no estimate for the total and earnings).Mean amounts and errors are based on the sample of true positives only, that is, those who receive benefits/earnings in both the administrative and survey data, or the total sample of true positives/false positives/false negatives.The error in amount equals the difference between the survey and administrative amount.Groups with different size of error in ACLC are too small to disclose.Observations with missing/imputed benefit values are excluded.Sample is restricted to those aged 16þ.

Benefits and Earnings Cross-Reporting
Cross-reporting at the extensive margin Accuracy of reporting at the extensive margin-correctly reporting a source of income-is important for many applications, such as estimating benefit take-up rates and understanding overall reporting behavior.Table 3 provides cross tabulations focusing on combinations of survey income receipt, given administrative receipt.Each row provides the proportion of those in one administrative category reporting benefit and income receipt in the survey.For example, the first row are individuals who, based on administrative data, only received UI income.While 38.3 percent correctly reported this, fully 11.2 percent reported only earnings.We draw attention to the fact that very few individuals receiving "UI only" in the administrative record reported receiving other benefits in the survey, while 11 percent of those who received "UA only" reported only receiving UI.
We highlight three important regularities in this table.First, the diagonal that represents when the survey response agrees with administrative record in receipt typically has entries at or below 50 percent.The only exceptions are the "Earnings only" and "none" rows.Second, the column "Earnings only" represents individuals who report only labor market earnings in the survey.The high percentages in this column demonstrate that many respondents simply report myriad benefit and earnings combinations as earnings only.Finally, we note that those who are only on UI, UA, or some combination of UI, UA, and ACLC (the first, second, sixth, and eighth rows) have relatively high percentages in the "none" column, reflecting complete denial of benefit receipt.
Next, we estimate linear probability models for the probability of reporting a benefit receipt in the survey, conditional on administrative receipt for UI and UA.We report a focused subset of coefficients here; full specifications and robustness checks including logit estimates are in Supplementary Material tables S1-S4.Due to disclosure restrictions, we are unable to provide estimates for ACLC.These models investigate false negative reports and allow us to control for myriad demographic characteristics, earnings and receipt and report of the other programs.These models are not meant to identify causal effects, but rather highlight associations similar to those in table 3 while allowing us to control for other factors.
Table 4 presents results for UI.In column (1), we condition on the logarithm of the administrative amount of UI; the administrative amount as well as the discrepancy between the survey and administrative amounts of earnings (in thousands); and dummies for false positive, false negative, and true positive in earnings, UA, and ACLC, with true negative being omitted.This first specification focuses on administrative characteristics of the UI program and the reporting of other incomes.In column (2), we control for a rich set Income Source Confusion Using the SILC    4).
The statistically significant coefficients of interest in both specifications are qualitatively and quantitatively similar.The results are robust to differences across characteristics, suggesting strong underlying associations.We find that higher administrative earnings and larger earnings overreports in the survey are negatively associated with correctly reporting receipt of UI, with Source: Own calculations with the SILC.Note: This table shows an estimation of a linear probability model.The dependent variable equals 1 if the benefit amount is positive in both the survey and administrative data; and 0 if the survey amount is 0 while the administrative amount is positive."True þ" implies positive income amounts in both the survey and administrative data; "false þ" means positive amount in the survey and zero in the administrative data; "false -" means zero in the survey and positive amount in the administrative data; and "true -" means zero amounts in both the survey and administrative data.Column (2) adds controls for: age group (in 5-year age bands), number of children in the household (0, 1, 2, 3þ), number of adults in the household (1, 2, 3þ), the highest achieved education level (low, middle, high), if in a couple, health status (6 categories), region (Vienna, borough with more than 100,000 residents, borough with 10,000-100,000 residents, borough with less than 10,000 residents), occupation (12 categories), industry (25 categories), being a civil servant, country of birth (7 categories), wave (interviewed for the 1st, 2nd, 3rd, 4th time), interaction between wave and year (2008 to 2011), interview type (in person or by phone), same interviewer as last year.Sample is restricted to those aged 16þ.Observations with missing/imputed administrative/survey UI or earnings are excluded.Cells with too few observations cannot be disclosed

556
C.R. Bollinger and I.V. Tasseva coefficients of À0.008 and À0.013, respectively.One explanation is a negative correlation among errors if administrative earnings fail to capture parts of true earnings; another explanation is that respondents report UI as earnings.The combination of coefficients on true positive for earnings, and the levels, also suggest that respondents may be reporting UI twice in the survey: once as earnings and then again as UI.In the survey, people are first asked about their earnings and then about benefit entitlements.This may lead to some of the cross-reporting.Table 4 also presents results consistent with cross-reporting of UI and UA.Reporting UI in the survey declines with a false positive in UA by 0.152, compared to a true negative.
The amount of UI matters: the coefficient on log of UI administrative amount is 0.136; a 1 percent increase in UI is associated with a 0.1 percent higher probability of reporting UI receipt.Participating in job training paid by the labor market agency is also positively associated with reporting UI in the survey (a coefficient of 0.112).
Holding constant the UI amount, the longer one receives UI, the less likely one is to report its receipt.One explanation could be stigma associated with a long spell of UI because the respondent was unable to find employment.Alternatively, this could be due to recall error if the spell crossed years.We find that the later in the interview year respondents are interviewed, and thus the larger the time gap between the income reference and data collection period, the less likely they are to accurately report UI receipt in the survey (see Supplementary Material table S1).
Table 5 presents estimates for the probability of reporting UA in the survey, conditional on administrative receipt.These results suggest it is possible UA is also being misreported as earnings or other benefits.As with UI, higher administrative earnings and earnings overstatement in the survey are associated with lower probability of reporting UA.Furthermore, the probability of reporting UA in the survey is reduced with a false positive in earnings, by 0.247.These results suggest respondents may be putting UA into earnings.Results are also consistent with respondents reporting UA as UI.Reporting UA declines with a false positive in UI, relative to a true negative.The coefficient (À0.298) is bigger than the coefficient for reporting UI of a false positive in UA (À0.152 in table 4).Similarly to table 4, the common coefficients across both specifications are generally qualitatively and quantitatively similar.
The amount of UA matters, consistent with results for UI.However, unlike UI, controlling for demographic characteristics does markedly reduce this association.A 1 percent increase in the UA amount increases the probability of reporting UA in the survey by 0.035 percent.Job training by the labor market agency also means it is more likely to accurately report UA receipt (by 0.181).A longer benefit duration is positively associated with Income Source Confusion Using the SILC Supplementary Material tables S5 and S6 repeat the above analyses by gender.Women seem more prone to confusion between UI and UA than men.Misreporting of both UI and UA is positively associated with the level of administrative earnings and the difference between survey and administrative earnings.Women appear less likely to report a benefit receipt in both earnings and the benefit.Reporting UI receipt in the survey increases with a false positive in earnings by 0.31 for men, suggesting reporting UI receipt twice, while we find a small negative coefficient for women (Supplementary Material table S5).Women may be less prone to inflate their total income than men, but more likely to report benefits in the wrong category.

Cross-reporting at the intensive margin
While false negatives are large and important, errors at the intensive margin-in the reported amount of an income source-are also important to understanding overall reporting behavior.(columns 2 and 3) separately by benefit type, earnings, and total income (all unemployment benefits þ earnings) conditional on benefit receipt and report.It also presents the mean error in level and in percent of the mean administrative income (column 5), the standard deviation of the errors (column 6), and correlation between the survey and administrative amounts (column 7).While on average there is a gap between the survey and administrative amounts, the errors by income source tend to partly offset each other and reduce the overall error in income.The errors in the UI/UA true positive groups, as measured by both average delta and standard deviation of delta, are the smallest, suggesting that respondents who correctly report receipt of at least one of their benefits tend to make fewer mistakes in levels.Source: Own calculations with the SILC.Note: Earnings.D ¼ mean survey-mean administrative amount.SD of D ¼ standard deviation of (survey amount-administrative amount).q ¼ correlation between survey and administrative amounts.Group with ACLC true positives is too small to disclose.Sample includes those aged 16þ.Observations with missing/imputed earnings or benefits (UI, UA, or ACLC) are excluded.

Income Source Confusion Using the SILC
For those who fail to report receipt of UI or UA, the error in earnings (and income) is higher both in mean value and as measured by the standard error of the difference.While the error for income in these cases is still large, it is smaller on average than either the underreporting of the benefit or the overreporting of the earnings.
Table 7 examines earnings errors (survey-administrative earnings) by combined benefits status.We take the sample of respondents overreporting earnings and receiving benefits in period t while not receiving benefits in period s.We split the benefit status in period t by false negatives and true positives, with an error in the reported benefit amount of <10 percent versus !10 percent.The average error in earnings is smaller in the periods when no benefits are received, particularly for respondents who fail to report any of the benefits they receive or when they make a large error in the amount they report.The difference in earnings overreports with benefit receipt suggests that some of the overreport is benefits.
Table 8 examines the relationship between the level of misreporting in earnings and benefits.We combine errors across all three benefits and measure it as the administrative amount minus the survey.The dependent variable is the error in earnings reports measured as the survey minus the administrative amount.A coefficient of 1 would indicate Euro-for-Euro Source: Own calculations with the SILC.Note: This table shows the mean of the difference between survey and administrative earnings.A positive mean implies earnings are on average overreported in the survey relative to administrative data.The sample includes those aged 16þ and who overreport their earnings in the survey relative to the administrative data and receive any of UI, UA, or ACLC according to the administrative data in period t, while not receiving any of the benefits in period s according to both the survey and administrative data.Observations are further split into groups, depending on the survey value of the benefits in period t: if a true þ for the combined receipt of benefits and the absolute error in the sum of benefits (admin-survey sum of benefits) is </! 10 percent of the administrative amount; and a false -for the combined receipt of benefits.Observations with missing/imputed administrative/survey earnings are excluded.

562
C.R. Bollinger and I.V. Tasseva Table 8.OLS model of the error in earnings (survey-admin earnings) on the reverse error in sum of benefits (admin-survey benefits).
Men Women Source: Own calculations with the SILC.Note: The outcome variable is the error in earnings (survey-admin earnings).The key independent variable is the reverse error in the sum of all unemployment benefits, UI, UA, and ACLC (admin-survey sum of benefits), overall or by category: with a true -for the combined receipt of all benefits, that is, error equals 0; true þ; false -; and a false þ.The controls include: age group, number of children in the household, number of adults in the household, if in a couple, region, education, occupation, industry, if a civil servant, country of birth, self-reported health, duration of earnings receipt in months according to the administrative data, proxy interview, month of interview, survey wave, interaction between wave and year, interview type, and if the same interviewer as last year.Sample is restricted to those aged 16þ.Observations with missing/imputed administrative/survey earnings/benefits are excluded.Standard errors clustered by individual and shown in parentheses.p-values shown next to standard errors and based on a two-tailed significance test.
transfer from one source to the other.Column 1 shows that men nearly onefor-one misreport benefits as earnings (and vice versa).We split the benefit reporting errors into four categories: false negatives, false positives, and true positives with benefits underreporting or overreporting.Noting that false negatives are a large group (38.4 percent), the coefficient of 1.072 for men indicates that these individuals report their benefits nearly one-to-one into earnings.False positive men underreport earnings by roughly 60 cents for each Euro overreported.For those who are true positives and underreport their benefit, they appear to report about 2/3 of the missing amount as earnings.For those who overreport their benefit, they underreport earnings by the same amount.
For women the story is more muted, with smaller coefficients.A similar pattern exists, though: those with false negatives replacing earnings, false positives underreporting earnings, and those who make errors in only the level tend to replace at fractional amounts.There is clearly evidence that people are having difficulty placing income in the proper source.However, we cannot identify if this is simply confusion due to timing, stigma, or other possible explanations.

Implications of Income Source Confusion
We focus on misreporting of earnings as a dependent variable in two applications.We note that since misreporting of earnings is correlated with variables related to unemployment, coefficients on these variables can be biased.We highlight this using estimates of the returns to education.Since unemployment is associated with lower education, reporting unemployment compensation in earnings inflates earnings disproportionately for that group.
Our second focus is on construction of a sample.When mismeasurement of a key variable in sample construction is correlated with the dependent variable, bias can occur.We highlight this in estimation of the returns to job training.Sample selection hinges on being unemployed in one time period.Along with subsequent misallocation of earnings for those with persistent unemployment, this leads to substantial bias in the estimates of job training.

Bias in the returns to education
Table 9 documents the distribution of education by benefit status: "recipient" if receiving any of UI, UA, or ACLC and "nonrecipient" if not receiving unemployment benefits.The key finding is that recipients tend to be less educated.
There is a long-term interest of policymakers and researchers in understanding the association between earnings and education.Given the differences in education by benefits status, we test if the misreporting of 564 C.R. Bollinger and I.V. Tasseva unemployment compensation as earnings biases the returns to education.Focusing on full-year, full-time workers, including those who were unemployed but would have worked the full year otherwise, we construct two samples: one using the survey measures of earnings, UI, UA, and ACLC to determine who is a full-year, full-time worker and one using the administrative measures.We estimate standard Mincer wage equations for men and women for survey versus administrative earnings.As in other regressions, we control for a rich set of covariates (see table 10 notes).In table 10, for men, the returns to both middle and high education, relative to low, are higher in the administrative than survey data.Jointly testing the differences in the coefficients between the two datasets rejects the null hypothesis of equivalence (p-value of .041).For women, we also find differences in the returns to education by dataset, although the differences are smaller and not significantly different (p-value of .624).The lower rates of cross-reporting benefits as earnings for women likely drive the weaker results.
Differences in the results from table 10 may stem from two issues.First, the samples are different.The survey data contain individuals who were not working, but reported benefits as labor market earnings, and the administrative data include individuals who failed to report their earnings in the survey.The second bias stems from earnings misreporting: either cross-reporting of benefits as earnings or simple measurement error in earnings alone.Both sample and misreporting errors can include error due to exclusion of parts of true earnings in the administrative earnings.Though we do not know its extent, we show that some of the error is due to cross-reporting.To isolate misreporting of earnings, we repeat the analysis in table 11 by restricting the sample to the same people in both datasets, ensuring that differences in coefficients are entirely due to the intensive misreporting of earnings.For women, most of the difference in the returns to education by dataset goes away, suggesting differences in the sample dominate the results.In comparison, for men the differences mostly remain, so it is differences in the earnings reports that mainly cause the bias in the returns to education.4) are based on survey earnings; columns (3) and (6) on administrative earnings; and columns (2) and (5) on an "intermediate" earnings measure with administrative earnings for those with administrative receipt of any unemployment benefit-UI, UA, or ACLC-and survey earnings reports for everybody else.
We decompose the errors in earnings using an "intermediate" earnings measure that uses the administrative earnings of those with unemployment benefits and survey earnings reports for everyone else.Differences in coefficients between the survey and intermediate earnings capture the errors from benefit recipients-a combination of cross-reporting and simple misreporting-while differences in coefficients between the intermediate and administrative earnings capture errors from earnings misreporting only.For men, errors from benefit recipients account for a large share of the total bias in the coefficients for middle and high education, suggesting that cross-reporting matters.For women, errors by benefit recipients lead to a downward bias in the returns to education, whereas errors by everybody else go in the opposite direction, though the differences are not statistically significant.

Bias in the returns to job training
Policymakers often tie unemployment programs to job training.We look at the implication of errors in survey incomes for the efficacy of job training in raising earnings and employment.Our sample is constructed of individuals who received unemployment benefits or reported being unemployed for at least one month in year t.Two samples are constructed: one using only survey measures of UI, UA, and ACLC to establish unemployment and one using administrative measures of UI, UA, and ACLC.The outcome variables are earnings and employment in t þ 1, based on survey versus administrative earnings.For the few individuals who received multiple job training spells, we include only the last one.
We estimate a standard Mincer wage regression with indicators for three types of job training: self-funded, employer-funded, and funded by the labor market agency.We control for a rich set of covariates (see table 12 notes).Small sample sizes prevent separate estimation by sex.
Table 12 presents the results for earnings.The coefficient on labor market agency training is of particular interest as tied to government policy tackling unemployment.The coefficient in the survey data is a robust 14.3 percent gain from training provided by the labor market agency.However, in the administrative data, this rises to 22.2 percent.The loss of nearly a third of the potential sample is clearly an issue.Additionally, individuals who remain or become unemployed again in year t þ 1 may be reporting benefits as earnings.We also note substantial bias in the coefficients on self-funded training (-13.6 percent to -31.4 percent) and employer-funded training (-7.7 percent to 15.2 percent).Joint test for differences in the coefficients between the two datasets rejects the null hypothesis of equivalence (p-value of .043).
Table 13 estimates models of employment in period t þ 1.In the survey data, the coefficient on labor market agency training is not statistically significant and is 2.4 percent.In the administrative data, the coefficient rises to 9 percent and is 550C.R. Bollinger and I.V. Tasseva and are not shown: UA with missing/imputed values; ACLC false -, true þ, and with missing/imputed values; and job training with missing or n/a values.Standard errors clustered by individual and shown in parentheses.p-values shown next to standard errors and based on a two-tailed significance test.

Table 2 .
Misreporting of unemployment benefits and earnings.

Table 3 .
Combinations of income receipt, given administrative receipt.
Each row provides the proportion of those in one administrative category reporting benefit and income receipt in the survey.UI ¼ unemployment insurance benefit; UA ¼ unemployment assistance; ACLC ¼ assistance for covering living costs; earn.¼ earnings.Missing values (.) indicate cells with too few cases (cannot be disclosed).Numbers on the diagonal are highlighted in light gray.Otherwise, numbers with a value of 10 percent or more are highlighted in dark gray.Observations with missing/imputed administrative/survey values are excluded.Sample is restricted to those aged 16þ.
Source: Own calculations with the SILC.Note:

Table 4 .
Probability of reporting the unemployment insurance (UI) benefit in the survey, conditional on receiving it.

Table 5 .
Probability of reporting the unemployment assistance (UA) in the survey, conditional on receiving it.
reporting UA.Unlike with UI, UA recipients can send follow-up applications to extend benefit duration, and hence be on UA for long spells.

Table 5 .
Table 6 provides information on how the level of reporting (the exact Euro amount) is associated with reporting of different benefits.It compares the mean survey and administrative amounts Continued.Source and note: See table 4. Observations with missing/imputed administrative/survey UA or earnings are excluded.Cells with too few observations cannot be disclosed and are not shown: ACLC false -, true þ, and with missing/imputed values; and job training by other institutions and with n/a values.

Table 6 .
Mean survey and administrative amounts by benefits true positive, false negative, and false positive.

Table 7 .
The mean of survey-admin earnings, conditional on overreporting earnings and receiving benefits in period t while not receiving benefits in period s.

Table 10 .
Returns to education: log-earnings regression.Note: OLS estimation of log-earnings regressions.Sample is restricted to those aged 19-64; and based on survey versus administrative information for full-year and full-time employed or unemployed, but who would have worked the full year otherwise.Education refers to the highest level of education achieved (see table 9 for details).Columns (1) and (3) are based on survey earnings and columns (2) and (4) on administrative earnings.Observations with imputed administrative/survey earnings, UI, UA, or ACLC are excluded.The controls include: age group (5-year bands), region, if a civil servant, country of birth, proxy interview, month of interview, survey wave, interaction between wave and year, interview type, and if the same interviewer as last year.Standard errors clustered by individual and shown in parentheses.p-valuesshownnext to standard errors and based on a two-tailed significance test.566C.R. Bollinger and I.V. Tasseva Source: Own calculations with the SILC.

Table 11 .
Returns to education: log-earnings regression (sample with both positive administrative and survey earnings).Source and note: See table10.Sample is further restricted to those with both positive administrative and survey earnings.Columns (1) and (