Predicting poverty trends by survey-to-survey imputation: the challenge of comparability

Poverty in low-income countries is usually measured using large and infrequent household consumption surveys. The challenge is to ﬁnd methods to measure pov- erty rates more frequently. This study validates a survey-to-survey imputation method, based on a statistical model utilizing consumption surveys and light surveys to measure changes in poverty rates over time. A decade of poverty predic- tions and regular poverty estimates in Malawi provides a unique case study. The analysis suggests that this modelling approach works within the same context given that households’ demographic composition is included in the model. Predicting poverty using different surveys is challenging because of different aspects of comparability. A new way to account for seasonal coverage strengthens the model when imputing for surveys covering different seasons. It is important for national statistics ofﬁces and supporting agencies to prioritize maintaining consistency in the way data are collected in surveys to provide comparable trends over time.


Introduction
To monitor and assess how economic development and national policy interact and affect poverty, regular and frequent poverty measurement is essential. The standard approach to estimating the poverty rate is based on comprehensive survey data on households' detailed consumption. Such surveys are costly, time consuming, and are often undertaken only every fourth or fifth year. There is a need for cheaper and simpler methods that can be used to assess poverty levels more frequently. Different methods have been developed depending on the specific objective and the availability of data, see Dang (2020) for a systematic overview.
The National Statistical Office (NSO) in Malawi adopted a survey-to-survey imputation approach to fill the gap in national poverty statistics as follows: Data from the Integrated Household Survey 2004 1 (IHS2) was used to develop a regression model for household consumption per capita with a set of key household-specific variables, called 'predictors'.
The predictors were collected in annual Welfare and Monitoring surveys (WMS) and used to estimate the consumption levels which subsequently were used to impute an overall poverty rate. The poverty rate trend developed using this method suggested a gradual improvement. In 2004, the official national poverty level (calculated directly from the IHS2-survey) was 52%. Using the NSO's imputation method, the poverty rate gradually decreased to 39% in 2009(NSO of Malawi, 2010. The trend is consistent with an increase in real GDP per capita and an increase in production of maize, the main staple food. On the other hand, official poverty estimates for 2010 2 based on a new Integrated Household Survey (IHS3) showed that poverty had hardly improved since 2005. This puzzle raised the question: did the decreasing poverty trend predicted by the model reflect real changes in Malawi, or was the model wrong?
The answer to this question is important for policy makers and statisticians in Malawi. But the answer is also of practical interest for the international community as such approaches are increasingly applied and are potentially useful for annual reporting on sustainable development goals (SDGs). In a recent review of poverty imputation techniques Dang et al. (2019) calls for '. . .validation studies to provide richer evidence on contexts where these methods may or may not work, or how well these methods work'.
The purpose of the present study is to assess and improve a method to provide comparable poverty estimates over time. The case study from Malawi provides, to the best of our knowledge, a unique application of the usefulness of the imputation approach.
An approach to estimate a poverty measure by using a regression model was proposed in an influential article by Elbers et al. (2003). Their approach, also referred to as 'poverty mapping', combine census data with a regression model estimated using data from an expenditure survey. The poverty mapping approach has been modified and used for surveyto-survey imputation in several contexts. Whereas the poverty mapping approach produce fine grained poverty maps, survey-to-survey imputation produce poverty estimates at national and sub-national levels. Both methods rely on a well-defined model with stable parameters. The broad conclusion from validation of the survey-to-survey imputation method using comparable consumption data from Vietnam and Inner Mongolia is that the approach appears to work (Christiaensen et al., 2012). The same conclusion was reached in a validation study using seven households budget surveys from Uganda (Mathiassen, 2013) and in a study by Dang and Lanjouw (2018) using two comparable consumption surveys from India. On the other hand, Newhouse et al. (2014) found that a similar approach fails when imputing poverty from household budget surveys into labour force surveys using data from Sri Lanka. They argue that for such a set up to produce reliable poverty estimates, a welfare tracking survey system to collect predictors in a consistent way should be established. It is exactly this type of system that was attempted in Malawi.
Because of the uncertainty around the model-based predictions for [2005][2006][2007][2008][2009], which arose after the comprehensive IHS3 2010 survey, the Malawi NSO stopped calculating poverty rates using the WMS conducted after 2009. However, after 2009, new IHS and WMS surveys are available, and by using these data, it is possible to calculate poverty trends using official poverty rates and model-based predictions for the period 2004-2014. Data sets from nine surveys are used in this analysis to validate poverty trends for 2004-2014. Whereas household budget surveys like the IHS cover a full 12-month period, lighter surveys (lighter with respect to the size of the questionnaire) such as the WMS, normally cover only a part of a year. The seasonality element, however, was not considered in the previous estimations of the poverty rates trends in Malawi. The months that were covered by the WMS generally represent the harvest period of the year, which is likely to result in lower poverty rates. A key contribution of this paper is the development of a methodological approach that accounts for seasonality in the modelling framework.
Further, the paper examines whether some explanatory variables are more important to include in the models than others. This is of practical interest for selecting variables and is important if some are more affected by questionnaire design than others. A survey experiment was undertaken using the 2013 Integrated Household Panel Survey (IHPS) in Malawi. The experiment was designed to measure the effect of asking the same question in different questionnaire contexts (Kilic and Sohnesen, 2019). This experiment was a response to the poverty puzzle in Malawi, and the results suggested that the downward trend predicted by the WMS surveys were, at least partly, due to differences in the survey questionnaires.
Although the questions asked to capture the predictors are formulated in the exact same way, the WMS questionnaire is much shorter than the IHS, and the questions are not followed up with additional probing. Thus, both responders' fatigue, as well as the elaboration on questions can affect the answers, see Lavrakas (2008) for a review of this theme. To reveal possible inconsistencies in the predictors over time, this paper investigates trends in all underlying poverty predictors searching for unlikely and systematic patterns that can be assigned to the setting in which the predictors were collected. The setting could be related to the questionnaire design as well as to the more general planning and implementation of the surveys. If the design of the survey, training of enumerators, and the organization of data collection differ, these factors are also inclined to have a bearing on the results. Although hard to observe for someone not involved in all surveys, such differences are more likely if the number of households interviewed, the purpose of the survey or type, and the degree of external support to the surveys, differ. These aspects will be used as a backdrop to understand deviations in trends.
Section 2 describes the methodology and data used. The analyses are presented in Section 3, Section 4 discusses the results, and Section 5 provides our conclusions.

The modelling framework
We use the methodology by Mathiassen (2009) that was used to predict poverty rates in Malawi. For the current work, we also use new survey data from Malawi to develop a new model. The surveys used include the household budget survey, referred to as IHS, and the light welfare monitoring survey, referred to as WMS. The method is designed to predict a population measure of poverty defined as the probability that consumption per capita is below a specific threshold (the poverty line). The empirical measure of this probability is the fraction of persons in the population with consumption below the poverty line. To predict this measure of poverty we first postulate a regression model for household per capita consumption with selected covariates as explanatory variables (or predictors). Household consumption per capita, observed in the IHS, forms the foundation for developing the regression model. In contrast, in the light surveys, only the predictors are observed. Accordingly, we can use the regression model (including the distribution of the stochastic error term) to predict the probability that consumption per capita for a given household, given the predictors for this household, is less than the poverty threshold by using information only from the WMS. Subsequently, we aggregate this probability across households by means of the WMS (using population weights) to obtain the estimate of the corresponding population poverty measure. We will refer to poverty estimated in this way as 'predicted poverty rates'.
Formally, the stages of the methodology can be described as follows: Let Y i denote consumption per capita in household i and let X i be the corresponding vector of explanatory variables (including a constant term). The regression model estimated in the first stage is given by where b is a vector of unknown parameters and g i is an error term that captures unobserved explanatory variables and is assumed to be independent of X i . Let F be the c.d.f. of g i : Let c be the threshold defining the poverty line. The probability that the individuals in the household are poor is given by The empirical counterpart of Equation (2) is the fraction of individuals in household i with characteristics X i that are below the poverty line. By means of the model in Equation (2) we can predict the probability that a household with characteristics X i is poor in the years where only a WMS is conducted. Note that both a parametric and a non-parametric approach can be used to estimate F from the IHS data. From Equation (2), it follows that the unconditional probability of being poor is given by where E denotes the expectation operator with respect to the distribution of X i . The empirical counterpart of Equation (3) is the fraction of individuals in the population that are poor, which is our population measure of poverty. From data collected in an IHS, one can obtain the fraction of individuals that are poor in the years when such surveys are conducted, we will refer to this way of calculating the poverty rates as 'actual poverty rate'. Provided the distribution F and the parameter vector b are unchanged over time, one can use the data from IHS to estimate b and F in a first stage and use Equation (3) in a second stage to predict overall poverty in the population. In the first stage analysis, it has been found that F is approximately a normal distribution with zero mean and variance r: In this case, the model above depends on the parameters b and r that are estimated in the first stage. But in general, the method outlined above is by no means dependent on F being a normal c.d.f. Since a WMS is representative, the right-hand side of Equation (3) can be evaluated by using the approximation where S denotes the sample of the WMS, n is the sample size, h i the number of members in household i, r 2 ¼ Varg i , and U is the standard normal c.d.f. The standard error of the predictor is where N is the number of individuals in a population X consisting of N H households, p i ¼ 1 if the household is poor and zero otherwise and n H is the number of households in the sample. See Mathiassen (2009) for more details.
2.1.1 Accounting for cluster effects. Households within clusters are often more similar than households in different clusters. This implies that the error terms in Equation (1) might be correlated within clusters. Although OLS will still produce consistent parameter estimates, the covariance matrix of these estimates will, in general, be biased. As a result, the variance of the predictor will be biased. To account for such cluster effects we estimate the parameter vector, b, in Equation (1) and their standard errors, using the method of generalized least square (GLS). One can still apply the approach in Equation (5) after a minor modification. The modification replaces the covariance matrix of the OLS estimates of the regression coefficients, namely r 2 ðX 0X Þ À1 , by the corresponding covariance matrix of the GLS estimates V C . Consequently, instead of VarðX ib =rjX i Þ ¼ X i ðX 0X Þ À1 X 0 i (Equation (12) in Mathiassen, 2009) the following relation will hold under GLS estimation The matrix V C is readily obtained as an output from GLS estimation packages. Accordingly, it is easy to simulate the bias and confidential intervals also in the GLS case. 3 2.1.2 Accounting for seasonality with latent seasonal explanatory variables. A key contribution of this paper is to extend the regression model to account for possible seasonal variations in consumption and explanatory variables. The corresponding regression model is given by where s indicates season, and Z is is a vector of explanatory variables that vary across seasons. Unfortunately, we do not observe Z is in all the seasons in WMS. Let $Z is denote the value of the seasonal explanatory variable of household i, season s, in the IHS. Let s s ðkÞ be defined by whereZ is ðkÞ denotes component k ofZ is . Suppose that Z is is only observed for s ¼ 1 and that s s ðkÞ is approximately constant over the years between two IHS. Then, it follows that 3 The predictor will be biased as it depends onb;d;r À Á in a nonlinear way. The biases for predictions in the current analysis have been calculated according to formula (8) above and Equations (11) and (13) in Mathiassen (2009). These biases are very small (never above 0.05 percentage points from the predictor value) and can therefore be neglected. which implies that the average of Z i1 ðkÞs s ðkÞ across households becomes approximately equal to the average of Z is ðkÞ across households. One can therefore use Z Ã is ðkÞ ¼: Z i1 ðkÞs s ðkÞ as a proxy for the unobserved component Z is ðkÞ: The error term g is in the first stage regression model (Equation 1) may be correlated across seasons. Let r 2 ¼ Varg i , where g i denotes the mean across seasons. If the error terms were uncorrelated across seasons then r ¼ 0:5r: However, this is not necessarily the case. By means of the first stage regression analysis one can use the regression model in Equation (5) to estimate r by using data from the IHS survey. The poverty predictor in Equation (4) now becomes , which implies that the poverty level is less than 0.5, then the presence of positive inter-seasonal correlation of the error term implies that the poverty measure in Equation (6) will underpredict the poverty level. On the other hand, if X i b þ Z i d > 0 the predictor in Equation (6) will overpredict the poverty level.

About the surveys
The data used in this analysis are from two IHS, one IHPS and six WMS. See Table 1 for an overview of these surveys. All surveys used representative sampling techniques and can be grossed up to the entire population of households in Malawi or to regional areas and urban/rural areas. 4 The IHS2 (2004) and IHS3 (2010) are large surveys including 11,280 and 12,271 households, respectively. The IHPS had the smallest size with 4,000 households. The questionnaires used for the IHS2, IHS3, and IHPS contain only minor changes so can be considered nearly identical. These surveys obtained detailed information about household consumption and expenditures and can be used to calculate total consumption for households, and consequently also poverty calculated by the fraction of persons in the population with consumption below the poverty line. The WMS surveys were conducted annually from 2005 to 2009, and again in 2011 and 2014. No survey was conducted in 2012. The WMS are lighter surveys that cannot be used to calculate total consumption but aim to track welfare in areas, such as education, health, employment, and asset ownership. The questionnaires for the WMSs remained largely unchanged from 2005 to 2009. There were some changes in 2011 and 2014 compared with the previous WMS questionnaires with respect to the placement of modules, but the questions related to the variables used in the imputation remained unchanged throughout. In addition, a large module on peace and governance was added to the WMS2014, plus data collection was, for the first time, done electronically by using a CAPI 5 technique. The WMS sample sizes varied a great deal, from 5,234 (2005) to 29,389 (2007) households. There were some obvious quality flaws with WMS2011 data, and this dataset was dropped from the subsequent analysis. 6 4 The framework proposed by Dang et al. (2017) provides an adjustment for differences in sample design between surveys. The validation using budget surveys and labour force surveys in Jordan estimates poverty levels close to those calculated directly from the budget survey. 5 Computer-assisted personal interviewing. 6 About 20% of the households did not report any information regarding food consumption.
Most households in Malawi are rural smallholders with a consumption pattern following a seasonal variation that roughly corresponds to the quarters of the year: The main planting season starts with the rainfall in the fourth quarter. Wild plants and green maize become available in the first quarter while the main harvest starts in the second quarter. Thus, food crops are cheap at the end of the second and during the third quarter. By the fourth quarter, stores are running out of food supplies resulting in high prices. This is also the start of the lean or hunger season which lasts into the first quarter. Only IHS2 and IHS3 covered the whole year, the fourth row in Table 1. The WMS was initially designed only to cover the months from July to September, i.e. third quarter. However, for 2008 and 2009 the fieldwork also spanned into the fourth quarter, and the 2014 survey covered the fourth quarter of 2013 and the first quarter of 2014. In Equation (7), the seasons (1-4) are defined as the corresponding quarter.
In addition to the NSO, the World Bank (WB) and Statistics Norway (SSB) were involved in implementing the different surveys, row 4 in Table 1. The WB supported the IHS surveys with respect to questionnaire design, sampling, fieldwork, and preparation of data. SSB supported all the WMSs but in various ways. The SSB support of the WMS surveys in 2005, 2006, and 2007 included questionnaire design, sampling, fieldwork, and preparation of data which was provided by a long-term, resident advisor. For the subsequent WMSs the technical support from Statistics Norway was limited to an advisory role.
A two-stage sample design was used for all surveys. For each domain (district), enumeration areas (EAs) were selected by systematic random probability-proportional to size sampling (PPS). Within each EA, households were listed, and a fixed cluster of households were selected by systematic random sampling. The sampling framework for IHS2 was the 1998 Population Census and the WMS2005 and 2006 were sub-samples of the IHS2 sample, see fifth row in Table 1. WMS2007 was attached to an agricultural census (NACAL), and while the sample frame was the 1998-Census, the second sampling units were farming households. To expand the WMS2007 coverage to the whole population, extra samples were drawn by systematic random PPS to include urban EAs and within all EAs to cover landless and urban non-farming households for the WMS. The samples in WMS2008 and 2009 are sub-samples from the NACAL. The IHS3 sample was based on the new Population Census in 2008. The IHPS2013 was a sub-sample of IHS3, where 3,246 households from the IHS3-survey were revisited in 2013. Individuals rather than households were followed. If one individual moved into another household, then that household was also sampled in IHPS. This led to an almost 20% increase in the panel sample from 2010 to 2013. WMS2014 was drawn from the Census in 2008.
2.2.1 Comparability issues. Food quantities are central to the calculation of total consumption and are often reported in non-standard units such as bunch or pail. The factors for converting non-standard units into kilograms for maize changed between IHS2 and IHS3. 7 As maize is the main food in Malawi, and food is the main component of the value

2008-Census
Note: Q denotes quarter of the year, that is Q1 ¼ 1st quarter (January-March). NACAL refers to the National Census of Agriculture and Livestock.
Source: Authors' compilation of information.

8
PREDICTING POVERTY OVER TIME of total consumption, this change may affect comparability of total consumption and thus poverty between the IHSs. IHPS (2013) used the new conversion factors for all foods. In addition, a set of new price indices to adjust nominal consumption for cost of living differences was also estimated. These two changes resulted in making the consumption and poverty status of the panel households non-comparable to the poverty estimated in IHS2 and IHS3. For this reason, and because the IHPS did not cover a full year, we do not use this survey to establish poverty models. Neither can we compare the predicted poverty numbers for IHPS to the actual poverty calculated using the IHPS 2013 survey.
The determination of the poverty line in Malawi follows the Cost of Basic Needs method (Ravallion, 1998) and was calculated using the 2004 IHS2 survey. The poverty line calculation was updated using the 2010 IHS3 and 2013 IHPS to account for changes in prices. 8 The poverty level in Malawi between 2004 and 2010 showed very little change. This was rather surprising since other economic indicators showed improvements during this period. This led to a discussion by Beck et al. (2016) about the methodology used for setting the poverty threshold. Beck et al. (2016) recalculated the poverty threshold, and also revised the set of conversion factors to convert food consumption into kilograms. In contrast to the official poverty estimates which were relatively stable, their study estimates that poverty declined by more than eight percentage points during that period.
While the methodology used for calculating total consumption in the household is important for the evaluation of whether the survey-to-survey imputation approach works, it is important to note that the absolute level of the poverty line is exogenous and will not affect the estimation. To estimate poverty over time we apply the official poverty line (c in Equation 8).

The analytical approach
For the proposed poverty measure to be efficient and reliable, the explanatory variables must have a close relationship to the household's total consumption. At the same time, the explanatory variables should be practical to collect using a light survey instrument. For example, consumption variables require comprehensive questions to collect the needed information. But whether the household consumed specific items, or not, are the type of questions that can be asked in a light survey instrument. The explanatory variables included in the Malawi-models can be divided into the following groups: core demographic variables, characteristics of head of household, education, housing characteristics, assets ownership, food consumption (yes/no for specific food items), non-food consumption (yes/no for specific non-food items), and two indicators regarding head of household's welfare. 9 In addition, controls for districts and seasons are included. The variables were selected among a large set of relevant candidates in IHS2 by using a stepwise approach. Information on the predictors was collected in WMS surveys by replicating the same phrasing of the questions as in the IHS survey. To account for potential differences such as consumption habits and reasons for this revision were that the previous factors were not considered to be accurate enough and that a significantly larger proportion of households in the IHS3 (compared with the IHS2), reported the consumption of maize flour in pails'. 8 See NSO of Malawi and the World Bank (2018) for details on how the poverty line in Malawi was calculated. 9 From the section named "subjective assessment of well-being" in the IHS questionnaires. differences in relative prices, separate models were applied for urban areas and for each of the three rural regions (North, Central, and South).
The imputation approach relies on a well-defined model with a set of stable parameters including stability in the seasonal adjustment component, plus poverty predictors that are comparable also when collected in another survey or setting. But the relationship between households' total consumption and the explanatory variables may change over time, certain key explanatory variables may not be observed, or poverty predictors may not be comparably reproduced in another survey. Each of these aspects are investigated in the following analysis.
It is important to note that due to the changes in conversion factors for maize, described above, we are careful when drawing conclusions comparing out-of-sample predictions and actual poverty levels based on the two IHS surveys. Hence, our validation approach is based on an out of sample prediction analysis within each of the IHS surveys, as well as analyses of trends over the entire period.

Results and discussion
The fit of the estimated regression models in Equation (7) based on the IHS2 and IHS3 data has a range of R 2 between 57% and 84%. See Table A1-A8 in the Supporting Material for estimation results.

Does the seasonal adjustment work?
To test the seasonal adjustment, we compare results generated within the same survey. The two first columns in Table 2 show actual poverty rates in the full samples and in season three only, respectively. The third column shows the predicted poverty rates without applying the seasonal adjustment and the fourth column shows predicted poverty rates while applying the seasonal adjustment, using the full samples to estimate the models and assuming that consumption variables are affected by seasonality. In both cases, only households visited in the third season are used for prediction. Standard errors are in parentheses. Not applying seasonal adjustments implies that we predict for the third season only, while using the seasonal adjustments means that we are predicting annual poverty rates. The results in Table 2 show that applying seasonal adjustments makes a difference, sometimes a large difference, and this brings the predictions closer to the annual actual poverty rates (compared with when not applying adjustments). Thus, we can conclude that applying the seasonal correction provides a better prediction for the annual poverty rate.

Are some explanatory variables more important than others?
By systematically excluding groups of explanatory variables from the models and comparing the results, we assess the specific predictors' impact on the overall prediction. In this analysis, each of the IHS samples for the four geographic strata was randomly divided into two subsamples with the same number of households. Model parameters were then estimated from one subsample and were used to predict poverty rates for the other subsample of the same survey, and vice versa. In this way, we can compare predicted poverty rates to actual poverty rates in the exact same context. The results shown in Table 3 are based on the average of the two subsamples. Column 1 shows the difference between the actual poverty rates and the model including all variables, illustrating that the models work well within the same context. Columns (2)-(6) show the differences between the actual poverty rates and the predicted poverty rates when excluding one or more groups of explanatory variables. Excluding demographic variables (Column 6) has the largest impact on predicted poverty rates, especially in rural areas, by systematically predicting lower poverty rates compared with the actual level.
No other variable group has such a systematic impact on the overall predicted poverty rate (see Columns 2-5). One explanation regarding the importance of household size is that there is little heterogeneity among households, especially in rural areas. If that is the case, the quality of housing, consumption patterns, assets, and education would not be enough to differentiate between poor and non-poor households. Rather it could be the share of breadwinners versus dependents, that is the most important variable for explaining the variation in poverty rates across households. It is worth noticing that in the model for Rural North, where excluding demographic variables have the largest impact, fewer explanatory variables were identified to significantly contribute to the model (compare 13 explanatory variables in the North-model to between 17 and 20 variables in the other models). Figure 1(a-d) shows the predicted poverty rate trends for the four geographic areas, using models estimated from the IHS2 and IHS3 data. The figures for IHS2, using the model estimated by the IHS2 data, are not predictions, but actual poverty rates as calculated directly from the survey, and similarly for IHS3. As all seasons were covered in IHS2 and IHS3 it is not necessary to apply a seasonal adjustment of the predictions based on data from these surveys. Adjusting for seasonality produces less of a decline in the poverty rates compared with the official trends in poverty that were published using the WMSs up to 2009 (shown in the Supporting Material, Table A9). 10 This is not surprising as the seasons covered in the WMS2005-WMS2009 data are expected to reflect the relatively good seasons of the year. 10 Some variables originally included have been removed from the model: Two expenditure variables (expenditure for sugar and cooking oil) were taken out as the CPI used was questioned, cell phone was taken out as it is not considered a stable poverty predictor and whether household paid for public transport was removed as the instructions on how to ask the question had changed in the subsequent surveys. These changes had little impact on the predictions.

Poverty trends
The tables with the predictions and standard errors are found in Tables A10 and A11 in the Supporting Material.
In Rural North, poverty predictions based on the WMS survey data show a poverty trend that is gradually declining with the lowest levels attained during 2009 and 2014. The poverty predictions for 2010 (IHS3) and 2013 (IHPS) are markedly higher than the 2009 and 2014 levels. This same basic pattern is seen for Rural Central and to a slightly lesser degree for Rural South. For urban areas, the poverty rate predictions based on the WMS surveys suggest a sharp decline in poverty from 2006 to 2007 which then remains fairly stable for the WMSs in 2008, 2009, and 2014. Again, the poverty rate estimates based on IHS3 and IHPS are higher than the corresponding predictions based on the WMS surveys.
In summary, comparing poverty rates using only the two IHSs and IHPS surveys shows stagnant poverty levels for all areas, whereas a decreasing trend in poverty rates over the period is predicted from the WMS surveys. Further, for all regions, the two models estimated from the IHS2 and IHS3 data predict the same changes/trend in poverty rates-with mostly small differences in the predicted levels. This is illustrated by the t-values of the difference in the prediction between the two models, see Table A12 in the Supporting Material, where only 3 out of the 36 cases differ at a significant level.

Trends in poverty rate predictors
Trends in all types of poverty rate predictors used in one or more of the models are explored to determine whether specific poverty predictors signal inconsistent and unlikely changes over time. Based on the results of these poverty predictors trend analyses, models are estimated excluding the most inconsistent variables. The resulting trends from the reduced models are discussed against the trends that include the full model.

Demographic variables.
Explanatory variables based on the composition of the household, such as the total number of members and dependency ratio, are variables that we do not expect to fluctuate. Table 4 shows the 99% confidence interval for household Source: Authors' calculation.
size. The intervals are overlapping for the three first subsequent surveys, however, from 2007 and onwards household size significantly differs for the adjacent surveys, and there is no trend in any directions over time. This is not systematically assigned to type of survey. The IHS2 and IHS3 numbers are, however, not significantly different from each other, and the estimated average household sizes for these 2 years are lower than in most of the other surveys. The household size estimate for the IHPS is, however, markedly higher than in the two IHSs. Calculation of demographic variables requires a clear definition of who is to be counted as a household member. 11

Asset variables.
Of the assets included in the predictions, only those likely to be owned by wealthier households (TV and refrigerator) are steadily increasing. Ownership of There are some differences with respect to the instructions given for how to classify people that were away from the household part of the time. Examining questionnaires, enumerator manuals, and syntaxes do not provide a clear conclusion for how this was taken care of in the various surveys. It seems, however, that household head was usually counted as a member irrespective of whether he/she was away for more than 9 months. To which extent the other members are classified as members even if they were away from the household most of the time is not clear, and the different supervision training and teams may have interpreted how to account for members partly away from the household differently.
less expensive assets are, in general, lower or about the same, when comparing the beginning and the end of the period (Online Appendix Figure A1). There seems to be a decrease in the ownership of radios-but not a smooth trend. 2007 has the highest value and 2014 the lowest value. This decrease may be associated with a steady increase in cell phone 12 and TV ownership over the period, as these provide other means for receiving information and music. Ownership of bicycles varies greatly; between 38% and 59% in the period. The overall trend in ownership of beds is stable, while clothes iron ownership has been decreasing since 2007.

Housing standard variables.
There is a slight increase in the percentage of population using electricity for lighting over the period (Online Appendix Figure A2). Quality of floors and roofs seem to steadily improve, since the percentages with poor quality of these housing conditions decrease over the period. There is only a small decrease in the percentage whose main source of cooking fuel is firewood. The data from the WMS2009 seem to be at odds with the other survey results with respect to quality of floor, roof, and electricity: which is not credible. Persons per room in households also vary greatly and not systematically with time or type of survey. There were some differences in the instructions regarding how this question was asked in the different surveys which may affect these outcomes.

Education variable.
Educational qualifications in seven categories 13 for all household members above 5 years are reported in all surveys. The average maximum household qualification is shown in Online Appendix Figure A3, zero denotes that no education certificate was achieved among the household members and six denotes that at least one person in the household holds a post graduate degree. There is a trend towards higher education in the period.

Consumption variables.
There are large differences in the percentage of households purchasing toothpaste in the past month (Online Appendix Figure A4). It seems to be a systematic difference between the two survey types, with a much higher percentage reporting purchase of toothpaste in the WMS surveys. Part of this can be explained by seasonalitybut far from all. The purchase of toothpaste in the IHS surveys varies with 10 percentage points across the four seasons. 12 The rate of ownership of cell phones increased from less than 5% in 2004 to almost 55% in 2014. 13 (0) None, (1) Primary School Leaving Certificate, (2) Junior Certificate Examination, (3) Malawi School Certificate Examination, (4) Non-University Diploma, (5) University Diploma Degree, and (6) Post-graduate Degree The percentage of households that consumed various food items in the last 7 days before the interview, are likely affected by seasonality and only season 3, covered in all but one survey, is shown (Online Appendix Figure A5). WMS2014 did not include this season and does not appear in the figure. In general, there seems to be a tendency towards increased food consumption. However, there are ups and downs, particularly in WMS2007 and WMS2009 with reporting of relatively high consumption rates.
There are also large variations in the purchase of men's clothing and shoes (to both genders) in the last three months (Online Appendix Figure A6). The WMS2009 level is much higher than the others and WMS2007 is also high. There is no overall trend throughout the period and the two poverty predictors closely track each other.
There are some obvious differences between the IHSs and the WMSs, especially with respect to how the consumption of food and non-food variables were captured. For example, in the IHS2, IHS3, and IHPS, households were asked in detail about their consumption of more than a hundred food items. First, they were asked whether they had consumed the specific item. If yes, the question is followed up with questions on amount, price, and source (purchase/gift/own production). On the other hand, in the WMS surveys, the households are only asked whether they consumed 11 specified food items. The question on whether they consumed the food item or not is phrased in the same way as in the IHS surveys; however, there are no follow-up questions regarding quantities or prices. The situation is similar with non-food items. 14 3.4.6 Household head's welfare variables. There is no clear trend regarding whether the household head sleeps with a blanket and sheet in the cold season over time (Online Appendix Figure A7). This prevalence is, however, much higher in 2007 compared with the other years. Similarly, there is no systematic pattern, with respect to type of survey, in the number of clothes the household head owns. This variable varies greatly, and the values are particularly high in 2007 and 2014. These questions are not as well established, as are the other predictors included in the models. Thus, one hypothesis is that the enumerators may not have been sufficiently trained to ensure a consistent approach for asking these questions.
In summary, exploration of the trends in poverty predictors reveals some systematic patterns across the surveys. There are specifically three sets of variables which show systematic patterns and, therefore, merit exclusion in further model building. Changing instructions in how to count number of rooms is likely to have affected the variables 'persons per room,' the two welfare variables regarding household head vary a lot across the WMSs, and the binary non-food consumption variables tend to be systematically lower in the IHS surveys compared with WMS surveys. Kilic and Sohnesen (2019), also found that particularly food and non-food consumption, as well as the same welfare variables, were affected by the questionnaire context. Based on the current analyses, as well as the earlier results by Killic and Sohnesen, the conclusion to exclude these sets of variables from a predictive model is strengthened.
Other patterns observed from the predictor trend analysis are related to specific surveys. Two surveys, WMS2007 and WMS2009, stand out as 'good years', i.e. with predictor 14 Note: clothes and shoes are made up of an aggregation of questions. For example, the households were asked whether they bought any boy's shoes, any girl's shoes, any men shoes, or any women shoes. The poverty predictor will be one if they bought any of these shoes, and zero if not.
values signalling improvements compared with the adjacent surveys. Some of these high values are particularly unlikely; specifically, the improvement in housing material and electricity access which is depicted only in the WMS2009 and not afterwards. According to UN (2008) non-sampling errors may increase by sample size. From 2007 and onwards, the WMS samples were large. WMS2007 is the largest survey with almost 30,000 households and WMS2009 is the second largest at almost 21,000 households. WMS2007 was conducted as a part of an agricultural Census (the NACAL). The NACAL included several visits and the WMS was the sixth out of 10 modules which may have caused respondent fatigue. From 2008 to 2014, the sample sizes remained large (see Table 1).
In addition, there have been challenges in consistently capturing demographic variables. As shown, the demographic variables have a key role and excluding them systematically lowers the predicted poverty level in rural areas. In the next analysis, we first exclude demographic variables from the trend analysis. And in the second trend analysis, we exclude rooms, the two welfare variables related to head of household, and all consumption variables. This leaves the demographic variables: assets, housing, education, and geographic controls. We refer to the models using only these explanatory variables as the 'reduced models'.

Poverty rate trends with reduced set of poverty predictors
When removing the demographic variables from the models, an even larger reduction in poverty is depicted for the WMS and IHPS surveys than shown when removing these variables from the analysis predicting poverty for an IHS survey, see Table 5. A reason for the stronger impact could be that the surveys generally have a higher household size which is associated with higher poverty rates. Figure 2(a-d) presents the poverty rate trends using the reduced models. 15 With the reduced variable model, poverty declines in Rural North up to 2009-but not as much as when all variables are included in the model, as seen in Fig. 1(a). The reduced variable models predict higher poverty rates using the WMSs from 2007 and onwards, compared with the full model. Poverty rates predicted using IHS2, IHS3, and IHPS are nearly unchanged.
There is less change in poverty rates in Rural Central over the survey period with the reduced variable model. Poverty rates predicted using IHS2 and IHS3 reduced models are higher for WMS2007-WMS2014 compared with the respective full models.
For Rural South, the reduced variable models reveal fewer changes in the poverty rates over the same period. Compared with the full models, the poverty rates predicted by the reduced variable model is higher using the WMS surveys from 2007 and onwards.
For urban areas, the reduced variable model predicts less of a change in poverty compared with the full model, by producing a higher poverty rate estimate using most of the WMS surveys. The predictions onto IHS2 and IHS3 are the same using the reduced and the full models.
Online Appendix Table A13 shows the t-values for the differences between the poverty rates predicted by the full and the reduced IHS2 model and the full and the reduced IHS3 model, for each survey year and for each geographical area. There are no statistically significant differences for IHS2, WMS2005, WMS2006, WMS2008, IHS3, and IHPS. The confidence intervals are, however, in many cases large. Several of the predictions for the rural regions in WMS2007, WMS2009, andWMS2014 differ significantly depending on which model, reduced or full, is used. In summary, we find that omitting the variables from the model hardly affects the poverty rates predicted for the IHSs and IHPS, whereas poverty levels increase when using the reduced models for most WMSs. It is, however, interesting to note that the predictions for WMS2005 and WMS2006 are less affected by omitting the variables from the model in contrast to the other WMS surveys.

Discussion
Malawi is, to the authors knowledge, a unique case where the statistical office integrated survey-to-survey imputation of poverty rate trends into their statistical reporting. This is a valuable case study also for the larger international community, as the challenges in establishing poverty trends are likely not unique to Malawi.
There are many non-sampling errors that can affect comparability of poverty rates over time. In addition to whether the poverty rate is calculated directly from a survey or predicted by a model, differences in methodology including survey instruments, fieldwork, and analysing the data, can all affect the resultant poverty figures. In Malawi, as in other countries, changes in survey implementation take place and even surveys of the same type are seldom strictly comparable. It is not possible to quantify the effect that each of these different factors have on the final poverty figures as there is no blueprint of the 'true' trend, but this analysis of Malawi provides us with several insights.
The results suggest that the survey-to-survey imputation approach works when conducted within the same survey. This is, however, not sufficient to assess whether the model parameters and the seasonality adjustment are stable over time. Stability is a critical assumption because as time passes one may expect the relationship between the predictors and consumption per capita to change. Other studies, referred to in the introduction, Christiaensen et al. (2012), Mathiassen (2013), Newhouse et al. (2014), Dang and Lanjouw (2018), support that survey-to-survey imputation approach works when predicting over time, onto an 'identical' survey type. In Malawi methodological changes occurred between IHS2 and IHS3 that may have affected the comparability of the consumption aggregate, and thus the validation does not rely on the comparison between these two surveys. Another indication of parameter stability is provided by comparing the trends predicted by the two IHS models. The trends are the same within each of the four geographic areas in the 10 years period, whether they are based on a 2004 or a 2010 model. It is important to keep in mind that the stability assumption is likely to depend on the context. And it is more likely to hold in cases where the change in the poverty rate is small (like in Malawi) rather than where poverty rates are changing substantially. It is more likely to be violated as time passes between the surveys. However, an appropriate test of the stability assumption would need to compare each of the estimated model parameter from the IHS2 and IHS3 models, using comparable welfare aggregates, this is not feasible for this study.
The new addition to the methodology, to correct for the fact that the WMSs and the IHPS only cover a shorter period and not a whole year as in the IHS surveys, also works well when tested in the same context. When applied to the WMS surveys, there is still a clear downward trend in poverty over time, although less pronounced than the trend not corrected for seasonality that was published by the NSO. There is, however, a clear break

18
PREDICTING POVERTY OVER TIME in the trend with higher predicted poverty rates for IHS3 and IHPS compared with the trend predicted by the WMS surveys.
Part of the reason for this break in trend may be assigned to differences in the way the IHS and IHPS versus the WMS questionnaire were designed. Some explanatory variables, dichotomous consumption and welfare predictors, seem to be influenced by the survey tool. There is a relatively large impact on the model-based predictions for the WMS surveys from 2007 and onwards, when excluding these variables from the models. At the same time there is hardly any impact on the model-based predictions for the IHS and IHPS surveys when excluding the same variables. This gives support to the hypothesis that the respondents report 'overly' positive on such variables in the light survey tool compared with the IHS and IHPS, everything else equal, confirming the finding in Kilic and Sohnesen (2019). Although it is interesting to note that this hypothesis does not seem to hold for the two first WMSs.
Excluding the variables that are most likely to be systematically affected by the survey tool results in a much flatter development in poverty. And in rural Central and rural South, the reduced models confirm the unchanged poverty level as given by the IHS and IHPS surveys. It may, however, be argued that a reduced model without some variables will be less able to pick up fluctuations in poverty.
An analysis of the predictors also reveals confounding trends that are not likely attributed to questionnaire design. In WMS2009, there is a relatively high proportion of households that report using electricity as the main source of light. The figure is out of the trend and significantly different from the other surveys. Access to electricity in the house is not a variable that is expected to fluctuate from year to year. In addition, two other poverty predictors (quality of roof and quality of floor in the dwelling) are also not expected to change quickly. These survey results indicate household improvements in 2009 that are out of the trend observed in the other years. The systematic high levels of several poverty predictors in a single survey, which are not expected to fluctuate, suggest that implementation of these variables in the surveys differed. An even more worrying problem is revealed by the inconsistent measure of household size.
The number of household members does not vary significantly between IHS2, WMS2005, WMS2006, and IHS3, but varies between all other adjacent surveys. The roster module is comprehensive and an explanation for this variation could be that the training of the enumerators was not done consistently in the various surveys. Different NSO surveys were sponsored by different funding organizations, including the government of Malawi, the World Bank, and Statistics Norway. The composition and staffing of the field work supervision teams changed from one type of survey to the next. Hence, the application of the rules for when to include a household member may have varied.
Another reason for the inconsistency may be related to differences in the sampling frame approach. WMS2007 was a part of a national agriculture census (large survey). Although based on the 1998 Population Census sampling-frame, as were the previous surveys, the selection of farming, non-farming, and urban households was done in separate steps, which was different than in previous surveys. This is also the point in time when the household size estimates started to fluctuate. There are, however, unsystematic variations in household size estimates when comparing the WMS2007, 2008, and 2009, although these surveys are relying on the same sampling (WMS2008 and WMS2009 were subsamples of WMS2007). Differences in the sampling framework should be adjusted for by the sampling weights, while the household size variables suggest that the weighting scheme did not work.
Household composition, such as number of household members in different age groups, and household size are critical variables to include in the model. Without such variables, the prediction for the same setting is systematically lower for the rural areas. Excluding demographic variables in the estimation of the poverty trend analysis lead to substantially lower estimated poverty rates in all geographic areas and for all years.
In addition to the challenges that different questionnaire design pose, the analysis suggests that there are other differences affecting comparability of data within the WMS survey program. Whereas the trends in predictors and poverty for WMS2005 and WMS2006 are consistent with the IHS surveys, the results for the remaining WMSs are less credible. Three main changes took place. The external support for the practical implementation of the surveys was gradually withdrawn while at the same time the sample size grew much larger over time, and therefore also the field operation. This suggests that there is indeed a tradeoff between reduced sampling errors and increased non-sampling errors related to sample size. In addition, the WMS2007 was based on an agricultural sampling frame with additional non-farming and urban households. The same frame was used for 2008 and 2009. These changes took place in parallel with the observed unexpected variations in the data provided from the surveys.

Conclusion
This paper set out to investigate whether survey-to-survey imputation can be used to provide estimates of changes in poverty rates over time, using surveys of different types and with different seasonality coverage. The seasonal correction developed seems to work well. An important takeaway, however, is that there are challenges in collecting comparable underlying predictors using other survey instruments. More attention should be given to design a consistent system for collecting and analysing household survey data. It is fundamental to focus on quality in all phases of a survey, and to stick to the same methodologies and trainings for the different surveys. This is a real challenge when the NSO is dependent on financial support from stakeholders that want to have their own influence on questionnaires and implementation. Capacity development within the NSO should be central in the support given to all surveys to ensure that the knowledge and ownership of the survey approach and method is anchored within the national statistical institution, which is the one consistent stakeholder involved in all surveys.