Validation of Maternal Report of Receipt of Iron–Folic Acid Supplementation during Antenatal Care in Rural Southern Nepal

ABSTRACT Background Coverage of iron–folic acid (IFA) supplementation is a key indicator for tracking programmatic progress within and across countries. However, the validity of maternal report of this information during household surveys has yet to be determined. Objectives This study aimed to examine the validity of maternal recall of receipt of IFA supplementation during antenatal care (ANC) and factors associated with accuracy of maternal recall. Methods A longitudinal cohort design was employed. The direct observation of the IFA received during all ANC visits at the 5 study health posts served as the “gold standard” to the maternal report of IFA received during the postpartum interview. Individual-level validity was assessed by calculating indicator sensitivity, specificity, and AUC. The inflation factor (IF) measured population-level bias. A multivariable log-binomial model was used to assess factors associated with accurate recall. Results The majority (95.8%) of women were observed receiving IFA during pregnancy. Women overreported the number of IFA tablets received compared with what was observed during ANC visits (mean difference: 45 tablets). Maternal report of any IFA receipt was moderate (AUC = 0.60; 95% CI: 0.50, 0.71), and population bias was low (IF = 1.01). However, the individual-level validity was poor across the 7 IFA tablet count categories; the AUC for categories ranged from misleading to moderate. Driven by the trend of maternal overreport, the IF indicated that maternal report drastically underestimated the coverage of lower tablet categories and overestimated the coverage of higher tablet counts. Accuracy of maternal report was not associated with months since last ANC observation nor any maternal characteristics. Conclusions Maternal report of the amount of IFA supplementation received during pregnancy produced extremely biased population coverage and performed poorly to moderately for individual-level validity. It is imperative to improve this indicator because it is used in global frameworks and national program planning.


Introduction
An estimated 38% of pregnant women globally are anemic; half of these cases are attributed to iron deficiency (1). WHO defines anemia during pregnancy as a hemoglobin (Hb) concentration <110 g/L, although in the second trimester this cutoff decreases to a Hb concentration of 105 g/L (2). WHO recommends daily supplementation of 30-60 mg of elemental iron and 400 μg of folic acid during pregnancy to prevent maternal anemia, puerperal sepsis, low birth weight (LBW), and preterm birth (3). Other individual studies have demonstrated an association between iron-folic acid (IFA) supplementation and reductions in postpartum hemorrhage; maternal, perinatal, neonatal, and <5-y mortality; childhood anemia; and improved cognitive development (4)(5)(6)(7)(8)(9)(10)(11).
Coverage of IFA supplementation during pregnancy is commonly collected through large population surveys such as the Demographic and Health Survey (DHS). Coverage is defined as the proportion of a population in need of an intervention that receives the intervention (12). The DHS collects data on antenatal care (ANC), including IFA supplementation, through maternal report of services received during a woman's most recent pregnancy. DHS version 7 (DHS7) asks about the most recent live birth in the past 5 y. DHS8, published in June 2020, asks about iron supplementation for the most recent live birth or stillbirth in the past 3 y (13). The  asks, "During this pregnancy, were you given or did you buy any iron tablets?" and if the women reports yes, it then asks, "During the whole pregnancy, for how many days did you take the tablets?" (14). Using the DH7 parameters, IFA coverage is defined as the proportion of women who had a live birth in the past 5 y who consumed any IFA tablets during that pregnancy. The policies for the number of days of IFA supplementation during pregnancy varies across countries; examples include >90, >100, and >180 d of IFA consumption during pregnancy (15). The most recent policy in Nepal, National Anemia Strategy 2002, stipulates 180 tablets antenatally and 45 tablets postpartum for a total regimen of 225 tablets. The 2016 coverage estimates in Nepal show that although 90.2% of pregnant women reported receiving or buying any IFA during their last pregnancy, only 42% of women reported consuming ≥180 tablets during the pregnancy. Forty-six percent of pregnant women in Nepal are anemic, indicating that the need for IFA supplementation is great in this population (14).
There is a growing body of evidence examining maternal report of services received during antenatal, labor, and postpartum care and care-seeking for childhood illness that reports a range of indicator validity (16)(17)(18)(19)(20)(21). There are limited data of the validity of maternal report of nutrition coverage indicators, including IFA supplementation. The IFA coverage indicator is a core process indicator of the Global Nutrition Monitoring Framework, an indicator for the Countdown to 2030, and is used by countries to inform programming and policies (15,22). Therefore, it is essential that its measurement is valid.
This study's primary objective was to examine the validity of maternal report of IFA supplementation receipt during ANC and factors associated with accuracy of maternal report. A secondary objective was to examine maternal characteristics associated with receiving or buying IFA from other sources than the government health post.

Study site
This study was conducted from December 2018 to November 2020 in part of the Nepal Nutrition Intervention Project-Sarlahi (NNIPS) study area. In conjunction with our local study team with >30 y of experience conducting research in Sarlahi, 2 municipalities were chosen because of their demographic composition and to limit the bureaucratic permissions required, as Nepal's health system is decentralized to the municipality level. The Sarlahi district is located in the southern Terai region, where it borders the state of Bihar, India. Subsistence farming is the primary economic activity in the district, although income from migratory labor to the Gulf states has become increasingly common. Approximately 60% of the female population in Sarlahi cannot read or write, and 69% are married between the ages of 15 and 19 y (23).

Study population, design, and data collection
Pregnant women were recruited, consented, and enrolled at 5 public health posts in the district. The health posts were chosen based on ANC case load and geographic location, in terms of accessibility for both the clients and our study staff who had to travel to the posts daily. Women who were attending their first ANC visit who were married, aged ≥15 y, and lived in the study area were eligible. Women are assumed to be married if pregnant and attending ANC, and it would have been culturally inappropriate to ask to confirm marital status, so all women enrolled are assumed to be married. Women who had already attended an ANC visit (including an ultrasound appointment), were aged <15 y, were planning to attend a nonstudy health post or to leave the NNIPS study area while pregnant, or were through 6 mo postpartum were ineligible. Participants were consented during the enrollment visit and again at the postpartum interview.
As noted previously, the DHS asks about IFA consumption during pregnancy. However, establishing a gold standard for consumption would be extremely difficult because this would require observing pregnant women every day. Therefore, the primary objective of the study was to validate maternal report of IFA received during ANC. To accomplish this, we employed a longitudinal cohort design in which the direct observation of providers giving IFA to pregnant women during ANC at the health posts served as the "gold standard" (24). The study observers were trained prior to study implementation. The observer training included videos of mock as well as real ANC visits, in addition to field training with a gold standard observer. Each observer was trained until they reached a certain level of inter-and intraobserver agreement with the videos and the gold standard observer in the field. The direct observation by the study observer was conducted using checklist of 28 items, including 1 IFA-related item, "How many tablets was the woman given?" During the direct observation of the ANC visit, which in the 5 health posts occurred in a single room, the trained study observer would record 000 for zero tablets or a number between 001 and 180.
During the enrollment visit, a demographic questionnaire was administered. At each subsequent ANC visit, the woman was given a brief follow-up questionnaire concerning care-seeking between direct observations. These follow-up questionnaires were used to determine if the woman bought or received IFA between visits (e.g., at a local pharmacy) that we did not observe; for the "gold standard," it was vital for us to observe all the services received during pregnancy. The direct observation was then compared with maternal report, which was collected ∼6 mo postpartum at the woman's home or maiti (parental home). The 6-mo postpartum questionnaire included questions asked in the same language as the 2016 Nepal DHS and about services received at the study health posts specifically. The questionnaire also collected information on socioeconomic status (SES) and pregnancy outcome.

Analysis
A target sample size of 300 women was established for the overall study based on a conservative 50% coverage for IFA receipt and the assumption that coverage of counseling topics would be lower (14). This allowed for a 95% CI with a width of 0.13 for an AUC equal to 0.50, which is equivalent to a random guess. To account for loss to followup, women who went elsewhere for ANC that could not be directly observed, and women who did not have a live birth, the study aimed to enrolled 450 women.
In this analysis, we aimed to validate maternal report of 1) any IFA receipt and 2) the number of tablets reported received (Box 1). If the woman was observed receiving any IFA tablets (tablet count of ≥1) Validation of maternal report of IFA 311 during ANC observation, this was considered receipt of "any" IFA. We collected the exact number of tablets received and then categorized it into 7 groups; 0, 1 to <30, 30 to <60, 60 to <90, 90 to <120, 120 to <180, and ≥180 tablets.

Box 1
Two measures of IFA receipt Direct observation/gold standard ("gold standard"): The gold standard of the number of IFA tablets received, established by direct observation of each ANC visit at the study health post, during pregnancy.
Direct observation/gold standard, complete follow-up: A subset of participants who never reported receiving or buying IFA elsewhere in between direct observations. This was determined using information collected via the follow-up questionnaire at the start of the second and all subsequent direct observations of ANC visits.
Reported received at study health posts ("reported received"): The number of IFA tablets the woman received at the study health posts during her entire pregnancy, as reported by the woman at the postpartum interview. This is the question for the true validation analysis.
The "gold standard" was compared to "reported received" for validation analysis. Using the follow-up questionnaire data, we were able to identify a subcohort of women who never reported receiving or buying IFA between direct observations. This subset represents a more ideal gold standard, where we are more confident that we observed all IFA received during a woman's pregnancy (Box 1). The same validation outcomes were measured in this smaller group of participants as a sensitivity analysis.
The number of tablets observed received compared with reported received was examined by scatterplot. "Don't know" responses in the postpartum follow-up interview were recorded. We constructed 2 × 2 tables to calculate the sensitivity (Se) and specificity (Sp). IFA categories based on a small number of true positive or negative observations that produce estimates with a high degree of uncertainty (95% CIs >15 percentage points) are presented but flagged for readers to interpret with caution. The AUC and the inflation factor (IF) were then calculated to assess validity. Sensitivity, specificity, and AUC measure individuallevel validity, and the IF measures population-level validity. The AUC represents the area under a plot of the indicator's sensitivity against (1 -specificity) and is defined as "the probability that the test will correctly classify one positive case and one negative case" (24). Although this measure is commonly used for cutoffs for diagnostic tests, the AUC in this case represents a summary measure of the individuallevel validity. An AUC = 0.5 would be comparable to a random guess, and an AUC = 1 would indicate perfect validity. The IF is the ratio of the study coverage (Pr), given the indicator's Se and Sp, to the true coverage (P), based on the gold standard. The study coverage is calculated by the following equation: Pr = P × (Se + Sp -1) + (1 -Sp) (25). The IF quantifies the degree to which the survey indicator over-or underestimates the true population coverage. An IF between 0.75 and 1.25 indicates low population bias, with 1.00 indicating that the estimate of coverage from the survey is equal to the true coverage (24).
Factors associated with the accuracy of maternal report were examined through a log-binomial bivariate and multivariable regressions (or a Poisson regression if the log-binomial did not converge). The response variable "accuracy" is a dichotomous variable, indicating "accurate" and "inaccurate" responses. A response is accurate if the woman's reported number of IFA tablets received at the health posts falls within the same category (as outlined previously) of the count recorded during direct observation. An inaccurate response is a maternal report of a count outside of the observed count category. Maternal age (age <20 y compared with age ≥20 y), education (none compared with any), parity (nulliparous compared with multiparous), and household wealth were included in the model. The household wealth variable was constructed by summing 11 binary household characteristics (e.g., fuel and drinking water sources) and ownership variables (e.g., number of cattle or motorcycles owned), dividing each woman's total by the number of nonmissing variables and separating this proportion into quartiles. Report of time from the last ANC observation was dichotomized to more or less than 12 mo after examining the locally weighted scatterplot smoother (LOWESS) curve. The observed number of tablets the woman received during ANC was also included in the model to examine if women can better report fewer or larger numbers of tablets. The number of observed tablets was classified as 0-60, 60-120, and >120 after the review of the LOWESS curve. Log-binomial bivariate and multivariable regressions were used to examine associations between maternal characteristics and receipt or purchase of IFA elsewhere during ANC. A P value <0.05 was considered significant.
All analyses were conducted using Stata version 14.2 (StataCorp).

Ethical approval
The Institutional Review Board of the Johns Hopkins Bloomberg School of Public Health and the Nepal Health Research Council approved the parent study.

Results
A total of 441 women were enrolled and 434 women completed the postpartum interview (Figure 1). The 7 women (1.5% of sample) lost to follow-up had moved out of the study area or had life changes, such as divorce, that did not allow the study team to contact them. There were no differences between the women lost to follow-up and those who remained in the study. There were 278 women (64%) who did not report ever receiving or buying IFA between ANC observations, which includes the 46 women who attended only 1 ANC visit. The  Table 1).
The average age of the women enrolled was 22.5 y, ranging from 16 to 41 y ( Table 1). The number of ANC visits observed per woman ranged from 1 to 14, with the average number of visits observed equal to 4.5 visits. The average SES composite score was 6.3 of the possible 11 points, indicating low ownership. Overall, 59.6% of the women reported 0 y of education. The majority of participants had a live birth; there were 28 miscarriages/abortions (6%) and 4 stillbirths (<1%).
The scatterplots in Figure 2 illustrate the differences in the "gold standard" and the "reported received." In the entire cohort (Figure 2A (Supplemental Figure 1). Reported numbers tended to be heaped, whereas observed number of tablets were more evenly spread out from 0 to >200.
The validation analyses were conducted only among women with a live birth because this is how the most recent DHS in Nepal was conducted. The analytical cohort included 402 women with live births, for 248 of whom all IFA receipt was observed. The validation results comparing the "gold standard" to "reported received" are presented in Table 2. There were 0 "Don't know" responses for the report of IFA during the postpartum interview. Validation of any IFA receipt had moderate individual accuracy (AUC: 0.60; 95% CI: 0.50, 0.71) and low population bias (IF: 1.01). Specificity was low, meaning that women who were not observed receiving IFA often reported receiving IFA at the postpartum interview. Although there was low population bias in this population, Figure 3 shows that at a lower coverage, the survey question will overestimate the true coverage.
The validity of maternal report of the number of IFA tablets was poor across all the categories of true tablet counts, or the counts measured by direct observation. For the majority of categories, the AUC was between 0.47 and 0.58, indicating a performance that was at worst misleading and at best barely better than a random guess. The exception was report of 120 to <180 tablets, which had an AUC = 0.64 (95% CI: 0.58, 0.70). At the lower values of the true tablet counts, the inflation factor showed a greater underestimation of the coverage and Validation of maternal report of IFA 313 high population-level bias, driven by the maternal overreporting trend outlined in Figure 3. As the true tablet counts increased to ≥120, the inflation factor indicated an overestimation of the coverage and, again, high population-level bias.
The "gold standard" compared with "reported received" validation analyses were also run in the entire cohort, including women who had miscarriages/abortions (n = 28) and stillbirths (n = 4). The inclusion of these women improved the validity of maternal report of receipt of any IFA (Supplemental Table  2). The AUC in this population was equal to 0.75 (95% CI: 0.67, 0.83), which is considered high individual-level validity. The specificity increased as well to 52.8% (95% CI: 35.5, 69.9%). The tablet-count validation results were slightly improved in the entire cohort, although this did not qualitatively change the interpretation. The women with live births had a significantly higher average number of visits and observed number of tablets received compared with the women with adverse pregnancy outcomes (μ 1 = 4.65 visits compared with μ 2 = 1.97 visits, P < 0.01; and μ 1 = 73.1 tablets compared with μ 2 = 16.9 tablets, P < 0.01). In fact, 50% of women with adverse pregnancy outcomes had only 1 ANC visit, and 59.4% of these women received 0 tablets, which may explain some of the differences in specificity and AUC between the 2 analyses. Restricting the analysis to the women who never reported receiving or buying IFA between observations in the entire cohort for the sensitivity analysis did not change the validity results ( Table 3). Sensitivity improved slightly for the restricted group, although it did not change the AUC or the trend for population-level bias across tablet count categories.
There were 93 women (23.1%) who accurately reported the number of tablets within the 7 defined categories. Only 9.2% of women reported an exact match of tablets observed and reported received (data not shown). There were no maternal characteristics associated with accurate report of number of IFA tablets received at the 5 study posts ( Table 4). The number of months since the last ANC observation also did not have a significant association with accuracy; the unadjusted risk showed that a lag of >12 mo is associated with a slight decrease in the accuracy RR, but after adjustment for other variables in the model, the RR increased to 1.29 (95% CI: 0.73, 2.28). The strongest association with an accurate response was with an observed count of IFA tablet receipt >120 compared with a count between 0 and 60 tablets (adjusted RR: 3.68; 95% CI: 2.21, 6.13). Those who were observed receiving 60 to <120 tablets were almost half as likely to report this information accurately compared with those receiving 0 and 60 tables, although this association was only significant in the bivariate analyses.

Discussion
This study estimated the validity of maternal report of IFA supplementation receipt during pregnancy, including the report of the number of tablets received. To our knowledge, this is the first study to assess the validity of maternal report of the number of IFA tablets received. We found that report of any IFA supplementation during ANC had moderate individuallevel validity and low population bias in a population with high coverage. However, at the population level, maternal report of the number of tablets received underestimated the true coverage at the lower range of tablet counts and greatly overestimated the true coverage at the high range of tablet counts. No maternal characteristics included in the analysis were associated with accuracy of maternal report.
We observed 95.8% of women receiving IFA during ANC. This is greater than the 2016 Nepal DHS estimate of 86.7% for Province 2, where the study site is located (14). This difference could be attributed to the fact that the DHS includes women who never attended ANC, whereas in our study all women had ≥1 ANC visit. We observed 2.8% of women receiving ≥180 IFA tablets. In comparison, the 2016 DHS reports 42% of women consuming IFA tablets for ≥180 days (14). In Province 2, the proportion was lower at 28% but is still nearly 4 times greater than what was observed during our study. However, the DHS estimate is likely an overestimation because, as we have shown in this analysis, maternal report of tablet counts >120 tended to greatly overestimate the coverage. A potential reason for the overreporting could be social desirability biasthe tendency of individuals to provide what they think the interviewer and society would consider a favorable response (26). In this case, women may want to appear to the interviewer that they received more tablets than they did in order to appear to have received more complete care. This may also explain why women who received a greater number of tablets were >3 times more likely to report accurately; the women who received a greater number of tablets did not feel compelled to overreport. In addition, another study demonstrated that interventions with high coverage tend to be overreported, driven by the logic that women assume they should have received the interventions (27).
One previous study examined the validity of maternal report of any IFA receipt using data collected from 9 service provision assessments (SPAs) and a small sample from a previous study at the same site as our study (21). The SPA validation analyses produced a sensitivity of 88.7%, specificity of 79.3%, an AUC >0.70 (their cutoff for high individual-level validity), and low population bias. However, these results are from exit interviews immediately following an ANC visit, so they are not necessarily comparable to those presented in this article. The validation results from the previous study in our study area are much more comparable: sensitivity = 86.1%, specificity = 34.3%, AUC = 0.60, and IF = 1.43. This study was population based and included women who did not attend ANC but who were provided supplements as part of the study. Their report period ranged from 1 to 2 y, which is slightly longer than ours (range: 1-22 m; mean report time: 9.1 months from date of pregnancy outcome). In addition, their coverage of IFA receipt was much lower than ours (53.7% compared with 95.8%, respectively), which resulted in higher population bias (IF = 1.43 compared with IF = 1.0, respectively), despite the lower specificity in our study.
To our knowledge, this is the first study to examine the validity of maternal report of the number of tablets received Validation of maternal report of IFA 315 during pregnancy. Maternal report of birth weight has been examined in validity studies, which is another instance of validating a numerical response rather than a "yes or no" response. A study conducted in Taiwan examined maternal report of birth weight and found that although women were able to accurately report whether their infant was LBW, the accuracy for reporting the specific weight category was low (15.9%) and women tended to overreport their infant's birth weight (28). This is slightly lower than the 21.8% of women accurately reporting categories of IFA receipt in our study. A study at our same site in Nepal had low sensitivity for maternal report of birth weight to classify LBW and of length of gestation to classify preterm birth (19). This is similar to the low sensitivity for classifying categories of IFA tablets received in our study population.
No associations were found between maternal characteristics in a multivariate model and accuracy of report of number of IFA tablets received in this study, which is similar to findings from other validation studies that report no patterns of association between maternal SES, age, parity, or education and accuracy (29,30). In contrast, other studies examining maternal report of LBW have reported parity and maternal education to be associated with accuracy (19,31). There was no observed association with accuracy and length of report period, which is consistent with other studies' findings (19,28).
A woman having any years of education was significantly associated with receiving IFA elsewhere between observations. Women with higher education have been shown to be more likely to access health services and receive higher quality care (32,33). It is possible that women with higher education were aware of the benefits of IFA during pregnancy and thus sought its receipt outside of the health facilities, where stocks can be limited. A woman being pregnant for the first time was also more likely to receive IFA between visits in our study. Higher parity is associated with decreased ANC attendance (34) and lower quality ANC (32). Therefore, women with higher parity may also be less likely to seek additional services outside of the health facility.
Given the frequent use of the consumption indicator in global nutrition tracking and national-level policy and programs, it is important to consider how the indicator is defined and measured. There are multiple factors to consider when defining the coverage indicator for the number of tablets consumed and/or received during pregnancy. First, the policies for the amount of IFA consumed during pregnancy differ across countries (15), so there is no standard "adequate" amount from a policy perspective. Second, the prevalence of iron-deficient anemia varies by country (1), representing a difference in need from a biological perspective. Finally, the mean number of ANC visits varies by country (35), meaning from a programmatic perspective there is no standard number of opportunities to provide IFA during ANC. The results of the study suggest against using household surveys to measure the coverage of the amount of IFA received during pregnancy because the amount reported received is greatly overestimated. However, there is not a clear alternative. Routine health information systems, including electronic health records, are generally rather weak in low-and middle-income countries and often focus on aggregated rather than individual data (36). Their use for measurement of IFA receipt is further complicated by the fact that many women received IFA from other sources, such as pharmacies, which would not be captured by these systems. Furthermore, if functional and available, these systems would capture receipt, not consumption, of IFA. Further consideration is needed to decide how to best define and measure the amount of IFA received or consumed and under which circumstances  reporting this indicator would be most useful and accurate. In addition, further research in private facility settings, semiurban or urban areas with higher education levels, and more varied IFA coverage would be beneficial to best inform the indicator development and its measurement. A strength of this study is the use of direct observation by trained study observers as the gold standard, which is the preferred standard for validation studies. Another strength is that although our 6-mo study report period is shorter than the DHS's 3-or 5-y report period, it is much longer than those of other validation studies that have used exit interviews to measure maternal report accuracy. A limitation of this study is that we observed IFA receipt only at our 5 government health posts. This presents 2 issues: our sample included only women who sought ANC, and we were unable to observe all sources of IFA receipt for all the women in the study. This may mean that our findings may not be generalizable to women who do not attend ANC at all or who attend private facilities. We attempted to address the inability to observe all IFA receipt through the use of follow-up questionnaires to identify women who did not receive or buy IFA between visits. However, this approach does rely on reports by the women that they did not go elsewhere to obtain IFA. The potential of an observer effect is a third limitation, whereby providers alter their care because of the presence of our study staff at the facility. To mitigate this possibility, during the consent process, the providers were informed that the study staff were not medically trained (and therefore would not know if the care they observed was correct or not) and would not report any findings to the provider's superiors. Furthermore, the study observers were stationed in the facility every business day for more than 1 y, so we hope that if present initially, the observer effect lessened over time. Another limitation of this study was the reliance on accurate reporting of observed IFA receipt and number of tablets given during ANC visits by the study observers in the health facilities. We believe this limitation was reduced by the rigorous training of observers.
In conclusion, the use of maternal report of any IFA receipt during pregnancy had moderate individual-level validity and low population bias, meaning that the use of this indicator in surveys to measure any IFA received during pregnancy accurately estimates the population coverage. However, maternal report of the number of IFA tablets received produced extremely biased population coverage and performed comparably to or worse than a random guess for individual-level validity. Additional research is needed to assess maternal report of IFA receipt in other settings with more variable IFA coverage levels and to further elucidate reasons for inaccurate reporting of IFA tablet counts to improve the indicator for future use.