Abstract
Much aggregate social-science analysis relies upon the standard national income and product accounts as a source of economic data. These are recognized to be defective in many poor countries, and are missing at the regional level for large parts of the world. Using updated luminosity (or nighttime lights) data, the present study examines whether such data contain useful information for estimating national and regional incomes and output. The bootstrap method is used for estimating the statistical precision of the estimates of the contribution of the lights proxy. We conclude that there may be substantial cross-sectional information in lights data for countries with low-quality statistical systems. However, lights data provide very little additional information for countries with high-quality data wherever standard data are available. The largest statistical concerns arise from uncertainties about the precision of standard national accounts data.
1. Introduction
Measures of national output and income are the major social indicators used to evaluate the relative performance of countries over space and time. Richard Froyen describes economic policy in the era before economic accounts were developed as follows (Froyen, 1996):
One reads with dismay of Presidents Hoover and then Roosevelt designing policies to combat the Great Depression of the 1930's on the basis of such sketchy data as stock price indices, freight car loadings, and incomplete indices of industrial production. The fact was that comprehensive measures of national income and output did not exist at the time. The Depression, and with it the growing role of government in the economy, emphasized the need for such measures and led to the development of a comprehensive set of national income accounts.
Development of a full set of national economic accounts has been a major accomplishment of national statistical systems since the 1930s, but there is much further work needed to integrate output, income and wealth accounts (Jorgenson et al., 2006).
While economic statistics in wealthy countries have improved greatly in recent decades, statistical data are often of low quality in other countries. This is especially true for the poorest countries in sub-Saharan Africa, many of which have no reliable censuses of population and only rudimentary economic statistics. A few countries (Iraq, Afghanistan and Somalia being examples) have virtually no functioning statistical systems. One example is Somalia. While international agencies routinely produce population and output estimates, the basis for these is very tenuous. The last population census in Somalia was 1975. There is no functional statistical office, and indeed there have been few functional central governments over the past three decades. A search for statistics from a web site stating it is the government web site finds a message, ‘UNDER RECONSTRUCTION PLEASE CHECK BACK SOON’. Similar issues arise for many other African countries. The last functional population census for the Democratic Republic of Congo was 1981. With the exception of data on international trade, economic data in the poorest countries are particularly thin.
Because of the shortcoming of standard data sources, the authors undertook a project to examine measures that could supplement or substitute for measures of regional statistical accounts, i.e., those that come from standard sources on income, output and other demographic information. One place to look for alternative data is luminosity or nighttime lights. There is a large literature on both satellite-based nighttime lights measure and the use of it as a proxy for population, output and poverty (see the references in the next section). The intuitive notion is that lights data might serve as a useful proxy for economic output because it is ‘objectively’ measured, is highly correlated with output, and is universally available for the world except for the high latitudes.
Figure 1 shows a scatter plot of lights density and output density for all grid cells with positive output for 2006 (both are natural logarithms, N = 12,393). It is clear that lights and output have a strong positive correlation at high output densities, but the relationship is less apparent at low output densities. To extract the information from the luminosity data, we construct a synthetic measure of output (blending lights data and standard national economic accounts measures) and calculate optimal weights that minimize the expected error of that synthetic measure. The optimal weight on the lights-based proxy measures how much useful information lights contain as a proxy for measuring national or regional output. The present study contains further results on the accuracy of such measures for different countries and concepts.
Gross cell product (GCP) and lights data, all cells. Figure shows the scatter plot of log calibrated lights for 2006 and log of gross cell product for all cells at the 1° x 1° resolution. Output density is gross cell product (PPP in billions in 2005 international $) per km2. Luminosity density per km2 is the radiance calibrated lights for 2006. They include all grid cells (N = 12,393) with positive output and lights. The solid line is the kernel estimator using an Epanechnikov kernel and 100 grid points per kernel. (Source: Chen and Nordhaus, 2011).
Gross cell product (GCP) and lights data, all cells. Figure shows the scatter plot of log calibrated lights for 2006 and log of gross cell product for all cells at the 1° x 1° resolution. Output density is gross cell product (PPP in billions in 2005 international $) per km2. Luminosity density per km2 is the radiance calibrated lights for 2006. They include all grid cells (N = 12,393) with positive output and lights. The solid line is the kernel estimator using an Epanechnikov kernel and 100 grid points per kernel. (Source: Chen and Nordhaus, 2011).
1.1. Use of luminosity as social–economic indicators
Nighttime lights data have been gathered from satellites for more than two decades and have been carefully filtered into a series of high resolution data with observations beginning in 1992.
Scholars have made extensive use of luminosity in socioeconomic studies. For example, a search on Google Scholar found almost 3000 studies since 2000 that have used nighttime lights to study economic phenomena. Previous studies, primarily in the field of geoscience and economics, have used nighttime image data as a proxy for socioeconomic development of particular geographic areas (Doll et al., 2000; Sutton and Costanza, 2002; Ebener et al., 2005; Elvidge et al., 2007; 1997; Sutton et al., 2007; Henderson et al., 2011, 2012). Elvidge et al. (2007, 51) conclude ‘Nighttime lights provide a useful proxy for development and have great potential for recording humanity’s presence on the earth’s surface and for measuring important variables such as annual growth for development’.
In the past decade, researchers have undertaken a series of tests to support this conclusion. For instance, Doll et al. (2000) extrapolated 1° latitude by 1° longitude grid cell gross domestic product (GDP) by applying the coefficient of log–log relationship obtained at the country level, and concluded that lit-area-derived PPP–GDP grid map modeled global economic activity ‘very well’. Ebener et al. (2005) and Sutton et al. (2007) have used percent frequency of lighting to predict GDP per capita at the national and sub-national level.
Few studies have undertaken a formal analysis comparing nighttime lights data with traditional–economic development measures. Two major studies to date are Henderson et al. (2012) and Chen and Nordhaus (2011). Each of these studies prepared output proxies based on lights data and compared the lights proxies with standard output measures.
Henderson et al. (2012) examined the relative performance of lights and traditional measures for real growth rates for a panel of 188 countries. Their results were similar to those of Chen and Nordhaus for high-income countries and regions with high-quality data. However, they found lights data to be of moderate value in low-income countries for measuring growth. They concluded, for example, ‘For countries with poor national income accounts, the optimal estimate of growth is a composite with roughly equal weights on conventionally measured growth and growth predicted from lights.’ This study looked only at the time-series properties of luminosity as a proxy and did not examine the cross-sectional (density) properties.
Chen and Nordhaus (2011) developed a method for estimating whether nighttime lights measures could be used to improve estimates of economic output at the national and regional levels. This study used cross-sectional (density) as well as time-series output estimates. Additionally, they looked at a comparison not only for countries but also at a more disaggregated level (1° by 1° grid cells from the GEcon database). The regional refinement helped to remove country effects and increased the sample by a factor of approximately 100. The study examined two output concepts, the growth rate measure of output from 1992 to 2008 and annual output density measured as constant-price output per unit area, and it examined three different lights measures—raw, stable and intercalibrated lights. The tests were particularly aimed at countries and subnational regions with low-quality data systems.
Chen and Nordhaus (2011) found that lights data is likely to add value as a proxy for output for countries with the poorest statistical systems, those that receive a D or an E grade, but has very limited value added for A, B and C countries. (The rating system is described below.) This finding held at the national level and at subnational levels where data are available. The reason for the low value added of lights in high-grade countries is that the lights data have high measurement errors while the estimated measurement errors in the standard economic data are relatively small.
A major surprise in the original study was that lights data do not allow reliable estimates in regions with the lowest output densities. The reason is that level of stable lights is too low in these regions to be distinguished from the background lights (noise), so lights is set at zero in the stable lights data set. The bottom line was that lights data are presently too unreliable to be of significant use in supplementing standard data except in exceptional circumstances.
The major difference between Henderson et al. (2012) and Chen and Nordhaus (2011) was that the former found lights data useful for time series (growth) estimates for low-income countries, with an optimal weight approximately half. The reason for the difference between the two studies was primarily in the statistical identification procedure that they used to estimate the optimal weights. Chen and Nordhaus relied for identification on estimates of the error of measurement of output or output growth for conventional output measures. Henderson et al. used a signal-to-noise ratio to derive their optimal weighting of the two measures.
The shortcoming of the Henderson et al. estimates is that there are no direct measures of the signal-to-noise ratio, and that ratio is inferred from the lights data. More precisely, that study makes two assumptions about the error structure: that the errors of measurement in the lights equation and the covariance of lights and output in the lights equation are the same for all countries (these being respectively the equivalent of
in Equation (2.3) below). We know from direct estimates across satellites that the
differ across countries (see Supporting Information in Chen and Nordhaus, 2011). Furthermore, our estimates of
described below and shown in Tables 4a through 4d clearly differ markedly across country grades. Finally, unlike measurement error of standard national-accounts data, there are no prior estimates of the signal-to-noise ratio. We therefore have reasons to question the identification strategy in Henderson et al. (2012) and hypothesize that their positive results on low-data-quality countries arises from an assumption about the error structure that is inconsistent with the data.
By contrast, the Chen-Nordhaus identification strategy is based on an estimate of the precision of the estimate of national accounts measures. This concept has been addressed in various studies, particularly in a time-series framework (see the discussion below for details and references). The present article describes the Chen-Nordhaus measure in more detail below and continues to use that strategy because it has a solid basis in empirical studies of national accounts data.
The present study has two primary purposes. First, we extend the analysis by Chen and Nordhaus (2011) through lights data for 2010 and with an updated version of the GEcon data set and of updated national accounts data.
Secondly, we present a new methodology for estimating the statistical precision of the estimates of the value of the lights proxy. This second part is important because nowhere in the vast existing literature on the use of lights as a proxy is there an analysis of the precision of the estimates.
With respect to the second purpose, it is well established that the weights that are used for proxy estimates should be treated in a statistical manner (National Research Council, 2006). As we show below, the statistical model for deriving the optimal weights on conventional GDP measures and lights is underidentified and requires estimates of three parameters: the measurement error of conventional GDP measures, the measurement error of luminosity and the coefficient in the regression equation of output and lights. The challenge in the present study is that the estimates are a mixture of prior estimates of errors as well as statistically based estimates, so we need to combine both approaches. In addition, there is very little evidence on the reliability of the national accounts data outside the high-income countries.
In the remainder of the article, we first discuss the parameters we use to calculate the luminosity weight, next present the analytic model underlying the estimation, and then present the results from bootstrap and sensitivity analysis.
2. The analytic model
2.1. Nighttime lights data
The primary nighttime image data were gathered by US Department of Defense satellites starting in the mid-1960s to determine the extent of worldwide cloud cover. The data were later declassified and made publicly available as the Defense Meteorological Satellite Program Operational Linescan System (DMSP-OLS), and have been used to measure economic development of certain geographic areas as described above. All data are available for the period 1992–2010.
The raw data can be acquired in two spatial resolution modes. The full resolution data, also referred to as ‘fine’ data, have nominal spatial resolution of 0.5 km. The ‘smoothed’ data are an average of 5 × 5 blocks of fine data and have a nominal spatial resolution of 2.7 km. The data that we obtained from the National Oceanic and Atmospheric Administration–National Geophysical Data Center are constructed using the smoothed spatial resolution mode, at a resolution of 30 arc-seconds, covering 180° W to 180° E longitude and 75° N to 65° S latitude. There are different versions of the data; three of particular importance are the ‘raw’, the ‘stable lights’ and the ‘calibrated’ versions.2 After considerable testing, we have relied on the stable lights version.
2.2. The GEcon data
For standard economic output data, at the country level, we used GDP in purchasing power parity (PPP) values at constant 2005 international US dollars from the World Bank from 1992 to 2010. For disaggregated output data, we used the GEcon data set, available at gecon.yale.edu. These data are available at 1° x 1° latitude and longitude resolution for all terrestrial grid cells for 1990, 1995, 2000 and 2005 using PPP values at constant 2005 international US dollars.
Although the GEcon data have been described in detail elsewhere, we provide a succinct description for those unfamiliar with this data set. The data come from the GEcon project at Yale University (GEcon denotes Geographically based Economic data). This dataset provides a set of output accounts at a grid scale of 1° latitude by 1° longitude for all terrestrial grid cells of the world, a total of 27,442 observations. The earliest version contains cell data for the year 1990 and was first published in 2005. Multiple revisions have been conducted over the years, and its current version includes more observations, improved methods, and a total of four periods—1990, 1995, 2000 and 2005. In addition to economic output accounts for cells, or gross cell product (GCP), the dataset also provides information on cell population, land area size, precipitation, temperature and geophysical measures such as soil type, vegetation categories and distance to navigable rivers and ice-free oceans.
The construction of country data differs for each country because of differences in the national economic accounts of each country. Detailed economic accounts by region are estimated for all countries with more than 50 grid cells (this being about 90% of all grid cells. We collected economic data (generally on gross product) at the smallest available political administrative unit for each country. These were divided into sub-grid-cells (these being the minimum segments of grid cells and political boundaries) using spatial overlays between grid cells and administrative boundary maps. We then estimated the sub-grid-cell populations using grid cell population estimates from the Grid Population of World (CIESIN, 2013). Next, the sub-grid-cell gross output is obtained as the multiplication of the sub-grid-cell population and sub-regional gross output per capita or income per capita. The GCP is the sum of outputs from all sub-grid-cells that are located in one grid cell. Finally, the sum of the GCPs of a country is rescaled to conform to the official total country GDP.
Note that, while the data have units of ‘2005 international US dollars’, the actual construction of the indexes differs by country. While the USA uses ideal Fisher indexes for output, many countries (such as China) continue to use Laspeyres indexes to measure growth. For intermediate years, we interpolated gross cell output using national output and population numbers. More specifically, for the 1992 to 2010 time-series cell data, we interpolate annual cell data within each 5-year period using country growth rates for that period, and then rescaled summed cell values to match the annual national GDP. In effect, this implies that the relative outputs of different grid cells change smoothly between observational years.
The grid cell measures in GEcon have advantages over information collected according to political administrative boundaries (Nordhaus et al., 2006). For most countries, areas defined at the 1°x 1° resolution tend to be much smaller than areas defined by nation states or provinces. The land area of 1°x 1° cell is equivalent to 10,000 Sq kms near the equator, but shrinks as the region approaches the poles. Thus, the GEcon data provides estimates at a much smaller scale than standard national accounts. Furthermore, by using spatial coordinates, demographic and economic variables can be merged with geophysical variables (such as those mentioned at the end of the last paragraph), allowing for integration of social and environmental research.
Conceptually, the GCP is measured in a similar fashion as GDP and gross regional product (Nordhaus, 2006; Nordhaus et al., 2006). Since there are limited socio-economic data at the grid cell level, a spatial rescaling approach is used to estimate the GCP values. The description of spatial rescaling procedure can be found in the data documentation from the Yale GEcon website, gecon.yale.edu.
Since its existence, the GEcon data has brought a geospatial component in many social science investigations, including geographic penalty on African poverty and the economic damages caused by greenhouse warming (Nordhaus, 2006 and Tang et al., 2009). It also has been used in other environmental studies (Tang and Woods, 2008; Buhaug et al., 2009; Seo, 2011), and inequalities between ethnic groups and ethno-nationalist conflict (Cederman et al., 2011). It is used in measuring economic distance (Ramcharan, 2009). Disaggregated models of the impacts of global warming have been undertaken using the data set.
In exploring the relationship between cell economic output and lights, we note an important shortcoming of the lights data—the large number of cells with zero values for stable lights. Researchers have found that, at low level of lights, the background noise tends to dominate the human signal. In preparing the stable lights set, therefore, a large number of cells are set at zero. For example, we calculate that 5489 grid-cells have positive population and zero stable lights for the year 2000. Of these, 1613 cells are in Russia, 1286 cell are in Africa and 784 cells are in Canada. We found only one cell with positive stable lights and zero estimated population. The zero-lights cells are estimated to contain 0.4% of world output and population, so we are not missing much of the total. However, it is a concern with the statistical analysis because there are likely to be biases in estimation for low-lights regions.
2.3. Derivation of the optimal weights
This section explains the basic approach and derives the optimal weighting of conventional output and lights measures. This section draws upon the Supplementary Information from Chen and Nordhaus (2011). For this purpose, we define the different variables as follows:
-
Y = output from national accounts (GDP in constant 2005 international US $)
-
Y* = true output (GDP in constant 2005 international US $)
-
X = synthetic measure of output (GDP in constant 2005 international US $)
-
M = measured lights value (index value)
-
Z = lights-based measure of output (GDP in constant 2005 international US $)
-
i = grid cell (here 1° latitude by 1° longitude)
-
j = country
-
k = country grade (A, B, C, D, E)
-
t = year
-
y = log (Y) and similarly for other upper case variables
-
= measurement error in GDP -
= measurement error in lights -
= error in output-lights relationship -
= structural parameters
For notational purposes, we define xi(t) as the value of variable x in grid cell i averaged over a year. We omit the time variables when they are inessential to the exposition. Note that in our full treatment, the measurement errors for GDP and lights as well as the error in the output-lights relationship will differ by country ‘grade’. Additionally, in the complete model below, we will allow for the structural parameters to differ by country grade. Begin by assuming that there is an unknown true level of output for each country and grid cell, which is measured with error.
For the present study, we assume that there is no bias in measured output, so μ = 1. This assumption is not completely innocuous as there may be systematic growth mismeasurement due, say, to incomplete source data or infrequent observations. The important issues raised by μ ≠ 1 have not been solved. Assuming that μ = 1 yields:
Luminosity is subject to measurement error (due to satellite, calibration and other sources):
There is assumed to be a structural relationship between luminosity and true output. An important statistical assumption is that output is the exogenous variable and luminosity depends upon lights and other omitted variables. The assumed relationship is as follows:
The error in Equation (2.3) arises from several sources. One important error is that lights data are sampled at night, whereas economic activity is generally concentrated in the daytime. More important, the luminosity of lights at night differs greatly across sectors. Often, lights are associated with electricity use. The use of electricity per dollar of output in different sectors provides a rough idea of how luminosity might vary. In the 2002 US input–output tables, the electricity used per unit output of real estate was 200 times greater than that of software. (See the input–output tables at www.bea.gov for the underlying data.) Similar differences are seen across other sectors.
Another example is automotive travel. The lights data will show significant economic activity on highways, while the national accounts place the activity at point of purchase of the automobile, gasoline, and insurance. In this respect, the lights data better reflect the actual location of economic activity compared to the standard economic accounts. These examples suggest that industrial composition across countries and regions is likely to make the output-lights relationship in Equation (2.3) relatively noisy.
Our procedure is first to estimate Equation (2.3) using measured output and lights. This provides a biased estimate of the coefficient,
, because output is measured with error. We then do an errors-in-variable correction using our prior estimates of the measurement error of GDP to get a corrected estimate of the structural coefficient, which we denote
The corrected coefficient is calculated as:
Here,
is the estimated coefficient in Equation (2.3);
is the a priori measurement-error variance of true output; and
is the estimated variance of true output. The consistent estimate of
follows immediately.
We then estimate the lights-based output proxy as follows by inverting Equation (2.3):
where
is the log of our lights-output proxy
and
and are the corrected coefficients from equation (2.4).
Next, we construct a combined measure of output by taking weighted averages of conventional measures of output and our lights-based output proxy:
Where-
= new synthetic measure of output -
= ln(
) -
= weighting fraction on lights.
The key variable of this study is θ, which is the share (or weight) of the lights-based output proxy. The central question we address is whether we can significantly improve conventional measures of output using lights. Some mathematical analysis (provided in Chen and Nordhaus, 2011) shows that the optimal weight on lights, θ*, is given by:
However, because the parameters in (2.7) are unknown, we need to find an appropriate estimator of
. Assuming that
is known from external evidence (see below) and that
and
can be consistently estimated as
and
, respectively, it can be shown that
is the uniformly consistent estimator of
. Details of the proof are provided in Chen and Nordhaus (2011).
2.4. Estimating the reliability of the optimal weight
It is tempting to construct new measures of output based on lights (θ) when the estimated optimal weight on lights is large. This is not advisable unless we have a clear idea of the reliability of the estimates of the optimal weights. However, because the procedure used to estimate the optimal weight is complex and the estimator is only consistent, we cannot use standard confidence techniques.
There are several approaches to estimating the statistical precision in large systems. One of the earliest approaches was the ‘delta method’. This approximates expected values of functions of random variables using polynomial approximation, usually with a Taylor expansion (see Oehlert, 1992). The delta method is difficult to apply in the present estimates because of the multi-stage nature of the estimation. Moreover, the usual approach in the delta method is to assume normal distributions, which is inappropriate in the present context given the frequency of regions with low lights. Some tests indicate that it seriously underestimates standard errors (see Efron, 1982).
Our preferred approach is to use a bootstrap procedure to estimate the precision of the weighting parameter,
. The bootstrap procedure does not rely on assumptions of normality for the errors (see Efron and Tibshirani, 1986 as well as the survey in Davison et al., 2003). In the present situation, the estimates involve multiple steps. An additional difficulty is that there may be truncation errors at low levels of light (because these are erroneously calculated to be zero lights).
In the following analysis, we first take a bootstrap resample of the data with replacement, where the size of the resample is equal to the size of the original data set. We then calculate the distribution of the statistic of interest by taking multiple replications. We interpret the estimation as random observations in our calculations although a fixed-variables interpretation might be more natural in this context. We do not do statistical tests because we are primarily concerned with the overall reliability, which can be best understood by dispersions and boxplots, shown below.
The precision of the estimate of θ in Equation (2.7) depends upon three parameters. Two of them (
and
) come from the regression analyses. The other (
) comes from a priori estimates for measurement errors in standard output. We discuss each of these in turn.
The parameters for the error variance of lights and the coefficient in the lights-output equation can be estimated by standard bootstrap techniques. Applying bootstrap procedures to the regression equation can generate a set of
and
where the subscript m after estimates indicates that they are bootstrapped replications. These will provide consistent estimates of the errors of these parameters.
In general, estimates of GDP and other national accounting measures do not have associated statistical errors. Unlike other data series (such as the unemployment rate), GDP is not a statistically based estimate but is built up from multiple sources of data and several ad hoc procedures. This issue is addressed at length in the next section.
3. Errors of measurement of output
3.1. Overview
The thorniest issue in estimating the reliability of the optimal weights is determining the errors of standard national accounting measures. We begin with a discussion of ‘measurement error’ in this area (see Fixler (2009) for a useful discussion). It is general practice for statistical offices deriving national economic accounts not to provide estimates of the measurement error. Instead, accountants generally discuss ‘reliability’, which is the inverse of measurement error (Fixler and Grimm, 2008). Fixler and Grimm note that ‘total measurement error … in the [national income and product accounts] is never observed’.3 The major focus in estimating reliability is to determine the size of revisions, which is a component of total measurement error but will generally be smaller.
We define measurement error for standard national economic output estimates as the error of estimate of output or output growth relative to an ideal measure of national output. For our purpose, we define the ‘ideal’ measure of output as that one corresponding to the definition of national output in the System of National Accounts (1993).
Fixler (2009) described measurement error as arising from six sources: sampling error, response error, non-response error, coverage error, processing error, improperly designed source data and non-statistically related errors. It is likely that in our framework the last of these (non-statistically related errors) may be most important. Non-statistical errors include imputations, conceptual differences, index construction, sectoral definitions, and the scope of the exclusions (such as home production, subsistence farming, illegal activity and smuggling).
We can distinguish two different kinds of errors. The first are time-series errors. Measures of output growth generally keep the conceptual basis of the measures as well as the data sources constant over time (at least for short periods). Time-series or growth-rate errors will arise primarily from errors in the source data and errors in aggregation. Moreover, since there are two or three alternative methods of constructing national output (e.g. income and expenditures), we can examine the statistical discrepancy to make an initial estimate of the size of the measurement error.
A second kind of measurement error is cross-sectional level or density errors. These would apply to comparisons of the level of output among countries or regions, such as a comparison of the per capita output of the USA and Mali. Cross-sectional errors will encompass a broader set of concerns than time series errors. They will include most of the ones mentioned above. In addition, they will reflect differences in source data, concepts and price measurement by country, as well as errors in measuring the effective exchange rates among different currencies. We would therefore expect the cross-sectional errors to be larger than the time-series errors.
The present study examines both country output data and grid-cell output data. We will therefore need to consider errors in both countries and grid-cells, as well as time-series and cross-sectional (density) estimates. The issues involved in estimating the measurement errors are extensively discussed in a background article (Nordhaus and Chen, 2012). The present article summarizes the results, and the reader is referred to that discussion for the full analysis and references.
3.2. Errors in national-level data
3.2.1 Time-series errors: general
Estimation of errors in standard national accounts is a vast and largely uncharted enterprise. The US Bureau of Economic Analysis (BEA), which produces the accounts, has devoted considerable attention to reliability issues (see Fixler and Grimm, 2008 and the many references therein). Our review concludes that the lower bound for the measurement error of the growth rate of real output is 0.3 percentage points per year for the USA. We apply this number to other countries with high-quality statistical systems.
3.2.2 Time-series errors from index-number differences
One of the methodological differences among countries involves the index-number techniques used in determining growth rates. Most high-quality systems currently use superlative techniques (such as Fisher’s Ideal index), while other countries (such as China) continue to use Laspeyres indexes. BEA’s calculations indicate that for the USA, the average error due to using Laspeyres rather than Fisher indexes is around 0.3 percentage points per year. Larger biases would be expected in countries with particularly rapid structural change.
3.2.3 Cross-sectional errors from revisions and methodological differences
A second set of estimates concerns the level of GDP (or GDP density per square km when used in conjunction with lights). For the USA, we calculate a measurement error of 0.35% of GDP, and this is used as a lower bound for high-quality statistical systems.
3.2.4 Cross-section differences from exchange-rate calculations
A particularly difficult issue in country comparisons is the conversion from national currencies into a common currency. Common practice today is to use PPP, or purchasing-power parity, exchange rates rather than market exchange rate. While there is (in our view) no question about the conceptual appropriateness of PPP measures, the practice of calculating them has proven extremely difficult, and in some cases, such as the appropriate multilateral weights, unresolved. There is a vast literature on the subject. For a recent review, see the article by Deaton and Heston (2010).
A system of country grades from A to D was introduced by Summers and Heston (1991). These were judgmental margins of error (actually defined as the root mean squared error). Table 1 shows the margins of error as defined in the original Summers Heston study (1991). The system of grading has been adopted in the current Penn World Table estimates of national output. Very few countries receive the grade of A and a substantial numbers are C or D. The grade A countries would be representative of countries such as the USA. Note that the margin of error is much greater than the average statistical discrepancy. We will adopt the margin of errors in Table 1 for our estimates of the cross-sectional errors for countries. We note that reliability estimates have also been provided by the International Monetary Fund (the Data Quality Assessment Framework), but the estimates include a variety of characteristics not closely related to measurement errors, so we have relied upon the Summers-Heston estimates with some of our own judgmental adjustments.
Cross-sectional errors estimated by Summers and Heston (1991)
| Estimated margin of error in PPP cross section | |
|---|---|
| Country grade | Error in level (%) |
| A | 9 |
| B | 15 |
| C | 21 |
| D | 30 |
| Estimated margin of error in PPP cross section | |
|---|---|
| Country grade | Error in level (%) |
| A | 9 |
| B | 15 |
| C | 21 |
| D | 30 |
Estimate of the measurement error of national output. In all cases, these are interpreted as the standard deviation of the measurement error. Cross-section errors refer to the error in measuring the level of output.
3.3. Grid-cell estimates
In our estimates below, we use grid cell as well as country output estimates. The grid cell output data have higher errors than the national data, but they are an important source because of the much higher resolution than country data. We have about 20,000 non-zero grid cells as compared to somewhat less than 200 countries.
The tradeoff is that estimating the grid-cell errors is more challenging because their estimation is in its infancy. We have used grid-cell output data based on the GEcon database. We consider the national level and growth estimates to be a lower bound for our grid-cell estimates. The major approach available to estimate potential error is similar to that used above—to examine changes in estimates of levels of PPP output for individual grid cells across revisions of the GEcon data sets. The revisions have added considerable accuracy by using improved maps, better population estimates, and improved imputations. In addition, the GEcon estimates have added output estimates for E quality countries for which data are not generally available, such as Somalia and Afghanistan.
We have examined the revisions in the estimates of the GEcon data between different versions from the first published version (GEcon 1.3 from 2005) to GEcon 3.4 in 2011. Our calculations indicate that the data revisions were approximately two times the estimated errors of the national accounts data when we aggregated them by country groups. We take the magnitude of the revisions to be a first approximation to the remaining measurement errors in the cell data. It is recognized that these estimates, as with the error estimates for the national data, have large elements of judgment. Table 2 summarizes our estimates of the output-measurement errors for different countries and concepts.
Estimates of errors of national and gridded GDP data used in estimates of combined measures of output
| Country grade | Estimates for country output | Estimates for grid-cell output | ||
|---|---|---|---|---|
| 1-year growth rate (%) | Output level (%) | 1-year growth rate (%) | Output level (%) | |
| A | 0.6 | 10 | 1.2 | 20 |
| B | 0.8 | 15 | 1.6 | 30 |
| C | 3.0 | 20 | 4.0 | 40 |
| D | 5.0 | 30 | 5.0 | 60 |
| E | 6.0 | 50 | 8.0 | 100 |
| Country grade | Estimates for country output | Estimates for grid-cell output | ||
|---|---|---|---|---|
| 1-year growth rate (%) | Output level (%) | 1-year growth rate (%) | Output level (%) | |
| A | 0.6 | 10 | 1.2 | 20 |
| B | 0.8 | 15 | 1.6 | 30 |
| C | 3.0 | 20 | 4.0 | 40 |
| D | 5.0 | 30 | 5.0 | 60 |
| E | 6.0 | 50 | 8.0 | 100 |
The units are percentage points of the natural logarithm.
Source: Authors. For details, see Nordhaus and Chen (2012).
4. Methods and results
Chen and Nordhaus (2011) provided preliminary estimates of the optimal weighting fractions of conventional GDP measures and lights-based measures. We now provide estimates of the weighting fraction based on revised data along with the estimated error for the optimal weights for each of the different approaches (time series and cross section, country and grid-cell, for each country grade). In the present study, we define the time series as the annual average growth rate for country or cell from 1992–1993 to 2009–2010. Table 3 lists the numbers of country and cell observations and representative countries for each country grade used in this study. A complete list of countries and grades is available in an online appendix.
Distribution of countries and cells without missing values by grade
| Grade level | Number of countries | Number of cells | Representative country |
|---|---|---|---|
| A | 16 | 2810 | Australia, Canada, USA |
| B | 13 | 853 | Argentina, Germany, Spain |
| C | 101 | 6170 | Bangladesh, Egypt, Mexico, Russia |
| D | 29 | 761 | Algeria, Cambodia, D.R. Congo, Libya |
| E | 8 | 240 | Iraq, Mynamar, North Korea, West Bank and Gaza |
| Total | 167 | 10,834 |
| Grade level | Number of countries | Number of cells | Representative country |
|---|---|---|---|
| A | 16 | 2810 | Australia, Canada, USA |
| B | 13 | 853 | Argentina, Germany, Spain |
| C | 101 | 6170 | Bangladesh, Egypt, Mexico, Russia |
| D | 29 | 761 | Algeria, Cambodia, D.R. Congo, Libya |
| E | 8 | 240 | Iraq, Mynamar, North Korea, West Bank and Gaza |
| Total | 167 | 10,834 |
Cells are at the 1° x 1° resolution. The sample of cells used in the growth rate analysis includes all available observations after merging the GEcon dataset (4.0) and DMSP-OLS Nighttime Stable Lights Time Series (Version 4) and taking logarithm of both variables. Note that these data exclude observations with zero values for lights (see text).
Source: Authors.
Our formal analysis is in two steps. The first step is a standard bootstrap analysis of the standard error of the estimated parameters. This first step can provide estimates for all parameters except the errors of measurement of standard national-account output measures. For the standard output measures, we do not have a statistical method to generate errors; for the second step we use sensitivity analysis.
4.1. Bootstrap analysis: background
We are concerned with the precision of the point estimates of the optimal weight on lights (θ) provided in Chen and Nordhaus (2011). The value of θ is determined by three parameters (β,
and
) as shown in Equation (2.7). Therefore, the reliability of the optimal weight is influenced by the reliability of these parameters.
Because the procedure contains multiple steps and assumptions, we cannot estimate the precision using standard techniques. So, as a first step, we use bootstrap techniques to determine the precision from those parameters that are statistically derived (in the present section) and sensitivity analysis to estimate the precision (in the next section).
In the exposition that follows, we simplify the notation by substituting
respectively. These short-hand expressions are used to keep the text intelligible, but readers are reminded of the formal definitions provided in the first part of the article.
4.2. Bootstrap analysis: parameters
In the first step, we apply standard bootstrap techniques (Freeman, 1981; Efron and Tibshirani, 1986) to estimate the uncertainty of β and
, and ultimately θ, caused by sampling error in the regression model. We first generate a set of estimates of β and
by resampling the data with replacement for Equation (2.3), and then combine the bootstrap estimates with the baseline estimates of
from Table 2 to calculate the corrected regression coefficient β. Putting together the error-corrected parameters β,
and baseline
, we finally calculate the optimal θ with Equation (2.7). For all calculations, we set the number of bootstrapped replications at N = 1000.
We present the results for the bootstrapped estimates in both tables and box plot figures by country grade. The statistics of 1000 replications of regression coefficient (β) and root mean squared error
for country data are shown in Tables 4a and 4b, while those for cell data are shown in Tables 4c and 4d.
Coefficient estimates from bootstrap analysis as described in text: Results for country cross-sectional analysis
| Grade | Regression coefficient (β) | Mean squared residual ( ) | ||||
|---|---|---|---|---|---|---|
| Mean | SD | IQR | Mean | SD | IQR | |
| Grade A | 0.779 | 0.021 | 0.029 | 0.511 | 0.051 | 0.077 |
| Grade B | 0.733 | 0.014 | 0.020 | 0.530 | 0.024 | 0.033 |
| Grade C | 0.980 | 0.010 | 0.013 | 0.719 | 0.010 | 0.014 |
| Grade D | 0.953 | 0.013 | 0.017 | 0.796 | 0.017 | 0.023 |
| Grade E | 1.311 | 0.042 | 0.058 | 0.642 | 0.029 | 0.040 |
| All grades | 0.953 | 0.006 | 0.008 | 0.733 | 0.008 | 0.010 |
| Grade | Regression coefficient (β) | Mean squared residual ( ) | ||||
|---|---|---|---|---|---|---|
| Mean | SD | IQR | Mean | SD | IQR | |
| Grade A | 0.779 | 0.021 | 0.029 | 0.511 | 0.051 | 0.077 |
| Grade B | 0.733 | 0.014 | 0.020 | 0.530 | 0.024 | 0.033 |
| Grade C | 0.980 | 0.010 | 0.013 | 0.719 | 0.010 | 0.014 |
| Grade D | 0.953 | 0.013 | 0.017 | 0.796 | 0.017 | 0.023 |
| Grade E | 1.311 | 0.042 | 0.058 | 0.642 | 0.029 | 0.040 |
| All grades | 0.953 | 0.006 | 0.008 | 0.733 | 0.008 | 0.010 |
Coefficient estimates from bootstrap analysis as described in text: Results for country time series analysis
| Grade | Regression coefficient (β) | Mean squared residual ( ) | ||||
|---|---|---|---|---|---|---|
| Mean | SD | IQR | Mean | SD | IQR | |
| Grade A | 0.424 | 0.427 | 0.395 | 0.236 | 0.036 | 0.050 |
| Grade B | –0.486 | 0.797 | 1.319 | 0.328 | 0.113 | 0.194 |
| Grade C | 0.014 | 0.320 | 0.456 | 0.545 | 0.075 | 0.101 |
| Grade D | 1.253 | 0.273 | 0.373 | 0.411 | 0.057 | 0.075 |
| Grade E | 0.001 | 0.637 | 0.669 | 0.366 | 0.106 | 0.151 |
| All grades | 0.322 | 0.267 | 0.398 | 0.526 | 0.059 | 0.081 |
| Grade | Regression coefficient (β) | Mean squared residual ( ) | ||||
|---|---|---|---|---|---|---|
| Mean | SD | IQR | Mean | SD | IQR | |
| Grade A | 0.424 | 0.427 | 0.395 | 0.236 | 0.036 | 0.050 |
| Grade B | –0.486 | 0.797 | 1.319 | 0.328 | 0.113 | 0.194 |
| Grade C | 0.014 | 0.320 | 0.456 | 0.545 | 0.075 | 0.101 |
| Grade D | 1.253 | 0.273 | 0.373 | 0.411 | 0.057 | 0.075 |
| Grade E | 0.001 | 0.637 | 0.669 | 0.366 | 0.106 | 0.151 |
| All grades | 0.322 | 0.267 | 0.398 | 0.526 | 0.059 | 0.081 |
Coefficient estimates from bootstrap analysis as described in text: Results for cell cross-sectional analysis
| Grade | Regression coefficient (β) | Mean squared residual ( ) | ||||
|---|---|---|---|---|---|---|
| Mean | SD | IQR | Mean | SD | IQR | |
| Grade A | 0.764 | 0.002 | 0.003 | 1.286 | 0.006 | 0.008 |
| Grade B | 0.908 | 0.003 | 0.004 | 1.184 | 0.008 | 0.010 |
| Grade C | 0.887 | 0.002 | 0.002 | 1.583 | 0.003 | 0.004 |
| Grade D | 0.940 | 0.005 | 0.007 | 1.740 | 0.009 | 0.012 |
| Grade E | 0.718 | 0.013 | 0.017 | 1.928 | 0.012 | 0.017 |
| All grades | 0.870 | 0.001 | 0.002 | 1.591 | 0.003 | 0.004 |
| Grade | Regression coefficient (β) | Mean squared residual ( ) | ||||
|---|---|---|---|---|---|---|
| Mean | SD | IQR | Mean | SD | IQR | |
| Grade A | 0.764 | 0.002 | 0.003 | 1.286 | 0.006 | 0.008 |
| Grade B | 0.908 | 0.003 | 0.004 | 1.184 | 0.008 | 0.010 |
| Grade C | 0.887 | 0.002 | 0.002 | 1.583 | 0.003 | 0.004 |
| Grade D | 0.940 | 0.005 | 0.007 | 1.740 | 0.009 | 0.012 |
| Grade E | 0.718 | 0.013 | 0.017 | 1.928 | 0.012 | 0.017 |
| All grades | 0.870 | 0.001 | 0.002 | 1.591 | 0.003 | 0.004 |
Coefficient estimates from bootstrap analysis as described in text: Results for cell time series analysis
| Grade | Regression coefficient (β) |
Mean squared residual ( ) |
||||
|---|---|---|---|---|---|---|
| Mean | SD | IQR | Mean | SD | IQR | |
| Grade A | 0.667 | 0.090 | 0.123 | 0.778 | 0.026 | 0.037 |
| Grade B | 0.449 | 0.143 | 0.189 | 0.930 | 0.053 | 0.067 |
| Grade C | 0.626 | 0.026 | 0.034 | 1.137 | 0.021 | 0.027 |
| Grade D | 0.705 | 0.148 | 0.200 | 1.312 | 0.057 | 0.077 |
| Grade E | 0.571 | 0.323 | 0.438 | 1.436 | 0.094 | 0.126 |
| All grades | 0.703 | 0.023 | 0.032 | 1.084 | 0.015 | 0.022 |
| Grade | Regression coefficient (β) |
Mean squared residual ( ) |
||||
|---|---|---|---|---|---|---|
| Mean | SD | IQR | Mean | SD | IQR | |
| Grade A | 0.667 | 0.090 | 0.123 | 0.778 | 0.026 | 0.037 |
| Grade B | 0.449 | 0.143 | 0.189 | 0.930 | 0.053 | 0.067 |
| Grade C | 0.626 | 0.026 | 0.034 | 1.137 | 0.021 | 0.027 |
| Grade D | 0.705 | 0.148 | 0.200 | 1.312 | 0.057 | 0.077 |
| Grade E | 0.571 | 0.323 | 0.438 | 1.436 | 0.094 | 0.126 |
| All grades | 0.703 | 0.023 | 0.032 | 1.084 | 0.015 | 0.022 |
Note: The statistics in Tables 4a–4d are mean, standard deviation (SD), and interquartile range (IQR) for bootstrapped regression results for
and root mean squared error
by grade. These use 1000 replications for each analysis.
The results show that the β parameter is reliable estimated for the cross sections for both countries and cells. By contrast, the time-series coefficients of β are much less reliable, particularly for countries (see Tables 4a and 4b for the country data analysis). Tables 4c and 4d show the similar results for cells. Comparing all specifications (cell versus country and cross sections versus time series), we find the standard deviation and the interquartile range (IQR) of β estimate are largest for time series country data (Table 4b).
The results are more easily visualized in box plot figures, which are a convenient method for displaying the dispersion of an estimate. Figures 2 and 3 present the distribution of the β parameters in box plots for country and cell data, respectively. The upper and lower edges of the box indicate the value at the 75th and 25th percentile, and the difference is the IQR. The upper hash mark indicates the 3rd quartile plus 1.5 IQR, and the lower hash mark indicates the 1st quartile minus 1.5 IQR.
Box plot for bootstrapped regression coefficient
for each grade for country data.
Figure shows the estimated elasticity of lights with respect to true output. The middle line in the box is median value, and the upper and lower edges of the box indicate the value at the 75th and 25th percentile. The upper hash mark indicates the 3rd quartile plus 1.5 IQR (interquartile range). The lower hash mark indicates the 1st quartile minus 1.5 IQR. The different boxes are different country grades as indicated.
Box plot for bootstrapped regression coefficient
for each grade for country data.
Figure shows the estimated elasticity of lights with respect to true output. The middle line in the box is median value, and the upper and lower edges of the box indicate the value at the 75th and 25th percentile. The upper hash mark indicates the 3rd quartile plus 1.5 IQR (interquartile range). The lower hash mark indicates the 1st quartile minus 1.5 IQR. The different boxes are different country grades as indicated.
Box plot for bootstrapped regression coefficient
for each grade for grid-cell data.
See Figure 2 for the explanation for the box plot and legend.
Box plot for bootstrapped regression coefficient
for each grade for grid-cell data.
See Figure 2 for the explanation for the box plot and legend.
These figures clearly show that the box heights (the IQR) are much larger for time series (the bottom panels in the Figures 2 and 3) than for cross sections (the top panels in Figures 2 and 3). The same point is seen for the hash marks. We observe the similar patterns in both country and cell figures. The difference between cross section and time series is not surprising given the vast difference in the cross-sectional levels of output across regions as compared to the relatively limited difference in the growth rates among regions.
Next, we examine the bootstrapped results for the root mean squared error (RMSE) or
shown in the column 5–7 in Tables 4a–4d and in box plot Figures 4 and 5. Again, the results are based on one thousand replications of regressions for countries and cells for each grade, and for all countries and all cells. The results here are similar to the results for bootstrapped β. The RMSE estimator of cross sections is more reliable than for time series, particularly for the cell data. The only exception is the grade A countries.
Box plot for bootstrapped root mean squared error or RMSE
for each grade for country data.
See Figure 2 for the explanation for the box plot and legend.
Box plot for bootstrapped root mean squared error or RMSE
for each grade for country data.
See Figure 2 for the explanation for the box plot and legend.
Box plot for bootstrapped root mean squared error or RMSE
for each grade for cell data.
See Figure 2 for the explanation for the box plot and legend.
Box plot for bootstrapped root mean squared error or RMSE
for each grade for cell data.
See Figure 2 for the explanation for the box plot and legend.
Taking all the results together, we see that the estimates of the parameters are most reliable for cross section estimates for cells, and are least reliable for time series estimates for countries.
4.3. Bootstrap analysis: optimal weights
The final goal of this study is to estimate the precision of the optimal weighting coefficient on the lights-based proxy measure, or θ. Recall that θ = 0 when all the weight is on standard national-accounts measures, and θ = 1 when the entire weight is on the lights-based proxy measures. To calculate θ, we use Equation (2.4) and the baseline
. We take the error-adjusted coefficient for β for each replication of β, and calculate θ based on the error-adjusted β, mean squared error or
, and baseline
. We do this for each country grade and for each specification (time series and cross sections, and country and grid cell data). Again N = 1000 for each of the different versions.
Tables 5a and 5b present the θ estimates without bootstrap procedure (the ‘baseline’ value), and the statistics for bootstrapped θ (mean, standard deviation, and interquartile range) for country and cell analysis. The baseline values of θ (column 2 in Tables 5a and 5b) are very close to the results of θ estimator published in our early work (Chen and Nordhaus, 2011). We updated the GEcon data in summer 2011 and the present analysis is based on the latest version of the GEcon data. The results are consistent with our previous findings that the luminosity signal adds considerable information for D and E country grades, but adds very little information for A, B and C country grades. This conclusion holds for both cross sections and time series, and for both country and cell analysis as well. We do not include a discussion of the country time-series estimates for E countries because the sample is too small.
Parameter and precision estimates for optimal weight from bootstrap analysis as described in text: Results for country data analysis
| Sample distriubtion of θ | |||||
|---|---|---|---|---|---|
| Grade | Baseline θ | Mean | SD | IQR | |
| Cross sectional | A | 0.022 | 0.023 | 0.004 | 0.007 |
| B | 0.039 | 0.042 | 0.005 | 0.006 | |
| C | 0.070 | 0.071 | 0.003 | 0.004 | |
| D | 0.118 | 0.121 | 0.006 | 0.008 | |
| E | 0.613 | 0.631 | 0.032 | 0.042 | |
| all | 0.076 | 0.077 | 0.002 | 0.003 | |
| Time series | A | 0.003 | 0.005 | 0.009 | 0.005 |
| B | 0.004 | 0.012 | 0.017 | 0.012 | |
| C | 0.000 | 0.011 | 0.021 | 0.009 | |
| D | 0.419 | 0.490 | 0.179 | 0.252 | |
| E | 0.024 | 0.485 | 0.357 | 0.728 | |
| all | 0.006 | 0.021 | 0.028 | 0.028 | |
| Sample distriubtion of θ | |||||
|---|---|---|---|---|---|
| Grade | Baseline θ | Mean | SD | IQR | |
| Cross sectional | A | 0.022 | 0.023 | 0.004 | 0.007 |
| B | 0.039 | 0.042 | 0.005 | 0.006 | |
| C | 0.070 | 0.071 | 0.003 | 0.004 | |
| D | 0.118 | 0.121 | 0.006 | 0.008 | |
| E | 0.613 | 0.631 | 0.032 | 0.042 | |
| all | 0.076 | 0.077 | 0.002 | 0.003 | |
| Time series | A | 0.003 | 0.005 | 0.009 | 0.005 |
| B | 0.004 | 0.012 | 0.017 | 0.012 | |
| C | 0.000 | 0.011 | 0.021 | 0.009 | |
| D | 0.419 | 0.490 | 0.179 | 0.252 | |
| E | 0.024 | 0.485 | 0.357 | 0.728 | |
| all | 0.006 | 0.021 | 0.028 | 0.028 | |
Parameter and precision estimates for optimal weight from bootstrap analysis as described in text: Results for grid-cell data analysis
| Sample distribution of θ | |||||
|---|---|---|---|---|---|
| Grade | Baseline θ | Mean | SD | IQR | |
| Cross sectional | A | 0.014 | 0.014 | 0.000 | 0.000 |
| B | 0.052 | 0.052 | 0.001 | 0.001 | |
| C | 0.052 | 0.052 | 0.000 | 0.000 | |
| D | 0.113 | 0.113 | 0.002 | 0.002 | |
| E | 0.241 | 0.242 | 0.008 | 0.011 | |
| all | 0.042 | 0.042 | 0.000 | 0.000 | |
| Time series | A | 0.002 | 0.002 | 0.001 | 0.001 |
| B | 0.001 | 0.002 | 0.001 | 0.001 | |
| C | 0.011 | 0.011 | 0.001 | 0.001 | |
| D | 0.043 | 0.046 | 0.020 | 0.025 | |
| E | 0.081 | 0.516 | 0.247 | 0.368 | |
| all | 0.010 | 0.010 | 0.001 | 0.001 | |
| Sample distribution of θ | |||||
|---|---|---|---|---|---|
| Grade | Baseline θ | Mean | SD | IQR | |
| Cross sectional | A | 0.014 | 0.014 | 0.000 | 0.000 |
| B | 0.052 | 0.052 | 0.001 | 0.001 | |
| C | 0.052 | 0.052 | 0.000 | 0.000 | |
| D | 0.113 | 0.113 | 0.002 | 0.002 | |
| E | 0.241 | 0.242 | 0.008 | 0.011 | |
| all | 0.042 | 0.042 | 0.000 | 0.000 | |
| Time series | A | 0.002 | 0.002 | 0.001 | 0.001 |
| B | 0.001 | 0.002 | 0.001 | 0.001 | |
| C | 0.011 | 0.011 | 0.001 | 0.001 | |
| D | 0.043 | 0.046 | 0.020 | 0.025 | |
| E | 0.081 | 0.516 | 0.247 | 0.368 | |
| all | 0.010 | 0.010 | 0.001 | 0.001 | |
Next, we focus on the bootstrapped θ results. Comparing the baseline θ (column 2 in Tables 5a and 5b) and the mean of bootstrapped θ (column 3 in Tables 5a and 5b), we find the baseline θ is generally slightly lower than the mean of bootstrapped θ, especially for country time series data. In instance, for countries with grade A statistics (Table 5a), the base θ for time series is 0.003, while the mean of θ estimates is 0.005. For D country grade, the values are 0.419 and 0.490. The underestimation of θ is probably caused by the non-linearity of the estimate in Equation (2.7). For parameters that are relatively well determined, the non-linearity is unimportant, and the bootstrapped mean will be close to the baseline estimate. Table 5b shows for cell cross sections the baseline θ and mean of θ estimates are identical for all different country grades.
Tables 5a and 5b show that the distribution of the θ estimator varies by country grade and model specification. By country grade, we find the standard deviation and the (IQR) for θ estimators are much larger for E country grade than for other countries. Column 4 in Table 5a shows that the standard deviation and IQR of θ for cross section for E country grade is 0.032 and 0.042, while the highest numbers for other country grades are 0.006 and 0.008. This result indicates that the θ estimate is least reliable for countries with poorest statistics system. This is a discouraging result as these are the countries that could benefit most for an independent data source.
In addition to difference across country grade, we also found standard deviation and the IQR of θ are larger for time series than for cross sectional data, especially for D and E grades. The results of cell analysis (Table 5b) show that for D grade the standard deviation and IQR for time series data are 0.020 and 0.025, while for cross sections are 0.002 and 0.002. Similarly, for E grade, standard deviation and IQR for time series is 0.247 and 0.368, while for cross sections are only 0.008 and 0.011. This result is consistent with the results of β and RMSE estimates, shown in Tables 4a–d and Figures 2–5, that the θ estimate is more reliable for cross section than time series data. We will discuss a graphical presentation of the results in the next section.
4.4. Sensitivity analysis for output measurement errors
The last section used a bootstrap approach for estimating the precision of the optimal weights. In the second step, we need to test the sensitivity of the estimated optimal weight on lights-based proxy (θ) on the prior estimate of the measurement error of standard national-accounts output (
). We do not have reliable ways to estimate the precision of this measurement error. To test this question, we take values of the measurement errors of output that are one-half and two times the base values estimated above. This would seem a plausible bound on the measurement errors given the procedures used to derive them described above. However, we cannot place a statistical interpretation on the upper and lower numbers, and we therefore interpret these as sensitivity analyses.
For these calculations, we perform two more sets of bootstrap analysis. In each, we use the same bootstrap sample for β and
as used for the earlier calculations so that the only difference in the estimated θ is the error of measurement of standard GDP. Assuming that
is equal to, half, or double of the baseline
, we use Equation (2.4) to calculate adjusted β, and then use Equation (2.7) to calculate the new value of θ. Using doubled baseline
in Equation (2.4) caused a problem in many replications because the new
is larger than the value of
. This is theoretically impossible in our specification because
cannot be negative (i.e.
should be always less than
). To deal with this problem, we set the upper bound for
at 95% of
in the cases where
is larger than
. In the first step of analysis, we generate one set of β and
(N = 1000) through bootstrapping regression and derive one set of θ (N = 1000) based on baseline
. In the second step, we generate two additional sets of θ based on different value of
, and N is equal to 1000 for each set.
We present three sets of bootstrapped θ estimates from both step one and two in box plots in Figures 6 and 7. Figure 6 presents the results of three sets of bootstrapped θ for each country grade, with the top panel for cross sections and the bottom one for time series. Figure 7 presents the comparable results for the grid cells. These box plots show that the value of estimated θ (the optimal weight on lights-based proxy) is in some cases quite sensitive to the prior estimate of the measurement errors in conventional GDP data.
Box plots for bootstrapped θ estimator for cross sectional (XS) data for countries.
Box plots for bootstrapped θ estimator for cross sectional (XS) data for countries.
Box plots for bootstrapped θ estimator for time series (TS) data for countries.
Note to Figures 6a and 6b: There are two graphs (Figure 6a for cross sectional and Figure 6b for time series analysis). Each graph has six panels (one for each grade, A through E, and one for all observations). The figure shows three sets of θ for each panel. The left-hand box plot shows the distribution of the θ using an estimated error of the national accounts estimate of output equal to one-half the base value,
. The middle box plot shows the estimator using the base value,
(the results from the first step of the analysis). The right-hand box plot shows the estimator using two times the base value of
. Each box plot is based on 1000 bootstrapped observations of θ. See Figure 2 for the explanation for the box plot.
Box plots for bootstrapped θ estimator for time series (TS) data for countries.
Note to Figures 6a and 6b: There are two graphs (Figure 6a for cross sectional and Figure 6b for time series analysis). Each graph has six panels (one for each grade, A through E, and one for all observations). The figure shows three sets of θ for each panel. The left-hand box plot shows the distribution of the θ using an estimated error of the national accounts estimate of output equal to one-half the base value,
. The middle box plot shows the estimator using the base value,
(the results from the first step of the analysis). The right-hand box plot shows the estimator using two times the base value of
. Each box plot is based on 1000 bootstrapped observations of θ. See Figure 2 for the explanation for the box plot.
Box plots for bootstrapped θ estimator for cross sectional (XS) for grid cells.
Box plots for bootstrapped θ estimator for cross sectional (XS) for grid cells.
Box plots for bootstrapped θ estimator for time series (TS) data for grid cells.
For explanation, see legend to Figures 6a and 6b.
Box plots for bootstrapped θ estimator for time series (TS) data for grid cells.
For explanation, see legend to Figures 6a and 6b.
We can explain the results using the result for the grade C countries in the top panel in Figure 6 as an example—this being the country cross-sectional analysis. We see that the values of three sets of bootstrapped θ are moderately sensitive to the value of
. The middle box shows the distribution of θ when we apply the baseline value
(Table 5) to Equations (2.4) and (2.7). For cross sectional country analysis for C grade countries in particular, our baseline value for measurement error for output level in standard output data is 20%. Using this number and the bootstrapped regression results from the last section, we calculated that the median value of θ is around 0.1 with little dispersion as shown by the middle box for C countries. This indicates that the median optimal weight for lights-based proxy is 10%, and the median optimal weight for conventional measure is around 90%.
Similarly, the left-hand box plot shows the distribution of θ calculated with one-half baseline
. The median value of bootstrapped θ for this specification is close to zero with very little dispersion. Finally, the right-hand box plot shows the distribution of θ calculated with two times baseline
. The corresponding median values for θ estimator increase to 0.30.
The box plots in Figures 6 and 7 also confirm our conclusion on sampling errors of θ estimator based on Table 5: Comparing time series and cross sectional results, we find the box size (the interquartile range) for cross sectional output is much smaller than for time series data, which indicates the θ estimator is more reliable for cross sections than for time series. The box sizes for grid-cell cross sectional analysis are smallest among all specifications, suggesting the estimate of θ has highest reliability for this specification. On the other hand, the box plot for θ based on time series country data shows largest box size, that is, lowest reliability.
Finally, we find the only case that the value of θ is not sensitive to
is for time series cell analysis for the highest grade countries, particularly grade A. The box plot for this analysis in Figure 7 (the bottom panel) shows that changing the value of baseline
leads to very little change in θ. The precision of the output estimates is sufficiently high that even doubling the measurement error in conventional output still leads no change in the value of the lights-based proxy for these countries.
5. Conclusions
The purpose of present article is to determine the value of nighttime lights for measuring output. More specifically, we present new estimates and examine the precision of the estimates of the optimal weights on conventional GDP and nighttime lights data for estimating ‘true output’ in countries and grid cells over the period 1992–2010.
Our major findings and recommendations are as follows. First, for grade A and B countries, there is no reason to use luminosity data as a supplement to standard data in any context where standard data are available. We found virtually no value added in these countries for either country or cell aggregates for either time series or cross sections. The lights data are not useful for A and B countries because the standard data are sufficiently reliable. These results are reasonably robust to statistical and sensitivity tests.
Second, we find that there is no advantage at present of using lights data for time-series corrections for countries or grid cells for any countries where data are available. Again, for A and B countries, there is no value added in the time-series lights data. For D and E and most C countries, the uncertainties in the estimates of the weights are too large at present to allow their use in construction of reliable time-series estimates based on lights. We conclude that without further refinement of the lights data (for example, developing a careful intercalibration of the data over years and satellites or improved quality of lights data), lights data are not a reliable proxy for time-series measures of output growth. The one possible exception is that lights data may have use for grade C countries, but this would require further refinement.
This finding is contrary to the results of Henderson et al. (2012), who estimate the weight for time series (growth) estimates. The findings in that study agree with the present study for countries with high data quality, but they calculate that lights data have substantial value added for time series estimates for 120 low and middle income countries. While some of our estimates indicate that there may be value added from the lights data for these countries, our estimates indicate that there is low reliability in these estimates. Henderson et al. did not provide estimates of reliability with their findings, so we do not know whether the difference in the findings is because of sampling error, differences in the sample of countries, or differences in methodology. In any case, we find that for time-series analysis, the contribution of lights data is either unreliable or very small for middle and low income countries.
Third, for D and E countries and for cross-sectional estimates of output, the estimates suggest that there may be substantial information in the lights data. Our results indicate that there is substantial promise in using lights data for estimating output per person (the density estimates) for countries with low-quality data. Moreover, the cross-sectional errors in estimating the optimal weights for D and E countries come primarily from uncertainty about the error in the standard output data and not from the measurement errors for lights or in the lights-output coefficient. Therefore, if the measurement errors in the cross section could be more precisely determined, there would be substantial information in the lights data that could be used to supplement current estimates of the level of output for both countries and grid cells.
Fourth, we emphasize that those areas with the lowest incomes have lights data equal to zero (because of data filtering), and efforts should be made to improve the quality and resolution of the lights data for low-lights regions. We noted above that almost one-third of grid-cells with positive population and output were recorded with zero lights. While these grid-cells contain only a small fraction of output and population, it is a large part of the land area of the globe.
Fifth, given these results, we recommend that future work be concentrated on integrating luminosity data into the cross-sectional estimates of national and regional output for D and E countries. The main open issues in integrating lights with economic output in these cases involve estimating the reliability of national accounts data and not the reliability of the nighttime lights data.
Sixth, we emphasize that there are several open questions about the use of luminosity as a proxy. The ones we particularly flag for future research are the following: the system for estimating the optimal weights is underidentified, and different approaches give different answers for countries with low data quality; there are large areas of the world for which the lights data have insufficient resolution and are set at zero in the stable lights data set; and the lights data have not been intercalibrated across years and satellites. Solutions to each of these open issues may sharpen the answer as to where lights will be a useful proxy or adjunct for standard economic data.
Seventh, we caution that the results of this and most other studies to date rely on a data set for lights that has relatively low resolution. As we noted above, a substantial fraction of cells with positive population are recorded as having zero value for stable lights. There is also substantial variation in measured light values for the same cells and years across satellites. With the development of better optics, computers, storage, processing and communications, it seems likely that there can be substantial improvements in the quality of the lights data in the coming years. This suggests that the negative results in the present study may overcome if the quality of the lights data improves.
Finally, the major concerns about use of lights as a proxy involve uncertainties about the precision of standard national accounts data. Our procedures provide statistically based estimates of the uncertainties arising from errors in lights and in the lights-output relationship. However, because we do not have statistical techniques to estimate the reliability of standard national and regional output data, particularly on a cross-sectional basis, we cannot judge the degree of imprecision coming from this source. Our sensitivity analysis suggests that this source of uncertainty is likely to dominate the overall imprecision in the optimal weighting fraction between lights and conventional output. This conclusion reminds us of the admonition of Josiah Stamp (1929), ‘The Government are very keen on amassing statistics - they collect them, add them, raise them to the nth power, take the cube root and prepare wonderfu1 diagrams. But what you must never forget is that every one of the figures comes in the first instance from the village watchman, who just puts down what he damn pleases’.
Supplementary material
Supplementary data for this article are available at Journal of Economic Geography online.
Acknowledgements
We are grateful to many researchers who have helped develop the present work, including the editor and anonymous referees. We particularly would like to thank participants in the National Bureau of Economic Research–Conference on Research in Income and Wealth Workshop and the Yale Environmental Economics Workshop for comments. We are grateful for comments on the work by Ben Jones, Dale Jorgenson and Steve Landefeld as well as for comments on the statistical approach from Zhipeng Liao. We have benefited from extensive advice from Chris Elvidge and his team and thank them particularly for their work in developing data on nighttime lights and making them available to the scientific community.
Funding
Research on the G-Econ database was funded by the Glaser Foundation, the U.S. Department of Energy (grant number 26488350-49105-A), and the U.S. National Science Foundation (grant no 4756-YU-NSF-0507).

















