Abstract

Concerns about bioterrorism and influenza have focused attention on identifying novel data sources to enhance public health surveillance. The authors evaluated free Pittsburgh Post-Gazette Internet death notices for Allegheny County, Pennsylvania, as a potentially timely source of mortality data. Data abstracted from Internet death notices for 1998–2001 were compared with mortality records from the Pennsylvania Department of Health. Approximately 75% (44,294/60,281) of state records had death notices, and 91% (44,294/48,651) of death notices corresponded to a state record. There was a 2-day median lag from the date of death to online death notice publication. The date of death, gender, age, and name data were nearly 90% accurate and 60–100% complete. Increasing education and age were independently associated with increased Pittsburgh Post-Gazette reporting. Being non-White, female, or a nursing home resident were independently associated with decreased reporting. The Pittsburgh Post-Gazette Internet death notices provided accurate, timely mortality data for nearly three fourths of all Allegheny County deaths.

Mortality surveillance has been an important epidemiologic tool since William Farr introduced the concept during the 1847–1848 influenza epidemic in London (1). Findings are now used to characterize epidemics, quantify the severity of influenza seasons, initiate investigations, better describe causes of mortality, efficiently allocate public health resources, and evaluate program efficacy (2). However, in the United States, existing mortality surveillance lacks timely, local epidemic detection capabilities, highlighting the need for new data sources to track the public health impact of diseases that cause excess mortality, such as influenza.

In the United States, death certificate collection and analysis currently provide the only means of monitoring trends in mortality, and they require months for states to compile manually. The Office of Budget and Management is implementing the E-Vital program to expand the use of electronic death registration (EDR) systems by states. However, the program is funded on behalf of the Social Security Administration, not the Department of Health and Human Services. The data collected via the EDR systems are not currently incorporated into mortality-monitoring programs conducted by the Centers for Disease Control and Prevention (CDC). EDR implementations are designed primarily to prevent fraudulent Social Security Administration benefit payments to deceased beneficiaries, not for public health surveillance.

Moreover, the extension of EDR systems to national coverage is uncertain, given E-Vital's highly political nature. Implementation of EDRs had varied degrees of success in states where they have been employed, the E-Vital program was not mentioned in the Office of Budget and Management's 2006 annual report to Congress, and all of the electronic government (E-Government) initiatives (including E-Vital) were subjects of Congressional scrutiny in 2006 (3). Finally, according to Office of Budget and Management performance measures for the E-Vital program, as of January 2007, only 23 percent of deaths in the 16 states implementing EDRs were reported via the electronic systems. The performance measures had not yet determined death notification processing time (4). Thus, existing EDRs are not currently capable of providing sufficient mortality data to conduct epidemic detection functions at either a state or a federal level. Additionally, future development of a timely, accurate, national electronic mortality data network via the E-Vital program is uncertain.

The CDC's 122 Cities Mortality Reporting System and influenza-related pediatric mortality tracking use sentinel sites to enhance timeliness of data collection and analysis. However, even these data are delayed by approximately 2–3 weeks from the times of death and evaluate only weekly aggregated, national-level data (5–7). Additionally, the CDC uses death certificate filing dates in lieu of date of death. Thus, as a result of vacation days taken by employees of reporting institutions, holidays artificially affect the number of deaths reported for that time period. For example, deaths are reported for only 3–5 days prior to Christmas or Thanksgiving, whereas after the holiday week, the CDC will receive data for 9–11 days (8). Finally, the data cannot be used to assess local trends and do not include residency information.

In contrast, since 1998 the Pittsburgh Post-Gazette has been publishing free, daily, online death notices for the greater Pittsburgh area, which includes the counties of Allegheny, Beaver, Butler, Washington, and Westmoreland (www.Post-Gazette.com/obituaries). Allegheny County, encompassing the city of Pittsburgh, has a population of over 1,250,000, making the county an ideal location for epidemic surveillance/detection in western Pennsylvania (9, 10). We conducted an attribute assessment of these death notices to assess their potential utility for timely, local mortality surveillance.

MATERIALS AND METHODS

Data preparation

The following is the standard data format included in Pittsburgh Post-Gazette online death notices that are presented by county of burial: Doe, John Q., age, of …, formerly of …, date of death, funeral home name, funeral home location (town). Using Python 1.6 text recognition software (Python Software Foundation, Hampton, New Hampshire), we abstracted first, middle, and last names; name suffixes (i.e., Jr., Sr., II, and III); age; date of publication; date of death; and residency from death notices with dates of death in the Pittsburgh Post-Gazette from 1998 to 2001 for those buried in Allegheny County. Names were standardized by deleting spaces, apostrophes, and titles such as Dr. and Rabbi. We assigned unique identification numbers to each individual receiving a death notice and used predefined parameters allowing for up to a two-letter misspelling, hyphenated last names, and a ±2-day difference in dates of death to identify multiple notices for an individual. We merged multiple notices per individual using the SAS PROC UPDATE procedure (SAS Institute, Inc., Cary, North Carolina), which overwrote data from the initial death notice with subsequent, duplicate data.

Gender was determined by a comparison of first names with Census 2000 name-gender frequencies. If a first name in a death notice was found in only one of the gender-based census lists, the gender of that list was assigned to the death notice. We designated death notices containing a name suffix, such as Jr., Sr., or II, as male. For unisex names, we used the percent frequency of occurrence by gender provided by the census. If the percent frequency was 0.015 percent or more in only one of the name-gender lists, then that list's gender was assigned. Names with no match in either gender list were marked as unknown, as were names where the frequency in both lists was either ≥0.015 percent or <0.015 percent. The 0.015 percent cutoff level was selected upon comparative review of census data.

Attribute assessment

On the basis of predefined parameters focusing on the date of death and first and last names, we matched updated Pittsburgh Post-Gazette death notices with Pennsylvania Department of Health mortality records for Allegheny County residents. In cases where the date of death was missing from the Pittsburgh Post-Gazette death notice, we substituted the (date of publication – mean lag time from death to date of publication (2 days)) as calculated in the timeliness analysis. Using state mortality records as the “gold standard,” we conducted an attribute assessment consisting of the following five criteria to characterize the features of the death notice data.

Sensitivity was defined as the (number of state mortality records with a corresponding death notice)/(total number of state mortality records). Positive predictive value (PPV) was calculated as the (number of death notices with a corresponding state mortality record)/(total death notices). Timeliness was calculated as the number of days between the death notice date of death and the date of publication. Data quality was assessed by use of SAS PROC COMPARE by tabulating the completeness and accuracy of death notice data fields, in comparison with corresponding state mortality record fields. Residential comparisons were based on minor civil division codes (MCDs) used by the state to record the municipality of residence. Representativeness was evaluated using SAS PROC LOGISTIC (α = 0.01) to conduct univariate and multivariate logistic regression analyses to assess factors contributing to inclusion in Pittsburgh Post-Gazette death notices (i.e., death notice reporting). We regressed state mortality data with a binomial outcome variable representing whether a matching Pittsburgh Post-Gazette death notice existed. To compensate for anomalies in newspaper publication procedures, we excluded 5,519 death notices from the first 6 weeks of 1998 and April–June of 2000. In 1998, the Pittsburgh Post-Gazette began publishing online, and initial reporting suffered from data entry gaps. In 2000, the Pittsburgh Post-Gazette briefly incorporated obituaries published in other newspapers. The practice was later discontinued. All variables were treated as categorical except date of death, age, and years of education attained. Race was grouped into White, Black, and other. Socioeconomic status was evaluated by use of Census 2000 median 1999 household income for MCDs.

RESULTS

Sensitivity and positive predictive value

For 1998–2001, an average of 73.5 percent (44,294/60,281) of state mortality records had a matching death notice (sensitivity), with annual percent sensitivities of 61.7, 75.4, 78.8, and 77.8 (table 1). For 1998–2001, 91.0 percent (44,294/48,651) of death notices had a matching state record (PPV), with annual percent PPVs of 91.1, 91.3, 91.2, and 90.6 (table 1). A review of unmatched death notices revealed that 31.4 percent (1,367/4,357) could be matched to state records from other surrounding counties. Additionally, misspellings in name fields greater than two letters and differences in dates of death greater than 2 days prevented the identification of some corresponding records. There was no significant age or gender difference between those with corresponding state records and those without.

TABLE 1.

Sensitivity and positive predictive value for aggregated Pittsburgh Post-Gazette death notices and Pennsylvania Department of Health mortality records, 1998–2001*

graphic 
graphic 

Timeliness

Date of death was included in 99.1 percent (48,233/48,651) of death notices. Thirty-six percent of death notices were published within 1 day of death, 81.7 percent within 2 days, 92.1 percent within 3 days, 95.6 percent within 4 days, 97.0 percent within 5 days, 98.1 percent within 7 days, and 99.0 percent within 33 days. In all 4 years, the median publication lag times were 2 days (mean: 2.1 days). The maximum lag time was 244 days. A newspaper data entry error resulted in one death notice with a date of death occurring after the date of publication.

Data quality

First and last names were nearly all complete, with blank first name fields in only 13 death notices. The date of death, gender, and residence were 99.1 percent, 95.6 percent, and 74.3 percent complete, respectively (table 2). Nonmissing fields for the last and first names, gender, and residence equaled corresponding state fields 87.0 percent, 89.8 percent, 90.3 percent, and 80.2 percent of the time, respectively. The date of death was 89.7 percent (43,565/44,294) accurate for nonblank fields (table 3). The median difference between dates of death was 1 day for the 729 death notices reporting inaccurate dates of death. Approximately 90 percent of the discrepancies between the dates of death fell within 2 days. The maximum discrepancy was 334 days.

TABLE 2.

Completeness as shown by data on key variables for all Pittsburgh Post-Gazette death notices by year, Pennsylvania, 1998–2001

 1998 (n = 10,009) 1999 (n = 12,613) 2000 (n = 12,893) 2001 (n = 13,136) 1998–2001 (n = 48,651) 
Date of death (%) 99.2 99.5 99.3 98.6 99.1 
Age (%) 55.9 57.8 60.4 59.9 58.8 
Gender* (%) 95.4 94.9 94.8 97.9 95.6 
Residence (%) 76.4 76.2 77.5 75.0 74.3 
Last name (%) 100.0 100.0 100.0 100.0 100.0 
First name (%) 100.0 100.0 100.0 100.0 100.0 
 1998 (n = 10,009) 1999 (n = 12,613) 2000 (n = 12,893) 2001 (n = 13,136) 1998–2001 (n = 48,651) 
Date of death (%) 99.2 99.5 99.3 98.6 99.1 
Age (%) 55.9 57.8 60.4 59.9 58.8 
Gender* (%) 95.4 94.9 94.8 97.9 95.6 
Residence (%) 76.4 76.2 77.5 75.0 74.3 
Last name (%) 100.0 100.0 100.0 100.0 100.0 
First name (%) 100.0 100.0 100.0 100.0 100.0 
*

Gender was deduced as described in Materials and Methods.

TABLE 3.

Accuracy as shown by comparison of nonblank Pittsburgh Post-Gazette death notice data with the corresponding matched Pennsylvania Department of Health mortality data by year, 1998–2001

 1998 1999 2000 2001 1998–2001 
 Accuracy (%) No. of records* Accuracy (%) No. of records* Accuracy (%) No. of records* Accuracy (%) No. of records* Accuracy (%) No. of records* 
Date of death 90.1 9,931 89.7 12,553 89.9 12,800 89.3 12,949 89.7 48,233 
Age 87.9 5,589 88.0 7,293 87.9 7,795 87.0 7,915 87.7 28,592 
Gender 90.3 9,533 90.6 11,941 90.5 12,191 89.9 12,840 90.3 46,505 
Residence 81.2 7,448 80.5 9,365 80.6 9,724 78.8 9,588 80.2 36,125 
Last name 89.8 10,009 90.2 12,613 89.9 12,893 89.4 13,136 89.8 48,651 
First name 88.7 10,009 88.6 12,611 88.4 12,888 82.8 13,130 87.0 48,638 
 1998 1999 2000 2001 1998–2001 
 Accuracy (%) No. of records* Accuracy (%) No. of records* Accuracy (%) No. of records* Accuracy (%) No. of records* Accuracy (%) No. of records* 
Date of death 90.1 9,931 89.7 12,553 89.9 12,800 89.3 12,949 89.7 48,233 
Age 87.9 5,589 88.0 7,293 87.9 7,795 87.0 7,915 87.7 28,592 
Gender 90.3 9,533 90.6 11,941 90.5 12,191 89.9 12,840 90.3 46,505 
Residence 81.2 7,448 80.5 9,365 80.6 9,724 78.8 9,588 80.2 36,125 
Last name 89.8 10,009 90.2 12,613 89.9 12,893 89.4 13,136 89.8 48,651 
First name 88.7 10,009 88.6 12,611 88.4 12,888 82.8 13,130 87.0 48,638 
*

Represents the number of records where only Pittsburgh Post-Gazette death notice data were present in the variable field.

Age fields were 58.8 percent (28,592/48,651) complete. We used age data provided by state records to calculate the proportions of missing Pittsburgh Post-Gazette data by ages. For children aged <2 years, Pittsburgh Post-Gazette death notices contained age data a median of 90.3 percent of the time, 19.3 percent for those aged 2–29 years, 36.3 percent for those aged 30–59 years, 43.4 percent for those aged 60–89 years, and 29.3 percent for those aged ≥90 years. Nonblank fields were 87.7 percent (43,565/44,294) accurate. For the 921 death notices presenting inaccurate ages, 69.7 percent of age discrepancies were by ±1 year, with a maximum difference of 72 years.

Representativeness

Univariate analysis.

In univariate analysis, we found age and education to be associated with increased death notice reporting as compared with referents (table 4). Gender, race, manner of death, and disposition of the body were associated with decreased reporting. The 129 MCDs had mixed results. The date of death was not a significant predictor.

TABLE 4.

Representativeness as shown by significant predictors of Pittsburgh Post-Gazette death notice reporting using univariate logistic regression, Pennsylvania, 1998–2001*

Variable % of deaths with a death notice No. of death notices/no. of state records Odds ratio 95% confidence interval 
Year     
    1998 69.5 8,915/12,833 1.0 Referent 
    1999 76.1 11,629/15,283 1.326 1.326, 1.472 
    2000 78.7 8,928/11,352 1.612 1.521, 1.708 
    2001 78.7 12,028/15,294 1.599 1.516, 1.687 
Month     
    July 75.0 3,466/4,621 1.0 Referent 
    June 71.0 2,480/3,495 0.814 0.738, 0.899 
    May 71.7 2,602/3,631 0.843 0.764, 0.930 
    January 77.6 3,493/4,502 1.154 1.047, 1.271 
Age (continuous variable) 75.0 41,077/54,758 1.008† 1.006, 1.009 
Education (continuous variable) 75.3 40,129/53,291 1.093† 1.085, 1.100 
Race     
    White 76.3 36,967/48,425 1.0 Referent 
    Black 65.0 4,038/6,187 0.582 0.550, 0.616 
    Other 48.7 73/150 0.294 0.213, 0.405 
Gender     
    Male 75.5 19,229/25,455 1.0 Referent 
    Female 74.6 21,849/29,307 0.949 0.912, 0.986 
Patient status at death     
    Home 76.3 8,814/11,548 1.0 Referent 
    Nursing home 73.5 11,116/15,128 0.859 0.813, 0.909 
    Other 71.5 917/1,283 0.777 0.683, 0.884 
Disposition     
    In state, burial 78.2 3,781/43,199 1.0 Referent 
    In state, cremation 65.0 6,554/10,078 0.519 0.495, 0.543 
    In state, donation 50.3 89/177 0.282 0.210, 0.379 
    Out of state, burial 46.3 371/806 0.241 0.209, 0.277 
    Out of state, cremation 25.7 9/35 0.097 0.045, 0.206 
    Out of state, other 28.6 2/7 0.112 0.022, 0.575 
    Out of state, unknown 48.3 28/58 0.260 0.155, 0.436 
    Unknown 60.0 243/405 0.418 0.342, 0.511 
Certifier of death     
    Certifying physician 76.9 13,374/17,395 1.0 Referent 
    Coroner/medical examiner 68.9 3,379/4,903 0.667 0.621, 0.715 
    Pronouncing physician 74.9 24,325/32,464 0.899 0.861, 0.938 
Manner of death     
    Natural causes 75.3 38,183/50,686 1.0 Referent 
    Suicide 69.8 345/494 0.758 0.625, 0.920 
    Homicide 57.9 195/337 0.449 0.362, 0.558 
    Undetermined 57.3 43/75 0.440 0.278, 0.695 
Coroner consult     
    No 76.7 26,537/34,592 1.0 Referent 
    Yes 70.6 11,918/16,874 0.730 0.700, 0.761 
    Unknown 79.6 2,623/3,296 1.183 1.083, 1.292 
Autopsy     
    No 75.4 37,992/50,358 1.0 Referent 
    Yes 70.0 3,040/4,342 0.760 0.710, 0.813 
Variable % of deaths with a death notice No. of death notices/no. of state records Odds ratio 95% confidence interval 
Year     
    1998 69.5 8,915/12,833 1.0 Referent 
    1999 76.1 11,629/15,283 1.326 1.326, 1.472 
    2000 78.7 8,928/11,352 1.612 1.521, 1.708 
    2001 78.7 12,028/15,294 1.599 1.516, 1.687 
Month     
    July 75.0 3,466/4,621 1.0 Referent 
    June 71.0 2,480/3,495 0.814 0.738, 0.899 
    May 71.7 2,602/3,631 0.843 0.764, 0.930 
    January 77.6 3,493/4,502 1.154 1.047, 1.271 
Age (continuous variable) 75.0 41,077/54,758 1.008† 1.006, 1.009 
Education (continuous variable) 75.3 40,129/53,291 1.093† 1.085, 1.100 
Race     
    White 76.3 36,967/48,425 1.0 Referent 
    Black 65.0 4,038/6,187 0.582 0.550, 0.616 
    Other 48.7 73/150 0.294 0.213, 0.405 
Gender     
    Male 75.5 19,229/25,455 1.0 Referent 
    Female 74.6 21,849/29,307 0.949 0.912, 0.986 
Patient status at death     
    Home 76.3 8,814/11,548 1.0 Referent 
    Nursing home 73.5 11,116/15,128 0.859 0.813, 0.909 
    Other 71.5 917/1,283 0.777 0.683, 0.884 
Disposition     
    In state, burial 78.2 3,781/43,199 1.0 Referent 
    In state, cremation 65.0 6,554/10,078 0.519 0.495, 0.543 
    In state, donation 50.3 89/177 0.282 0.210, 0.379 
    Out of state, burial 46.3 371/806 0.241 0.209, 0.277 
    Out of state, cremation 25.7 9/35 0.097 0.045, 0.206 
    Out of state, other 28.6 2/7 0.112 0.022, 0.575 
    Out of state, unknown 48.3 28/58 0.260 0.155, 0.436 
    Unknown 60.0 243/405 0.418 0.342, 0.511 
Certifier of death     
    Certifying physician 76.9 13,374/17,395 1.0 Referent 
    Coroner/medical examiner 68.9 3,379/4,903 0.667 0.621, 0.715 
    Pronouncing physician 74.9 24,325/32,464 0.899 0.861, 0.938 
Manner of death     
    Natural causes 75.3 38,183/50,686 1.0 Referent 
    Suicide 69.8 345/494 0.758 0.625, 0.920 
    Homicide 57.9 195/337 0.449 0.362, 0.558 
    Undetermined 57.3 43/75 0.440 0.278, 0.695 
Coroner consult     
    No 76.7 26,537/34,592 1.0 Referent 
    Yes 70.6 11,918/16,874 0.730 0.700, 0.761 
    Unknown 79.6 2,623/3,296 1.183 1.083, 1.292 
Autopsy     
    No 75.4 37,992/50,358 1.0 Referent 
    Yes 70.0 3,040/4,342 0.760 0.710, 0.813 
*

To compensate for anomalies in newspaper publication procedures, 5,519 records from the first 6 weeks of 1998 and April–June of 2000 were omitted from the regression.

Odds ratio per year.

Although non-Whites were reported less in Pittsburgh Post-Gazette death notices, Blacks comprised only 11.3 percent (6,187/54,762) and all other races 0.3 percent (150/54,762) of all deaths. Those deaths from suicide, homicide, or an undetermined manner were reported less than those from natural causes. However, when combined, these represent 1.7 percent (906/54,762) of all deaths. Finally, although cremation, donation, and out-of-state disposition of a body reduced death notice reporting, all out-of-state dispositions and in-state donations of bodies combined represent only 2.0 percent (1,083/54,762) of all deaths.

Of 129 MCDs in Allegheny County, 64 had death notice reporting rates that were significantly different from that of the city of Pittsburgh, the referent. Residents of these 64 MCDs represented 44.3 percent of all deaths in the county. The median odds ratio of death notice reporting was 0.38 (mean odds = 0.8) for these MCDs. Twenty three were associated with higher death notice reporting than Pittsburgh (odds ratio (OR) range: 1.4–3.2) and 41 with lower reporting (OR range: 0.0–0.8).

Multivariate analysis.

Multivariate odds ratios for age, education, race, and gender remained consistent with univariate results (table 5). In the multivariate analysis, accidental manner of death (referent is death from natural causes) was associated with increased reporting, replacing suicide, homicide, and undetermined manners of death associated with decreased reporting in the univariate analysis. Disposition of the body (table 5) had fewer significant levels of response but remained associated with decreased death notice reporting.

TABLE 5.

Representativeness as shown by significant predictors of Pittsburgh Post-Gazette death notice reporting using multivariate logistic regression, Pennsylvania, 1998–2001*

Variable % of deaths with a death notice No. of death notices/no. of state records Odds ratio 95% confidence interval 
Age (continuous variable) 75.3 40,128/53,289 1.003† 1.002, 1.005 
Education (continuous variable) 75.3 40,128/53,289 1.096† 1.087, 1.104 
Race     
    White 76.5 36,148/47,260 1.0 Referent 
    Black 66.4 3,910/5,885 0.445 0.413, 0.479 
    Other 48.6 70/144 0.268 0.187, 0.384 
Gender     
    Male 76.0 18,856/24,823 1.0 Referent 
    Female 74.7 21,272/28,466 0.870 0.830, 0.912 
Patient status at death     
    Residence 76.6 8,646/11,286 1.0 Referent 
    Nursing home 73.9 10,791/14,613 0.750 0.697, 0.808 
Disposition of body     
    In state, burial 78.4 32,969/42,074 1.0 Referent 
    In state, cremation 65.9 6,426/9,758 0.393 0.368, 0.420 
    In state, donation 51.2 89/174 0.170 0.123, 0.235 
    Out of state, burial 46.5 365/785 0.150 0.129, 0.175 
    Out of state, cremation 26.5 9/34 0.052 0.023, 0.117 
    Out of state, unknown 28.6 2/57 0.150 0.086, 0.261 
Manner of death     
    Natural causes 75.6 37,268/49,285 1.0 Referent 
    Accident 73.5 1,048/1,426 1.283 1.079, 1.524 
Certifier of death     
    Certifying physician 77.1 13,068/16,958 1.0 Referent 
    Coroner/medical examiner 69.8 3,320/4,758 0.703 0.616, 0.801 
Variable % of deaths with a death notice No. of death notices/no. of state records Odds ratio 95% confidence interval 
Age (continuous variable) 75.3 40,128/53,289 1.003† 1.002, 1.005 
Education (continuous variable) 75.3 40,128/53,289 1.096† 1.087, 1.104 
Race     
    White 76.5 36,148/47,260 1.0 Referent 
    Black 66.4 3,910/5,885 0.445 0.413, 0.479 
    Other 48.6 70/144 0.268 0.187, 0.384 
Gender     
    Male 76.0 18,856/24,823 1.0 Referent 
    Female 74.7 21,272/28,466 0.870 0.830, 0.912 
Patient status at death     
    Residence 76.6 8,646/11,286 1.0 Referent 
    Nursing home 73.9 10,791/14,613 0.750 0.697, 0.808 
Disposition of body     
    In state, burial 78.4 32,969/42,074 1.0 Referent 
    In state, cremation 65.9 6,426/9,758 0.393 0.368, 0.420 
    In state, donation 51.2 89/174 0.170 0.123, 0.235 
    Out of state, burial 46.5 365/785 0.150 0.129, 0.175 
    Out of state, cremation 26.5 9/34 0.052 0.023, 0.117 
    Out of state, unknown 28.6 2/57 0.150 0.086, 0.261 
Manner of death     
    Natural causes 75.6 37,268/49,285 1.0 Referent 
    Accident 73.5 1,048/1,426 1.283 1.079, 1.524 
Certifier of death     
    Certifying physician 77.1 13,068/16,958 1.0 Referent 
    Coroner/medical examiner 69.8 3,320/4,758 0.703 0.616, 0.801 
*

To compensate for anomalies in newspaper publication procedures, 5,519 records from the first 6 weeks of 1998 and April–June of 2000 were omitted from the regression. Additionally, 1,473 records were excluded primarily because of missing education data.

Odds ratio per year.

In multivariate analysis, five MCDs were associated with increased reporting (OR range: 1.20–1.75) and 46 with decreased reporting (OR range: 0.00–0.90). The median odds ratio declined to 0.12 (mean odds = 0.36) for the 51 MCDs versus 0.38 for the 64 MCDs in the univariate analysis. Additionally, the proportion of all deaths residing in one of these 51 MCDs declined from 44.3 percent in the univariate to 33.2 percent in the multivariate analysis.

Of 54,762 deaths, 1,473 (2.3 percent) were excluded from the multivariate analysis primarily because of missing data on education (1,471/1,473). Although 2.4 percent of records for Whites were deleted because of blank data on education, 4.9 percent and 4.0 percent of Black and other races were removed. This contributed to a decrease in odds ratios for racial categories as compared with univariate results.

DISCUSSION

The free, online death notices provided by the primary regional newspaper, the Pittsburgh Post-Gazette, documented approximately 69 percent of deaths in the county from 1998 and 75–79 percent in the latter 3 years. Reporting was lower in 1998 because the Pittsburgh Post-Gazette began online posting of death notices that year, and the initial 2–4 months of publication suffered from gaps in data entry not repeated subsequently.

In all 4 years of the study, 91 percent of death notices for Allegheny County residents had a corresponding state mortality record. Data differences exceeding the predefined matching parameters of the study prevented some additional matching (e.g., >2-letter difference in the spelling of a name and >2-day difference in dates of death). Moreover, an examination of state records revealed that some individuals identified as Allegheny County residents appeared to have out-of-state addresses. As the result of potential imperfections of the gold standard, the sensitivity and PPV of the Pittsburgh Post-Gazette system may have been underestimated in this analysis.

The median and mean lag times between the date of death and the date of publication were 2 days. Nearly 82 percent of notices were reported within 2 days of death and 98 percent within 7 days. Thus, the timeliness of death notice reporting exceeds other means of mortality monitoring used by the CDC, such as the 122 Cities Mortality Reporting System which is delayed by at least 2–3 weeks (2, 5–7).

Residency fields in the Pittsburgh Post-Gazette were 80 percent accurate. Residence in state mortality records is defined as the deceased's permanent or principal domicile based on legal state/county records (e.g., driver's license, tax forms). However, the residence provided in death notices is self-defined by the deceased's family. As a result, a difference between where people physically lived and where they associated themselves (e.g., where they were raised or lived most of their lives) may have contributed to the discrepancies between the two data sets. Nevertheless, additional research into the geographic distribution of reporting rates may provide a method of tracking local spatial trends in mortality.

Data derived for name, date of death, age, and gender were all approximately 90 percent accurate. With the exception of age, completeness for these variables exceeded 95 percent. Age was published in only approximately 60 percent of cases. The more frequent lack of age data among death notices for those aged 50–90 years is offset by the increased likelihood of death notice publication for older individuals. Additionally, if reporting sensitivity by age remains constant over time, these data could be useful for monitoring age-specific deaths over time.

Although residents of over a third of the county's municipalities were less likely to publish a death notice than city of Pittsburgh residents, multivariate analysis controlled for much of the reporting discrepancy among MCDs, reducing the median reporting odds ratio from 0.38 in the univariate analysis to 0.12 in the multivariate. Thus, county residents of municipalities surrounding the city of Pittsburgh are nearly as well represented in the death notices as those of the county's urban center.

Additionally, further investigation of the MCD results using Census 2000 median household incomes for 1999 indicated that household income did not impact reporting. MCDs with both higher and lower reporting rates had median household incomes between that of Pittsburgh (referent) and the 78 MCDs exhibiting no significant reporting difference from the referent.

Increases in years of education were associated with a slightly higher likelihood of death notice publication. Females and non-Whites were less likely to have a published death notice. However, according to Census 2000, non-Whites comprised approximately only 15 percent of the county's population (7). Again, if reporting rates remain constant for these factors, this system could be useful for monitoring trends even among groups less likely to have an online death notice. Alternatively, sentinel populations appropriate to the disease of interest could be selected to compensate for these biases.

Selwyn Collins, a pioneer in mortality surveillance, observed that, for every peak in excess mortality associated with influenza and pneumonia, there is a corresponding peak in excess mortality attributed to other causes (11). Research in this area demonstrated associations of increased risk of influenza-related death and patients with a variety of chronic diseases, including diabetes mellitus, multiple sclerosis, scoliosis, and some heart diseases (1, 11–13). Accordingly, all-cause mortality distributions will indicate influenza epidemics with wider, less pronounced peaks than those using influenza-specific mortality data. Thus, a more accurate measure of the total impact of an epidemic is given by all-cause, excess mortality calculations, where the cause of death on a death certificate may mask the presence of an alternate contributing cause as with influenza or human immunodeficiency virus historically (12). Additionally, the wider distributions increase surveillance sensitivity and the timeliness of existing epidemic detection methods. Therefore, the lack of cause-of-death data in the Pittsburgh Post-Gazette death notices is not expected to adversely affect excess mortality quantification or epidemic detection. This is supported by figure 1 that indicates the similarity of mortality trends among Allegheny County all-cause deaths, pneumonia/influenza deaths, and Pittsburgh Post-Gazette death notice reporting. Future research should evaluate the utility of the Pittsburgh Post-Gazette death notices to specifically detect excess mortality and influenza epidemics.

FIGURE 1.

Frequencies of all-cause and pneumonia/influenza mortality recorded in the mortality registry of the Division of Vital Records, Pennsylvania Department of Health, compared with reported Pittsburgh Post-Gazette death notice frequencies for Allegheny County, Pennsylvania, aggregated by 2-week intervals, 1998–2001.

FIGURE 1.

Frequencies of all-cause and pneumonia/influenza mortality recorded in the mortality registry of the Division of Vital Records, Pennsylvania Department of Health, compared with reported Pittsburgh Post-Gazette death notice frequencies for Allegheny County, Pennsylvania, aggregated by 2-week intervals, 1998–2001.

Concerns about bioterrorism and a virulent influenza pandemic have focused attention on identifying novel data sources in order to enhance public health surveillance. With rapid influenza outbreak detection and a means to measure the magnitude of an epidemic, local public health resources could be used more effectively to mitigate epidemic impacts. In addition to the potential utility of this system for influenza surveillance, monitoring of online death notices may also be useful for detecting and assessing the magnitude of bioterrorism attacks and naturally occurring outbreaks associated with excess mortality.

Currently, there are several syndromic surveillance systems, such as the Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE) and the Real-time Outbreak Detection System (RODS), accessible to several state and county health departments throughout the nation. Additionally, CDC's BioSense syndromic surveillance, Electronic Early Aberration Reporting System (EARS), and 122 Cities Mortality Monitoring System programs offer opportunities for federal implementation. Syndromic systems are in need of additional data streams to bolster signal detection sensitivity without losing specificity and would benefit from the inclusion of additional signals. Incorporating mortality surveillance into these systems would be an effective approach to operationalization and would provide the ability to categorize the cause of the detected epidemic/excess number of deaths.

Smaller cities served by one or two newspapers providing free death notices offer the most probable candidates for replication of the findings presented here. Additionally, the existence of popular online obituary services, such as Legacy.com and ObitsArchive.com, offers additional data-mining opportunities not explored in the current research. Legacy.com serves over 1,000 newspapers in cities nationwide and could be used as a supplemental source of death notice data. A preliminary Internet search revealed that the geographically diverse cities of Austin, Texas, Tulsa, Oklahoma, St. Louis, Missouri, Charlotte, North Carolina, Des Moines, Iowa, Colorado Springs, Colorado, San Buenaventura (i.e., Ventura), California, Winston-Salem, North Carolina, and Peoria, Illinois, fit the recommended profile and have populations ranging from approximately 104,000 to 690,250. Additionally, Phoenix, Arizona (population of approximately 1,461,575), had a total of five newspapers with two providing free obituaries and the others exclusively using the online Legacy.com service to compile obituary data. With identification of other cities across the United States where death notice data can be used to enhance local-level influenza surveillance, it would be possible to track temporal and spatial trends in influenza-related mortality nationwide.

Abbreviations

    Abbreviations
  • CDC

    Centers for Disease Control and Prevention

  • EDR

    electronic death registration

  • MCD

    minor civil division code

  • OR

    odds ratio

  • PPV

    positive predictive value

The authors thank Virginia Dato for her assistance with concept development, Kathleen Shutt for her assistance with SAS programming, Wendy Chapman for her expertise in Python programming language, and the Pennsylvania Department of Health's Bureau of Health Statistics and Research for state mortality records.

Conflict of interest: none declared.

References

1.
Collins
SD
Lehmann
JL
Excess deaths from influenza and pneumonia and from important chronic diseases during epidemic periods, 1918–1951
1953
Washington, DC
Public Health Service, Department of Health, Education, and Welfare
2.
Stroup
NE
Zack
MM
Wharton
M
Teutsch
SM
Churchill
RE
Sources of routinely collected data for surveillance
Principles and practice of public health surveillance
 , 
1999
New York, NY
Oxford University Press
3.
Pully
J
Electronic death registration still alive: SSA program makes progress in states
Fed Comput Week
 , 
2006
 
4.
E-Government. E-Vital performance measures—summary view
 , 
2007
Washington, DC
Office of Management and Budget
 
5.
Centers for Disease Control and Prevention. Flu activity & surveillance: reports and surveillance methods in the United States
2006
Atlanta, GA
Centers for Disease Control and Prevention
 
6.
Centers for Disease Control and Prevention. 121 Cities Mortality Reporting System: history
2006
Atlanta, GA
Centers for Disease Control and Prevention
 
7.
Simonsen
L
Clarke
MJ
Stroup
DF
, et al.  . 
A method for timely assessment of influenza-associated mortality in the United States
Epidemiology
 , 
1997
, vol. 
4
 (pg. 
390
-
5
)
8.
Choi
K
Thacker
SB
An evaluation of influenza mortality surveillance, 1962 –1979. II. Percentage of pneumonia and influenza deaths as an indicator of influenza activity
Am J Epidemiol
 , 
1981
, vol. 
113
 (pg. 
227
-
35
)
9.
German
RR
Lee
LM
Horan
JM
, et al.  . 
Updated guidelines for evaluating public health surveillance systems
MMWR Recomm Rep
 , 
2001
, vol. 
50
 (pg. 
1
-
35
)
10.
Klaucke
DN
Buehler
JW
Thacker
SB
, et al.  . 
Guidelines for evaluating surveillance systems
MMWR Morb Mortal Wkly Rep
 , 
1988
, vol. 
37
 
suppl 5
(pg. 
S1
-
18
)
11.
Eickhoff
TC
Sherman
IL
Serfling
RE
Observation on excess mortality associated with epidemic influenza
JAMA
 , 
1961
, vol. 
176
 (pg. 
776
-
82
)
12.
Langmuir
AD
Farr
W
Founder of modern concepts of surveillance
Int J Epidemiol
 , 
1976
, vol. 
5
 (pg. 
13
-
18
)
13.
Collins
SD
Excess mortality from causes other than influenza and pneumonia during influenza epidemics
1932
Washington, DC
Public Health Service, Department of Health, Education, and Welfare