Abstract

To determine the completeness of reporting of human immunodeficiency virus (HIV) diagnoses to state surveillance systems, the authors used capture-recapture methods. The numbers of cases diagnosed in the areas were estimated using HIV diagnoses reported to nine surveillance programs by different sources (e.g., laboratories, health-care providers). To account for dependencies between reporting sources, the authors used log-linear models to estimate the number of cases that had been diagnosed but were not identified by any reporting sources. Completeness of reporting (observed cases/expected cases) was determined for two time frames: cases diagnosed within a 1-year period (from October 1, 2002, to September 30, 2003, for most US states) reported up to 6 months after that diagnosis period and cases diagnosed within a 6-month period reported up to 12 months after that diagnosis period. A total of 11,266 HIV diagnoses were reported for the 1-year period with 21,589 report documents. Completeness of reporting of HIV diagnoses was 76% (95% confidence interval: 66, 83) when allowing 6 months of reporting delay (range: 72–95%) and improved to 81% (95% confidence interval: 72, 88) with 12 months' follow-up. When reporting systems retain all relevant documents, capture-recapture is a feasible approach for assessing completeness of reporting of HIV diagnoses. Completeness should be measured by allowing 12-months' reporting delay.

The accuracy of case counts and prevalence and incidence rates of disease depends on the completeness of case ascertainment. Studies of the completeness of reporting of acquired immunodeficiency syndrome (AIDS) diagnoses in different geographic areas of the United States and for different years of AIDS diagnosis showed that AIDS reporting was 60–98 percent complete (16). In one area, the completeness of AIDS reporting was also found to vary by reporting source; reporting from physicians' offices was less complete (75 percent) than reporting from hospitals (100 percent) or public/community health clinics (96 percent) (6). Less information is available on the completeness of reporting of diagnoses of human immunodeficiency virus (HIV) infection (not AIDS) (79). An assessment in South Carolina with hospital chart reviews for 1986–1990 found 21 percent underreporting of HIV-infected patients (8).

These earlier studies are of limited value because they assessed completeness of reporting of HIV or AIDS diagnoses for only a few areas and for isolated time periods. However, information on completeness of reporting of HIV or AIDS diagnoses in the US surveillance system should be available on a frequent basis, for example, for each data acquisition year and for each reporting area. In addition, to aid in the interpretation of counts and rates, completeness assessments must be coupled to timeliness so that completeness for the most recent reporting period (e.g., most recent year when computing the annual incidence rate) is known when the data are disseminated. A further limitation to most of the earlier studies is that they used active case finding (i.e., they reviewed medical or other records to determine missed cases), an approach ill suited for routine assessments of completeness because it is labor intensive.

In recent years, surveillance sites have incorporated into their routine case-finding activities most data sources that collect or include information on HIV-infected people, making it impossible to assess completeness using a single data source. On the other hand, the availability of case reports from various data sources makes capture-recapture methods a feasible and efficient approach for evaluating completeness of reporting. This report describes a pilot project to assess completeness of reporting of HIV and AIDS diagnoses in the US surveillance system using capture-recapture methods.

MATERIALS AND METHODS

We calculated completeness of reporting of HIV diagnoses (people may receive a diagnosis of HIV with or without a concurrent AIDS diagnosis, here referred to as HIV diagnoses) or AIDS diagnoses by comparing the number of diagnoses reported to the surveillance program for a diagnosis year with the number of diagnoses expected to have been made during that year. We used capture-recapture methods to estimate the expected number of diagnoses (1014). The overlap of reporting of diagnoses from different sources was determined, as was the number of diagnoses reported solely by each source. Using these numbers, the total number of diagnoses that were not reported by any source was estimated so that the total number of expected diagnoses within the population could be calculated.

HIV diagnoses included all new diagnoses of HIV occurring within the project's time window, regardless of whether AIDS was diagnosed at the time of HIV diagnosis or later. The HIV diagnosis date was the earliest date of HIV diagnosis (with or without a concurrent diagnosis of AIDS). AIDS diagnoses included all AIDS diagnoses occurring within the project's time window, regardless of whether HIV (not AIDS) had been diagnosed and reported before.

Data collection

The surveillance programs of seven states (Florida, Illinois, Louisiana, Maryland, Michigan, New Jersey, and Washington) and two metropolitan areas (Dallas (Dallas County, Texas) and New York (Bronx, Kings, New York, Queens, and Richmond counties, New York)) participated in this project and collected information on persons with a diagnosis of HIV. Only data that were stored with name identifiers were used, which included AIDS data from all areas and HIV data from all but three areas (Illinois, Maryland, and Washington). For this project, the Florida program collected information from 40 of its 67 counties, which comprise 54 percent of Florida's population, about 35 percent of AIDS diagnoses, and 35 percent of HIV (not AIDS) diagnoses. Just as in routine surveillance, the names and other patients' identifiers were removed from all data before transferring the data to the Centers for Disease Control and Prevention (CDC).

HIV diagnoses are reported to health departments from a variety of reporting sources. To maximize completeness of reporting, most states require multiple providers (e.g., physicians and laboratories) to independently report diagnoses. In the traditional (or case-based) surveillance system, when more than one report is received for an individual, for example, from a medical care provider and from a laboratory, the information from these reports is consolidated, and one record with all pertinent information is entered into the surveillance system for that person. For this study, all reports on new HIV and AIDS diagnoses received within a 1-year study period (from October 1, 2002, to September 30, 2003, for most states) and up to 6 months after the diagnosis period were retained, and information for each report was entered into a database that was separate from the main surveillance system. Examples of reports include data collection forms supplied by CDC, laboratory reports, or data obtained through linkage to other databases. Reports were received by passive surveillance (i.e., reporting source sends report) or by active surveillance (e.g., health department investigator reviews medical records at a hospital or contacts health-care provider by phone). Some reports had insufficient information to determine whether the case definition was met, and some were missing key demographic information but met the case definition after follow-up. The follow-up information was considered part of the same report, even when obtained from a different source. Each report was entered into the database, and the reporting source was indicated. Although each separate report for the study was maintained, information from all the reports for an individual was also consolidated to derive a person-based record in the main surveillance system; these person-based records were transferred to CDC on a routine monthly schedule, as part of ongoing surveillance activities.

Reporting sources included laboratories, hospitals, physicians' offices/health-care providers, vital statistics departments (death and birth certificates), local health departments, and HIV-testing and -counseling sites. Reporting sources were coded as inpatient; outpatient; emergency room; screening, diagnosis, or referral agencies; laboratory; other databases; other facility records; or other records. Other databases might include death certificates, Medicaid or billing records, or other disease registries such as cancer or hepatitis. Other facilities' records were from correctional facilities or coroners. Other records may be sources that were not included in the above categories.

Separate reports from each database were matched to records for HIV-infected persons reported to CDC's HIV/AIDS Reporting System by use of the state-assigned unique case identification number. The information obtained from the HIV/AIDS Reporting System included the date of diagnosis (HIV, AIDS, or both), which was ultimately used to determine eligibility within the project's time window and to classify HIV and AIDS diagnoses.

Analyses

Capture-recapture methods were used to estimate the total number of cases diagnosed overall and in each reporting area. To take into account the dependencies between reporting sources, we applied log-linear models (15) to estimate the number of cases that would have been diagnosed but were not identified by any reporting sources. To reduce bias in model selection, we used a model-averaging approach (16). Models with various dependencies between reporting sources were considered. The weight for each contributing model, k, is exp(−AIC(k)/2)/Σover k (exp(−AIC(k)/2)), where “AIC” is the Akaike Information Criterion. Wald's confidence interval estimates were obtained by use of the unconditional variances that integrate model selection uncertainty into estimates of precision (17).

Because the original coding of report sources did not provide enough reports in some categories (e.g., emergency room), these were combined with other sources. We applied three-source (laboratories, inpatient/outpatient care providers and emergency rooms, and other) and four-source (laboratories, inpatient, outpatient and emergency room, and other) models to the data. However, because estimates from four-source models were unstable (the standard errors of the estimated number of cases not captured from the four-source models were 90 percent or greater than the estimates), the reported results are based on the three-source models.

We calculated completeness overall and by evaluation area for AIDS diagnoses (all AIDS diagnoses, all programs) and HIV diagnoses (all HIV diagnoses with or without concurrent AIDS diagnosis, eight programs). We excluded diagnoses for persons who did not reside in an evaluation area at the time of diagnosis. We first calculated the completeness for the full diagnosis year with the 6-month follow-up for case reporting; CDC's annual HIV/AIDS reports are typically based on data with 6 months' follow-up after a diagnosis year to which adjustment weights are applied to account for delays in reporting (18). We also calculated the completeness for the first 6 months of the diagnosis year to allow a 12-month follow-up period in accordance with the proposed performance standard for completeness of HIV reporting. Finally, we calculated the following ratios: the observed number of diagnoses divided by the number corrected for the estimated underascertainment (obtained from the capture-recapture analyses); the observed number of diagnoses divided by the number of diagnoses adjusted for reporting delays; and the number of diagnoses adjusted for reporting delays divided by the number of diagnoses from capture-recapture analyses. Estimates adjusted for reporting delays were calculated using the methods applied for CDC's surveillance reports (18).

Generally, we do not expect all diagnoses to be reported from only one report source because HIV infection may be diagnosed in a variety of settings. However, more and more surveillance programs rely on laboratory reporting. Most cases are diagnosed on the basis of laboratory results, and the laboratory reports could be sent to the surveillance program. We therefore also assessed the completeness of reporting from laboratories by comparing the number of diagnoses reported by laboratories (at least one laboratory report) with the total number of expected diagnoses.

RESULTS

Reporting of HIV diagnoses

A total of 11,266 HIV diagnoses (range per registry: 809–4,295) were reported by the six surveillance programs with confidential, name-based HIV and AIDS reporting for the 1-year study period, and a total of 21,589 reports (range: 1,509–7,825) were received for these diagnoses. Most reports were from laboratories, but percentages varied (32–78 percent) among programs (table 1). The next most common sources of reports were outpatient and inpatient facilities. Thirty-nine percent of HIV diagnoses were reported by two or more sources (table 2).

TABLE 1.

Percentage of reports of HIV* diagnoses by source of report, nine US HIV-reporting sites, 2002–2004


Reporting area
 

Inpatient facility
 

Outpatient facility
 

Emergency room
 

Screening, diagnosis, or referral agency
 

Laboratory
 

Other database
 

Other facility records
 

Other source
 

Unknown source
 
HIV diagnoses          
    Site A 64 
    Site C 14 61 10 
    Site E 10 11 12 40 22 
    Site F 21 31 10 32 
    Site G 53 40 
    Site I 60 36 
AIDS* diagnoses          
    Site B 10 78 
    Site D 23 36 35 
    Site H
 
16
 
5
 
0
 
4
 
63
 
11
 
2
 
1
 
0
 

Reporting area
 

Inpatient facility
 

Outpatient facility
 

Emergency room
 

Screening, diagnosis, or referral agency
 

Laboratory
 

Other database
 

Other facility records
 

Other source
 

Unknown source
 
HIV diagnoses          
    Site A 64 
    Site C 14 61 10 
    Site E 10 11 12 40 22 
    Site F 21 31 10 32 
    Site G 53 40 
    Site I 60 36 
AIDS* diagnoses          
    Site B 10 78 
    Site D 23 36 35 
    Site H
 
16
 
5
 
0
 
4
 
63
 
11
 
2
 
1
 
0
 
*

HIV, human immunodeficiency virus; AIDS, acquired immunodeficiency syndrome.

Represents reports from any source, including multiple reports from the same source.

For three areas, information was available for AIDS diagnoses only.

TABLE 2.

HIV* cases reported by one, two, or three sources, nine US HIV-reporting sites, 2002–2004


 

HIV diagnoses
 
 
AIDS* diagnoses
 
 

 
No.
 
%
 
No.
 
%
 
Health-care provider 1,355 12 1,435 13 
Laboratory 3,732 33 4,700 42 
Other 1,612 14 737 
Hospital and laboratory 1,286 11 1,350 12 
Hospital and other 264 241 
Laboratory and other 2,627 23 2,132 19 
Hospital, laboratory, and other
 
390
 
3
 
484
 
4
 

 

HIV diagnoses
 
 
AIDS* diagnoses
 
 

 
No.
 
%
 
No.
 
%
 
Health-care provider 1,355 12 1,435 13 
Laboratory 3,732 33 4,700 42 
Other 1,612 14 737 
Hospital and laboratory 1,286 11 1,350 12 
Hospital and other 264 241 
Laboratory and other 2,627 23 2,132 19 
Hospital, laboratory, and other
 
390
 
3
 
484
 
4
 
*

HIV, human immunodeficiency virus; AIDS, acquired immunodeficiency syndrome.

HIV diagnoses (HIV with or without a concurrent AIDS diagnosis) were reported by six reporting sites; AIDS diagnoses were reported by all nine reporting sites.

Inpatient/outpatient care providers and emergency rooms.

Overall, the estimated completeness of reporting of HIV diagnoses for the 1-year period with 6 months of follow-up time was 76 percent, ranging from 72 percent to 95 percent among programs (table 3). Extending the follow-up time to 12 months for diagnoses within a 6-month period improved the overall completeness estimate (81 percent, 95 percent confidence interval: 72, 88). Excluding cases with concurrent HIV and AIDS diagnoses, we found that the completeness of reporting of cases diagnosed with HIV without AIDS was 70 percent (95 percent confidence interval: 58, 80) for the 1-year period with 6 months of follow-up time and 76 percent (95 percent confidence interval: 65, 85) for the 6-month period with 12 months of follow-up time (data not shown).

TABLE 3.

Estimated completeness of reporting of HIV* infections diagnosed within 1 year and reported up to 6 months after the end of the year, based on capture-recapture log-linear models, nine US HIV-reporting sites, 2002–2004


Reporting area
 

Estimated completeness (%)
 

95% confidence interval
 
HIV diagnoses   
    Total 76 66, 83 
    Site A 95 93, 97 
    Site C 76 69, 81 
    Site E 76 66, 85 
    Site F 83 73, 89 
    Site G 72 50, 88 
    Site I 79 55, 92 
AIDS* diagnoses   
    Total 77 66, 86 
    Site A 99 98, 100 
    Site B 91 81, 96 
    Site C 80 76, 84 
    Site D 69 61, 76 
    Site E 87 83, 91 
    Site F 89 84, 92 
    Site G 70 47, 86 
    Site H 76 65, 84 
    Site I
 
70
 
39, 90
 

Reporting area
 

Estimated completeness (%)
 

95% confidence interval
 
HIV diagnoses   
    Total 76 66, 83 
    Site A 95 93, 97 
    Site C 76 69, 81 
    Site E 76 66, 85 
    Site F 83 73, 89 
    Site G 72 50, 88 
    Site I 79 55, 92 
AIDS* diagnoses   
    Total 77 66, 86 
    Site A 99 98, 100 
    Site B 91 81, 96 
    Site C 80 76, 84 
    Site D 69 61, 76 
    Site E 87 83, 91 
    Site F 89 84, 92 
    Site G 70 47, 86 
    Site H 76 65, 84 
    Site I
 
70
 
39, 90
 
*

HIV, human immunodeficiency virus; AIDS, acquired immunodeficiency syndrome.

Reporting of AIDS diagnoses

The estimated completeness of reporting of AIDS diagnoses (a total of 11,079 AIDS diagnoses were reported) for the 1-year period was 77 percent (range: 69–99 percent among programs) (table 3). Reporting completeness was higher, 81 percent (95 percent confidence interval: 72, 87), for diagnoses within the 6-month period with 12 months of follow-up.

Adjusted numbers

The completeness of ascertainment of diagnoses for a diagnosis year improves with time according to reporting delays. The estimates reported in CDC's annual HIV/AIDS surveillance reports are commonly adjusted for reporting delays. Because these methods are based on historical trends of diagnoses eventually reported, adjustment for reporting delays may not account for incomplete reporting; that is, the ratio of number adjusted for reporting delays to number adjusted for incomplete reporting (obtained from capture-recapture analysis) is expected to be less than 1. Table 4 shows that some areas have higher estimates of total diagnoses from capture-recapture calculations than from adjustments for reporting delays, consistent with incomplete reporting. On the other hand, the estimates adjusted for reporting delays were higher than were the numbers corrected for underascertainment of diagnoses for some areas and in particular for area A, which had high completeness estimates. This indicates that, for area A, either the reporting delay weights overestimate the number of diagnoses or the estimate based on capture-recapture methods is an underestimate. The interprogram variability was smaller for the estimates adjusted for reporting delays than for estimates based on capture-recapture analyses.

TABLE 4.

Ratios of observed to estimated numbers of HIV* infections diagnosed within 12 months, with a 6-month follow-up period, nine US HIV-reporting sites, 2002–2004


Reporting area
 

No. of observed diagnoses/no. of diagnoses based on capture-recapture analyses
 

No. of observed diagnoses/no. of diagnoses adjusted for reporting delays
 

No. of diagnoses adjusted for reporting delays/no. of diagnoses from capture-recapture analyses
 
HIV diagnoses    
    Site A 0.95 0.85 1.12 
    Site C 0.76 0.88 0.86 
    Site E 0.76 0.87 0.88 
    Site F 0.83 0.81 1.02 
    Site G 0.72 0.78 0.93 
AIDS* diagnoses    
    Site A 0.99 0.85 1.17 
    Site B 0.91 0.85 1.07 
    Site C 0.80 0.90 0.90 
    Site D 0.69 0.76 0.90 
    Site E 0.87 0.92 0.95 
    Site F 0.89 0.84 1.06 
    Site G 0.70 0.84 0.83 
    Site H 0.76 0.83 0.92 
    Site I
 
0.70
 
0.76
 
0.92
 

Reporting area
 

No. of observed diagnoses/no. of diagnoses based on capture-recapture analyses
 

No. of observed diagnoses/no. of diagnoses adjusted for reporting delays
 

No. of diagnoses adjusted for reporting delays/no. of diagnoses from capture-recapture analyses
 
HIV diagnoses    
    Site A 0.95 0.85 1.12 
    Site C 0.76 0.88 0.86 
    Site E 0.76 0.87 0.88 
    Site F 0.83 0.81 1.02 
    Site G 0.72 0.78 0.93 
AIDS* diagnoses    
    Site A 0.99 0.85 1.17 
    Site B 0.91 0.85 1.07 
    Site C 0.80 0.90 0.90 
    Site D 0.69 0.76 0.90 
    Site E 0.87 0.92 0.95 
    Site F 0.89 0.84 1.06 
    Site G 0.70 0.84 0.83 
    Site H 0.76 0.83 0.92 
    Site I
 
0.70
 
0.76
 
0.92
 
*

HIV, human immunodeficiency virus; AIDS, acquired immunodeficiency syndrome.

Reporting delay adjustment based on historical trends of cases reported.

Based on five reporting areas for which reporting delay weights were available for HIV diagnoses.

Laboratory reporting

Laboratory evidence of HIV infection, available for virtually all persons with a diagnosis of HIV, and reliance on CD4-positive T-lymphocyte counts to establish the AIDS case definition provide the opportunity for most AIDS cases to have at least one document from a laboratory reporting source. In addition, many programs are working to establish more complete laboratory reporting. We found completeness of HIV case reporting from laboratories to range from 35 percent to 90 percent among programs.

DISCUSSION

The estimated completeness of ascertainment of HIV diagnoses is dependent on the time point when the measurement is taken. A minimal performance standard for completeness of HIV reporting was published in the 1999 Guidelines for National Human Immunodeficiency Virus Case Surveillance, Including Monitoring for Human Immunodeficiency Virus Infection and Acquired Immunodeficiency Syndrome, stating that “state and local HIV/AIDS surveillance systems should use reporting methods that provide case reporting that is complete (greater than or equal to 85 percent)” (19, p. 13). This standard does not include a time point at which completeness is to be measured or a window period during which diagnoses are made.

Annually, CDC publishes an HIV/AIDS surveillance report, which provides the numbers and rates of diagnoses for the most recent diagnosis year based on data reported to CDC by the end of June after that diagnosis year. Using the capture-recapture methods, we found that one of six surveillance programs met the completeness standard of greater than or equal to 85 percent at 6 months after the diagnosis year (comparable to the time lag for annual reports) for all HIV diagnoses, and one program came close to the standard at 83 percent. On the basis of point estimates, four of nine programs met the completeness standard for AIDS reporting. Allowing additional follow-up time, up to 12 months after the diagnosis year, would allow more programs to meet the completeness standard, such that completeness should be greater than or equal to 85 percent at 12 months after a diagnosis year. Although increasing the follow-up time (e.g., from 6 to 12 months) increases the completeness, a longer window period during which persons receive their diagnosis (e.g., 12 rather than 6 months) provides more stable estimates. Therefore, we propose a standard observation period of 1 year with a standard follow-up period of 1 year from the end of the observation period to meet the 85 percent completeness standard. Reports based on incomplete data, such as annual surveillance reports based on data that do not meet minimal performance standards, should be interpreted with caution.

The numbers and rates published in annual HIV/AIDS surveillance reports are generally adjusted for reporting delays. The reporting delay weights account for diagnoses that will eventually be reported to the surveillance system up to 5 years after the diagnosis year, according to historical trends in late reports of diagnoses. These weights do not account for diagnoses never reported to the surveillance programs. We observed that there was less variability among programs in the numbers adjusted for reporting delays (data not shown). This may be because reporting delay weights take information from all states into account and are derived primarily from data stratification by reporting area (the largest variability in reporting delay), as well as from some other factors (e.g., race, risk factor information, or size of the population in the area where the case resides). Therefore, the variability is evenly distributed across the stratification level, whereas the capture-recapture results are based on one program's data without accounting for covariate effects on the data.

Many surveillance programs aim to increase electronic reporting, in particular, electronic reporting of laboratory results for timely case notification and complete ascertainment of diagnoses. In about 89 percent of the areas reporting data to the HIV/AIDS Reporting System, reporting of CD4-positive T-lymphocyte counts (most often counts of <200/mm3 or 14 percent) and/or viral load to the health department is mandated (CDC, unpublished data). Availability of laboratory documents does not necessarily translate into higher completeness of reporting; the success of diagnosis reporting through laboratory notification will depend on the ability to follow up on such reports to obtain the information required for case surveillance that was not contained in the originally submitted documents. Each program must review its methods of ascertainment of diagnoses and training of health-care providers to determine how to improve the completeness of reporting.

The feasibility of using capture-recapture methods for routine assessments of completeness of reporting to HIV/AIDS surveillance programs is limited if the methodological assumptions are not met. A capture bias occurs when there is dependence between data sources. In this study, some data sources are unlikely to be independent; that is, the probability of a diagnosis being reported by one data source is likely affected by whether the diagnosis is reported by another data source. For example, positive dependence can be expected when diagnoses listed in a laboratory database are more likely to also be listed on a medical care provider database. This would lead to an underestimation of the total number of diagnoses (positive dependence). We applied log-linear modeling to adjust for dependence between sources. The log-linear modeling method cannot be applied to programs that collect or code data for only two reporting sources. For completeness estimates based on two sources, the assumption that the two sources are independent may not be true for some programs. Therefore, results may not be accurate for these programs, and additional assessments of completeness (e.g., case-finding studies) may be needed. When several sources are available for analysis, results may also be less reliable when less than 15 percent of diagnoses are reported by any one source, as was common in this analysis and was reflected in wide confidence intervals.

The accuracy of data, such as information about persons with a diagnosis of HIV, location of residence at the time of diagnosis, date of diagnosis, and matching of multiple reports for the same person, can influence the validity of estimates. Such influences should be minimized, however, by using the case definition consistently and ensuring that each of the data sources covers the same geographic area. The assumption that the probability of ascertainment by a reporting source is the same for all diagnoses is violated when the probability of capture (catchability) varies among persons but for any one person is unaffected by previous capture. This assumption is less likely to be violated when the case definition is used consistently; that is, the likelihood of diagnosis does not vary by demographic or other factors. However, it is possible that persons with more severe HIV disease (i.e., those whose disease progressed to AIDS) are more likely to be reported. Therefore, we assessed completeness overall, by severity of disease and by evaluation program, and found differences in the estimates of completeness for HIV and AIDS diagnoses. The probability of capture may also vary within a source, for example, when more intense efforts to find new HIV diagnoses are applied to some hospitals but not others. How such variability might affect completeness estimates is unknown, and we were not able to account for this in our analyses. Finally, migration and death affect the catchability of a person or the probability that a diagnosis being reported by a reporting source is constant over time. We controlled for migration problems by using a short time frame for the study. In addition, earlier studies found low migration patterns, in which the state of residence at diagnosis differed from the state of death for about 5 percent of persons with AIDS (20, 21).

For our analyses, we excluded follow-up reports, which are reports obtained by active investigation after a first report from another source had been received, because the initial and the follow-up reports are correlated (i.e., dependent). The sources that were actively investigated may or may not have reported the HIV or AIDS diagnoses; any reports sent by the sources' own initiative were included in the analyses. Removing follow-up reports reduces the number of diagnoses captured by both sources and may result in underestimating the completeness (by overestimating the number of diagnoses not captured by any one of the two sources). On the other hand, including follow-up reports would increase the number of diagnoses captured by both sources and result in overestimating the completeness (by underestimating the number of diagnoses not captured by any one of the two sources).

A limitation of our study is that our analyses are based on a special study that required areas to enter information for each document into a separate database from the HIV/AIDS surveillance database. Errors may have been introduced if documents were omitted, not correctly linked to existing records of HIV-infected persons, or not correctly coded. In the future, software that allows entry and retention of all documents into the surveillance database will be available. This will reduce errors of coding or omission and the burden of dual record systems. Linkage of documents is conducted by the state or local surveillance programs using all available information (e.g., name, Social Security number, sex, date of birth); we did not have information to assess errors in linking documents or the impact of incorrect linkage on results (22, 23). Duplicate reporting of cases, that is, false-negative linkage of documents, would lead to the overestimation of the number of cases, while false-positive linkage of documents may lead to the underestimation of the number of cases.

In summary, our analyses show the completeness of reporting of HIV or AIDS diagnoses to be somewhat below the estimates for the national HIV/AIDS Reporting System from previous studies (24, 25). However, it is not clear how much follow-up time was allowed in previous studies; it may well have been longer than that in our study. Our results provide the best estimates of completeness available for a large group of areas reporting data to the national HIV surveillance system and can serve as a baseline for future assessments. In addition, this pilot study assessed the feasibility of capture-recapture methods for ongoing, standardized evaluations of surveillance data. With the advent of reporting systems that retain all relevant documents with information on HIV-infected persons submitted to health departments, capture-recapture analyses are a useful and less costly way to assess completeness of reporting. These methods make annual completeness assessments feasible and can help to determine reporting weaknesses in surveillance systems and to interpret the data appropriately. Retention of documents with coding of report sources and annual completeness assessments using capture-recapture methods are recommended for all areas reporting HIV data.

Participating investigators and contributors: Dr. Rebecca Grigg (Florida); Fran Eury and Anne McIntyre (Illinois); Dr. Stephanie Broyles, Danell Watkins, and Amy Zapata (Louisiana); Dr. Colin P. Flynn (Maryland); Elizabeth Hamilton (Michigan); Dr. Helene Cross, John Ryan, and Dr. Abdel R. Ibrahim (New Jersey); Wendy Kahalas (New York State); Dr. Judith Sackoff (New York City); Dr. Sharon K. Melville, L. J. Smith, and Tammy L. Sajak (Texas); and Maria Courogen, Anna Meddaugh Baskapan, Jim Kent, and Amy Bauer (Washington).

The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention.

Conflict of interest: none declared.

References

1.
Buehler JW, Berkelman RL, Stehr-Green JK. The completeness of AIDS surveillance.
J Acquir Immune Defic Syndr
 
1992
;
5
:
257
–64.
2.
Doyle TJ, Glynn MK, Groseclose SL. Completeness of notifiable infectious disease reporting in the United States: an analytical literature review.
Am J Epidemiol
 
2002
;
155
:
866
–74.
3.
Greenberg AE, Hindin R, Nicholas AG, et al. The completeness of AIDS case reporting in New York City.
JAMA
 
1993
;
269
:
2995
–3001.
4.
Jara MM, Gallagher KM, Schieman S. Estimation of completeness of AIDS case reporting in Massachusetts.
Epidemiology
 
2000
;
11
:
209
–13.
5.
Rosenblum L, Buehler JW, Morgan MW, et al. The completeness of AIDS case reporting, 1988: a multisite collaborative surveillance project.
Am J Public Health
 
1992
;
82
:
1495
–9.
6.
Schwarcz S, Hsu L, Prisi MK, et al. The impact of the 1993 AIDS case definition on the completeness and timeliness of AIDS surveillance.
AIDS
 
1999
;
13
:
1109
–14.
7.
Klevens RM, Fleming PL, Gaines CG, et al. Completeness of HIV reporting in Louisiana, USA. (Letter).
Int J Epidemiol
 
1998
;
27
:
1105
.
8.
Meyer PA, Jones JL, Garrison CZ. Completeness of reporting of diagnosed HIV-infected hospital inpatients.
J Acquir Immune Defic Syndr
 
1994
;
7
:
1067
–73.
9.
Solomon L, Flynn C, Eldred L, et al. Evaluation of a statewide non-name based HIV surveillance system.
J Acquir Immune Defic Syndr
 
1999
;
22
:
272
–82.
10.
Hook EB, Regal RR. Capture-recapture methods in epidemiology: methods and limitations.
Epidemiol Rev
 
1995
;
17
:
243
–64.
11.
Hook EB, Regal RR. The value of capture-recapture methods for apparent exhaustive surveys. The need for adjustment for source of ascertainment intersection in attempted complete prevalence studies.
Am J Epidemiol
 
1992
;
125
:
1060
–7.
12.
Capture-recapture and multiple-record systems estimation I: history and theoretical development. International Working Group for Disease Monitoring and Forecasting.
Am J Epidemiol
 
1995
;
142
:
1047
–58.
13.
Papoz L, Balkau B, Lellouch J. Case counting in epidemiology: limitations of methods based on multiple data sources.
Int J Epidemiol
 
1996
;
25
:
474
–8.
14.
Stephen C. Capture-recapture methods in epidemiologic studies.
Infect Control Hosp Epidemiol
 
1996
;
17
:
262
–6.
15.
Cormack RM. Loglinear models for capture-recapture.
Biometrics
 
1989
;
45
:
395
–413.
16.
Stanley TR, Burnham KP. Information-theoretic model selection and model averaging for closed-population capture-recapture studies.
Biom J
 
1998
;
40
:
475
–94.
17.
Buckland ST, Burnham KP, Augustin NH. Model selection: an intergral part of inference.
Biometrics
 
1997
;
53
:
603
–18.
18.
Green T. Using surveillance data to monitor trends in the AIDS epidemic.
Stat Med
 
1998
;
17
:
143
–54.
19.
Guidelines for national human immunodeficiency virus case surveillance, including monitoring for human immunodeficiency virus infection and acquired immunodeficiency syndrome. Centers for Disease Control and Prevention.
MMWR Recomm Rep
 
1999
;
48
(RR-13):1–27, 29–31.
20.
Buehler JW, Frey RL, Chu SY, et al. The migration of persons with AIDS: data from 12 states, 1985–1992.
Am J Public Health
 
1995
;
85
:
1552
–5.
21.
Harris NS, Dean HD, Fleming PL. Demographic, behavioral, and geographic characteristics among adults who have migrated from place of AIDS diagnosis to death, United States. Presented at the 2001 National HIV Prevention Conference, Atlanta, Georgia, August 12–15, 2001.
22.
Lee AJ, Seber GAF, Holden JK, et al. Capture-recapture, epidemiology, and list mismatches: several lists.
Biometrics
 
2001
;
57
:
707
–13.
23.
Brenner H. Application of capture-recapture methods for disease monitoring: potential effects of imperfect record linkage.
Methods Inf Med
 
1994
;
33
:
502
–6.
24.
HIV/AIDS surveillance report,
2003
. Vol 15. Atlanta, GA: Centers for Disease Control and Prevention, US Department of Health and Human Services, 2004:40. (http://www.cdc.gov/hiv/topics/surveillance/resources/reports/2003report/pdf/2003SurveillanceReport.pdf).
25.
Klevens RM, Fleming PL, Li J, et al. The completeness, validity, and timeliness of AIDS surveillance data.
Ann Epidemiol
 
2001
;
11
:
442
–9.