Abstract

Objective

To support long COVID research in National COVID Cohort Collaborative (N3C), the N3C Phenotype and Data Acquisition team created data designs to aid contributing sites in enhancing their data. Enhancements include long COVID specialty clinic indicator; Admission, Discharge, and Transfer transactions; patient-level social determinants of health; and in-hospital use of oxygen supplementation.

Materials and Methods

For each enhancement, we defined the scope and wrote guidance on how to prepare and populate the data in a standardized way.

Results

As of June 2024, 29 sites have added at least one data enhancement to their N3C pipeline.

Discussion

The use of common data models is critical to the success of N3C; however, these data models cannot account for all needs. Project-driven data enhancement is required. This should be done in a standardized way in alignment with common data model specifications. Our approach offers a useful pathway for enhancing data to improve fit for purpose.

Conclusion

In this initiative, we rapidly produced project-specific data modeling guidance and documentation in support of long COVID research while maintaining a commitment to terminology standards and harmonized data.

Introduction

The National COVID Cohort Collaborative (N3C) is a centralized repository for COVID-related research.1 As of August 2024, N3C contains data on over 22.9 million patients from 84 contributing sites.2 Over 4300 users have requested access to the N3C enclave, and their work has collectively produced 296 publications and presentations covering a range of COVID-related research topics.2–9

Answering clinical questions using electronic health record data requires that the data are “fit for purpose.”10,11 As an example, it is not possible to study outcomes of patients on ventilators without granular data about use of ventilators. After the initial implementation of N3C, and as part of the NIH RECOVER Initiative,12 we recognized gaps that needed to be addressed to ensure the N3C data were “fit for purpose” for long COVID research questions.3 To that end, the following data enhancements were prioritized: (1) an indicator of whether a patient was seen in a long COVID specialty clinic; (2) Admission, Discharge, and Transfer (ADT) transactions; (3) patient-level social determinants of health (SDOH); and (4) in-hospital use of oxygen supplementation devices. In this paper, we describe N3C’s approach to enhancing data from contributing sites.

All N3C sites contribute clinical data to the repository using 1 of 4 common data models (CDMs, including OMOP (Observational Medical Outcomes Partnership), i2b2/ACT, PCORnet, or TriNetX). Data from each CDM are harmonized centrally into the OMOP CDM, which is exposed to N3C users. The original scope included data commonly available in CDMs: demographics, encounter information, diagnoses, procedures, laboratory tests, medications, and immunizations.

Although the various CDM communities offer structure and guidance to add new data elements, it was not enough to simply ask sites to add the 4 data elements listed above. CDMs specifications do not provide the level of detail necessary to ensure these new data elements were modeled and structured in a consistent manner. If each site added these data elements without specific guidance, the resulting data would be difficult to harmonize, and it would ultimately be harder for researchers to use these data meaningfully.

We collaboratively developed and disseminated Data Designs: documentation detailing the definition, scope, and structure for each data enhancement, customized for each CDM.13 These Data Designs provide guidance to contributing sites to add data enhancements in a consistent manner (with standardized structure, coding, and meaning that conforms to the rules of each CDM), while also managing the burden and effort required of sites.

Methods

The N3C Phenotype and Data Acquisition team, composed of subject matter experts (SMEs) for the 4 CDMs, was charged with supporting sites in expanding their data to include information about long COVID specialty clinics, ADT transactions, SDOH, and oxygen supplementation. The process happened quickly: Data Designs for each enhancement were produced and disseminated in about 4 months. Major steps in the data enhancement process were as follows:

  1. Understand domains in the context of COVID research. Background research was required for each domain to determine data availability, variation across sites, and research needs.

  2. Define scope. For each data enhancement, we needed to define the appropriate scope for the data. This required balancing the research needs with minimizing burden on contributing sites.

  3. Write CDM-agnostic guidance. Each Data Design begins with a definition of the data enhancement, a description of why it is important, and scoping guidance. For some data enhancements, we provide additional guidance on sourcing data and mapping.

  4. Write CDM-specific guidance for structuring each data enhancement. To ensure data were submitted in a standardized manner, we drafted guidance detailing how contributing sites should add data to their CDMs. This included specifying target tables, field mappings, and values, in accordance with the rules of each CDM. Figure 1 shows examples used in the CDM-specific guidance.

  5. Dissemination and support. Data Designs are posted on the N3C GitHub13 and were shared with participating sites via webinar and email. The Phenotype and Data Acquisition team supported sites through challenges via consultations and Slack.

  6. Harmonization and data quality. The N3C ETL (extract, transform, and load) process was updated to include the data enhancements. As part of N3C’s data quality assessment, sites receive feedback through regularly distributed Data Quality Scorecards.14 Data enhancements were incorporated into these scorecards.

Screen capture of four tables, each depicting how each data enhancement are modeled. Tables show column headers and sample data.
Figure 1.

Sample representation of data enhancements in OMOP. In the documentation for participating sites, we provided detailed instructions, along with examples, on how each data enhancement should be coded and structured in each CDM. Here, we show the OMOP examples for each data enhancement. Each block represents a table in OMOP. The first row includes relevant OMOP fields, and the second row shows sample data modeled according to our instructions. The data shown here are fictitious. ADT = Admission, Discharge, and Transfer; ICU = intensive care unit; OMOP = Observational Medical Outcomes Partnership; PASC = Post-acute sequelae of COVID-19.

While we had overarching principles for each data design, the approach, effort, and level of information required varied considerably across data enhancements. Next, we discuss some of these unique aspects of each data enhancement.

Long COVID clinic

We asked sites to identify visits to their long COVID specialty clinic (if applicable) and map that visit type to a standard code. Although this data enhancement characterizes a visit, we designed this item such that it would be represented in the OMOP OBSERVATION table, rather than the VISIT_OCCURENCE table, so as not to override existing visit type data. At the time, there was no established OMOP concept ID for “long COVID clinic,” so we created a custom OMOP concept ID (2004207791) to represent long COVID clinic. OMOP conventions allow the creation of custom codes where there are no existing standards. Using ID numbers greater than 2 billion makes collisions with other custom codes unlikely.

ADT transaction data

ADT transactions track a patient's movement during a hospitalization. While defining the scope, we recognized that mapping every hospital department to a standard would be onerous for sites and that COVID researchers were most interested in patients who received care in emergency and critical care settings. Therefore, we requested sites map transactions to OMOP concept IDs 8870 (emergency room) or 581379 (critical care) while all other transactions could be mapped to a “catch-all” inpatient concept ID (8717). We further limited the scope to just “transfer in” transactions, cutting down an immense volume of transfers out, attending changes, bed changes, etc.

Social determinants of health

SDOH data can be collected or derived in a number of ways: patient-reported measures, diagnosis codes, or linkage to area-level socio-economic data. Our scope was limited to patient-reported measures because the latter 2 items were already supported in N3C. To define the scope and approach for patient-reported SDOH, we conducted a brief landscape analysis, consulted with the N3C SDOH domain team,15 and closely analyzed available SDOH at an institution. We identified 6 domains that were of interest and appeared to be routinely collected: housing status, food insecurity, financial resource strain, transportation, social connectedness, and stress. Although variability exists, many sites ask SDOH questions in a semi-standardized format, which can be readily mapped to LOINC, a standardized vocabulary for labs and clinical observations. To add sites in the mapping process, we created a shared spreadsheet with mappings from standardized Epic SDOH questions to LOINC codes. The spreadsheet also allowed sites to document additional mappings and unmappable SDOH questions (see the Supplementary Material).

In-hospital use of oxygen supplementation devices

Initially, we planned to include detailed measures captured in oxygen supplementation flowsheets. Exploration within one site’s source data revealed approximately 130 variables related to oxygen supplementation (many of which were ambiguously named). A successful effort to map these numerous variables and values to a standard terminology would require involvement of an intensive care clinician, and we could not assume such expertise was available at each site. Given these challenges, we engaged with clinician researchers in N3C to understand what the most important data elements were and subsequently determined that the most pressing need was to understand the type of oxygen supplementation received (eg, high flow nasal cannula, ventilator). Next, we identified SNOMED as the standardized vocabulary to capture these data, and, in consultation with a clinician, created a list of 31 SNOMED codes for sites to use. We also identified a need to create custom codes for “room air” and other oxygen devices not mappable to the selected SNOMED codes.

Both the sites leading the data designs and contributing sites received funding from a mix of sources including RECOVER and the National Center for Data to Health (CD2H). Some additional sites voluntarily added data enhancements without supplemental funds.

Results

For each enhancement, we produced and published a Data Design document on GitHub.13 Each Data Design includes description of the data element, relevance to N3C, and scope; guidance on how to find data within source systems; and CDM-specific guidance for structuring data. As of June 2024, 29 sites have added at least one data enhancement to their N3C pipeline; 9 sites added long COVID clinic data, 29 sites added ADT transactions, 24 sites added SDOH data, and 26 sites added oxygen supplementation data. Table 1 characterizes the scale and breadth of data enhancements available within N3C. Notably, all these variables are mapped to common concept IDs. This is the value of our approach: these data are not only available, they are also harmonized and thus ready for analysis.

Table 1.

Characterization of N3C data enhancements, by patient and row count.

Concept IDConcept nameUnique patient countRow count
Long COVID clinic visits
2004207791Long COVID clinic visit (Custom concept)20 61666 152
ADT transactions
581379Inpatient critical care facility557 5791 974 865
8870Emergency room—hospital3 211 43318 840 284
8717Inpatient hospital2 607 45642 174 447
Social determinants of health
37020730Has lack of transportation kept you from medical appointments, meetings, work, or from getting things needed for daily living1 487 1005 107 339
42869557Housing status523910 085
37020172Are you worried about losing your housing [PRAPARE]173 614333 074
1617701Has the electric, gas, oil, or water company threatened to shut off services in your home in past 12 months57 662660 560
46234789How hard is it for you to pay for the very basics like food, housing, medical care, and heating1 052 4122 700 296
36304041Within the past 12 months we worried whether our food would run out before we got money to buy more [U.S. FSS]1 520 2474 272 580
36306143Within the past 12 months the food we bought just didn't last and we didn't have money to get more [U.S. FSS]1 594 6093 779 441
46234787Do you belong to any clubs or organizations such as church groups unions, fraternal or athletic groups, or school groups [NHANES III]623 2921 708 888
O2 devices
4145694Aerosol oxygen mask177 679548 016
45762031Aerosol tent, adult12754104
45764548Aerosol tent, pediatric3939
4160626Ambu bag18 03031 180
45760842Basic nasal oxygen cannula1 552 54492 742 933
4138614BiPAP oxygen nasal cannula63 8637 379 447
4137849Blow by oxygen mask89 028319 210
4243754Continuing positive airway pressure unit91 4701 093 891
45768222Continuous positive airway pressure/Bilevel positive airway pressure mask52 482844 593
45761494CPAP nasal oxygen cannula37 162623 194
4138487Face tent oxygen delivery device98 846280 029
4139525High flow oxygen nasal cannula190 44524 223 271
2004208004N3C: Other oxygen device562 23254 250 699
2004208005N3C: Room air2 117 944276 653 221
4145528Nonrebreather oxygen mask197 3411 829 502
45771595Non-rebreathing oxygen face mask90138
45759373Oxygen administration face tent2355183 622
4222966Oxygen mask366 9701 399 713
36715214Oxygen mustache cannula306314 296
36715213Oxygen pendant cannula46268454
36715212Oxygen reservoir cannula666058 914
4138916Oxygen ventilator293 62710 260 319
4145529Oxyhood238412 587
4138748Partial rebreather oxygen mask9992600
45759146Partial-rebreathing oxygen face mask25 25334 185
4188569T piece with bag7045185
4188570T piece without bag466051 541
45759811Tracheostomy mask, aerosol4354806 323
45760219Tracheostomy mask, oxygen13 0571 680 369
4144319Transtracheal oxygen catheter433011 510
45768197Ventilator154 13912 342 695
4322904Venturi mask33 889614 863
45759930Venturi oxygen face mask566613 046
Concept IDConcept nameUnique patient countRow count
Long COVID clinic visits
2004207791Long COVID clinic visit (Custom concept)20 61666 152
ADT transactions
581379Inpatient critical care facility557 5791 974 865
8870Emergency room—hospital3 211 43318 840 284
8717Inpatient hospital2 607 45642 174 447
Social determinants of health
37020730Has lack of transportation kept you from medical appointments, meetings, work, or from getting things needed for daily living1 487 1005 107 339
42869557Housing status523910 085
37020172Are you worried about losing your housing [PRAPARE]173 614333 074
1617701Has the electric, gas, oil, or water company threatened to shut off services in your home in past 12 months57 662660 560
46234789How hard is it for you to pay for the very basics like food, housing, medical care, and heating1 052 4122 700 296
36304041Within the past 12 months we worried whether our food would run out before we got money to buy more [U.S. FSS]1 520 2474 272 580
36306143Within the past 12 months the food we bought just didn't last and we didn't have money to get more [U.S. FSS]1 594 6093 779 441
46234787Do you belong to any clubs or organizations such as church groups unions, fraternal or athletic groups, or school groups [NHANES III]623 2921 708 888
O2 devices
4145694Aerosol oxygen mask177 679548 016
45762031Aerosol tent, adult12754104
45764548Aerosol tent, pediatric3939
4160626Ambu bag18 03031 180
45760842Basic nasal oxygen cannula1 552 54492 742 933
4138614BiPAP oxygen nasal cannula63 8637 379 447
4137849Blow by oxygen mask89 028319 210
4243754Continuing positive airway pressure unit91 4701 093 891
45768222Continuous positive airway pressure/Bilevel positive airway pressure mask52 482844 593
45761494CPAP nasal oxygen cannula37 162623 194
4138487Face tent oxygen delivery device98 846280 029
4139525High flow oxygen nasal cannula190 44524 223 271
2004208004N3C: Other oxygen device562 23254 250 699
2004208005N3C: Room air2 117 944276 653 221
4145528Nonrebreather oxygen mask197 3411 829 502
45771595Non-rebreathing oxygen face mask90138
45759373Oxygen administration face tent2355183 622
4222966Oxygen mask366 9701 399 713
36715214Oxygen mustache cannula306314 296
36715213Oxygen pendant cannula46268454
36715212Oxygen reservoir cannula666058 914
4138916Oxygen ventilator293 62710 260 319
4145529Oxyhood238412 587
4138748Partial rebreather oxygen mask9992600
45759146Partial-rebreathing oxygen face mask25 25334 185
4188569T piece with bag7045185
4188570T piece without bag466051 541
45759811Tracheostomy mask, aerosol4354806 323
45760219Tracheostomy mask, oxygen13 0571 680 369
4144319Transtracheal oxygen catheter433011 510
45768197Ventilator154 13912 342 695
4322904Venturi mask33 889614 863
45759930Venturi oxygen face mask566613 046

ADT = Admission, Discharge, and Transfer; BiPAP = bilevel positive airway pressure; CPAP = continuous positive airway pressure; N3C = National COVID Cohort Collaborative.

Table 1.

Characterization of N3C data enhancements, by patient and row count.

Concept IDConcept nameUnique patient countRow count
Long COVID clinic visits
2004207791Long COVID clinic visit (Custom concept)20 61666 152
ADT transactions
581379Inpatient critical care facility557 5791 974 865
8870Emergency room—hospital3 211 43318 840 284
8717Inpatient hospital2 607 45642 174 447
Social determinants of health
37020730Has lack of transportation kept you from medical appointments, meetings, work, or from getting things needed for daily living1 487 1005 107 339
42869557Housing status523910 085
37020172Are you worried about losing your housing [PRAPARE]173 614333 074
1617701Has the electric, gas, oil, or water company threatened to shut off services in your home in past 12 months57 662660 560
46234789How hard is it for you to pay for the very basics like food, housing, medical care, and heating1 052 4122 700 296
36304041Within the past 12 months we worried whether our food would run out before we got money to buy more [U.S. FSS]1 520 2474 272 580
36306143Within the past 12 months the food we bought just didn't last and we didn't have money to get more [U.S. FSS]1 594 6093 779 441
46234787Do you belong to any clubs or organizations such as church groups unions, fraternal or athletic groups, or school groups [NHANES III]623 2921 708 888
O2 devices
4145694Aerosol oxygen mask177 679548 016
45762031Aerosol tent, adult12754104
45764548Aerosol tent, pediatric3939
4160626Ambu bag18 03031 180
45760842Basic nasal oxygen cannula1 552 54492 742 933
4138614BiPAP oxygen nasal cannula63 8637 379 447
4137849Blow by oxygen mask89 028319 210
4243754Continuing positive airway pressure unit91 4701 093 891
45768222Continuous positive airway pressure/Bilevel positive airway pressure mask52 482844 593
45761494CPAP nasal oxygen cannula37 162623 194
4138487Face tent oxygen delivery device98 846280 029
4139525High flow oxygen nasal cannula190 44524 223 271
2004208004N3C: Other oxygen device562 23254 250 699
2004208005N3C: Room air2 117 944276 653 221
4145528Nonrebreather oxygen mask197 3411 829 502
45771595Non-rebreathing oxygen face mask90138
45759373Oxygen administration face tent2355183 622
4222966Oxygen mask366 9701 399 713
36715214Oxygen mustache cannula306314 296
36715213Oxygen pendant cannula46268454
36715212Oxygen reservoir cannula666058 914
4138916Oxygen ventilator293 62710 260 319
4145529Oxyhood238412 587
4138748Partial rebreather oxygen mask9992600
45759146Partial-rebreathing oxygen face mask25 25334 185
4188569T piece with bag7045185
4188570T piece without bag466051 541
45759811Tracheostomy mask, aerosol4354806 323
45760219Tracheostomy mask, oxygen13 0571 680 369
4144319Transtracheal oxygen catheter433011 510
45768197Ventilator154 13912 342 695
4322904Venturi mask33 889614 863
45759930Venturi oxygen face mask566613 046
Concept IDConcept nameUnique patient countRow count
Long COVID clinic visits
2004207791Long COVID clinic visit (Custom concept)20 61666 152
ADT transactions
581379Inpatient critical care facility557 5791 974 865
8870Emergency room—hospital3 211 43318 840 284
8717Inpatient hospital2 607 45642 174 447
Social determinants of health
37020730Has lack of transportation kept you from medical appointments, meetings, work, or from getting things needed for daily living1 487 1005 107 339
42869557Housing status523910 085
37020172Are you worried about losing your housing [PRAPARE]173 614333 074
1617701Has the electric, gas, oil, or water company threatened to shut off services in your home in past 12 months57 662660 560
46234789How hard is it for you to pay for the very basics like food, housing, medical care, and heating1 052 4122 700 296
36304041Within the past 12 months we worried whether our food would run out before we got money to buy more [U.S. FSS]1 520 2474 272 580
36306143Within the past 12 months the food we bought just didn't last and we didn't have money to get more [U.S. FSS]1 594 6093 779 441
46234787Do you belong to any clubs or organizations such as church groups unions, fraternal or athletic groups, or school groups [NHANES III]623 2921 708 888
O2 devices
4145694Aerosol oxygen mask177 679548 016
45762031Aerosol tent, adult12754104
45764548Aerosol tent, pediatric3939
4160626Ambu bag18 03031 180
45760842Basic nasal oxygen cannula1 552 54492 742 933
4138614BiPAP oxygen nasal cannula63 8637 379 447
4137849Blow by oxygen mask89 028319 210
4243754Continuing positive airway pressure unit91 4701 093 891
45768222Continuous positive airway pressure/Bilevel positive airway pressure mask52 482844 593
45761494CPAP nasal oxygen cannula37 162623 194
4138487Face tent oxygen delivery device98 846280 029
4139525High flow oxygen nasal cannula190 44524 223 271
2004208004N3C: Other oxygen device562 23254 250 699
2004208005N3C: Room air2 117 944276 653 221
4145528Nonrebreather oxygen mask197 3411 829 502
45771595Non-rebreathing oxygen face mask90138
45759373Oxygen administration face tent2355183 622
4222966Oxygen mask366 9701 399 713
36715214Oxygen mustache cannula306314 296
36715213Oxygen pendant cannula46268454
36715212Oxygen reservoir cannula666058 914
4138916Oxygen ventilator293 62710 260 319
4145529Oxyhood238412 587
4138748Partial rebreather oxygen mask9992600
45759146Partial-rebreathing oxygen face mask25 25334 185
4188569T piece with bag7045185
4188570T piece without bag466051 541
45759811Tracheostomy mask, aerosol4354806 323
45760219Tracheostomy mask, oxygen13 0571 680 369
4144319Transtracheal oxygen catheter433011 510
45768197Ventilator154 13912 342 695
4322904Venturi mask33 889614 863
45759930Venturi oxygen face mask566613 046

ADT = Admission, Discharge, and Transfer; BiPAP = bilevel positive airway pressure; CPAP = continuous positive airway pressure; N3C = National COVID Cohort Collaborative.

Discussion

CDMs significantly increase our ability to integrate health data across institutions,16–19 and they have been critical to the success of N3C.20 However, the focus of CDMs is necessarily on the most common use cases and data elements, such as demographics, diagnoses, procedures, medications, and labs. It is not practical to expect CDM developers to anticipate every use case and data element that may be required. However, when studies require data not available in the CDMs, the potential of either data federation or data centralization cannot be fully realized. We propose collaborative modeling of project-specific data enhancements, like the approach outlined here, as a pragmatic solution to this problem. Such an approach can benefit not only the study at hand, but future studies that use CDM data. For example, one site uses enhanced SDOH data to support other ongoing studies. In this discussion, we identify factors for success and potential drawbacks.

Factors for success

  • Get the right people in the room. Researchers, with knowledge of the data needs, and informaticians, with expertise in the CDMs, must be engaged. Moreover, those with CDM expertise should be available to guide sites through challenges.

  • Conform to CDM standards. Data designs should comply with CDM requirements and best practices so that the data can be readily re-used in other studies.

  • Use standardized vocabularies where possible. This will ensure the meaning of data is understandable to researchers and for future projects.

  • Create clear, accessible documentation. Easy-to-understand documentation should be readily available to all contributing sites and users of the data.

  • Funding is a necessity. This effort requires time and expertise; both lead and contributing sites should be funded appropriately.

  • Build in quality checks. Accuracy of the data relied on sites carefully following the specifications. This was not universally true. In some cases, after receiving data, we needed to review guidance with sites. They were then able to make modifications and submit data that conformed with the specifications.

Drawbacks and further considerations

This approach offers a lot of promise to support the future of CDMs, but it has its drawbacks. Because this model is driven by specific use cases, there is a risk of the data not being generalizable to other studies. This can be mitigated if data modelers consider this and attempt to make their guidance broadly applicable. This approach could also lead to the proliferation of conflicting guidelines. A centralized catalog of project-specific CDM modeling guidance would alleviate this. Finally, this effort is time-intensive for both the data designers and the data contributors, and adequate funding may not be available in typical study budgets. Because of this, project-specific enhancements should not be the only approach for expanding CDMS: there is still a place for data networks to dedicate effort toward data modeling and expansion high-need, cross-cutting areas.

Conclusion

In this initiative, we rapidly produced project-specific data modeling guidance and documentation in support of long COVID research while maintaining a commitment to terminology standards and harmonized data. By engaging researchers, we were able to provide useful data while limiting undue burden on contributing sites. Our approach has allowed sites to enhance both N3C and their local CDMs. In order to support reusability, involvement of SMEs from the CDMs was critical to ensure the guidance conformed to CDM best practices. The use of standardized vocabularies, detailed documentation, and compliance with that documentation further contributes to reusability.

Acknowledgments

This research was possible because of the patients whose information is included within the data and the organizations (https://ncats.nih.gov/n3c/resources/data-contribution/data-transfer-agreement-signatories) and scientists who have contributed to the on-going development of this community resource (https://doi.org/10.1093/jamia/ocaa196).

Individual acknowledgements for N3C core contributors

We gratefully acknowledge the following core contributors to N3C: Adam B. Wilcox, Alexis Graves, Alfred (Jerrod) Anzalone, Amin Manna, Amit Saha, Amy Olex, Andrea Zhou, Andrew E. Williams, Andrew Southerland, Andrew T. Girvin, Anita Walden, Anjali A. Sharathkumar, Benjamin Amor, Benjamin Bates, Brian Hendricks, Brijesh Patel, Caleb Alexander, Carolyn Bramante, Cavin Ward-Caviness, Charisse Madlock-Brown, Christine Suver, Christopher Chute, Christopher Dillon, Chunlei Wu, Clare Schmitt, Cliff Takemoto, Dan Housman, Davera Gabriel, David A. Eichmann, Diego Mazzotti, Don Brown, Eilis Boudreau, Elaine Hill, Elizabeth Zampino, Emily Carlson Marti, Evan French, Farrukh M Koraishy, Federico Mariona, Fred Prior, George Sokos, Greg Martin, Harold Lehmann, Heidi Spratt, Hemalkumar Mehta, Hongfang Liu, Hythem Sidky, J.W. Awori Hayanga, Jami Pincavitch, Jaylyn Clark, Jeremy Richard Harper, Jessica Islam, Jin Ge, Joel Gagnier, Joel H. Saltz, Joel Saltz, Johanna Loomba, John Buse, Jomol Mathew, Joni L. Rutter, Julie A. McMurry, Justin Guinney, Justin Starren, Karen Crowley, Katie Rebecca Bradwell, Ken Wilkins, Kenneth R. Gersing, Kenrick Dwain Cato, Kimberly Murray, Lavance Northington, Lee Allan Pyles, Leonie Misquitta, Lesley Cottrell, Lili Portilla, Mariam Deacy, Mark M. Bissell, Mary Emmett, Mary Morrison Saltz, Melissa A. Haendel, Meredith Adams, Meredith Temple-O'Connor, Michael G. Kurilla, Nabeel Qureshi, Nasia Safdar, Nicole Garbarini, Noha Sharafeldin, Ofer Sadan, Patricia A. Francis, Penny Wung Burgoon, Peter Robinson, Philip R.O. Payne, Rafael Fuentes, Randeep Jawa, Rebecca Erwin-Cohen, Rena Patel, Richard A. Moffitt, Richard L. Zhu, Rishi Kamaleswaran, Robert Hurley, Saiju Pyarajan, Sam G. Michael, Samuel Bozzette, Sandeep Mallipattu, Satyanarayana Vedula, Scott Chapman, Shawn T. O'Neil, Soko Setoguchi, Steve Johnson, Tellen D. Bennett, Tiffany Callahan, Umit Topaloglu, Usman Sheikh, Valery Gordon, Vignesh Subbian, Warren A. Kibbe, Wenndy Hernandez, Will Beasley, Will Cooper, William Hillegass, and Xiaohan Tanner Zhang. Details of contributions available at covid.cd2h.org/core-contributors.

Data partners with released data in N3C

The following institutions whose data are released or pending: Available—Advocate Health Care Network—UL1TR002389: The Institute for Translational Medicine (ITM) • Aurora Health Care Inc—UL1TR002373: Wisconsin Network For Health Research • Boston University Medical Campus—UL1TR001430: Boston University Clinical and Translational Science Institute • Brown University—U54GM115677: Advance Clinical Translational Research (Advance-CTR) • Carilion Clinic—UL1TR003015: iTHRIV Integrated Translational health Research Institute of Virginia • Case Western Reserve University—UL1TR002548: The Clinical & Translational Science Collaborative of Cleveland (CTSC) • Charleston Area Medical Center—U54GM104942: West Virginia Clinical and Translational Science Institute (WVCTSI) • Children’s Hospital Colorado—UL1TR002535: Colorado Clinical and Translational Sciences Institute • Columbia University Irving Medical Center—UL1TR001873: Irving Institute for Clinical and Translational Research • Dartmouth College—None (Voluntary) Duke University—UL1TR002553: Duke Clinical and Translational Science Institute • George Washington Children’s Research Institute—UL1TR001876: Clinical and Translational Science Institute at Children’s National (CTSA-CN) • George Washington University—UL1TR001876: Clinical and Translational Science Institute at Children’s National (CTSA-CN) • Harvard Medical School—UL1TR002541: Harvard Catalyst • Indiana University School of Medicine—UL1TR002529: Indiana Clinical and Translational Science Institute • Johns Hopkins University—UL1TR003098: Johns Hopkins Institute for Clinical and Translational Research • Louisiana Public Health Institute—None (Voluntary) • Loyola Medicine—Loyola University Medical Center • Loyola University Medical Center—UL1TR002389: The Institute for Translational Medicine (ITM) • Maine Medical Center—U54GM115516: Northern New England Clinical & Translational Research (NNE-CTR) Network • Mary Hitchcock Memorial Hospital & Dartmouth Hitchcock Clinic—None (Voluntary) • Massachusetts General Brigham—UL1TR002541: Harvard Catalyst • Mayo Clinic Rochester—UL1TR002377: Mayo Clinic Center for Clinical and Translational Science (CCaTS) • Medical University of South Carolina—UL1TR001450: South Carolina Clinical & Translational Research Institute (SCTR) • MITRE Corporation—None (Voluntary) • Montefiore Medical Center—UL1TR002556: Institute for Clinical and Translational Research at Einstein and Montefiore • Nemours—U54GM104941: Delaware CTR ACCEL Program • NorthShore University HealthSystem—UL1TR002389: The Institute for Translational Medicine (ITM) • Northwestern University at Chicago—UL1TR001422: Northwestern University Clinical and Translational Science Institute (NUCATS) • OCHIN—INV-018455: Bill and Melinda Gates Foundation grant to Sage Bionetworks • Oregon Health & Science University—UL1TR002369: Oregon Clinical and Translational Research Institute • Penn State Health Milton S. Hershey Medical Center—UL1TR002014: Penn State Clinical and Translational Science Institute • Rush University Medical Center—UL1TR002389: The Institute for Translational Medicine (ITM) • Rutgers, The State University of New Jersey—UL1TR003017: New Jersey Alliance for Clinical and Translational Science • Stony Brook University—U24TR002306 • The Alliance at the University of Puerto Rico, Medical Sciences Campus—U54GM133807: Hispanic Alliance for Clinical and Translational Research (The Alliance) • The Ohio State University—UL1TR002733: Center for Clinical and Translational Science • The State University of New York at Buffalo—UL1TR001412: Clinical and Translational Science Institute • The University of Chicago—UL1TR002389: The Institute for Translational Medicine (ITM) • The University of Iowa—UL1TR002537: Institute for Clinical and Translational Science • The University of Miami Leonard M. Miller School of Medicine—UL1TR002736: University of Miami Clinical and Translational Science Institute • The University of Michigan at Ann Arbor—UL1TR002240: Michigan Institute for Clinical and Health Research • The University of Texas Health Science Center at Houston—UL1TR003167: Center for Clinical and Translational Sciences (CCTS) • The University of Texas Medical Branch at Galveston—UL1TR001439: The Institute for Translational Sciences • The University of Utah—UL1TR002538: Uhealth Center for Clinical and Translational Science • Tufts Medical Center—UL1TR002544: Tufts Clinical and Translational Science Institute • Tulane University—UL1TR003096: Center for Clinical and Translational Science • The Queens Medical Center—None (Voluntary) • University Medical Center New Orleans—U54GM104940: Louisiana Clinical and Translational Science (LA CaTS) Center • University of Alabama at Birmingham—UL1TR003096: Center for Clinical and Translational Science • University of Arkansas for Medical Sciences—UL1TR003107: UAMS Translational Research Institute • University of Cincinnati—UL1TR001425: Center for Clinical and Translational Science and Training • University of Colorado Denver, Anschutz Medical Campus—UL1TR002535: Colorado Clinical and Translational Sciences Institute • University of Illinois at Chicago—UL1TR002003: UIC Center for Clinical and Translational Science • University of Kansas Medical Center—UL1TR002366: Frontiers: University of Kansas Clinical and Translational Science Institute • University of Kentucky—UL1TR001998: UK Center for Clinical and Translational Science • University of Massachusetts Medical School Worcester—UL1TR001453: The UMass Center for Clinical and Translational Science (UMCCTS) • University Medical Center of Southern Nevada—None (voluntary) • University of Minnesota—UL1TR002494: Clinical and Translational Science Institute • University of Mississippi Medical Center—U54GM115428: Mississippi Center for Clinical and Translational Research (CCTR) • University of Nebraska Medical Center—U54GM115458: Great Plains IDeA-Clinical & Translational Research • University of North Carolina at Chapel Hill—UL1TR002489: North Carolina Translational and Clinical Science Institute • University of Oklahoma Health Sciences Center—U54GM104938: Oklahoma Clinical and Translational Science Institute (OCTSI) • University of Pittsburgh—UL1TR001857: The Clinical and Translational Science Institute (CTSI) • University of Pennsylvania—UL1TR001878: Institute for Translational Medicine and Therapeutics • University of Rochester—UL1TR002001: UR Clinical & Translational Science Institute • University of Southern California—UL1TR001855: The Southern California Clinical and Translational Science Institute (SC CTSI) • University of Vermont—U54GM115516: Northern New England Clinical & Translational Research (NNE-CTR) Network • University of Virginia—UL1TR003015: iTHRIV Integrated Translational health Research Institute of Virginia • University of Washington—UL1TR002319: Institute of Translational Health Sciences • University of Wisconsin-Madison—UL1TR002373: UW Institute for Clinical and Translational Research • Vanderbilt University Medical Center—UL1TR002243: Vanderbilt Institute for Clinical and Translational Research • Virginia Commonwealth University—UL1TR002649: C. Kenneth and Dianne Wright Center for Clinical and Translational Research • Wake Forest University Health Sciences—UL1TR001420: Wake Forest Clinical and Translational Science Institute • Washington University in St Louis—UL1TR002345: Institute of Clinical and Translational Sciences • Weill Medical College of Cornell University—UL1TR002384: Weill Cornell Medicine Clinical and Translational Science Center • West Virginia University—U54GM104942: West Virginia Clinical and Translational Science Institute (WVCTSI). Submitted—Icahn School of Medicine at Mount Sinai—UL1TR001433: ConduITS Institute for Translational Sciences • The University of Texas Health Science Center at Tyler—UL1TR003167: Center for Clinical and Translational Sciences (CCTS) • University of California, Davis—UL1TR001860: UCDavis Health Clinical and Translational Science Center • University of California, Irvine—UL1TR001414: The UC Irvine Institute for Clinical and Translational Science (ICTS) • University of California, Los Angeles—UL1TR001881: UCLA Clinical Translational Science Institute • University of California, San Diego—UL1TR001442: Altman Clinical and Translational Research Institute • University of California, San Francisco—UL1TR001872: UCSF Clinical and Translational Science Institute • NYU Langone Health Clinical Science Core, Data Resource Core, and PASC Biorepository Core—OTA-21-015A: Post-Acute Sequelae of SARS-CoV-2 Infection Initiative (RECOVER). Pending—Arkansas Children’s Hospital—UL1TR003107: UAMS Translational Research Institute • Baylor College of Medicine—None (Voluntary) • Children’s Hospital of Philadelphia—UL1TR001878: Institute for Translational Medicine and Therapeutics • Cincinnati Children’s Hospital Medical Center—UL1TR001425: Center for Clinical and Translational Science and Training • Emory University—UL1TR002378: Georgia Clinical and Translational Science Alliance • HonorHealth—None (Voluntary) • Loyola University Chicago—UL1TR002389: The Institute for Translational Medicine (ITM) • Medical College of Wisconsin—UL1TR001436: Clinical and Translational Science Institute of Southeast Wisconsin • MedStar Health Research Institute—None (Voluntary) • Georgetown University—UL1TR001409: The Georgetown-Howard Universities Center for Clinical and Translational Science (GHUCCTS) • MetroHealth—None (Voluntary) • Montana State University—U54GM115371: American Indian/Alaska Native CTR • NYU Langone Medical Center—UL1TR001445: Langone Health’s Clinical and Translational Science Institute • Ochsner Medical Center—U54GM104940: Louisiana Clinical and Translational Science (LA CaTS) Center • Regenstrief Institute—UL1TR002529: Indiana Clinical and Translational Science Institute • Sanford Research—None (Voluntary) • Stanford University—UL1TR003142: Spectrum: The Stanford Center for Clinical and Translational Research and Education • The Rockefeller University—UL1TR001866: Center for Clinical and Translational Science • The Scripps Research Institute—UL1TR002550: Scripps Research Translational Institute • University of Florida—UL1TR001427: UF Clinical and Translational Science Institute • University of New Mexico Health Sciences Center—UL1TR001449: University of New Mexico Clinical and Translational Science Center • University of Texas Health Science Center at San Antonio—UL1TR002645: Institute for Integration of Medicine and Science • Yale New Haven Hospital—UL1TR001863: Yale Center for Clinical Investigation.

Author contributions

Manuscript drafting: Kellie M. Walters and Emily R. Pfaff. Development of data designs and documentation: Kellie M. Walters, Marshall Clark, Sofia Dard, Adam M. Lee, Kristin Kostka, Robert T. Miller, Michele Morris, Matvey B. Palchuk, and Emily R. Pfaff. Project management: Elizabeth Kelly. Data harmonization and quality assurance: Stephanie S. Hong, Sofia Dard, and Emily R. Pfaff. Manuscript revisions and final approval: Kellie M. Walters, Marshall Clark, Sofia Dard, Stephanie S. Hong, Elizabeth Kelly, Kristin Kostka, Adam M. Lee, Robert T. Miller, Michele Morris, Matvey B. Palchuk, and Emily R. Pfaff. The N3C Consortium and RECOVER review committees also reviewed the manuscript using their standard processes.

Supplementary material

Supplementary material is available at Journal of the American Medical Informatics Association online.

Funding

The analyses described in this publication were conducted with data or tools accessed through the National Center for Advancing Translational Sciences (NCATS) N3C Data Enclave https://covid.cd2h.org and N3C Attribution & Publication Policy v 1.2-2020-08-25b supported by NCATS Contract No. 75N95023D00001, Axle Informatics Subcontract: NCATS-P00438-B, and OTA OT2HL161847 as part of the Researching COVID to Enhance Recovery (RECOVER).

Conflicts of interest

The authors have no competing interests related to this work.

Data availability

N3C/RECOVER: The N3C Data Enclave is managed under the authority of the NIH; information can be found at ncats.nih.gov/n3c/resources. Enclave data are protected and can be accessed for COVID-related research with an approved (1) IRB protocol and (2) Data Use Request (DUR). Enclave and data access instructions can be found at https://covid.cd2h.org/for-researchers.

Disclaimer

The N3C Publication committee confirmed that this manuscript msid: 2011.443 is in accordance with N3C data use and attribution policies; however, this content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the N3C program.

Institutional Review Board

The N3C data transfer to NCATS is performed under a Johns Hopkins University Reliance Protocol # IRB00249128 or individual site agreements with NIH. The N3C Data Enclave is managed under the authority of the NIH; information can be found at https://ncats.nih.gov/n3c/resources. The work was performed under DUR RP-E7676B.

References

1

Haendel
MA
,
Chute
CG
,
Bennett
TD
;
N3C Consortium
, et al.
The National COVID Cohort Collaborative (N3C): rationale, design, infrastructure, and deployment
.
J Am Med Inform Assoc
.
2021
;
28
:
427
-
443
.

2

N3C Dashboard—Home
. Accessed March 29, 2024. https://covid.cd2h.org/dashboard/

3

Pfaff
ER
,
Girvin
AT
,
Bennett
TD
;
N3C Consortium
, et al.
Identifying who has long COVID in the USA: a machine learning approach using N3C data
.
Lancet Digit Health
.
2022
;
4
:
e532
-
e541
.

4

Khatib
R
,
Glowacki
N
,
Lauffenburger
JC
,
N3C Consortium
, et al.
Association between the 10-year ASCVD risk score and COVID-19 complications among healthy adults (analysis from the National Cohort COVID Collaborative)
.
Am J Cardiol
.
2023
;
202
:
201
-
207
.

5

Pfaff
ER
,
Girvin
AT
,
Crosskey
M
,
N3C and RECOVER Consortia
, et al.
De-black-boxing health AI: demonstrating reproducible machine learning computable phenotypes using the N3C-RECOVER Long COVID model in the All of Us data repository
.
J Am Med Inform Assoc
.
2023
;
30
:
1305
-
1312
.

6

Lee
E
,
Bates
B
,
Kuhrt
N
,
N3C Consortium
, et al.
National trends in anticoagulation therapy for COVID-19 hospitalized adults in the United States: analyses of the National COVID Cohort Collaborative
.
J Infect Dis
.
2023
;
228
:
895
-
906
.

7

Sun
J
,
Zheng
Q
,
Madhira
V
,
National COVID Cohort Collaborative (N3C) Consortium
, et al.
Association between immune dysfunction and COVID-19 breakthrough infection after SARS-CoV-2 vaccination in the US
.
JAMA Intern Med
.
2022
;
182
:
153
-
162
.

8

Bennett
TD
,
Moffitt
RA
,
Hajagos
JG
,
National COVID Cohort Collaborative (N3C) Consortium
, et al.
Clinical characterization and prediction of clinical severity of SARS-CoV-2 infection among US adults using data from the US National COVID Cohort Collaborative
.
JAMA Network Open
.
2021
;
4
:
E2116901
.

9

Ge
J
,
Pletcher
MJ
,
Lai
JC
,
N3C Consortium
.
Outcomes of SARS-CoV-2 infection in patients with chronic liver disease and cirrhosis: a national COVID cohort collaborative study
.
Gastroenterology
.
2021
;
161
:
1487.e5
-
1501.e5
.

10

Daniel
G
,
Silcox
C
,
Bryan
J
, et al. Characterizing RWD Quality and Relevancy for Regulatory Purposes. Duke-Margolis Center for Health Policy;
2018
.

11

Gatto
NM
,
Campbell
UB
,
Rubinstein
E
, et al.
The structured process to identify fit-for-purpose data: a data feasibility assessment framework
.
Clin Pharmacol Ther
.
2022
;
111
:
122
-
134
.

12

RECOVER. Accessed November 9, 2023. https://recovercovid.org/

13

Phenotype_Data_Acquisition: the repository for code and documentation produced by the N3C Phenotype and Data Acquisition workstream
. Github. Accessed March 29, 2024. https://github.com/National-COVID-Cohort-Collaborative/Phenotype_Data_Acquisition

14

Dard
S
,
Kim
HW
,
Choudhury
M
, et al. N3C scorecards: a centralized and systematic approach to data quality. In: AMIA 2023 Annual Symposium, New Orleans, LA, November 14,
2023
.

15

Phuong
J
,
Hong
S
,
Palchuk
MB
, et al.
Advancing interoperability of patient-level social determinants of health data to support COVID-19 research
.
AMIA Jt Summits Transl Sci Proc
.
2022
;
2022
:
396
-
405
.

16

Forrest
CB
,
McTigue
KM
,
Hernandez
AF
, et al.
PCORnet® 2020: current state, accomplishments, and future directions
.
J Clin Epidemiol
.
2021
;
129
:
60
-
67
.

17

Visweswaran
S
,
Becich
MJ
,
D'Itri
VS
, et al.
Accrual to clinical trials (ACT): a clinical and translational science award consortium network
.
JAMIA Open
.
2018
;
1
:
147
-
152
.

18

Palchuk
MB
,
London
JW
,
Perez-Rey
D
, et al.
A global federated real-world data and analytics platform for research
.
JAMIA Open
.
2023
;
6
:
ooad035
.

19

Hripcsak
G
,
Duke
JD
,
Shah
NH
, et al.
Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers
.
Stud Health Technol Inform
.
2015
;
216
:
574
-
578
.

20

Pfaff
ER
,
Girvin
AT
,
Gabriel
DL
,
N3C Consortium
, et al.
Synergies between centralized and federated approaches to data quality: a report from the national COVID cohort collaborative
.
J Am Med Inform Assoc
.
2022
;
29
:
609
-
618
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact [email protected]

Supplementary data