-
PDF
- Split View
-
Views
-
Cite
Cite
Kellie M Walters, Marshall Clark, Sofia Dard, Stephanie S Hong, Elizabeth Kelly, Kristin Kostka, Adam M Lee, Robert T Miller, Michele Morris, Matvey B Palchuk, Emily R Pfaff, N3C and RECOVER Consortia , National COVID Cohort Collaborative data enhancements: a path for expanding common data models, Journal of the American Medical Informatics Association, Volume 32, Issue 2, February 2025, Pages 391–397, https://doi.org/10.1093/jamia/ocae299
- Share Icon Share
Abstract
To support long COVID research in National COVID Cohort Collaborative (N3C), the N3C Phenotype and Data Acquisition team created data designs to aid contributing sites in enhancing their data. Enhancements include long COVID specialty clinic indicator; Admission, Discharge, and Transfer transactions; patient-level social determinants of health; and in-hospital use of oxygen supplementation.
For each enhancement, we defined the scope and wrote guidance on how to prepare and populate the data in a standardized way.
As of June 2024, 29 sites have added at least one data enhancement to their N3C pipeline.
The use of common data models is critical to the success of N3C; however, these data models cannot account for all needs. Project-driven data enhancement is required. This should be done in a standardized way in alignment with common data model specifications. Our approach offers a useful pathway for enhancing data to improve fit for purpose.
In this initiative, we rapidly produced project-specific data modeling guidance and documentation in support of long COVID research while maintaining a commitment to terminology standards and harmonized data.
Introduction
The National COVID Cohort Collaborative (N3C) is a centralized repository for COVID-related research.1 As of August 2024, N3C contains data on over 22.9 million patients from 84 contributing sites.2 Over 4300 users have requested access to the N3C enclave, and their work has collectively produced 296 publications and presentations covering a range of COVID-related research topics.2–9
Answering clinical questions using electronic health record data requires that the data are “fit for purpose.”10,11 As an example, it is not possible to study outcomes of patients on ventilators without granular data about use of ventilators. After the initial implementation of N3C, and as part of the NIH RECOVER Initiative,12 we recognized gaps that needed to be addressed to ensure the N3C data were “fit for purpose” for long COVID research questions.3 To that end, the following data enhancements were prioritized: (1) an indicator of whether a patient was seen in a long COVID specialty clinic; (2) Admission, Discharge, and Transfer (ADT) transactions; (3) patient-level social determinants of health (SDOH); and (4) in-hospital use of oxygen supplementation devices. In this paper, we describe N3C’s approach to enhancing data from contributing sites.
All N3C sites contribute clinical data to the repository using 1 of 4 common data models (CDMs, including OMOP (Observational Medical Outcomes Partnership), i2b2/ACT, PCORnet, or TriNetX). Data from each CDM are harmonized centrally into the OMOP CDM, which is exposed to N3C users. The original scope included data commonly available in CDMs: demographics, encounter information, diagnoses, procedures, laboratory tests, medications, and immunizations.
Although the various CDM communities offer structure and guidance to add new data elements, it was not enough to simply ask sites to add the 4 data elements listed above. CDMs specifications do not provide the level of detail necessary to ensure these new data elements were modeled and structured in a consistent manner. If each site added these data elements without specific guidance, the resulting data would be difficult to harmonize, and it would ultimately be harder for researchers to use these data meaningfully.
We collaboratively developed and disseminated Data Designs: documentation detailing the definition, scope, and structure for each data enhancement, customized for each CDM.13 These Data Designs provide guidance to contributing sites to add data enhancements in a consistent manner (with standardized structure, coding, and meaning that conforms to the rules of each CDM), while also managing the burden and effort required of sites.
Methods
The N3C Phenotype and Data Acquisition team, composed of subject matter experts (SMEs) for the 4 CDMs, was charged with supporting sites in expanding their data to include information about long COVID specialty clinics, ADT transactions, SDOH, and oxygen supplementation. The process happened quickly: Data Designs for each enhancement were produced and disseminated in about 4 months. Major steps in the data enhancement process were as follows:
Understand domains in the context of COVID research. Background research was required for each domain to determine data availability, variation across sites, and research needs.
Define scope. For each data enhancement, we needed to define the appropriate scope for the data. This required balancing the research needs with minimizing burden on contributing sites.
Write CDM-agnostic guidance. Each Data Design begins with a definition of the data enhancement, a description of why it is important, and scoping guidance. For some data enhancements, we provide additional guidance on sourcing data and mapping.
Write CDM-specific guidance for structuring each data enhancement. To ensure data were submitted in a standardized manner, we drafted guidance detailing how contributing sites should add data to their CDMs. This included specifying target tables, field mappings, and values, in accordance with the rules of each CDM. Figure 1 shows examples used in the CDM-specific guidance.
Dissemination and support. Data Designs are posted on the N3C GitHub13 and were shared with participating sites via webinar and email. The Phenotype and Data Acquisition team supported sites through challenges via consultations and Slack.
Harmonization and data quality. The N3C ETL (extract, transform, and load) process was updated to include the data enhancements. As part of N3C’s data quality assessment, sites receive feedback through regularly distributed Data Quality Scorecards.14 Data enhancements were incorporated into these scorecards.

Sample representation of data enhancements in OMOP. In the documentation for participating sites, we provided detailed instructions, along with examples, on how each data enhancement should be coded and structured in each CDM. Here, we show the OMOP examples for each data enhancement. Each block represents a table in OMOP. The first row includes relevant OMOP fields, and the second row shows sample data modeled according to our instructions. The data shown here are fictitious. ADT = Admission, Discharge, and Transfer; ICU = intensive care unit; OMOP = Observational Medical Outcomes Partnership; PASC = Post-acute sequelae of COVID-19.
While we had overarching principles for each data design, the approach, effort, and level of information required varied considerably across data enhancements. Next, we discuss some of these unique aspects of each data enhancement.
Long COVID clinic
We asked sites to identify visits to their long COVID specialty clinic (if applicable) and map that visit type to a standard code. Although this data enhancement characterizes a visit, we designed this item such that it would be represented in the OMOP OBSERVATION table, rather than the VISIT_OCCURENCE table, so as not to override existing visit type data. At the time, there was no established OMOP concept ID for “long COVID clinic,” so we created a custom OMOP concept ID (2004207791) to represent long COVID clinic. OMOP conventions allow the creation of custom codes where there are no existing standards. Using ID numbers greater than 2 billion makes collisions with other custom codes unlikely.
ADT transaction data
ADT transactions track a patient's movement during a hospitalization. While defining the scope, we recognized that mapping every hospital department to a standard would be onerous for sites and that COVID researchers were most interested in patients who received care in emergency and critical care settings. Therefore, we requested sites map transactions to OMOP concept IDs 8870 (emergency room) or 581379 (critical care) while all other transactions could be mapped to a “catch-all” inpatient concept ID (8717). We further limited the scope to just “transfer in” transactions, cutting down an immense volume of transfers out, attending changes, bed changes, etc.
Social determinants of health
SDOH data can be collected or derived in a number of ways: patient-reported measures, diagnosis codes, or linkage to area-level socio-economic data. Our scope was limited to patient-reported measures because the latter 2 items were already supported in N3C. To define the scope and approach for patient-reported SDOH, we conducted a brief landscape analysis, consulted with the N3C SDOH domain team,15 and closely analyzed available SDOH at an institution. We identified 6 domains that were of interest and appeared to be routinely collected: housing status, food insecurity, financial resource strain, transportation, social connectedness, and stress. Although variability exists, many sites ask SDOH questions in a semi-standardized format, which can be readily mapped to LOINC, a standardized vocabulary for labs and clinical observations. To add sites in the mapping process, we created a shared spreadsheet with mappings from standardized Epic SDOH questions to LOINC codes. The spreadsheet also allowed sites to document additional mappings and unmappable SDOH questions (see the Supplementary Material).
In-hospital use of oxygen supplementation devices
Initially, we planned to include detailed measures captured in oxygen supplementation flowsheets. Exploration within one site’s source data revealed approximately 130 variables related to oxygen supplementation (many of which were ambiguously named). A successful effort to map these numerous variables and values to a standard terminology would require involvement of an intensive care clinician, and we could not assume such expertise was available at each site. Given these challenges, we engaged with clinician researchers in N3C to understand what the most important data elements were and subsequently determined that the most pressing need was to understand the type of oxygen supplementation received (eg, high flow nasal cannula, ventilator). Next, we identified SNOMED as the standardized vocabulary to capture these data, and, in consultation with a clinician, created a list of 31 SNOMED codes for sites to use. We also identified a need to create custom codes for “room air” and other oxygen devices not mappable to the selected SNOMED codes.
Both the sites leading the data designs and contributing sites received funding from a mix of sources including RECOVER and the National Center for Data to Health (CD2H). Some additional sites voluntarily added data enhancements without supplemental funds.
Results
For each enhancement, we produced and published a Data Design document on GitHub.13 Each Data Design includes description of the data element, relevance to N3C, and scope; guidance on how to find data within source systems; and CDM-specific guidance for structuring data. As of June 2024, 29 sites have added at least one data enhancement to their N3C pipeline; 9 sites added long COVID clinic data, 29 sites added ADT transactions, 24 sites added SDOH data, and 26 sites added oxygen supplementation data. Table 1 characterizes the scale and breadth of data enhancements available within N3C. Notably, all these variables are mapped to common concept IDs. This is the value of our approach: these data are not only available, they are also harmonized and thus ready for analysis.
Concept ID . | Concept name . | Unique patient count . | Row count . |
---|---|---|---|
Long COVID clinic visits | |||
2004207791 | Long COVID clinic visit (Custom concept) | 20 616 | 66 152 |
ADT transactions | |||
581379 | Inpatient critical care facility | 557 579 | 1 974 865 |
8870 | Emergency room—hospital | 3 211 433 | 18 840 284 |
8717 | Inpatient hospital | 2 607 456 | 42 174 447 |
Social determinants of health | |||
37020730 | Has lack of transportation kept you from medical appointments, meetings, work, or from getting things needed for daily living | 1 487 100 | 5 107 339 |
42869557 | Housing status | 5239 | 10 085 |
37020172 | Are you worried about losing your housing [PRAPARE] | 173 614 | 333 074 |
1617701 | Has the electric, gas, oil, or water company threatened to shut off services in your home in past 12 months | 57 662 | 660 560 |
46234789 | How hard is it for you to pay for the very basics like food, housing, medical care, and heating | 1 052 412 | 2 700 296 |
36304041 | Within the past 12 months we worried whether our food would run out before we got money to buy more [U.S. FSS] | 1 520 247 | 4 272 580 |
36306143 | Within the past 12 months the food we bought just didn't last and we didn't have money to get more [U.S. FSS] | 1 594 609 | 3 779 441 |
46234787 | Do you belong to any clubs or organizations such as church groups unions, fraternal or athletic groups, or school groups [NHANES III] | 623 292 | 1 708 888 |
O2 devices | |||
4145694 | Aerosol oxygen mask | 177 679 | 548 016 |
45762031 | Aerosol tent, adult | 1275 | 4104 |
45764548 | Aerosol tent, pediatric | 39 | 39 |
4160626 | Ambu bag | 18 030 | 31 180 |
45760842 | Basic nasal oxygen cannula | 1 552 544 | 92 742 933 |
4138614 | BiPAP oxygen nasal cannula | 63 863 | 7 379 447 |
4137849 | Blow by oxygen mask | 89 028 | 319 210 |
4243754 | Continuing positive airway pressure unit | 91 470 | 1 093 891 |
45768222 | Continuous positive airway pressure/Bilevel positive airway pressure mask | 52 482 | 844 593 |
45761494 | CPAP nasal oxygen cannula | 37 162 | 623 194 |
4138487 | Face tent oxygen delivery device | 98 846 | 280 029 |
4139525 | High flow oxygen nasal cannula | 190 445 | 24 223 271 |
2004208004 | N3C: Other oxygen device | 562 232 | 54 250 699 |
2004208005 | N3C: Room air | 2 117 944 | 276 653 221 |
4145528 | Nonrebreather oxygen mask | 197 341 | 1 829 502 |
45771595 | Non-rebreathing oxygen face mask | 90 | 138 |
45759373 | Oxygen administration face tent | 2355 | 183 622 |
4222966 | Oxygen mask | 366 970 | 1 399 713 |
36715214 | Oxygen mustache cannula | 3063 | 14 296 |
36715213 | Oxygen pendant cannula | 4626 | 8454 |
36715212 | Oxygen reservoir cannula | 6660 | 58 914 |
4138916 | Oxygen ventilator | 293 627 | 10 260 319 |
4145529 | Oxyhood | 2384 | 12 587 |
4138748 | Partial rebreather oxygen mask | 999 | 2600 |
45759146 | Partial-rebreathing oxygen face mask | 25 253 | 34 185 |
4188569 | T piece with bag | 704 | 5185 |
4188570 | T piece without bag | 4660 | 51 541 |
45759811 | Tracheostomy mask, aerosol | 4354 | 806 323 |
45760219 | Tracheostomy mask, oxygen | 13 057 | 1 680 369 |
4144319 | Transtracheal oxygen catheter | 4330 | 11 510 |
45768197 | Ventilator | 154 139 | 12 342 695 |
4322904 | Venturi mask | 33 889 | 614 863 |
45759930 | Venturi oxygen face mask | 5666 | 13 046 |
Concept ID . | Concept name . | Unique patient count . | Row count . |
---|---|---|---|
Long COVID clinic visits | |||
2004207791 | Long COVID clinic visit (Custom concept) | 20 616 | 66 152 |
ADT transactions | |||
581379 | Inpatient critical care facility | 557 579 | 1 974 865 |
8870 | Emergency room—hospital | 3 211 433 | 18 840 284 |
8717 | Inpatient hospital | 2 607 456 | 42 174 447 |
Social determinants of health | |||
37020730 | Has lack of transportation kept you from medical appointments, meetings, work, or from getting things needed for daily living | 1 487 100 | 5 107 339 |
42869557 | Housing status | 5239 | 10 085 |
37020172 | Are you worried about losing your housing [PRAPARE] | 173 614 | 333 074 |
1617701 | Has the electric, gas, oil, or water company threatened to shut off services in your home in past 12 months | 57 662 | 660 560 |
46234789 | How hard is it for you to pay for the very basics like food, housing, medical care, and heating | 1 052 412 | 2 700 296 |
36304041 | Within the past 12 months we worried whether our food would run out before we got money to buy more [U.S. FSS] | 1 520 247 | 4 272 580 |
36306143 | Within the past 12 months the food we bought just didn't last and we didn't have money to get more [U.S. FSS] | 1 594 609 | 3 779 441 |
46234787 | Do you belong to any clubs or organizations such as church groups unions, fraternal or athletic groups, or school groups [NHANES III] | 623 292 | 1 708 888 |
O2 devices | |||
4145694 | Aerosol oxygen mask | 177 679 | 548 016 |
45762031 | Aerosol tent, adult | 1275 | 4104 |
45764548 | Aerosol tent, pediatric | 39 | 39 |
4160626 | Ambu bag | 18 030 | 31 180 |
45760842 | Basic nasal oxygen cannula | 1 552 544 | 92 742 933 |
4138614 | BiPAP oxygen nasal cannula | 63 863 | 7 379 447 |
4137849 | Blow by oxygen mask | 89 028 | 319 210 |
4243754 | Continuing positive airway pressure unit | 91 470 | 1 093 891 |
45768222 | Continuous positive airway pressure/Bilevel positive airway pressure mask | 52 482 | 844 593 |
45761494 | CPAP nasal oxygen cannula | 37 162 | 623 194 |
4138487 | Face tent oxygen delivery device | 98 846 | 280 029 |
4139525 | High flow oxygen nasal cannula | 190 445 | 24 223 271 |
2004208004 | N3C: Other oxygen device | 562 232 | 54 250 699 |
2004208005 | N3C: Room air | 2 117 944 | 276 653 221 |
4145528 | Nonrebreather oxygen mask | 197 341 | 1 829 502 |
45771595 | Non-rebreathing oxygen face mask | 90 | 138 |
45759373 | Oxygen administration face tent | 2355 | 183 622 |
4222966 | Oxygen mask | 366 970 | 1 399 713 |
36715214 | Oxygen mustache cannula | 3063 | 14 296 |
36715213 | Oxygen pendant cannula | 4626 | 8454 |
36715212 | Oxygen reservoir cannula | 6660 | 58 914 |
4138916 | Oxygen ventilator | 293 627 | 10 260 319 |
4145529 | Oxyhood | 2384 | 12 587 |
4138748 | Partial rebreather oxygen mask | 999 | 2600 |
45759146 | Partial-rebreathing oxygen face mask | 25 253 | 34 185 |
4188569 | T piece with bag | 704 | 5185 |
4188570 | T piece without bag | 4660 | 51 541 |
45759811 | Tracheostomy mask, aerosol | 4354 | 806 323 |
45760219 | Tracheostomy mask, oxygen | 13 057 | 1 680 369 |
4144319 | Transtracheal oxygen catheter | 4330 | 11 510 |
45768197 | Ventilator | 154 139 | 12 342 695 |
4322904 | Venturi mask | 33 889 | 614 863 |
45759930 | Venturi oxygen face mask | 5666 | 13 046 |
ADT = Admission, Discharge, and Transfer; BiPAP = bilevel positive airway pressure; CPAP = continuous positive airway pressure; N3C = National COVID Cohort Collaborative.
Concept ID . | Concept name . | Unique patient count . | Row count . |
---|---|---|---|
Long COVID clinic visits | |||
2004207791 | Long COVID clinic visit (Custom concept) | 20 616 | 66 152 |
ADT transactions | |||
581379 | Inpatient critical care facility | 557 579 | 1 974 865 |
8870 | Emergency room—hospital | 3 211 433 | 18 840 284 |
8717 | Inpatient hospital | 2 607 456 | 42 174 447 |
Social determinants of health | |||
37020730 | Has lack of transportation kept you from medical appointments, meetings, work, or from getting things needed for daily living | 1 487 100 | 5 107 339 |
42869557 | Housing status | 5239 | 10 085 |
37020172 | Are you worried about losing your housing [PRAPARE] | 173 614 | 333 074 |
1617701 | Has the electric, gas, oil, or water company threatened to shut off services in your home in past 12 months | 57 662 | 660 560 |
46234789 | How hard is it for you to pay for the very basics like food, housing, medical care, and heating | 1 052 412 | 2 700 296 |
36304041 | Within the past 12 months we worried whether our food would run out before we got money to buy more [U.S. FSS] | 1 520 247 | 4 272 580 |
36306143 | Within the past 12 months the food we bought just didn't last and we didn't have money to get more [U.S. FSS] | 1 594 609 | 3 779 441 |
46234787 | Do you belong to any clubs or organizations such as church groups unions, fraternal or athletic groups, or school groups [NHANES III] | 623 292 | 1 708 888 |
O2 devices | |||
4145694 | Aerosol oxygen mask | 177 679 | 548 016 |
45762031 | Aerosol tent, adult | 1275 | 4104 |
45764548 | Aerosol tent, pediatric | 39 | 39 |
4160626 | Ambu bag | 18 030 | 31 180 |
45760842 | Basic nasal oxygen cannula | 1 552 544 | 92 742 933 |
4138614 | BiPAP oxygen nasal cannula | 63 863 | 7 379 447 |
4137849 | Blow by oxygen mask | 89 028 | 319 210 |
4243754 | Continuing positive airway pressure unit | 91 470 | 1 093 891 |
45768222 | Continuous positive airway pressure/Bilevel positive airway pressure mask | 52 482 | 844 593 |
45761494 | CPAP nasal oxygen cannula | 37 162 | 623 194 |
4138487 | Face tent oxygen delivery device | 98 846 | 280 029 |
4139525 | High flow oxygen nasal cannula | 190 445 | 24 223 271 |
2004208004 | N3C: Other oxygen device | 562 232 | 54 250 699 |
2004208005 | N3C: Room air | 2 117 944 | 276 653 221 |
4145528 | Nonrebreather oxygen mask | 197 341 | 1 829 502 |
45771595 | Non-rebreathing oxygen face mask | 90 | 138 |
45759373 | Oxygen administration face tent | 2355 | 183 622 |
4222966 | Oxygen mask | 366 970 | 1 399 713 |
36715214 | Oxygen mustache cannula | 3063 | 14 296 |
36715213 | Oxygen pendant cannula | 4626 | 8454 |
36715212 | Oxygen reservoir cannula | 6660 | 58 914 |
4138916 | Oxygen ventilator | 293 627 | 10 260 319 |
4145529 | Oxyhood | 2384 | 12 587 |
4138748 | Partial rebreather oxygen mask | 999 | 2600 |
45759146 | Partial-rebreathing oxygen face mask | 25 253 | 34 185 |
4188569 | T piece with bag | 704 | 5185 |
4188570 | T piece without bag | 4660 | 51 541 |
45759811 | Tracheostomy mask, aerosol | 4354 | 806 323 |
45760219 | Tracheostomy mask, oxygen | 13 057 | 1 680 369 |
4144319 | Transtracheal oxygen catheter | 4330 | 11 510 |
45768197 | Ventilator | 154 139 | 12 342 695 |
4322904 | Venturi mask | 33 889 | 614 863 |
45759930 | Venturi oxygen face mask | 5666 | 13 046 |
Concept ID . | Concept name . | Unique patient count . | Row count . |
---|---|---|---|
Long COVID clinic visits | |||
2004207791 | Long COVID clinic visit (Custom concept) | 20 616 | 66 152 |
ADT transactions | |||
581379 | Inpatient critical care facility | 557 579 | 1 974 865 |
8870 | Emergency room—hospital | 3 211 433 | 18 840 284 |
8717 | Inpatient hospital | 2 607 456 | 42 174 447 |
Social determinants of health | |||
37020730 | Has lack of transportation kept you from medical appointments, meetings, work, or from getting things needed for daily living | 1 487 100 | 5 107 339 |
42869557 | Housing status | 5239 | 10 085 |
37020172 | Are you worried about losing your housing [PRAPARE] | 173 614 | 333 074 |
1617701 | Has the electric, gas, oil, or water company threatened to shut off services in your home in past 12 months | 57 662 | 660 560 |
46234789 | How hard is it for you to pay for the very basics like food, housing, medical care, and heating | 1 052 412 | 2 700 296 |
36304041 | Within the past 12 months we worried whether our food would run out before we got money to buy more [U.S. FSS] | 1 520 247 | 4 272 580 |
36306143 | Within the past 12 months the food we bought just didn't last and we didn't have money to get more [U.S. FSS] | 1 594 609 | 3 779 441 |
46234787 | Do you belong to any clubs or organizations such as church groups unions, fraternal or athletic groups, or school groups [NHANES III] | 623 292 | 1 708 888 |
O2 devices | |||
4145694 | Aerosol oxygen mask | 177 679 | 548 016 |
45762031 | Aerosol tent, adult | 1275 | 4104 |
45764548 | Aerosol tent, pediatric | 39 | 39 |
4160626 | Ambu bag | 18 030 | 31 180 |
45760842 | Basic nasal oxygen cannula | 1 552 544 | 92 742 933 |
4138614 | BiPAP oxygen nasal cannula | 63 863 | 7 379 447 |
4137849 | Blow by oxygen mask | 89 028 | 319 210 |
4243754 | Continuing positive airway pressure unit | 91 470 | 1 093 891 |
45768222 | Continuous positive airway pressure/Bilevel positive airway pressure mask | 52 482 | 844 593 |
45761494 | CPAP nasal oxygen cannula | 37 162 | 623 194 |
4138487 | Face tent oxygen delivery device | 98 846 | 280 029 |
4139525 | High flow oxygen nasal cannula | 190 445 | 24 223 271 |
2004208004 | N3C: Other oxygen device | 562 232 | 54 250 699 |
2004208005 | N3C: Room air | 2 117 944 | 276 653 221 |
4145528 | Nonrebreather oxygen mask | 197 341 | 1 829 502 |
45771595 | Non-rebreathing oxygen face mask | 90 | 138 |
45759373 | Oxygen administration face tent | 2355 | 183 622 |
4222966 | Oxygen mask | 366 970 | 1 399 713 |
36715214 | Oxygen mustache cannula | 3063 | 14 296 |
36715213 | Oxygen pendant cannula | 4626 | 8454 |
36715212 | Oxygen reservoir cannula | 6660 | 58 914 |
4138916 | Oxygen ventilator | 293 627 | 10 260 319 |
4145529 | Oxyhood | 2384 | 12 587 |
4138748 | Partial rebreather oxygen mask | 999 | 2600 |
45759146 | Partial-rebreathing oxygen face mask | 25 253 | 34 185 |
4188569 | T piece with bag | 704 | 5185 |
4188570 | T piece without bag | 4660 | 51 541 |
45759811 | Tracheostomy mask, aerosol | 4354 | 806 323 |
45760219 | Tracheostomy mask, oxygen | 13 057 | 1 680 369 |
4144319 | Transtracheal oxygen catheter | 4330 | 11 510 |
45768197 | Ventilator | 154 139 | 12 342 695 |
4322904 | Venturi mask | 33 889 | 614 863 |
45759930 | Venturi oxygen face mask | 5666 | 13 046 |
ADT = Admission, Discharge, and Transfer; BiPAP = bilevel positive airway pressure; CPAP = continuous positive airway pressure; N3C = National COVID Cohort Collaborative.
Discussion
CDMs significantly increase our ability to integrate health data across institutions,16–19 and they have been critical to the success of N3C.20 However, the focus of CDMs is necessarily on the most common use cases and data elements, such as demographics, diagnoses, procedures, medications, and labs. It is not practical to expect CDM developers to anticipate every use case and data element that may be required. However, when studies require data not available in the CDMs, the potential of either data federation or data centralization cannot be fully realized. We propose collaborative modeling of project-specific data enhancements, like the approach outlined here, as a pragmatic solution to this problem. Such an approach can benefit not only the study at hand, but future studies that use CDM data. For example, one site uses enhanced SDOH data to support other ongoing studies. In this discussion, we identify factors for success and potential drawbacks.
Factors for success
Get the right people in the room. Researchers, with knowledge of the data needs, and informaticians, with expertise in the CDMs, must be engaged. Moreover, those with CDM expertise should be available to guide sites through challenges.
Conform to CDM standards. Data designs should comply with CDM requirements and best practices so that the data can be readily re-used in other studies.
Use standardized vocabularies where possible. This will ensure the meaning of data is understandable to researchers and for future projects.
Create clear, accessible documentation. Easy-to-understand documentation should be readily available to all contributing sites and users of the data.
Funding is a necessity. This effort requires time and expertise; both lead and contributing sites should be funded appropriately.
Build in quality checks. Accuracy of the data relied on sites carefully following the specifications. This was not universally true. In some cases, after receiving data, we needed to review guidance with sites. They were then able to make modifications and submit data that conformed with the specifications.
Drawbacks and further considerations
This approach offers a lot of promise to support the future of CDMs, but it has its drawbacks. Because this model is driven by specific use cases, there is a risk of the data not being generalizable to other studies. This can be mitigated if data modelers consider this and attempt to make their guidance broadly applicable. This approach could also lead to the proliferation of conflicting guidelines. A centralized catalog of project-specific CDM modeling guidance would alleviate this. Finally, this effort is time-intensive for both the data designers and the data contributors, and adequate funding may not be available in typical study budgets. Because of this, project-specific enhancements should not be the only approach for expanding CDMS: there is still a place for data networks to dedicate effort toward data modeling and expansion high-need, cross-cutting areas.
Conclusion
In this initiative, we rapidly produced project-specific data modeling guidance and documentation in support of long COVID research while maintaining a commitment to terminology standards and harmonized data. By engaging researchers, we were able to provide useful data while limiting undue burden on contributing sites. Our approach has allowed sites to enhance both N3C and their local CDMs. In order to support reusability, involvement of SMEs from the CDMs was critical to ensure the guidance conformed to CDM best practices. The use of standardized vocabularies, detailed documentation, and compliance with that documentation further contributes to reusability.
Acknowledgments
This research was possible because of the patients whose information is included within the data and the organizations (https://ncats.nih.gov/n3c/resources/data-contribution/data-transfer-agreement-signatories) and scientists who have contributed to the on-going development of this community resource (https://doi.org/10.1093/jamia/ocaa196).
Individual acknowledgements for N3C core contributors
We gratefully acknowledge the following core contributors to N3C: Adam B. Wilcox, Alexis Graves, Alfred (Jerrod) Anzalone, Amin Manna, Amit Saha, Amy Olex, Andrea Zhou, Andrew E. Williams, Andrew Southerland, Andrew T. Girvin, Anita Walden, Anjali A. Sharathkumar, Benjamin Amor, Benjamin Bates, Brian Hendricks, Brijesh Patel, Caleb Alexander, Carolyn Bramante, Cavin Ward-Caviness, Charisse Madlock-Brown, Christine Suver, Christopher Chute, Christopher Dillon, Chunlei Wu, Clare Schmitt, Cliff Takemoto, Dan Housman, Davera Gabriel, David A. Eichmann, Diego Mazzotti, Don Brown, Eilis Boudreau, Elaine Hill, Elizabeth Zampino, Emily Carlson Marti, Evan French, Farrukh M Koraishy, Federico Mariona, Fred Prior, George Sokos, Greg Martin, Harold Lehmann, Heidi Spratt, Hemalkumar Mehta, Hongfang Liu, Hythem Sidky, J.W. Awori Hayanga, Jami Pincavitch, Jaylyn Clark, Jeremy Richard Harper, Jessica Islam, Jin Ge, Joel Gagnier, Joel H. Saltz, Joel Saltz, Johanna Loomba, John Buse, Jomol Mathew, Joni L. Rutter, Julie A. McMurry, Justin Guinney, Justin Starren, Karen Crowley, Katie Rebecca Bradwell, Ken Wilkins, Kenneth R. Gersing, Kenrick Dwain Cato, Kimberly Murray, Lavance Northington, Lee Allan Pyles, Leonie Misquitta, Lesley Cottrell, Lili Portilla, Mariam Deacy, Mark M. Bissell, Mary Emmett, Mary Morrison Saltz, Melissa A. Haendel, Meredith Adams, Meredith Temple-O'Connor, Michael G. Kurilla, Nabeel Qureshi, Nasia Safdar, Nicole Garbarini, Noha Sharafeldin, Ofer Sadan, Patricia A. Francis, Penny Wung Burgoon, Peter Robinson, Philip R.O. Payne, Rafael Fuentes, Randeep Jawa, Rebecca Erwin-Cohen, Rena Patel, Richard A. Moffitt, Richard L. Zhu, Rishi Kamaleswaran, Robert Hurley, Saiju Pyarajan, Sam G. Michael, Samuel Bozzette, Sandeep Mallipattu, Satyanarayana Vedula, Scott Chapman, Shawn T. O'Neil, Soko Setoguchi, Steve Johnson, Tellen D. Bennett, Tiffany Callahan, Umit Topaloglu, Usman Sheikh, Valery Gordon, Vignesh Subbian, Warren A. Kibbe, Wenndy Hernandez, Will Beasley, Will Cooper, William Hillegass, and Xiaohan Tanner Zhang. Details of contributions available at covid.cd2h.org/core-contributors.
Data partners with released data in N3C
The following institutions whose data are released or pending: Available—Advocate Health Care Network—UL1TR002389: The Institute for Translational Medicine (ITM) • Aurora Health Care Inc—UL1TR002373: Wisconsin Network For Health Research • Boston University Medical Campus—UL1TR001430: Boston University Clinical and Translational Science Institute • Brown University—U54GM115677: Advance Clinical Translational Research (Advance-CTR) • Carilion Clinic—UL1TR003015: iTHRIV Integrated Translational health Research Institute of Virginia • Case Western Reserve University—UL1TR002548: The Clinical & Translational Science Collaborative of Cleveland (CTSC) • Charleston Area Medical Center—U54GM104942: West Virginia Clinical and Translational Science Institute (WVCTSI) • Children’s Hospital Colorado—UL1TR002535: Colorado Clinical and Translational Sciences Institute • Columbia University Irving Medical Center—UL1TR001873: Irving Institute for Clinical and Translational Research • Dartmouth College—None (Voluntary) Duke University—UL1TR002553: Duke Clinical and Translational Science Institute • George Washington Children’s Research Institute—UL1TR001876: Clinical and Translational Science Institute at Children’s National (CTSA-CN) • George Washington University—UL1TR001876: Clinical and Translational Science Institute at Children’s National (CTSA-CN) • Harvard Medical School—UL1TR002541: Harvard Catalyst • Indiana University School of Medicine—UL1TR002529: Indiana Clinical and Translational Science Institute • Johns Hopkins University—UL1TR003098: Johns Hopkins Institute for Clinical and Translational Research • Louisiana Public Health Institute—None (Voluntary) • Loyola Medicine—Loyola University Medical Center • Loyola University Medical Center—UL1TR002389: The Institute for Translational Medicine (ITM) • Maine Medical Center—U54GM115516: Northern New England Clinical & Translational Research (NNE-CTR) Network • Mary Hitchcock Memorial Hospital & Dartmouth Hitchcock Clinic—None (Voluntary) • Massachusetts General Brigham—UL1TR002541: Harvard Catalyst • Mayo Clinic Rochester—UL1TR002377: Mayo Clinic Center for Clinical and Translational Science (CCaTS) • Medical University of South Carolina—UL1TR001450: South Carolina Clinical & Translational Research Institute (SCTR) • MITRE Corporation—None (Voluntary) • Montefiore Medical Center—UL1TR002556: Institute for Clinical and Translational Research at Einstein and Montefiore • Nemours—U54GM104941: Delaware CTR ACCEL Program • NorthShore University HealthSystem—UL1TR002389: The Institute for Translational Medicine (ITM) • Northwestern University at Chicago—UL1TR001422: Northwestern University Clinical and Translational Science Institute (NUCATS) • OCHIN—INV-018455: Bill and Melinda Gates Foundation grant to Sage Bionetworks • Oregon Health & Science University—UL1TR002369: Oregon Clinical and Translational Research Institute • Penn State Health Milton S. Hershey Medical Center—UL1TR002014: Penn State Clinical and Translational Science Institute • Rush University Medical Center—UL1TR002389: The Institute for Translational Medicine (ITM) • Rutgers, The State University of New Jersey—UL1TR003017: New Jersey Alliance for Clinical and Translational Science • Stony Brook University—U24TR002306 • The Alliance at the University of Puerto Rico, Medical Sciences Campus—U54GM133807: Hispanic Alliance for Clinical and Translational Research (The Alliance) • The Ohio State University—UL1TR002733: Center for Clinical and Translational Science • The State University of New York at Buffalo—UL1TR001412: Clinical and Translational Science Institute • The University of Chicago—UL1TR002389: The Institute for Translational Medicine (ITM) • The University of Iowa—UL1TR002537: Institute for Clinical and Translational Science • The University of Miami Leonard M. Miller School of Medicine—UL1TR002736: University of Miami Clinical and Translational Science Institute • The University of Michigan at Ann Arbor—UL1TR002240: Michigan Institute for Clinical and Health Research • The University of Texas Health Science Center at Houston—UL1TR003167: Center for Clinical and Translational Sciences (CCTS) • The University of Texas Medical Branch at Galveston—UL1TR001439: The Institute for Translational Sciences • The University of Utah—UL1TR002538: Uhealth Center for Clinical and Translational Science • Tufts Medical Center—UL1TR002544: Tufts Clinical and Translational Science Institute • Tulane University—UL1TR003096: Center for Clinical and Translational Science • The Queens Medical Center—None (Voluntary) • University Medical Center New Orleans—U54GM104940: Louisiana Clinical and Translational Science (LA CaTS) Center • University of Alabama at Birmingham—UL1TR003096: Center for Clinical and Translational Science • University of Arkansas for Medical Sciences—UL1TR003107: UAMS Translational Research Institute • University of Cincinnati—UL1TR001425: Center for Clinical and Translational Science and Training • University of Colorado Denver, Anschutz Medical Campus—UL1TR002535: Colorado Clinical and Translational Sciences Institute • University of Illinois at Chicago—UL1TR002003: UIC Center for Clinical and Translational Science • University of Kansas Medical Center—UL1TR002366: Frontiers: University of Kansas Clinical and Translational Science Institute • University of Kentucky—UL1TR001998: UK Center for Clinical and Translational Science • University of Massachusetts Medical School Worcester—UL1TR001453: The UMass Center for Clinical and Translational Science (UMCCTS) • University Medical Center of Southern Nevada—None (voluntary) • University of Minnesota—UL1TR002494: Clinical and Translational Science Institute • University of Mississippi Medical Center—U54GM115428: Mississippi Center for Clinical and Translational Research (CCTR) • University of Nebraska Medical Center—U54GM115458: Great Plains IDeA-Clinical & Translational Research • University of North Carolina at Chapel Hill—UL1TR002489: North Carolina Translational and Clinical Science Institute • University of Oklahoma Health Sciences Center—U54GM104938: Oklahoma Clinical and Translational Science Institute (OCTSI) • University of Pittsburgh—UL1TR001857: The Clinical and Translational Science Institute (CTSI) • University of Pennsylvania—UL1TR001878: Institute for Translational Medicine and Therapeutics • University of Rochester—UL1TR002001: UR Clinical & Translational Science Institute • University of Southern California—UL1TR001855: The Southern California Clinical and Translational Science Institute (SC CTSI) • University of Vermont—U54GM115516: Northern New England Clinical & Translational Research (NNE-CTR) Network • University of Virginia—UL1TR003015: iTHRIV Integrated Translational health Research Institute of Virginia • University of Washington—UL1TR002319: Institute of Translational Health Sciences • University of Wisconsin-Madison—UL1TR002373: UW Institute for Clinical and Translational Research • Vanderbilt University Medical Center—UL1TR002243: Vanderbilt Institute for Clinical and Translational Research • Virginia Commonwealth University—UL1TR002649: C. Kenneth and Dianne Wright Center for Clinical and Translational Research • Wake Forest University Health Sciences—UL1TR001420: Wake Forest Clinical and Translational Science Institute • Washington University in St Louis—UL1TR002345: Institute of Clinical and Translational Sciences • Weill Medical College of Cornell University—UL1TR002384: Weill Cornell Medicine Clinical and Translational Science Center • West Virginia University—U54GM104942: West Virginia Clinical and Translational Science Institute (WVCTSI). Submitted—Icahn School of Medicine at Mount Sinai—UL1TR001433: ConduITS Institute for Translational Sciences • The University of Texas Health Science Center at Tyler—UL1TR003167: Center for Clinical and Translational Sciences (CCTS) • University of California, Davis—UL1TR001860: UCDavis Health Clinical and Translational Science Center • University of California, Irvine—UL1TR001414: The UC Irvine Institute for Clinical and Translational Science (ICTS) • University of California, Los Angeles—UL1TR001881: UCLA Clinical Translational Science Institute • University of California, San Diego—UL1TR001442: Altman Clinical and Translational Research Institute • University of California, San Francisco—UL1TR001872: UCSF Clinical and Translational Science Institute • NYU Langone Health Clinical Science Core, Data Resource Core, and PASC Biorepository Core—OTA-21-015A: Post-Acute Sequelae of SARS-CoV-2 Infection Initiative (RECOVER). Pending—Arkansas Children’s Hospital—UL1TR003107: UAMS Translational Research Institute • Baylor College of Medicine—None (Voluntary) • Children’s Hospital of Philadelphia—UL1TR001878: Institute for Translational Medicine and Therapeutics • Cincinnati Children’s Hospital Medical Center—UL1TR001425: Center for Clinical and Translational Science and Training • Emory University—UL1TR002378: Georgia Clinical and Translational Science Alliance • HonorHealth—None (Voluntary) • Loyola University Chicago—UL1TR002389: The Institute for Translational Medicine (ITM) • Medical College of Wisconsin—UL1TR001436: Clinical and Translational Science Institute of Southeast Wisconsin • MedStar Health Research Institute—None (Voluntary) • Georgetown University—UL1TR001409: The Georgetown-Howard Universities Center for Clinical and Translational Science (GHUCCTS) • MetroHealth—None (Voluntary) • Montana State University—U54GM115371: American Indian/Alaska Native CTR • NYU Langone Medical Center—UL1TR001445: Langone Health’s Clinical and Translational Science Institute • Ochsner Medical Center—U54GM104940: Louisiana Clinical and Translational Science (LA CaTS) Center • Regenstrief Institute—UL1TR002529: Indiana Clinical and Translational Science Institute • Sanford Research—None (Voluntary) • Stanford University—UL1TR003142: Spectrum: The Stanford Center for Clinical and Translational Research and Education • The Rockefeller University—UL1TR001866: Center for Clinical and Translational Science • The Scripps Research Institute—UL1TR002550: Scripps Research Translational Institute • University of Florida—UL1TR001427: UF Clinical and Translational Science Institute • University of New Mexico Health Sciences Center—UL1TR001449: University of New Mexico Clinical and Translational Science Center • University of Texas Health Science Center at San Antonio—UL1TR002645: Institute for Integration of Medicine and Science • Yale New Haven Hospital—UL1TR001863: Yale Center for Clinical Investigation.
Author contributions
Manuscript drafting: Kellie M. Walters and Emily R. Pfaff. Development of data designs and documentation: Kellie M. Walters, Marshall Clark, Sofia Dard, Adam M. Lee, Kristin Kostka, Robert T. Miller, Michele Morris, Matvey B. Palchuk, and Emily R. Pfaff. Project management: Elizabeth Kelly. Data harmonization and quality assurance: Stephanie S. Hong, Sofia Dard, and Emily R. Pfaff. Manuscript revisions and final approval: Kellie M. Walters, Marshall Clark, Sofia Dard, Stephanie S. Hong, Elizabeth Kelly, Kristin Kostka, Adam M. Lee, Robert T. Miller, Michele Morris, Matvey B. Palchuk, and Emily R. Pfaff. The N3C Consortium and RECOVER review committees also reviewed the manuscript using their standard processes.
Supplementary material
Supplementary material is available at Journal of the American Medical Informatics Association online.
Funding
The analyses described in this publication were conducted with data or tools accessed through the National Center for Advancing Translational Sciences (NCATS) N3C Data Enclave https://covid.cd2h.org and N3C Attribution & Publication Policy v 1.2-2020-08-25b supported by NCATS Contract No. 75N95023D00001, Axle Informatics Subcontract: NCATS-P00438-B, and OTA OT2HL161847 as part of the Researching COVID to Enhance Recovery (RECOVER).
Conflicts of interest
The authors have no competing interests related to this work.
Data availability
N3C/RECOVER: The N3C Data Enclave is managed under the authority of the NIH; information can be found at ncats.nih.gov/n3c/resources. Enclave data are protected and can be accessed for COVID-related research with an approved (1) IRB protocol and (2) Data Use Request (DUR). Enclave and data access instructions can be found at https://covid.cd2h.org/for-researchers.
Disclaimer
The N3C Publication committee confirmed that this manuscript msid: 2011.443 is in accordance with N3C data use and attribution policies; however, this content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the N3C program.
Institutional Review Board
The N3C data transfer to NCATS is performed under a Johns Hopkins University Reliance Protocol # IRB00249128 or individual site agreements with NIH. The N3C Data Enclave is managed under the authority of the NIH; information can be found at https://ncats.nih.gov/n3c/resources. The work was performed under DUR RP-E7676B.
References
RECOVER. Accessed November 9, 2023. https://recovercovid.org/