-
PDF
- Split View
-
Views
-
Cite
Cite
Anibal García-Sempere, Alejandro Orrico-Sánchez, Cintia Muñoz-Quiles, Isabel Hurtado, Salvador Peiró, Gabriel Sanfélix-Gimeno, Javier Diez-Domingo, Data Resource Profile: The Valencia Health System Integrated Database (VID), International Journal of Epidemiology, Volume 49, Issue 3, June 2020, Pages 740–741e, https://doi.org/10.1093/ije/dyz266
- Share Icon Share
Data resource basics
The Valencia Health System Integrated Database (VID) is a set of multiple, public, population-wide electronic databases for the Valencia Region, the fourth most populated Spanish region, with ∼5 million inhabitants and an annual birth cohort of 48 000 newborns, representing 10.7% of the Spanish population and around 1% of the European population. The VID provides exhaustive longitudinal information including sociodemographic and administrative data (sex, age, nationality, etc.), clinical (diagnoses, procedures, diagnostic tests, imaging, etc.), pharmaceutical (prescription, dispensation) and healthcare utilization data from hospital care, emergency departments, specialized care (including mental and obstetrics care), primary care and other public health services. It also includes a set of associated population databases and registries of significant care areas such as cancer, rare diseases, vaccines, congenital anomalies, microbiology and others, and also public health databases from the population screening programmes. All the information in the VID databases can be linked at the individual level through a single personal identification code. The databases were initiated at different moments in time (see details in the Data collected section), but all in all the VID provides comprehensive individual-level data fed by all the databases from 2008 to date.
The VID in the context of the Spanish National Health System
The Spanish National Health System (SNHS) is the result of a system consolidation process started in 1978 and leading to the nearly universal coverage of all citizens, providing care based on need and free at the point of delivery, except for a cost-sharing scheme for pharmaceuticals dispensed out of hospitals.1 Care delivery is mainly undertaken through a network of publicly owned, staffed and operated inpatient and outpatient centres. In 2002 a process of devolution to the 17 regions that comprise the Spanish state was completed. Each regional NHS is geographically organized into Primary Healthcare Districts (around 5000–25 000 people served by one Primary Care Centre), which in turn are embedded into Healthcare Departments (about 150 000–250 000 people served by one public hospital). Each regional NHS develops and operates its own information systems and the development of real-world data (RWD) research capabilities is heterogeneous, the Valencia Health System (VHS) in the region of Valencia being among the best in terms of data availability and the linkage capacity of databases at a population level.
Data collected
Data are sourced from a variety of datasets owned by the Health Department of the Valencia Regional Government. All data included in the databases can be obtained at the individual level. The type of available data, measurements collected and update frequency is different for each dataset. The main characteristics of each dataset are described below and in Fig. 1.

The Valencia Health System Integrated Database (VID); VIS, Vaccine Information System; RedMIVA, Microbiological Surveillance System; CIS, Cancer Information System; SIER-CV, Rare Diseases Information System; CAR, Congenital Abnormalities Registry; BIMCV, Medical Image Bank; CRC, Catalogue of Corporate Resources; MBDS, Minimum Basic hospital Data Set; AED, Accident & Emergency Department record; GAIA, Pharmaceutical Module; SIA, Ambulatory Information System.
The Population Information System (Sistema de información Poblacional, SIP) is a region-wide database that provides basic information on VHS coverage (dates and causes of VHS entitlement or disentitlement, insurance modality, pharmaceutical copayment status, assigned Healthcare Department, Primary Healthcare District and primary care doctor, etc.) and also some sociodemographic data such as sex, date of birth, nationality, country of origin, previous year income strata, employment status, risk of social exclusion, geographic location, address and other administrative data. Importantly, the SIP database includes the date of death captured from the Mortality Registry. The SIP database is paramount to the VID as it is the source of the individual, exclusive and permanent identifier number associated to each individual (the SIP number) that is then used throughout the rest of the databases, allowing data linkage across the multiple databases in the network (see Fig. 1).
The Ambulatory Medical Record (ABUCASIS) was implemented in 2006 as the electronic medical record (EMR) for primary and specialized outpatient activity, reaching 96% population coverage from 2009. ABUCASIS is integrated by two main modules: the Ambulatory Information System (Sistema de Información Ambulatoria, SIA) and the Pharmaceutical Module (Gestor Integral de la Prestación Farmacéutica, GAIA), including paediatric and adult primary care, mental health care, prenatal care and specialist outpatient services, as well as providing information about dates, visits, procedures, lab test results, diagnoses, clinical and lifestyle information. It also includes information on several health programmes (healthy children, vaccines, pregnancy, notifiable diseases, etc.), the primary care nurse clinical record and the health-related social assistance record. The SIA module uses the International Classification of Diseases 9th revision Clinical Modification (CIE9CM) for coding diagnoses. The SIA also uses the Clinical Risk Groups (CRG) system (3 MTM)2 to stratify the morbidity of the entire population.
The GAIA pharmaceutical module stores data on all outpatient pharmaceutical prescriptions and dispensations, including both primary care and outpatient hospital departments, using the Anatomical Therapeutic Chemical (ATC) classification system and the National Pharmaceutical Catalogue which allow the identification of the exact content of each dispensation. In-hospital medication is not included. GAIA provides detailed information on prescriptions issued by physicians, such as the duration of treatment and dosage. GAIA includes a comprehensive e-prescription paper-free system connected to all community pharmacies in the region that permits the linkage of individual prescriptions and dispensations through a specific prescription identification number. This results in a competitive advantage with regard to other pharmaceutical databases that usually only have dispensation information from pharmacy claims and enables a refined estimation of common and relevant research features such as medication adherence.
The Hospital Medical Record (ORION) has been in implementation since 2008 and provides comprehensive information covering all areas of specialized care from admission, outpatient consultations, hospitalization, emergencies, diagnostic services (labs, imaging, microbiology, pathology, etc.), pharmacy, surgical block including day surgery, critical care, prevention and safety, social work, at-home hospitalization and day hospitalization. ORION is currently in the process of being integrated for the whole region, with several databases already fully integrated and available for all hospitals, including the Minimum Basic Data Set at Hospital Discharge (MBDS) and the Accident & Emergency Department (AED) clinical record.
The MBDS is a synopsis of clinical and administrative information on all hospital admissions and major ambulatory surgery in the VHS hospitals, including public–private partnership hospitals (around 450 000 admissions per year in the region). The MBDS includes admission and discharge dates, age, sex, geographical area and zone of residence, main diagnosis at discharge, up to 30 secondary diagnoses (comorbidities or complications), clinical procedures performed during the hospital episode and the Diagnosis Related Groups (DRG) assigned at discharge. The MBDS used the ICD9CM system for coding until December 2015 and the ICD10ES (a Spanish translation of the ICD10CM) thereafter. The MBDS was extended in 2015 to include the ‘present on admission’ (POA) diagnosis marker and information on tumour morphology, as well as information on admissions from private hospitals.
The AED clinical record was launched in 2008 and collects triage data, diagnoses, tests and procedures performed in public emergency rooms. As with the MBDS, the coding system used was ICD9CM until December 2015 and ICD10ES thereafter. Diagnosis codification has been increasing from about 45% of all emergency department visits between 2008 and 2014 up to around 75% in 2017, basically due to the progressive incorporation of hospital coding.
The Corporate Resources Catalogue (Catálogo de Recursos Corporativos, CRC) provides information on the geographical and functional organization of the provision of care in the region (distribution of hospitals, primary care centres, etc.) and health care professionals (including age, gender and specialty).
The Microbiological Surveillance Network (Red de Vigilancia Microbiológica de la Comunidad Valenciana, RedMIVA) contains the results of the microbiological analyses performed in the VHS. Data is transferred from the laboratories to the RedMIVA database on a daily basis, providing real-time detection of circulating microorganisms and resistance patterns, and enabling microbiological surveillance. Importantly, RedMIVA gathers not only positive but also negative determinations. This database has been available since 2008.
The Vaccine Information System (Sistema de Información Vacunal, SIV) stores all the information on vaccination in the VHS since 2000, though data are only considered reliable after 2005. Available data include vaccine by type, manufacturer, batch number, number of doses, location and administration date, adverse reactions related to vaccines, rejected vaccinations and, if applicable, risk groups.
The Cancer Information System integrates three population-based information resources. The Childhood Cancer Registry provides information on cancer in the population under 20 years old; the Castellón Tumour Registry provides information on cancer in the province of Castellón; and the Oncologic Information System (NEOS) integrates medical information from all patients with malignant tumours in the region. The System was created in 1999 and delivers information on incidence, prevalence, tumour site and tumour type from 2004.
The Rare Diseases Information System (Sistema de Información de Enfermedades Raras de la Comunidad Valenciana, SIER-CV) was created in 2012 to provide population-wide epidemiological information on rare diseases in the region, allowing for the analysis of incidence, prevalence, patient characteristics, geographical distribution, etc. It includes the Congenital Anomalies Registry, which has provided information from 2007 on the prevalence of congenital anomalies in the region and the exposure to teratogen agents, and allows for research on the aetiology of these diseases, including genetic and environmental risk factors and their interaction.
The Medical Imaging Databank (Biobanco de Imagen Médica de la Comunidad Valenciana, BIMCV) is a digital biobank of medical images that provides access to the images and associated clinical records of all imaging studies performed in the VHS, with an average of 5.3 million studies per year from 210 different imaging techniques. Access to these datasets is a breakthrough for research and population imaging studies. The BIMCV is part of the Spanish node of the European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences (Euro-BioImaging) and incorporates tools to anonymize radiological images.
In all databases in the VID, individual data are collected daily as part of the routine clinical care provided to patients. Accordingly, datasets are updated daily and hence data may be available for research up to the same day data are extracted. Only in some cases, such as the MBDS and the AED records, are data subject to a consolidation and quality check process before data are available for research, so in these cases data from the last quarter before the data extraction may be missing or non-consolidated.
Ethical clearance
Ethics approval by an accredited ethical research committee is required to access the data for research purposes (see Data resource access section). The Valencia Government Health Department ensures the anonymization of data by providing only de-identified datasets, unless researchers have the informed consent of patients to access their data. In the case of dynamic cohort studies, it maintains the pseudo-anonymization codes to allow the successive incorporation of information into the cohort.
Data resource use
The VID is a unique and far-reaching research tool that enables real-world data research to be conducted in epidemiological surveillance,3,4 population risk and burden of disease,5–10 healthcare resource and drug utilization,11–15 quality and appropriateness of care,16–18 medication adherence,19–24 evaluation of safety25–27 and effectiveness28–32 of therapies in the real world, spatio-temporal analysis,33–35 economic analysis36–38 or the analysis of the impact of policy interventions (such as copayments, warnings from regulatory agencies, etc.) on healthcare utilization and outcomes.39,40 Also worth noting is the presence of several cross-national studies,11,13,14,27 participation in the Atlas of Variations in Medical Practice in the SNHS,33–35,41 and the potential of the VID to develop post-authorization studies based on RWD that are increasingly demanded by regulators, payers, providers and patients. Moreover, some research groups currently collaborate with the European Medicines Agency and the Food and Drug Administration in regulatory projects using the VID data.
Strengths and weaknesses
Strengths
VID has several strengths and some differential features with regard to other information resources. First, it links population-wide healthcare data with sociodemographic and administrative data, which allows the study of the determinants of health and the consequences of illness and treatments at an individual level in the population. This allows for the inclusion in studies of populations that are usually excluded from experimental designs, such as pregnant women, the elderly, people with multiple chronic diseases or paediatric populations. Second, it allows for the construction and follow-up of large cohorts of patients over time and the development of longitudinal studies, enabling research on the adoption of technologies and the monitoring of outcomes in the long-term. Third, it is a population-based data network providing insight into a population of 5 million inhabitants. This large size allows for the analysis of small subgroups of population, or the identification of rare events that are not usually captured in clinical trials and other designs based on primary data. Fourth, data quality in some of the databases is high, such as the SIP, the pharmaceutical module or the CMBD (admissions data), RedMiva or the vaccines registry. Fifth, the cost of developing research and the timing of access to the data is considerably lower than in experimental designs such as clinical trials. Finally, the possibility of linking prescription and dispensation data at the individual level allows for an accurate analysis of drug utilization, such as medication adherence studies.
Weaknesses
Some of the databases that comprise the VID are subject to the limitations inherent to routine clinical practice electronic databases. There may be information biases due to absent registration (data completeness) or differing data recording practices (data accuracy, misclassification, heterogeneity) in the electronic databases, although this is an intrinsic problem of any repository using data from routine clinical practice. Data quality may be a strength in some databases, but also a weakness in other repositories or for certain data, such as incompleteness of early data from AED records or coding reliability of diagnostic information in the EMR. Also, we do not have information about people who are not in contact with the public healthcare service or who are attended to in the private sector. Finally, different datasets cover different periods and we lack data on specific mortality causes and in-hospital pharmaceutical prescription (the latter will be available in forthcoming years as it is currently in the process of being integrated as part of the ORION information system).
Data resource access
Any researcher may request anonymized data from the VHS. The transfer of this type of data (anonymized, but with some risk of re-identification, in accordance with European regulations) by the VHS requires that the request be accompanied by: (i) a complete study protocol that explains the planned use of data, (ii) the approval of the project by an ethics committee and, if it includes pharmaceutical data, (iii) the classification of the study by the Spanish Agency of Medicines (some classifications may warrant additional authorizations). The VHS Data Commission reviews these requests, and approves or otherwise each specific data transfer for research purposes. Authorization to access the data under these requirements should be requested electronically from the Management Office of the VHS Data Commission (http://www.san.gva.es/web/dgfps/acceso-a-la-aplicacion).
Following authorization, researchers are required to commit to keeping the data in a secure environment, to not attempting to re-identify or to cross with other databases, to not using the data for purposes or projects other than those specified in the project protocol (although a new authorization may be requested for these purposes) and to not transferring the data to third parties. These latter commitments limit the possibility of storing data in open data repositories or including data as supplementary material in published articles.
The Valencia Health System Integrated Database (VID) is the result of the linkage, by means of a single personal identification number, of a set of publicly owned population-wide healthcare, clinical and administrative electronic databases in the region of Valencia, Spain, which has provided comprehensive information for about 5 million inhabitants since 2008.
The VID is a powerful resource for conducting real-world research in healthcare and has some unique features when compared with other relevant data sources at a local and a European level, such as its population-wide coverage, the richness of linkable information at an individual level, and the inclusion of information not usually linkable at an individual level such as imaging, microbiological data, public health data or the ability to link prescription and dispensation data.
The VID includes sociodemographic and administrative information (sex, age, nationality, etc.) and healthcare information such as diagnoses, procedures, lab data, pharmaceutical prescriptions and dispensations, hospitalizations, mortality, healthcare utilization and public health data. It also includes a set of specific associated databases with population-wide information on significant care areas such as cancer, rare disease, vaccines and imaging data.
Access to the VID data may be requested by any researcher (providing the corresponding documentation required) from the Valencia Health System Data Commission (http://www.san.gva.es/web/dgfps/acceso-a-la-aplicacion).
Funding
The VID is funded by and is the property of the Valencia Government Health Department. Access to data for researchers has no financial cost but is covered by research ethics and authorization processes.
Conflict of interest: None declared.
References
Author notes
Anibal García-Sempere and Alejandro Orrico-Sánchez contributed equally to this work.