Data Resource Profile: Hospital Episode Statistics Admitted Patient Care (HES APC)

s published since July 2011. We identified 264 relevant publications where the primary analysis involved the use of HES APC data, and a further 130 papers where HES data had been linked to cohorts created in other datasets. The annual number of publications using HES APC data has increased from 2 in 1993 to 88 in 2015. Published studies using HES APC data have covered a diverse range of topics. They have explored the incidence of conditions across regions and over time. They have also examined cross-sectional and longitudinal patterns of treatment by organization, including comparing NHS and privately contracted providers or regions, both from descriptive and analytical perspectives. Regional comparisons have included evaluating the impact of clinical evidence or guidelines as well as health care policies. They have examined the outcome of medical as well as surgical therapies (such as survival, short-term postoperative mortality, complications, reoperation and hospital readmissions), with some seeking to identify factors that are associated with these outcomes, in terms of both patient characteristics and organizational factors such as surgical volume or day of week. Methodological studies include creating coding frameworks, applying comorbidity scores, developing risk prediction models and using look-back methods to impute missing data items. Many high profile routinely produced reports on the quality of secondary care are based on HES APC data. These include hospital mortality monitoring reports produced by NHS Digital and commercial organizations, and research reports by independent think-tanks and Royal Medical Colleges. Strengths and weaknesses


Data resource basics
Scope Hospital Episode Statistics Admitted Patient Care (HES APC) data are collected on all admissions to National Health Service (NHS) hospitals in England. HES APC also covers admissions to independent sector providers (private or charitable hospitals) paid for by the NHS. 1 It is estimated that 98-99% of hospital activity in England is funded by the NHS. 2 A hospital admission includes any secondary care-based activity that requires a hospital bed, thus including both emergency and planned admissions, day cases, births and associated deliveries. HES APC does not cover accident and emergency (A&E, emergency department) attendances or outpatient bookings; these data are held in separate HES databases. All HES databases are collated and curated by NHS Digital (previously the Health and Social Care Information Centre). In the financial year 2014/15 (April to March), 18 731 987 hospital episodes from 451 different NHS hospital trusts (known as 'providers') were recorded in HES APC. 3

Purpose of data collection
The need for national data collection on hospital activity to inform management and planning of services was first recognized in the early 1980s by a Department of Health working group. 4 Following these recommendations, a national programme was progressively rolled out, starting in 1987 and obtaining continual national coverage by (financial year) 1989/90. 5 Since 2004/05, HES APC has also served as the basis for 'Payment by Results' (PbR), a payfor-performance system of secondary care reimbursement in the NHS internal market. 6 Structure HES APC data files are structured according to financial years. Each row in HES APC indicates a 'Finished Consultant Episode' (FCE). An FCE represents a continuous period of care under one consultant, and each is specified with a start and an end date. Episodes are labelled as 'finished' and entered in HES APC according to the financial year in which they end. Consequently, episodes that start in one financial year and end in another will be classified as unfinished in the starting financial year, and finished in the ending financial year. Unfinished episodes need to be removed before analysis to prevent double counting.
A hospital admission in HES APC is referred to as a 'spell', defined as an uninterrupted inpatient stay at one hospital. A spell may include several FCEs if the patient was seen by multiple consultants during the same stay, but does not include transfers between hospitals. If a patient is transferred to a different hospital, a new spell begins.
In order to identify and measure continuous hospital stays, which include transfers to other hospitals, continuous inpatient spells (CIPs) need to be derived. Although CIP identifiers are not provided in standard HES APC extracts, methods for linking FCEs into CIPs are available, 7 including that recommended by NHS Digital. 8

Research uses
HES APC has been frequently used for research and service evaluation, due to its universal coverage, long period of data collection and the ability to follow individuals over time. HES APC offers the opportunity to estimate population-based admission and procedure rates by condition and type of procedure, compare hospital performance and create hospital-based cohorts for short-or long-term follow-up. Since HES APC covers all births in NHS hospitals, representing 97.3% of births in England, 9 it is also possible to create nationally representative birth cohorts.

Processing cycle and frequency of data collection
Upon discharge from the care of a particular consultant, the treating clinician completes a discharge summary for the patient of diagnoses made and procedures carried out during that FCE (where procedures include surgery, diagnostic imaging, ventilation and infusion/transfusion therapy). Discharge summaries are forwarded to a clinical coding department in the hospital, who enter the information onto the local electronic patient information database. Clinical coders undergo nationally accredited training programmes and follow standardized rules for translating information on discharge summaries into clinical codes. 10,11 Every month, data are extracted from local hospital databases to the Secondary User Service (SUS), a national data warehouse housed within NHS Digital. 12 Data from the SUS are extracted both for purposes of hospital reimbursement under PbR, and separately to create a provisional monthly HES extract. NHS Digital carry out basic data checks and cleaning, add geographical fields based on patient postcodes, and attach pseudonymized patient identifiers ('HESIDs') to each episode. 13,14 At the end of each financial year, NHS Digital allow hospitals one further data submission to HES (the 'Annual Refresh'), after which a provisional annual HES extract is produced for final review by hospitals. Once the Annual Refresh has been checked, a final annual HES dataset is made available. 12 Linkage within HES APC From 1997/98 onwards (when patients' NHS numbers became a mandated return from hospitals), HES APC episodes have been linked longitudinally to the same patient by tagging episodes with the HESID. This alphanumeric variable allows patient follow-up, yet avoids the need for supplying patient identifiers to researchers. The methods used to generate the HESID have been described elsewhere. 15  HES APC is also routinely linked to a number of external datasets. The Clinical Practice Research Datalink, 16 a large UK primary care database, is linked to HES APC on a monthly basis. HES APC is linked to dates and causes of non-hospital deaths from the Register of Deaths in England and Wales held by the Office for National Statistics (for deaths registered since 1 January 1998), also on a monthly basis. 17 Only deaths of patients recorded in HES APC are available through this linkage (i.e. deaths of persons who have not had a hospital admission since April 1997 are not included).
NHS Digital also provides a trusted third-party bespoke linkage service, through which secondary users can request that HES APC data be linked to other external datasets. For example, both national disease registries (such as the National Joint Registry 18 and the UK Renal Registry 19 ) and well-established cohort studies including Whitehall II 20 and the Hertfordshire Cohort Study 21 have been linked to HES APC. Secondary users need to obtain the appropriate approvals to enable these linkages.

Measures
Clinical and patient data HES APC provides detailed clinical, demographic and organizational information for each FCE (see Table 1), with 270 variables available in the core dataset. Apart from data on diagnoses and procedures, HES APC contains information on dates of admission, operations and discharge, admission method (e..g. emergency or planned), care provider and many geographical variables mapped from a patient's postcode. The local health geographies and hospital providers in England have changed several times since 1997, and thus care needs to be taken to ensure continuity when carrying out local or provider level analyses that use HES APC data covering many years.
Socioeconomic status is measured by the Index of Multiple Deprivation 2004 (IMD), a small area-based indicator constructed from several different measures of deprivation. 22 IMD is measured at Lower Super Output Area (LSOA) level, where an LSOA contains between 400 and 1200 households. 23 Individual-level measures of socioeconomic status (e.g. education level or income) are not available. Detailed information on variables available, specific cleaning rules and coding used are available in the HES APC Data Dictionary provided by NHS Digital. 24 Diagnoses are coded using the International Classification of Diseases version 10 (ICD-10). 25  . Each FCE has one primary diagnosis, which accounts for the majority of the length of stay of the FCE. The other diagnoses are referred to as comorbidities. According to NHS Digital cleaning rules, each FCE must have at least one primary diagnosis, although it may be recorded as unknown (ICD-10 code R69).
Operations and other interventions are coded using a UK-specific system, the Office of Population Censuses and Surveys Classification of Interventions and Procedures (OPCS, currently version 4.7). 26 This has evolved over time as new techniques and technologies have been introduced. A history of versions in use is available from the NHS Digital coding standards website. 26 Each FCE may have up to 24 operations recorded (up from 4 before April 2002 and 12 in April 2002-March 2007), but procedure fields are left empty if patient management did not require an intervention covered by OPCS (e.g. where the primary treatment was a drug regimen or observation). A primary procedure is selected for each FCE as that which is the most resource-intensive, but a procedure may be described using more than one code to indicate surgical approach, anatomical location and side of procedure (e.g. stent placed under radiological control in femoral artery of left leg). Dates are also entered for each procedure.

Birth and delivery information
Each birth event in HES APC generates at least two FCEs: one delivery episode and one or more birth episodes. Each delivery and birth episode includes an additional 'maternity tail', with detailed fields including the baby's birthweight, gestational age, birth order (for multiple births), mode of delivery and maternal age ( Table 1). The maternity tail is based on information entered via local maternity databases. Unlike the diagnostic and procedure fields, the maternity tail data fields use HES-specific categories rather than standardized classifications, and it is not a mandated return to NHS Digital. This leads to large variations in data completeness and quality. 27,28 It is not possible to directly link a mother and a baby in HES APC; that is, the mother's HESID is not copied to the baby's birth record. However, linkage between mother and baby is possible using probabilistic methods. 29

Hospital use in England
Both numbers and rates of hospital admissions have increased during the period of HES APC data collection ( Figure 1), particularly among older adults (aged 60-74 and 75þ). Between 1998/99 and 2014/15, the overall FCE rate has increased by 40% from 24.5 per 100 person-years to 34.3 per 100 person-years, with the steepest increase (73.0%) in adults aged 75þ.
Since HES APC covers all hospital admissions, infants and older adults (aged 65þ) are over-represented in HES APC compared with the general population of England (Table 2).

Data resource use
Although no up-to-date bibliography of published research based on HES APC is curated by the data providers, a 2013 systematic review identified 148 articles using HES APC data published between 1989 and July 2011. 30 We carried out a subsequent search on PubMed on the 8 June 2016 using the search term 'Hospital Episode Statistics' for article abstracts published since July 2011. We identified 264 relevant publications where the primary analysis involved the use of HES APC data, and a further 130 papers where HES data had been linked to cohorts created in other datasets. The annual number of publications using HES APC data has increased from 2 in 1993 30 to 88 in 2015.
Published studies using HES APC data have covered a diverse range of topics. They have explored the incidence of conditions across regions and over time. 31,32 They have also examined cross-sectional and longitudinal patterns of treatment by organization, 33 including comparing NHS and privately contracted providers 34 or regions, 35,36 both from descriptive and analytical perspectives. Regional comparisons have included evaluating the impact of clinical evidence 37 or guidelines 38 as well as health care policies. 39 They have examined the outcome of medical as well as surgical therapies (such as survival, 40 short-term postoperative mortality, 41 complications, 42 reoperation 43 and hospital readmissions 44 ), with some seeking to identify factors that are associated with these outcomes, in terms of both patient characteristics 45,46 and organizational factors such as surgical volume 47 or day of week. 48 Methodological studies include creating coding frameworks, 28 applying comorbidity scores, 49 developing risk prediction models 50 and using look-back methods to impute missing data items. 51 Many high profile routinely produced reports on the quality of secondary care are based on HES APC data. These include hospital mortality monitoring reports produced by NHS Digital 52 and commercial organizations, 53 and research reports by independent think-tanks 54 and Royal Medical Colleges. 55

Coverage
The key strength of the HES APC database is its universal coverage, which provides an unselected sample of hospital episodes. The large size of HES APC makes it possible to precisely estimate admission rates and capture outcomes for rare conditions, including congenital anomalies or specific cancers.

Longitudinal linkage
Another strength is the possibility to longitudinally link patients using the HESID, allowing for the creation of HESbased cohort studies if a suitable inception date can be identified. The long period of data collection of HES (currently up to 19 years) allows long-term follow-up of admitted patients, which has allowed the development of risk prediction models for distal outcomes. 44 Standardized coding ICD-10 coding of clinical diagnoses offers the opportunity to use HES APC for international comparisons of secondary care use. Since ICD-10 is used in hospital administrative data across the UK, Europe, Canada, Australia and New Zealand, HES APC has been used to assess the impact of differential health policy between NHS systems and internationally. [56][57][58] International studies using HES APC include cross-country comparisons of the incidence of neonatal abstinence syndrome 59 and non-small cell lung cancer. 60 Nonetheless, international comparisons are challenging due to differences between countries in admission thresholds, organization of care provision, and whether secondary care is free at point of use or requires health insurance or other payment.
HES APC episodes are readily linked to information on costs of care, due to the ability to match each episode to a Healthcare Resource Group, and hence a unit cost. 61 This makes HES APC an important data resource for health economics. [62][63][64]

Coding variation
One of the key challenges in interpreting HES APC is the reliance on diagnostic and procedure codes for identifying study participants and outcomes. Despite centrally issued coding rules, clinical coders rely on the quality and detail of completed discharge summaries to enter data consistently. Consequently, diagnostic coding practices vary between hospitals, particularly for comorbidities. 65 Since the roll-out of PbR, financial incentives now exist for hospitals to improve coding depth in order to ensure accurate reimbursement. This has led to an increase in the number of diagnostic codes used and improvements in coding accuracy. 7,66 The introduction of PbR therefore poses challenges for interpreting time-series studies using HES APC data, and care must be taken to not overinterpret results identifying increasing complexity of cases admitted. 7

Sensitivity to admission thresholds
Since HES APC covers only admitted patients, it is sensitive to variation between hospitals or over time in admission thresholds. The introduction of the four-hour waiting target in A&E departments in 2004 has been suggested as a contributing factor for the increase in rates of emergency admissions in children during the 2000s. 67,68 Changes in thresholds for emergency admissions can be examined using linked HES A&E data; 69 however, variation in admission thresholds for planned procedures cannot readily be determined using HES datasets.

Missing data
Although age, sex and clinical characteristics are well completed in HES APC (see Table 2), data on ethnicity are not. Ethnicity has been a mandated return for all NHS contacts since 1991. Although ethnicity recording has improved over time, the proportion of patients with a known ethnicity recorded was still only 85% in 2011, up from 41% in 1997. 70 Further, there is a high proportion of missing data in the maternity tail fields (see Figure 2). Postcodes were not extracted from the SUS for birth episodes prior to 2013/14, which means earlier birth episodes cannot be mapped to geographical variables, including the Index of Multiple Deprivation (IMD). 71 As an example, completeness of the IMD decile variable for singleton birth episodes in 2012/13 was 7.8%, compared with 81.9% in 2013/14.

Quality of internal linkage
The HESID linkage algorithm relies heavily on the accurate recording of NHS number across all hospital episodes to avoid missed matches (FCEs that have failed to link to a patient). Consequently, there is a substantial proportion of missed matches in HES APC. A recent estimate puts the HESID missed-match rate at 4%, 72 leading to an underestimation of readmission rates by 3.8%. NHS numbers were not provided at birth until 2002, meaning that linkage within HES APC and to other HES and external datasets is not reliable for births before 2002/3. 73

Scope limitations
HES APC covers higher dependency (HDU) or intensive care unit (ICU) periods, but it does not contain 'flags' to identify such stays, nor detailed information on level of care or HDU/ICU interventions. A separate HES dataset covers adult critical care from 2008/09, 74 whereas data relating to neonatal or paediatric intensive care are collected through systems external to NHS Digital.
Data on drugs prescribed through hospital pharmacies to inpatients are not available in HES APC. There is currently no national individual-level hospital prescribing database for England.

Opt-outs
Patients who do not wish their records to leave NHS Digital can lodge a 'type 2 opt-out' with their primary care practice. 75 From 29 April 2016, any records (including in previous financial years) relating to persons who have opted out in any NHS Digital dataset (including HES APC) will therefore be removed before supply to secondary users. Overall, for the 2014/15 HES APC annual extract, 2.3% of episodes will be removed, with substantial geographical variation in opt-out rates. 75

Data resource access
Access to HES APC data is provided by NHS Digital for the NHS, government, researchers and commercial health care bodies. Those requesting an extract of the data must show that their work will support health and social care and improve health. 76

Acknowledgement
We are grateful to Professor Ruth Gilbert for her helpful input regarding this paper.