Metrics of sleep apnea severity: beyond the apnea-hypopnea index

Obstructive sleep apnea (OSA) is thought to affect almost 1 billion people worldwide. OSA has well established cardiovascular and neurocognitive sequelae, although the optimal metric to assess its severity and/or potential response to therapy remains unclear. The apnea-hypopnea index (AHI) is well established; thus, we review its history and predictive value in various different clinical contexts. Although the AHI is often criticized for its limitations, it remains the best studied metric of OSA severity, albeit imperfect. We further review the potential value of alternative metrics including hypoxic burden, arousal intensity, odds ratio product, and cardiopulmonary coupling. We conclude with possible future directions to capture clinically meaningful OSA endophenotypes including the use of genetics, blood biomarkers, machine/deep learning and wearable technologies. Further research in OSA should be directed towards providing diagnostic and prognostic information to make the OSA diagnosis more accessible and to improving prognostic information regarding OSA consequences, in order to guide patient care and to help in the design of future clinical trials.


Introduction
Obstructive sleep apnea (OSA) is a common chronic condition that is associated with neurocognitive impairment, hypertension, and incident cardiovascular and cerebrovascular disease [1]. Using conventional polysomnographic measures and thresholds for abnormality, OSA has been estimated to affect up to 1 billion people worldwide [2]. While most of these cases are undiagnosed, it is likely that most are asymptomatic or minimally symptomatic, and it remains uncertain whether these individuals have a disorder that requires therapeutic intervention [3]. OSA has historically been defined and quantified primarily by the frequency of apneas and hypopneas during sleep (apneahypopnea index, AHI), although the use of this metric has been challenged on both methodologic and pathophysiologic grounds [4][5][6]. Development of this Research Statement was motivated by a growing recognition of the limitations of the AHI to predict adverse effects of OSA and to predict responsiveness to treatment. Given the myriad of OSA-associated conditions across multiple biological systems, one might expect the optimal metric of OSA severity to differ depending on the outcome of interest [7,8]. In this Research Statement, we discuss the development of current definitions of apneas and hypopneas, the strengths and weaknesses of the AHI as a measure of OSA severity, and consider the possible added value of new or emerging metrics that can be derived from the PSG. This document is not intended to be used as a practice standard, nor is it intended to provide an exhaustive review of the available literature, but rather to provide historical context and recommendations for research needed to optimize OSA diagnostic and severity metrics. Important topics such as central sleep apnea and hypoventilation are outside of the scope of this document.

Defining the metric
A metric is defined as a system for measuring something or a standard of measurement. In medicine, metrics help us differentiate disease states from normal, as well as categorize the severity of illnesses. The use of metrics in health care has become increasingly important to define not only disease states, but also to measure outcomes and then to improve care continually. As stated by Blumenthal and McGinnis: "If something cannot be measured... it cannot be improved" [9]. The metrics that have been traditionally used to define sleep-disordered breathing syndromes derive from the number of breathing events that occur during sleep. In the 1970s, healthy subjects were studied as part of a control group and sleep-disordered breathing event rates were calculated. In these healthy controls, a breathing event rate of <5 apneas per hour was determined to be the threshold and hence became the standard for defining "disease" from "no disease" [10].Of note, the apneas with this early definition were not differentiated by type of apnea (obstructive, mixed, central). Lugaresi et al. suggested that more than 30 apneas during the night was the best discriminator of normal vs. abnormal [11,12]. Block et al. and later Gould et al., introduced the concept of hypopneas to identify episodes of reduced breathing that were felt to be physiologically important due to an associated drop in oxygen saturation or arousal [13][14][15]. Inconsistency in the definition of sleep-disordered breathing events, especially hypopneas, has been a widely recognized concern. Ultimately, the threshold to define disease vs. no disease may well evolve based on interventional studies, as has occurred with cholesterol and blood pressure and other cardiovascular risk factors.

History of polysomnography
Polysomnography (PSG) evolved from electroencephalography (EEG), first described by Berger in 1929 and applied to the study of sleep by Loomis in 1937 [16,17]. In 1957, Dement and Kleitman described sleep cycles and proposed a sleep classification schema [18,19]. Breathing sensors were first described in the 1960s and Gastaut described obstructive breathing events in which intermittent episodes of upper airway obstruction were noted with continued respiratory effort in patients with the Pickwickian Syndrome [20]. Cardiac signals were later added and the term "polysomnography" was used to describe the measurement of sleep utilizing a variety of body sensors.
In the early years of PSG, thermal sensors (thermistors and thermocouples) were used to measure airflow in a semiquantitative way. These could be placed over both the nose and mouth to detect the temperature changes in breathing as a surrogate for airflow. In 1997, Norman et al. reported using a standard nasal cannula connected to a pressure transducer to detect airflow [21]. This resulted in improved sensitivity to changes in airflow and allowed the demonstration of "flow limitation" characterized by flattening of the nasal cannula-pressure transducer signal. Respiratory effort was initially measured using mercury strain gauges around the thorax and abdomen, but at present, the most commonly utilized method is respiratory inductance plethysmography (RIP), which provides a more quantitative measure of effort and is able to provide an alternative flow signal when using the sum of the chest and abdominal belts. Some montages utilized carbon dioxide sensors and esophageal pressure transducers as measures of ventilation and respiratory effort, respectively. Pulse oximeters were subsequently used to measure oxygen saturation (initially from the ear) and oxygen saturation has now become a standard part of the polysomnography montage. Combining thermal sensors and a nasal cannula-pressure transducer device is the current recommended standard for measurement of airflow. While other effort measures are available, RIP has become the standard methodology for respiratory effort measurement.

Defining respiratory events
The term obstructive sleep apnea syndrome (OSAS) was introduced in 1976 with findings of daytime hypersomnolence and polysomnographically proven obstructive apneas [22]. That article referred to the following definition: "An apnea has been defined as a cessation of airflow at the nose and mouth lasting at least 10 sec," which derived from an earlier study in 1975 [23][24][25][26]. The 10-second rule for scoring respiratory events was based on the average amount of time that would lapse if two regular breaths were skipped with the subject breathing at the usual respiratory rate. This definition was felt to differentiate pathological apneas from breathing in normal adults, as the authors noted: "Isolated, brief apneic episodes and spirographic abnormalities may appear normally at the onset of sleep and during REM sleep." [21,26,27].
As noted above, hypopneas were first described by Block et al. as events of shallow breathing causing oxygen desaturation [13] as "flows in the nose and mouth decreased, chest movement decreased, and desaturation occurred; (see Figure 1 for timeline of hypopnea definitions). Desaturations were thought to be clinically noteworthy when a fall of 4% or greater from the preceding baseline occurred." Block and his colleagues had used a similar oxygen definition on previous studies and observed that it "could easily be seen on review of long tracings." The first case of sleep hypopnea syndrome, with frequent hypopneas but no apneas and clinical symptoms similar to obstructive sleep apnea syndrome was described in 1988 by Gould and colleagues [15].
Initially, OSA was defined by 30 apneas over the course of the night, which later evolved to the apnea index (the number of apneas divided by hours of sleep), where a cutoff was set at 5 apneas/hour to diagnose the disease [25]. However, as hypopneas became part of the spectrum of sleep-disordered breathing, substantial variability was noted in the definition of hypopnea. In one of the largest sleep research projects of the 1990s, the Wisconsin Sleep Cohort study (WSC), hypopneas were defined as clear decreases in the amplitude of a calibrated RIP signal accompanied by a 4% oxygen desaturation [28]. The Sleep Heart Health Study (SHHS), a large multicenter investigation, recognized the variability in hypopnea definitions that were in use. The authors used the ability of computerized PSG systems to combine data from airflow, oximetry, and EEG, calculated multiple measures of hypopnea based on a decrease in airflow or thoracoabdominal excursion of at least 30% of baseline for 10 seconds or more accompanied by variable degrees of oxygen saturation or the presence of EEG evidence of arousal, and demonstrated the dramatic impact of varying hypopnea definitions on the calculated AHI [29,30].
In an early attempt to standardize event definitions, an American Academy of Sleep Medicine (AASM) task force in 1999 published a recommendation that apnea and hypopnea be considered equivalent, and that the apnea/hypopnea event be defined as "a clear decrease (≥50%) from baseline in the amplitude of a valid measure of breathing during sleep" or "a clear amplitude reduction of a validated measure of breathing during sleep" that does not reach the 50% threshold "but is associated with either an oxygen desaturation of >3% or arousal." [31]. The 1999 AASM task force also defined a more subtle breathing abnormality, the respiratory event-related arousal (RERA), with the following: "These events must fulfill both of the following criteria: (1) Pattern of progressively more negative esophageal pressure, terminated by a sudden change in pressure to a less negative level and an arousal and (2) The event lasts 10 seconds or longer." They also recommended severity criteria based on degree of sleepiness and frequency of respiratory events, with the following severity grades based on the number of obstructive breathing events per hour: It should be noted that these recommendations were based on expert consensus, with an acknowledgment that the data on which to base these event definitions and severity measures were limited. Ultimately, these thresholds may well be revised as data from interventional studies help to characterize those patients most likely to benefit from therapy. Additional variables beyond simply counting events will likely be required, analogous to using C reactive protein (hsCRP) as a guide to cholesterol-lowering.
The definitions of respiratory events have continued to evolve. The AASM Manual for the Scoring of Sleep and Associated Events was published in 2007 [32] with redefined rules for the respiratory events. An apnea was scored as a drop in the peak thermal sensor excursion by 90% of baseline, an event that last at least 10 seconds and at least 90% of the event's duration meets the amplitude reduction criteria for apnea. The apneas were then classified by the presence or absence of inspiratory effort. The Recommended rule for hypopneas included a nasal pressure signal excursion drop by ≥30% of baseline for at least 10 seconds with a ≥4% desaturation from pre-event baseline, and 90% of the event duration must meet the amplitude reduction. The Alternative rule for hypopneas included two primary differences: a 50% reduction in nasal pressure signal and that the event is associated with either a ≥3% desaturation or an arousal. A RERA was defined as at least a 10-second event that did not meet the definition of hypopnea and was characterized by increasing respiratory effort or flattening of the nasal pressure waveform

2007
•AASM Scoring Manual (v 1.0) sets 2 rules for hypopneas: "Recommended" (30% AF reducƟon + 4% oxygen desaturaƟon) and "AlternaƟve" (3% oxygen desaturaƟon and/or arousal) (31)) 2012 •AASM changes "Recommended" to 30% AF reducƟon + 3% oxygen desaturaƟon and/or arousal; adds an "OpƟon" to report hypopneas with a 4% desaturaƟon (26) leading to an arousal from sleep [33,34]. While ideally the RERA is defined via use of an esophageal manometer, nasal pressure and inductance plethysmography are alternate measurement tools [33][34][35]. Version 2 of the AASM Scoring Manual was released in 2012 with a major shift in the Recommended hypopnea definition to a ≥30% drop in the flow signal for 10 seconds associated with either a ≥3% oxygen desaturation or an arousal, with the option to additionally report hypopneas using a definition requiring an associated ≥4% desaturation [27]. This change reflected a recognition that some patients whose events were not associated with a 4% desaturation were symptomatic and would benefit from treatment of OSA. It was acknowledged that these two alternative event definitions yielded very different measures of OSA severity, and that "thresholds for identification of the presence and severity of OSA, and for inferring health-related consequences of OSA, must be calibrated to the hypopnea definition employed [27]." However, no such change in recommended severity grade based on event frequency has been recommended by the AASM. The 2020 definitions from the AASM scoring Manual v2.6 for apneas, hypopneas and RERAs are generally unchanged from the 2012 definitions.
Just as the evolution and variation in respiratory monitoring technology and inconsistency in event definition has caused confusion and complicated comparisons across studies, the terminology for frequency of respiratory events has also been inconsistent. Early studies reported the frequency of apneas per hour of sleep as the apnea index (AI), but with the addition of the hypopnea as a physiologically relevant event, the term apnea-hypopnea index (AHI) came into use. At the same time, the term respiratory disturbance index (RDI) was also being used synonymously [27]. The 2007 AASM Scoring Manual defined the RDI to include RERAs in addition to hypopneas and apneas, measured per hour of sleep, thus formally differentiating it from the AHI [27,36]. Used commonly in early PSG when assessment of airflow and respiratory effort were less common, and still often reported today, the oxygen desaturation index (ODI) is defined as the number of transient falls in oxygen saturation per hour of sleep; the percent desaturation used to identify a desaturation event (typically 3% or 4%) must be specified.

Home sleep apnea testing
Technological advancements in sleep apnea measurement in recent years have been aimed at reliably measuring OSA in the unmonitored, home setting because of the greater convenience for the patient and the reduced cost of having the patient self-apply and record sleep and cardiopulmonary signals in the home. Home sleep apneas tests (also referred as portable / ambulatory monitoring) used for OSA screening and/or diagnosis range from devices with 1-2 sensors (e.g. pulse oximetry or airflow) to multi-channel devices with multiple sleep and cardiopulmonary signals [37][38][39][40]. The definitions used to score sleep-disordered breathing events by these devices vary widely depending on the technology used and the signals that are recorded making it difficult to compare data across devices [41]. Although home sleep apnea tests (HSATs) do not record the same set of signals as a full PSG, they all attempt to provide an index of OSA severity that is comparable to the PSG-derived AHI. Implementation of the AASM hypopnea definition that includes EEG arousal is not possible with most HSAT devices due to absence of EEG recording. However, some HSAT devices use surrogates for EEG arousal such as change in snoring, pulse rate change, or movement to identify hypopneas [42].
The majority of HSAT devices use recording time or valid signal time as the denominator instead of total sleep time (TST) in the calculation of AHI as they do not include EEG sensors to differentiate sleep from wake. Typically the difference in recording time versus TST is about 20% resulting in lower AHI values in the HSAT from this difference alone [43]. In order to differentiate indices that use recording time (including both sleep and wake periods) and total sleep time (EEG defined sleep time only), the AASM recommended use of the term Respiratory Event Index (REI). However, an increasing number of HSATs are able to distinguish sleep from wake using limited EEG or alternate technology (such as a combination of arterial tonometry and other physiological signals) and provide an estimated measure of TST [40,[44][45][46]. Despite these differences in technology and scoring, validation studies comparing HSAT to in-lab PSG data demonstrate reasonable agreement between OSA severity indices and adequate diagnostic sensitivity/specificity for OSA [47][48][49][50]. Because HSATs are less invasive, record data in the patient's natural environment and have the ability to record multiple nights (integrating physiological night-to-night variability of OSA), these data may theoretically provide a more reliable measure of sleep-disordered breathing severity than a single night in the laboratory.
While most portable sleep testing devices include a sensor to measure airflow, some alternative technologies such as peripheral arterial tonometry (PAT) have been used to identify sleepdisordered breathing events without use of airflow. The indices of OSA severity obtained using typical PAT devices, referred to as pAHI and pRDI have been reported to be equivalent to PSGderived AHI [45,51].
Severity of residual OSA while patients use PAP therapy is measured using airflow sensors built into PAP devices. Without the ability to measure sleep, PAP devices use the recording time as the denominator for an AHI that is based on airflow change alone. Differences between PAP device-derived AHI and PSG have been described, although the clinical significance of these differences is unclear [52].

Strengths and weaknesses of the AHI
In this section, we address the ability of the AHI to predict clinically relevant correlates of OSA, including patient-reported outcomes of daytime sleepiness and quality of life, motor vehicle and industrial accidents, hypertension, diabetes mellitus, CHD, stroke, heart failure, and death. In addition to the limitations of the AHI resulting from inconsistent methodology described above, concern has been raised that the AHI fails to capture the physiological abnormalities accurately that underlie its neurocognitive, metabolic, and cardiovascular effects. Among the criticisms of the AHI is that it explains little of the variance in these symptoms or disease outcomes. In considering the limitations of the AHI as a predictor of these outcomes, it is important to recognize three potential sources of limited predictive ability: 1. Precision with which the AHI reflects the true OSA-related exposure that is the cause of adverse outcomes, due either to failure to reflect the operative pathophysiologic mechanisms accurately or to measurement error from night-tonight variability or scoring inaccuracy; 2. Individual differences in response to OSA, which may reflect multiple factors, including genetics, age, medication use, and comorbid conditions such as obesity, among others. These differences in response to OSA will limit the ability to predict outcomes even when exposure to airway obstructive events is measured precisely; 3. Competing (non-OSA) causes of outcomes of interest, which will impose an upper limit on the amount of variance of an outcome that can be predicted even when exposure to OSA is measured precisely.
Therefore, where available, the predictive ability of the AHI is compared to other established risk factors for the outcome of interest.

Daytime sleepiness
Excessive daytime sleepiness (EDS) has long been recognized as a cardinal symptom of OSA, as described by early reports in the 1970s [26,53]. Suggested mechanisms underlying EDS in OSA include sleep fragmentation [54], increased somnogenic circulating cytokines and intermittent nocturnal hypoxemia [55][56][57][58], the latter possibly leading to neural cell injury and apoptosis affecting wake-promoting regions of the brain [59,60]. However, EDS is not universally present in patients with OSA, as some alternatively report fatigue or lack of energy [61], and others no symptoms at all. Indeed, the majority of patients with OSA do not report EDS, as identified in several geographically diverse studies reflecting an EDS prevalence of approximately 40% in OSA [62,63]. EDS prevalence in OSA can vary across comorbidities, with relatively low prevalence in heart failure [64] and atrial fibrillation [65] compared to a higher prevalence in patients with asthma. [66] Epidemiologic data [67][68][69] from the SHHS and WSC [70] support monotonic relationships of increasing severity of OSA as defined by the AHI and increasing percentage of those with EDS, associations that were present even in milder degrees of OSA. These studies, however, differ in terms of sex-specific differences in sleepiness symptoms in those with OSA. While the SHHS did not support differences, the WSC showed higher degree of EDS in women versus men-specifically, 22.6% of women and 15.5% of men with an AHI-4% >5/hour reported sleepiness [70]. In these epidemiology studies, however, fewer than half of those with moderate-to-severe OSA (defined as an AHI-4% ≥15/hour) report excessive sleepiness (defined as Epworth Sleepiness Scale Score ≥11/24) [68,71]. Clinic-based studies appear to be overall consistent in significant associations with EDS, mainly ascertained by ESS, but in some cases with objective multiple sleep latency testing and maintenance of wakefulness testing, with increasing degree of OSA defined by the AHI [72][73][74][75][76][77][78]. Numerous treatment trials have demonstrated that treatment of OSA results in improvements in both self-report and objective measures of sleepiness. While severity of OSA assessed by AHI is associated with improvement in sleepiness in some studies, the baseline severity of sleepiness is a better predictor of improvement, indicating the importance of the individual response to OSA as a marker of disease severity [79].

Quality of life
Few studies have examined the relationship between OSA metrics and quality of life; however, a relationship between AHI and QOL appears present. In a study of 737 individuals in the community-based WSC, higher AHI-4% was associated with significantly lower scores on 6 of 8 SF-36 health status scores (mental health, vitality, physical functioning, social, physical role, general health perception) in a dose-response fashion [80]. For example, compared to subjects without OSA (AHI = 0) and after adjustment for confounders including BMI, general health perception was 3.6, 5.6, and 7.0 points less in patients mild, moderate, and severe OSA as defined by AHI (mean score in the cohort was 72.5). The decrements in general health perception associated with moderate and severe OSA are similar to those found with other common diseases such as arthritis (reduced by 7.3), hypertension (reduced by 3.5), and back problems (reduced by 4.4), though the values are less than those seen in diabetes or angina (12.8 and 13.2 respectively) [81]. These results are generally consistent with those from the SHHS cohort, in which participants with severe OSA (AHI-4% ≥30/hour) demonstrated significantly reduced quality of life across a variety of domains. However, in the SHHS, only the vitality score of the SF-36 demonstrated a consistent linear relationship with AHI [82][83][84]. CPAP therapy may improve some domains of generic QOL, particularly with respect to physical functioning. However, the impact of CPAP appears more robust for sleep specific quality of life instruments such as the SAQLI or FOSQ [85].

Motor vehicle crashes and occupational industries
There is a presumed link between OSA and the occurrence of motor vehicle crashes (MVC) [86], through the impact of sleep fragmentation on vigilance and reaction time. Although many other factors also contribute to the occurrence of MVC, e.g., the number of miles driven (measure of exposure), driver experience, sleep duration, circadian factors, and age, numerous studies have documented increased rates of MVC in patients with OSA. In a meta-analysis of 10 studies [87], patients with OSA (as assessed by standard indices such as the AHI) had a significant and substantially increased relative risk of MVC compared to those without OSA (RR = 2.43, 1.21-4.89, p = 0.01).
Evidence of an association between disease severity (as measured by the AHI) and rates of crashes is inconclusive. Three studies had sufficient data to generate a pooled estimate of the relationship between AHI and MVC risk in patients with OSA. In these three studies, there was a trend towards greater AHI in OSA patients who had a crash vs. those who did not by approximately 10/hour (standardized mean difference in AHI between groups = 0.27, p = 0.055). In eight studies not included in the analysis, three found that severity of OSA was associated with crash risk, but 5 did not. However, in a more recent community-based study (SHHS), a significant increase in MVC risk was noted with increasing AHI-4% (OR 1.15, 95% CI 1.07-1.26 for every 10 event/ hour increase in AHI-4%), after adjustment for age, sex, miles driven, usual sleep duration and excessive sleepiness (based on a score ≥11/24 on the Epworth Sleepiness Scale) [88]. The MVC risk associated with a 10-unit increase in AHI was slightly greater than the MVC risk associated with habitually sleeping one hour less per night.
Fewer studies have examined the impact of OSA on occupational injuries, and rates of occupational injuries are lower than MVC and highly dependent on job type/responsibilities. Nevertheless, studies have consistently demonstrated an increased risk of occupational injuries in OSA patients, as recently reviewed [89]. In this analysis of 7 studies, patients with OSA had an increased risk of occupational injuries (OR = 2.18, 1.53-3.1). In four of the studies, OSA was defined according to PSG or polygraphy; OSA was diagnosed based on threshold AHI (5-10/ hour). When only these studies were considered, OR = 1.78 (1.03-3.07), again consistent with a substantial risk associated with OSA as documented by AHI. In a subsequent study of 1109 workers sent for a sleep study (PSG), sleep apnea severity (log (AHI+1)) was significantly associated with occurrence of occupational injuries (OR = 1.31, 95% CI 1.02-1.73, p = 0.04) after controlling for confounders [90]. Patients with moderate-tosevere OSA had a rate double those without OSA (OR 1.99, 95% CI 0.96-4.44 and 2.00, 95% CI 0.96-4.49 for moderate and severe OSA groups, respectively); this increase in OR was similar to the increased odds ratio of workers in a physical/manual related industry vs. those who were not (OR = 2.28).
Patients with OSA adherent to CPAP have rates of MVC similar to that of individuals without OSA [91]; the extent to which CPAP therapy might reduce occupational injury risk is not clear. However, there is no evidence that the AHI metric per se predicts a reduction in crash risk with therapy.

Hypertension
OSA is strongly associated with both prevalent and incident hypertension, although this effect appears weaker in older populations [92][93][94][95][96][97][98]. In the WSC and the SHHS, mean systolic and diastolic blood pressure (among those not using antihypertensive medications) and prevalent hypertension increased linearly with OSA severity as measured by the AHI-4%, after adjusting for age, sex, and body habitus. The magnitude of these associations was large: in SHHS, despite a prevalence of hypertension of 43% in the referent category, AHI-4% ≥30 was associated with an adjusted OR of 1.47 compared to those with AHI-4% <1.5; in WSC, for an individual with a BMI of 30, the estimated OR for hypertension was 1.21, 1.75, and 3.07 for AHIs of 5, 15, and 30, respectively, compared to an AHI of zero. In the WSC, there was also a dose-dependent association of OSA, as measured by AHI, with incident hypertension. After adjustment for age, sex, and body habitus, compared to an AHI of zero, the OR for incident hypertension was 2.03 for AHI-4% 5-14.9 and 2.89 for AHI-4% ≥15/hour [95]. No significant association with incident hypertension was seen in the SHHS cohort, however [98]. A recent metaanalysis of observational studies found that both prevalent and incident hypertension increased with increasing severity of OSA as measured by the AHI [99]. Although comparisons of AHI to other hypertension risk factors have rarely been reported, in a study of 372 adults aged 68 (SD 1) year, multivariate regression found that severe OSA (defined as AHI-3% >30/hour) was more strongly associated with incident hypertension than was male sex or BMI ≥ 30 [96].
Numerous studies have found that treatment of OSA with CPAP lowers blood pressure [100][101][102]. Few studies have looked in detail at PSG predictors of blood pressure response to CPAP. One study found that higher baseline AHI was associated with a greater fall in BP with treatment [103], while another found that time at saturation <90% was predictive [104]. In general, however, there is no consistent relationship between AHI and degree of blood pressure reduction with OSA treatment [105][106][107][108]. Conversely, the severity of blood pressure elevation at baseline is strongly associated with blood pressure reduction following PAP therapy, with particularly large effects noted in patients with resistant hypertension, again emphasizing the importance of the individual response to OSA as a marker of disease [103,109].

Coronary artery disease
A strong cross-sectional association of OSA with coronary heart disease (CHD) has been reported. In several case-control studies from Sweden comparing patients with CHD to controls free of known CHD, OSA was independently associated with CHD in both men and women [110][111][112][113]. Peker et al. found that an AHI ≥10 had an adjusted odds ratio (aOR) of 3.1 (95% CI 1.2-8.3) for CHD, similar to the effect of diabetes mellitus in this sample (aOR 4.2, 95% CI 1.1-17.1), and greater than that of either hypertension or hyperlipidemia. Mooe et al. found that in men, an AHI ≥14/hour was associated with CHD with an aOR of 4.5, nearly identical to the effect of hypertension (aOR 4.2), diabetes mellitus (aOR 4.3), or a five-unit increase in BMI (aOR 4.8), and considerably stronger than a positive smoking history (aOR 1.6 for current or former smoking). In women, an AHI ≥5/hour was associated with CHD with an aOR of 4.1, greater than that of hypertension (aOR 3.4), smoking history (aOR 2.4), or BMI (not significant), although not as strong as diabetes (aOR 6.8). In a cross-sectional analysis of baseline data from the communitybased SHHS, a considerably weaker association of AHI with CHD was noted, with AHI-4% >4.4/hour having an aOR 1.2 after extensive covariate adjustment, with no increase in risk at higher levels of AHI [114].
Community-based prospective studies of the relation of OSA to incident CHD do not provide consistent results. Using the unconventional reference group of participants with AHI = 0, the Wisconsin Sleep Cohort Study found that among participants not using CPAP, the age-, sex-, BMI-, and smoking-adjusted hazard ratio (aHR) for incident CHD was 2.4 (95% CI 1.0-6.0) for those with AHI-4% ≥30/hour [115]. The aHR ranged from 1.6 to 1.8 for groups with AHI >0-<5, 5-<15, and 15-<30, and the overall trend was not statistically significant. In the SHHS, the age-, race-, BMI-and smoking-adjusted association of OSA with incident CHD was significant only in men, with excess risk limited to those with AHI-4% ≥30/hour [116]. After further adjustment for lipids, diabetes mellitus, hypertension, and blood pressure, the association of OSA with incident CHD was significant only in those under age 70. In the Busselton Health Study cohort, using a MESAM IV device (a type of HSAT) to measure respiratory events, moderate-to-severe OSA was not associated with incident CHD (for AHI ≥15/hour compared with AHI <5/hour, aHR 1.1, 95% CI 0.24-4.6) [117].
Clinic-based prospective studies have also yielded mixed results regarding the association of OSA with incident or recurrent CHD, and many are difficult to interpret due to analysis of CPAPadherent versus CPAP-non-adherent patients, with high risk of bias due to the healthy user effect. Several recent, high-profile randomized clinical trials in non-sleepy patients with elevated AHI have failed to demonstrate a reduction in CHD, stroke or death with PAP therapy [118][119][120].

Stroke
A recent meta-analysis (which included both 3% or 4% desaturation criteria) of 86 studies comprising 7096 stroke patients found that 71% had an AHI >5/hour and 30% had an AHI >30/ hour [121]. The prevalence was similar in studies performed within 1 month to >30 months following stroke. OSA is also associated with incident stroke in both community-based and clinical cohorts [117,[122][123][124][125][126][127][128][129]. In cohorts in which both outcomes have been studied, the association of OSA with incident stroke is considerably stronger than its association with CHD. The WSC, found that an AHI-4% ≥20/hour was associated with increased risk of first-ever stroke over a 4-year follow-up period (OR 4.3, 95% CI 1.3-14.2); the magnitude of effect was somewhat lower and not statistically significant after adjusting for BMI (3.1, 95% CI 0. 74-12.8), with wide confidence intervals due to the overall low stroke event rate in this relatively young cohort [122]. The SHHS found that risk of incident stroke was higher in men with moderate to severe OSA in adjusted analyses (aHR = 2.9, 95% CI 1.1-7.4) [123]. This effect was not observed in women. One of the first clinic-based studies designed to examine OSA and stroke found that at an AHI >5, incident stroke and mortality were increased by nearly 2-fold (HR 1.97, 95% CI 1.12-3.48) [125]. In a study of 392 patients with CHD who were screened for OSA, presence of an AHI-3% ≥5 was associated with an adjusted HR for incident stroke of 2.9 (95% CI 1.4-6.1), stronger than the risk associated with type 2 DM, hypertension, current smoking or atrial fibrillation [127]. A clear dose-response relationship between AHI and risk of stroke has generally not been observed, however. While prospective observational studies suggested a decreased risk of stroke in patients who were treated with CPAP compared to untreated patients [128,130,131], these studies have a high risk of bias due to comparison of CPAP-adherent versus non-adherent patients. Randomized clinical trials in patients with stroke have not demonstrated a reduction in stroke risk, although they have been small and of limited power. No reduction in stroke risk with CPAP was noted in large randomized trials of patients with cardiovascular disease and OSA, although these studies were also not powered to detect a change in stroke risk per se [119,120,128,132,133].

Mortality
Conventional OSA metrics of AHI and measures of hypoxemia are strongly associated with mortality risk in the general population. In the WSC, the Busselton Health Study, and the SHHS, after adjusting for age, sex, BMI, and prevalent medical conditions, mortality risk increased with AHI [134][135][136]. Although there was not always a clear monotonic increase with AHI, the adjusted mortality hazard was generally higher with increasing OSA severity as measured by AHI. In the WSC, e.g. among participants not treated with positive airway pressure, the adjusted hazard ratios for all-cause mortality in those with mild (AHI-4% 5.0-14.9/hour), moderate (AHI-4% 15.0-29.9/hour), and severe (AHI-4% ≥30.0/hour) OSA were 1.4 (95% CI 0.7-2.6), 1.7 (95% CI 0.7-4.1), and 3.8 (95% CI 1.6-9.0), respectively, compared to those with AHI-4% <5. In the SHHS, percent time at SpO 2 <90% was also associated with increased mortality, although less strongly than the AHI, while arousal index was not a predictor of mortality. Increased mortality has long been reported in untreated OSA patients in sleep clinic-based cohorts [137,138]. Where it has been evaluated, mortality risk in these cohorts increases with increased severity of OSA as measured by AHI [130,139,140].
The mortality risk associated with OSA compared to other mortality risk factors has been reported in a small number of studies. In the Busselton Health Study, the unadjusted HR for AHI ≥15 was 5.0 (95% CI 2.0-12.2), similar to that of diabetes (HR 4.0), a decade older age (HR 3.6), and current smoking (HR 3.8), and larger than that of a 10 mmHg increase in mean arterial pressure (HR 1.7) [136]. Two clinic-based studies from Spain have reported adjusted HR for OSA that are comparable to other important mortality risk factors, although these studies report the risks associated with OSA in patients who have declined treatment and must therefore be interpreted with caution. Martinez-Garcia et al. reported that compared to those with AHI-4% <15, the aHR for untreated moderate OSA (AHI-4% 15-<30/hour) was 1.4, for untreated severe OSA (AHI-4% ≥30/hour) aHR was 2.3, while smoking ≥30 pack-years was associated with aHR of 1.5, diabetes mellitus aHR of 2.3, and age aHR of 1.8 per decade [140]. Similarly, Campos-Rodriguez reported that compared to those with AHI-4% <10, those with untreated AHI-4% of 10-29 had an aHR of 1.6, and untreated AHI ≥30 had an aHR of 3.5, while diabetes mellitus was associated with an aHR of 1.4, hypertension an aHR of 2.4, and age an aHR of 1.6 per decade [139]. Both community-based and clinical cohorts suggest that the association of OSA with mortality is stronger in those under age 70 than in older adults [135,141]. It is unclear whether this reflects an increase in competing causes of mortality or a different physiological response to OSA in the elderly. No adequately powered randomized clinical trials of OSA treatment to reduce mortality have been conducted, and no significant reduction in mortality has been reported in cardiovascular secondary prevention trials of CPAP therapy [119,120].

Alternative metrics
The AHI has been the most commonly used metric of sleep apnea severity for decades, but methodologic problems described above are widely recognized. In addition, failure to quantify the mechanisms that underlie the pathophysiologic consequences of OSA adequately is likely to contribute to the limited ability of the AHI per se to predict clinical consequences of OSA or response to OSA treatment. Thus, alternative metrics of disease severity have been proposed based on advanced signal processing and other sophisticated analyses [142]. We summarize here some of these alternative metrics, recognizing many have been proposed and no ideal metric has yet emerged. Indeed, as OSA is now considered a heterogeneous disease both from the perspective of underlying mechanisms (endotypes) as well as in terms of clinical manifestations (phenotypes) [1,143,144], it is quite likely that no single metric will adequately characterize all aspects of OSA and its related risks (Figure 2).

Hypoxic burden.
There is general agreement that hypoxia, particularly when severe, has deleterious effects on cardiometabolic function. The degree of desaturation and the frequency of desaturation are commonly quantified, but now investigators have captured the area under the oxyhemoglobin saturation curve as a metric of hypoxic burden. Azarbarzin et al. reported that this measure of hypoxic burden was easily derived from an overnight sleep study and was predictive of mortality from cardiovascular disease in two community-based cohorts [145,146]. In both the SHHS and the Study of Osteoporotic Fractures in Men [147], cardiovascular mortality increased progressively with increasing hypoxic burden, an effect that was not diminished by adjustment for AHI or more conventional  polysomnographic measures of hypoxia, including minimum saturation or percent time at saturation <90%. In the SHHS, the upper quintile of hypoxic burden had an adjusted HR of 1.96 (95% CI 1.11-3.43) for cardiovascular mortality. By contrast, AHI was not significantly predictive of mortality from cardiovascular mortality in these cohorts. The findings suggest that not only the frequency but the depth and duration of sleep-related upper airway obstructions, are important disease characterizing features. Prior authors had quantified the T90 (duration of saturation below 90%), a less OSA-specific measure of hypoxic burden, which was also predictive of important outcomes, including platelet aggregation as well as overall mortality [148]. In contrast, recent post-hoc analyses of the SAVE (Sleep Apnea Cardiovascular Endpoints) study have shown minimal predictive value of desaturation indices from the standpoint of an incident composite cardiovascular outcome [149]. Of note, pulse oximeters have evolved over the years but vary in sensitivity and time constants etc. emphasizing the important of methodological details in yielding robust conclusions. 2. Arousal intensity: Amatoury et al. have quantified the arousal intensity as a potentially important physiological variable [150]. The authors hypothesized that arousal from sleep can vary in intensity with some arousals being quite subtle (and not captured by traditional EEG criteria), whereas others are more robust leading to complete awakening from sleep [151]. The average arousal intensity was not related to the magnitude of the preceding respiratory stimuli but was positively associated with arousal duration, time to arousal, rate of change in epiglottic pressure, and negatively with body mass index (R 2 > 0.10, p ≤ 0.006). The authors concluded that the average arousal intensity is independent of the preceding respiratory stimulus. This finding is consistent with arousal intensity being a distinct pathophysiological trait. Respiratory and pharyngeal muscle responses increase with arousal intensity. Thus, patients with higher arousal intensities may be more prone to respiratory control instability. Prior work on 'subcortical arousals' had noted fragmentation of sleep with important effects on daytime function but were not captured by traditional EEG criteria. Azarbarzin and colleagues compared arousal intensity with the change in heart rate associated with obstructive events in a sample of 20 PSGs from patients attending a sleep laboratory. They found a strong correlation between these measures for a given individual (average r: 0.95 ± 0.04), consistent with concept that arousal intensity is a marker of autonomic activation [152]. A heart rate response to obstructive events in the upper quartile of the population was associated with an increased risk of both fatal (HR 1.68, 95% CI 1.22-2.30) and non-fatal (HR 1.60, 95% CI 1.28-2.00) cardiovascular disease events in the Sleep Heart Health Study, and this risk was particularly great in those who also had a large hypoxic burden [153]. Of note, arousals can also be quite variable, subject to inter-observer and intra-observer variability. The autonomic response to arousal is likely an important factor underlying the pathophysiology of OSA complications [154]. However, the predictive value of this parameter for hard cardiovascular events is untested and will require further study. 3. Odds ratio product (ORP) is a more recent metric that quantifies sleep depth [155]. It is derived from quantitative analyses of the EEG by using various power spectral measures. ORP values can range from 0 to 2.5 with values of 0 to 1.0 predicting sleep and 2.0 to 2.5 predicting wakefulness. Although ORP can vary significantly within any particular stage of sleep, there is overlap in ORP value and different stages of sleep. As with any metrics of sleep quality, there is substantial night-to-night variability in ORP values [156]. ORP values have also been shown to improve with CPAP therapy in patients with OSA. ORP during sleep has also been related to excessive wake time and sleep depth in those with OSA and/or PLMs [157]. The correlation between the right and left hemisphere ORP measures (interhemispheric sleep depth coherence) may be a measure of susceptibility to adverse neurocognitive outcomes in sleep apnea. A recent study analyzing SHHS data found that interhemispheric sleep depth coherence in patients with sleep apnea predicted reported occurrence of car accidents 2 years after the sleep study. Those in the highest quartile of sleep depth coherence had a 57% lower risk of accidents compared to the lowest quartile, independent of important confounders including AHI, reported sleepiness and usual sleep duration [158]. Thus measures of ORP may predict cognitive consequences of OSA. However, its relationship to cardiovascular outcomes is untested. 4. Cardiopulmonary coupling (CPC). Thomas et al. developed an automated technique to assess CPC during sleep using a single-lead EKG signal [159]. From a continuous, singlelead electrocardiogram, the authors extracted both the normal-to-normal sinus interbeat interval series and a corresponding electrocardiogram-derived respiration signal. Employing Fourier-based techniques, the product of the coherence and cross-power of these two simultaneous signals was used to generate a spectrographic representation of cardiopulmonary coupling dynamics during sleep. This technique shows that non-rapid eye movement sleep in adults demonstrates spontaneous abrupt transitions between high-and low-frequency cardiopulmonary coupling regimes, which have characteristic electroencephalogram, respiratory, and heart-rate variability signatures in both health and disease. Using the kappa statistic, agreement with standard sleep staging was poor (training set 62.7%, test set 43.9%) but higher with cyclic alternating pattern scoring (training set 74%, test set 77.3%). The authors concluded that a sleep spectrogram derived from information in a single-lead electrocardiogram can be used to track cardiopulmonary interactions dynamically. This technique may provide a complementary approach to the conventional characterization of graded non-rapid eye movement sleep stages.
CPC derived measures are correlated with conventional metrics derived from PSG including AHI [160]. A number of studies have linked CPC measures to outcomes including predicting early response in depressed patients, improvements in sleep quality with CPAP therapy, upper airway surgery, and MAD [161][162][163][164]. While these data suggest potential usefulness of CPC further studies are needed to determine whether they add value to AHI and other available time/frequency analyses (e.g. cardiovascular entropy).
5. Apnea-hypopnea event duration. Event duration has been quantified by Butler et al. [165]. The authors analyzed data from the SHHS and observed a potentially important relationship between the duration of the respiratory events and the overall mortality seen in SHHS. The authors had previously shown the event duration to be heritable, although the mechanisms driving the duration of respiratory events are unclear. In theory, short respiratory events could reflect a low arousal threshold (propensity to wake up) but in addition an individual with unstable ventilatory control (high loop gain) may also terminate respiratory events more quickly than an individual with low loop gain. Regardless, the authors showed that short respiratory event duration was predictive of mortality in men and women. After adjusting for demographic factors (mean age, 63 years; 52% female), apnea-hypopnea index (mean, 13.8/hour; SD, 15.0), smoking, and prevalent cardiometabolic disease, individuals with the shortest-duration events had a significant hazard ratio for all-cause mortality of 1.31 (95% confidence interval, 1.11-1.54). The authors surmised that individuals with shorter respiratory events may be predisposed to increased ventilatory instability and/or have augmented autonomic nervous system responses that increase the likelihood of adverse health outcomes. Of note, however, short respiratory events are likely associated with less hypoxic burden compared to long respiratory events, leading to some inconsistency or complexity in the predictive value of various metrics. Given the interest in personalized medicine in OSA, the event duration may be one factor to consider when identifying patients at high risk of mortality from OSA.

Future directions
Although the AHI as determined from laboratory-based PSG has been considered the gold standard metric of OSA severity (see Figure 3; [135]), financial pressures and the scale of the problem have driven increased reliance on home sleep apnea testing. This change has been further accelerated by the COVID pandemic, in which many patients prefer the convenience and presumed safety of home based testing [166]. While OSA as identified using the AHI is strongly associated with neurocognitive, metabolic and vascular outcomes, it is clear that the AHI as a single metric is neither adequate nor sufficient to define the presence or characterize the severity of OSA. This notion is evidenced by the lack of reported symptoms in many patients with severely elevated AHI and by failure of the AHI alone to identify patients whose experience cardiovascular benefit from PAP therapy. We are therefore highly supportive of the development of novel techniques to capture OSA occurrence and to predict its complications. Given the varying biology underlying each organ system, we expect that the ideal metric for OSA severity will vary by the complication of interest. Such severity metrics will likely require some combination of measuring [1] the magnitude of the OSA stimulus (e.g. using measures of gas exchange abnormality such as hypoxic burden considering mechanism of obstructive vs. central apnea), [2] individual responses to the stimulus (e.g., assessment of the autonomic nervous system or EEG) and [3] individual response to therapy (e.g., improvement in sleepiness or reduction in blood pressure) [167]. Many approaches could be taken to quantify sleep apnea severity better. Notwithstanding the limitations of the AHI, it is important that, as novel metrics are developed, their ability to improve upon the AHI as markers of prognosis or predictors of response to therapy be formally tested and replicated across populations of interest. A number of strategies are proposed: 1. Evaluation of symptom subtypes. Severity of sleepiness was included in the 1999 AASM recommendations for classifying OSA severity, although in the absence of a reproducible standard for assessing sleepiness, this severity metric was not retained in subsequent recommendations. Consideration of the individual's symptomatic response to OSA therapy may be particularly important, however. Ye et al. showed via cluster analyses three distinct groups of OSA patients: those who are minimally symptomatic, those with disrupted sleep and those with EDS [63]. In the SHHS, increased risk of incident total cardiovascular disease, CHD, and heart failure was seen only in a cluster characterized by excessive sleepiness [143]. Similarly, in a different study, mortality risk was increased only in those OSA patients who reported excessive sleepiness [168]. It is speculated that the failure of recent clinical trials to demonstrate reduction in cardiovascular risk with PAP therapy may reflect the exclusion of sleepy patients from these trials. Thus, sleepiness may be a marker of individual response to OSA that also reflects susceptibility to the cardiovascular effects of OSA. This idea warrants further investigation of symptoms as a measure of OSA severity or as a metric of susceptibility to OSA complications. 2. Genetics. Genetic factors are likely to play an important role in these individual differences, as suggested by the trait-like behavior of individual vulnerability to cognitive impairment from sleep deprivation [169][170][171]. While OSA has been long been recognized to be a complex heritable trait, studies evaluating the genetic causes of OSA and its component endotypes have only recently begun. This situation reflects the lack of availability of sleep testing in many longitudinal cohort studies. Further investigation of the genetic architecture of OSA is strongly encouraged, as this approach is likely to help explain individual differences in susceptibility to OSA and its clinical consequences. 3. Blood biomarkers. Panels of biomarkers have the potential to identify causal pathways affected by OSA, and thus to provide important prognostic and predictive information [172]. For example, quantifying inflammation, autonomic function, and oxidative stress pathways should provide insights into the risk of OSA-associated cardiovascular disease. The assessment of microRNAs and exosomes has also led to important insights both in terms of OSA biomarkers and potential therapeutic targets addressing OSA complications [173][174][175][176]. In addition to such hypothesis-driven biomarker panels, hypothesis-free methods of biomarker discovery are becoming increasingly accessible to the sleep field. Metabolomic, lipidomic, proteomic, and gene expression profiles made possible by advances in mass spectroscopy, microarray, and other technologies are being investigated with the potential to identify new biomarkers of OSA [177][178][179][180][181]. In theory, these techniques could identify diagnostic tests for OSA, prognostic markers for sleepiness and other consequences of OSA, new therapeutic targets to prevent OSA complications, and markers that can be followed to monitor successful therapy. 4. Machine learning. Machine learning and other hypothesis-free deep learning methods that identify complex patterns in empirical data are gaining traction in various medical applications. In addition to application to biomarker discovery, these methods could be applied to sophisticated signal processing of polysomnographic and other data to identify previously unrecognized patterns, and thus complement the hypothesis-driven approaches to alternative metric identification described in the above section on Alternative Metrics. Such methods will require appropriate validation and training sets but are being greatly facilitated by availability of Big Data [182][183][184]. 5. Wearable technologies. Wearable technologies are becoming ubiquitous and provide new opportunities to gain insight into pathophysiological abnormalities related to sleep and sleep disorders. For example, data from one device support its role compared to PSG [185][186][187]. Ongoing studies are comparing various simplified technologies for OSA diagnosis, although none is yet able to replace PSG. Wearable technologies provide the opportunity to record data inexpensively for multiple nights over extended periods of time, and future studies should assess the value of these longitudinal data (with variable parameters from each device) to improve prediction of OSA-associated morbidity, including cardiovascular risk.
Novel measures that improve upon the AHI for the diagnosis and severity classification of OSA will be particularly transformative if they help to capture the variability in OSA endophenotypes. This concept may facilitate robust adaptive randomized clinical trials to be designed to allow patients at risk of particular complications and amenable to specific interventions to be studied rigorously. Novel OSA metrics should be developed with an eye toward making complex measurements accessible to clinical practitioners, so that scientific advances can be readily disseminated into practice to improve patient care.