Body Mass Index and Risk of Nonalcoholic Fatty Liver Disease : Two Electronic Health Record Prospective Studies

Pfizer Worldwide Research and Development (A.K.L., S.K., C.H., V.B., M.S.L., J.D.), Groton, Connecticut 06340-5159, and Pfizer Worldwide Research and Development, New York, New York 10017; British Heart Foundation Glasgow Cardiovascular Research Centre (D.P., J.M.R.G., P.W., N.S.), University of Glasgow, Glasgow G12 8TA, United Kingdom; and Cardiovascular, Metabolic and Dermatology Genetics Unit (D.W.), GlaxoSmithKline, King of Prussia, Pennsylvania 19406

N onalcoholic fatty liver disease (NAFLD) is currently the most common form of liver disease and abnormal liver function tests and its prevalence is increasing due to the rise in obesity (1).Progression of NAFLD may lead to nonalcoholic steatohepatitis (NASH), marked by inflammation of the liver, and can further progress to fibrosis and eventual cirrhosis.Although simple NAFLD is usually benign and does not frequently progress to more advanced stages of liver disease, because of its high prevalence it is an increasing public health concern and a leading cause of cirrhosis (2,3).The estimated prevalence of NAFLD and NASH in the general population varies based on diagnostic method: NAFLD prevalence is estimated to be between 6.3% and 33% and NASH around 3%-5% (4).Prevalence of NAFLD among patients with type 2 diabetes has been estimated as high as 69% based on ultrasound diagnosis (5).Accurate prevalence of NASH is more difficult to estimate because liver biopsy is required and thus systematic screening or prospective study in the general population is not possible (6).Recorded diagnosis of NAFLD is currently the only realistic option of attaining detailed data on incident NAFLD in large studies, accepting many cases of NAFLD will not be picked up due to lack of systematic screening in any country.
Although there is wide appreciation of the link between obesity and risk of both NAFLD and NASH, almost exclusively from cross-sectional studies, large prospective epidemiological studies linking obesity to incident NAFLD/NASH are limited, save a recent study in Chinese subjects in a 5-year follow-up of 5562 normal weight subjects (7), and previous evidence linking obesity with higher risk for incident cirrhosis (8).Such studies are important scientifically for clinical management strategies and for public health information.For example, previously reported data from large prospective studies suggest body mass index (BMI) more than or equal to 35 kg/m 2 is associated with an approximately 40-to 90-fold increased diabetes risk compared with reference groups with BMI less than 22-23 kg/m 2 (9,10).This provides benchmarks for clinical management and public health guidance beyond cross-sectional data, and as such these latter 2 studies are widely cited.One would anticipate that the association between BMI and incident NAFLD/NASH is likewise strong though the prospective evidence is limited.Obesity, type 2 diabetes, and insulin resistance are closely linked to NAFLD (11) and weight loss and increased physical activity are associated with reductions in liver fat (12,13).
In this study, using large prospective routinely collected data (comprising Ͼ2.1 million people), we report the relationship between BMI and risk of a "recorded" NAFLD/ NASH diagnosis using 2 large electronic health record (EHR) databases, additionally stratifying by diabetes sta-tus and sex, to explore issues potentially informative to future clinical guidelines on management of these conditions.Of course, many patients with NAFLD are not picked up in the real world due to a variety of reasons and, as such, our study also provides a useful estimate of NA-FLD recording in real clinical practice.

Databases and analytical sample ascertainment
This study was performed using EHR data from 2 large databases: The Health Improvement Network (THIN) database and Humedica EHR database.THIN is a United Kingdom primary care EHR data resource, including more than 12 million total patients, of whom more than 3 million are current patients.Patient data in THIN are collected from United Kingdom general practitioners (GPs) and represent approximately 6% of the total United Kingdom population.During each consultation between the GP/nurse and patient, all conditions and symptoms are recorded electronically using the Read Clinical Classification version 2. Participants are representative of the United Kingdom population by age, gender, and medical conditions (14).THIN data were acquired from Cegedim Strategic Data Medical Research (United Kingdom) which licenses the record-level anonymized data collected from the National Health Service for use in medical research.The analyses described herein included current and past patients with records available in the years of 2003-2013.The Humedica EHR database contains information on approximately 25 million patients, 7 million of whom have integrated outpatient and hospital records.Medical conditions are recorded using International Classification of Diseases, Ninth Revision (ICD-9) codes.Our analysis used data from GPs, specialty care and hospitalizations in the years of 2007-2013.Our prospective analyses were limited to patients with a recorded BMI measurement between 15 and 60 kg/m 2 , and between the ages of 20 and 85 years for Humedica and ages 20 and 90 years for THIN.Patients without any recorded BMI measurement or without recorded smoking status were excluded.One year of active patient status was required before baseline BMI information in an effort to ensure disease history was captured.Patients with a diagnosis or history of chronic disease (including cardiovascular disease, neurodegenerative disease, chronic respiratory disease, neoplastic disease, or fatty liver disease) before baseline BMI date were excluded from analysis in an effort to reduce the impact of chronic diseases commonly associated with weight loss or non-BMI based risk on our endpoint of interest.Lack of reliable information about patient alcohol intake prevented the specific exclusion of patients with a history of alcohol abuse that may have been inaccurately diagnosed with NAFLD/NASH.However, the coding of NAFLD/NASH diagnosis implies that potential excessive alcohol intake was considered, and rejected, as a cause of liver disease at the time the diagnosis was recorded.The final analytical sample and details on the inclusion/exclusion criteria for patients in our sample are described in Figure 1 and the supplemental data below.

Endpoints
The outcome endpoint for analysis was a recorded diagnosis of NAFLD or NASH (see Supplemental Table 1).In Humedica, the outcome endpoint was determined by the ICD-9 code 571.8 which is specific for nonalcoholic liver disease and includes both NAFLD and NASH.In THIN, the outcome endpoint was determined using a combination of Read codes for NAFLD and NASH.Diabetes category in the analytical sample was defined as a patient receiving a diabetes diagnosis (either type 1 or type 2) where the diagnosis occurred anytime earlier than 1 year after the baseline BMI date and before the diagnosis of NAFLD/NASH.Diabetes status was determined in Humedica by the ICD-9 code 250.xx, and in THIN by the read codes C10EXXX and C10FXXX.The vast majority of cases (ϳ90%) of diabetes are likely to be type 2 diabetes.

Statistical analysis
Cox proportional hazards ratios and 95% confidence intervals for the recording of a diagnosis of NAFLD/NASH were calculated across BMI categories in patients in the analytical sample.Patients were grouped in 10 BMI categories at baseline ranging between 15 and 60 kg/m 2 .BMI category 20 to less than 22.5 kg/m 2 was selected as the reference category.The proportional hazards assumptions for all BMI categories in all performed models were tested and confirmed.The statistical model was adjusted for age (continuous variable), sex (male or female), and smoking status (categorical variable for never smoker, former smoker, and current smoker).The analysis was conducted using all patients meeting the inclusion criteria and it was then repeated after stratifying patients based on diabetes status and sex, respectively.Stratified analyses allowed for an interaction term between diabetes or sex and the BMI categories.
In order to assess potential bias we compared the BMI distribution of the analytical sample in each database with the published BMI distribution for each country.All statistical analyses were conducted using R version 3.1.2.

Patient characteristics
More than 50% of patients in each database had a recorded measurement of BMI (Figure 1).Patient characteristics for the analytical sample are provided in Table 1 and Supplemental Figure 1.Age, sex, smoking status and prevalence of diabetes were broadly similar between the 2 databases.The median follow-up time in the analytical

Sex-stratified analysis
In the sex-stratified analysis (Figure 3), men had greater risk of NAFLD/NASH diagnosis compared with women at every BMI category.The BMI-adjusted HR for men compared with women was 1.58 in THIN (CI, 1.47-1.70)and 1.40 in Humedica (CI, 1.34 -1.46).Furthermore, the relative risk of men vs women increased with increasing BMI, although this interaction tested as a linear effect across categories was significant only for Humedica (P ϭ .0159).

Relationship between BMI and NAFLD/NASH in patients with or without diabetes
In the analysis stratified by diabetes status, when patients were compared with the diabetes status-specific reference BMI category, we observed a greater association of increasing BMI on NAFLD/NASH diagnosis in patients without diabetes compared with diabetes patients (Figure 4, A and C).In Humedica, patients without diabetes in the highest BMI category had a HR of more than 10 (HR ϭ 10.55; 9.12-12.20)(Supplemental Table 2) compared with the reference category, whereas in patients with diabetes the HR was approximately 4 (HR ϭ 3.67; 2.51-5.36).Results in THIN were broadly similar (Supplemental Table 3), and in both cases interaction terms were significant (P Ͻ .00001).When patients with diabetes were compared with those without diabetes as the reference category with similar BMI, diabetes patients had HRs that were near double those without diabetes for all BMI categories (above 27.5 kg/m 2 ), whereas in the lower BMI categories the risk of NAFLD/NASH associated with diabetes was generally higher at around 3-to 5-fold (Figure 4, B and D).This finding was consistent in both the Humedica and THIN databases such that patients with diabetes in the highest BMI category (40 -60 kg/m 2 ) had a HR more than 20 (HR ϭ 21.63, 18.26 -25.61 [Humedica]; HR ϭ 24.88, 16.65-37.19 [THIN]) compared with patients without diabetes in the reference BMI category.Overall, the risk of NAFLD/NASH diagnosis was 2-fold higher in patients with T2D compared with those without T2D after adjusting for BMI (THIN HR ϭ 1.96, 1.75-2.20;Humedica HR ϭ 2.30, 2.17-2.44).

Evaluating the generalizability of our results
The BMI in our analytical samples was broadly similar to published data during similar time periods in the United Kingdom (Health Survey for England 2012) ( 15

Discussion
This study provides 2 important findings.Firstly, that irrespective of limited identification of NAFLD in the real world, the prospective risk for being recorded as having a diagnosis of NAFLD/NASH increased linearly with in-  creasing BMI such that risk of NAFLD/NASH diagnosis was approximately 5-to 9-fold higher at BMI of 30 -32.5 kg/m 2 rising to rise to around 10-to 14-fold higher at BMIs of 37.5-40 kg/m 2 compared with patients with BMI 20 -22.5 kg/m 2 .Second, in both databases the baseline prevalence of NAFLD/NASH was a fraction of the estimated population prevalence in previous studies which employed systematic hepatic imaging (17,18).This finding is expected and suggests that NAFLD is either being missed or is not looked for in many patients in the real world.Despite differences in clinical practice in the United Kingdom compared with the United States, and just over a 1 kg/m 2 difference in average baseline BMI as well as the difference in follow-up, relative risks of NAFLD/NASH diagnosis by BMI in both EHR databases were broadly similar.We also showed that relative increases in risk for recorded NAFLD/NASH diagnosis according to BMI were greater in individuals without diabetes compared with those with diabetes; not unexpected because many diabetes patients are likely to have NAFLD at diagnosis, even those at lower BMI.However, absolute risks were substantially higher in diabetes patients for any given BMI, a finding which supports the strong pathophysiological link (via ectopic fat) between NAFLD and type 2 diabetes (11).Indeed, it appears as if the association of diabetes with NAFLD/NASH risk is equivalent to an approximately 5-to 10-kg/m 2 increase in BMI in the nondiabetes curve; ie, the curve is shifted substantially to the right.This is particularly evident in the HR for diabetes patients in the healthy BMI range (20 -25 kg/m 2 ) where the risk of recorded NAFLD/NASH was more than 5-fold compared with patients without diabetes.
These novel findings concur with expectations grounded almost predominantly in cross-sectional observations that ectopic liver fat is common in diabetes individuals at diagnosis (11,19), but our findings also extend such observations by providing more granular data.Men had a modestly greater absolute risk of developing NA-FLD/NASH in all BMI categories than women, data which fit with greater liver fat content in men and their higher risk of type 2 diabetes at most BMIs (20).Given the strength of the findings (in particular very high HRs at elevated BMIs), the size of the study, its prospective design, and its contextual consistency with other work, as discussed further below, our results are potentially valuable; physicians need to be aware of the strong and near linear relationship between BMI and NAFLD risk, and clinicians can relate to patients that weight is the most important risk factor for development of NAFLD.
We fully recognize that because fatty liver disease is often undiagnosed due to the lack of systematic screening, the relationship between BMI and a recorded diagnosis of NAFLD/NASH presented here does not necessarily reflect the true relationship.In both databases, the baseline prevalence was a fraction of the estimated population prevalence in previous studies which has employed imaging (17,18).This discrepancy is likely due to the substantial underdiagnosis of NAFLD; routine screening procedures are not recommended in any country, and many patients with NAFLD are not recognized as having it, because liver function tests can be normal and even minor elevations are often not further investigated by imaging.Future studies are needed to determine whether there is a temporal change in the recording of NAFLD diagnosis.We believe there will be as general physicians become more familiar with the relevant diagnostic algorithms.
Our results may reflect an underestimation of relative risks if more cases are being missed in overweight/obese individuals (where cross-sectional imaging studies show NAFLD to be much more common) or alternatively, there could be an overestimation of relative risks if a diagnosis of NAFLD is less likely to be sought in those with lower BMI.However, critically, several factors lead us to believe our results from these prospective analyses are reasonably robust and externally valid.Firstly, cross-sectional studies using imaging in adults and children report odds ratio (OR) for NAFLD by higher BMI which are in line with, or sometimes greater than, what we have shown for BMI vs incident NAFLD/NASH.For example, in the Third National Health and Nutrition Examination Survey, NAFLD prevalence ascertained by ultrasound was approximately around 4-to 8-fold higher in the obese as compared with normal weight individuals in different ethnicities, being greater in men than women, and greater in diabetes patients (21).Further, in our recent cross-sectional study of 1874 young males and females (mean age 17.9 y), the prevalence of fatty liver, ascertained by ultrasound in a careful and detailed manner, was 0.4% (5 of 1226), 4.3% (12 of 279), and 22.2% (26 of 117) for individuals who were normal, overweight, and obese, respectively, ie, more than a 50-fold risk in the obese (22).Secondly, the strong concordance between results from United States-based and United Kingdom-based EHR databases, countries with different obesity rates and health care practices, increases our confidence in the conclusions.Thirdly, one prospective study in a Chinese population which repeated ultrasound yearly over 5 years in over 5500 subjects with around 500 incident NAFLD cases, noted a 13-fold difference in incidence risk comparing extreme fifths of BMI (1.53% vs 19.96%), even though the study was limited entirely to normal weight participants; notably the HR per 1-U BMI in this Chinese study at 1.22 was slightly greater than what we observed at 1.14 and 1.16 in the 2 EHRs in adjusted analyses (7).Thus, if anything, our findings of HRs in the present 2 EHRs ranging from 5-to 9-fold higher risks in obese (BMI of 30 -32.5 kg/m 2 ) to 10-to 14-fold in superobese (BMI of 37.5-40 kg/m 2 ) individuals relative to lean individuals (20 -22.5 kg/m 2 ) appear consistent with cross-sectional findings, and if anything may potentially underestimate, rather than overestimate, risks.Fourthly, finding of HRs above 10 for NAFLD/NASH at higher levels of BMI also supports the associations likely being causal because such high HRs are unlikely to be simply due to bias or confounding (bias and confounding often obscure much weaker associations, ie, where HRs are Ͻ2to 3-fold).Fifthly, higher risks of NAFLD/NASH in diabetes patients and in men are externally valid as they concord with cross-sectional imaging data, as described herein.Finally, we know modest weight loss (ϳ5%) can lead to reductions in NAFLD (23), whereas more major weight loss (ϳ10%-15%) can substantially reduce NA-FLD prevalence as demonstrated over a decade ago in a small series of diabetes patients (24), but an observation repeated many times since then, again in accordance with substantial HRs of risk at higher BMI levels.
Strengths and limitations of this work require careful consideration.The use of EHRs allowed us to access data from millions of patients, enabling us to study the relationship between BMI and NAFLD/NASH diagnosis in key patient subgroups and, critically, in a much more granular fashion than previously attempted.Of course, an ideal study would measure liver fat levels (and all confounders) by imaging methods and then repeat these tests serially over time as incident cases appear.However, such a study, conducted in the numbers required to adequately answer the question, does not exist as far as we know in a wide enough population to be generalizable.A further strength lies in our approach to limit sample bias through the exclusion of patients with chronic diseases linked to weight loss and adjusting for smoking status, both of which could confound the relationship between BMI and risk of NAFLD or NASH.Despite the benefits related to cost, timing, and access to large datasets, limited follow-up time and incomplete patient data represent 2 important limitations that can lead to bias.For example, in Humedica, patients without a recorded measurement of BMI had lower disease prevalence (data available on request).However, patients without BMI recorded are also likely to be thinner (thus health care workers do not consider its measurement to be relevant) and at lower risk for complications.The relative low age of our cohort is perhaps a notable advantage given the lower potential for chronic diseases and thus less chance of bias.We did not have access to alcohol intake data to exclude cases of fatty liver disease that may have been inaccurately diagnosed.However, focusing on the NAFLD/NASH diagnosis should limit the impact of these patients in the results and implies that potential alcohol intake was considered, and rejected, as the main cause of liver disease at the time the diagnosis was recorded by a physician.Moreover, a very recent cross-sectional study (25) which used similar ascertainment methods for NAFLD to ours (ie, ICD coding) reported ORs, adjusted for alcohol intake, broadly similar to HRs we report in our manuscript (obese vs normal weight OR ϭ 9.59).Hence, alcohol intake is unlikely to be a major bias in our study.We also recognize that future studies are needed to separately link BMI to incident NASH alone; we did not have sufficient power in the present cohorts to enable such analyses.Finally, the percentage of overweight and obese patients in our analytical sample is not dissimilar to published obesity rates for the United States and United Kingdom during a similar time period (15,16), giving our results some external consistency.
In summary, using 2 distinct EHR databases comprising more than 2.1 million people and more than 11 000 incident cases, we have shown 2 things.Firstly, that a strong and striking near linear relationship exists between BMI and future risk of recorded NAFLD/NASH, with higher absolute risks in men and patients with diabetes.Second, that NAFLD recording rates are far lower than would be expected from imaging studies, reflecting absence of systematic screening (currently not advocated) and relatively modest recognition of NAFLD.Nevertheless, as discussed the BMI-NAFLD relationship has strong external validity.The magnitude and consistency of the associations, namely a 5-to 10-fold increased risk in the obese and 10-to 14-fold risk in the morbidly obese highlights the importance of both prevention of weight gain and weight reduction strategies in the prevention and management of NAFLD.

Figure 2 .
Figure 2. HRs for diagnosis of NAFLD or NASH based on BMI category in Humedica (A) and THIN (B).HRs with 95% CI are presented compared with the reference BMI category of 20 to less than 22.5 kg/m 2 .

Figure 4 .
Figure 4. HRs for diagnosis of NAFLD or NASH based on BMI category stratified by diabetes status in Humedica (A and B) and THIN (C and D).HRs with 95% CI are presented compared with the reference BMI category of 20 to less than 22.5 kg/m 2 .In plots A and C, HRs are based on the reference BMI category within that group.In plots B and D, HRs are based on the nondiabetic reference BMI category.

Figure 3 .
Figure 3. HRs for diagnosis of NAFLD or NASH based on BMI category stratified by sex in Humedica (A) and THIN (B).HRs with 95% CI are presented compared with the reference BMI category of 20 to less than 22.5 kg/m 2 in females.