Abstract

Introduction

Machine learning presents a unique opportunity to improve electronic cigarette (vaping) monitoring in youth. Here we built a random forest model to predict frequent vaping status among Californian youth and to identify contributing factors and vulnerable populations.

Methods

In this prospective cohort study, 1281 ever-vaping twelfth-grade students from metropolitan Los Angeles were surveyed in Fall and in 6-month in Spring. Frequent vaping was measured at the 6-month follow-up as nicotine-containing vaping on 20 or more days in past 30 days. Predictors (n = 131) encompassed sociodemographic characteristics, substance use and perceptions, health status, and characteristics of the household, school, and neighborhood. A random forest was developed to identify the top ten predictors of frequent vaping and interactions by sociodemographic variables.

Results

Forty participants (3.1%) reported frequent vaping at the follow-up. The random forest outperformed a logistic regression model in prediction (C-Index = 0.87 vs. 0.77). Higher past-month nicotine concentration in vape, more daily vaping sessions, and greater nicotine dependence were the top three of the ten most important predictors of frequent vaping. Interactions were found between age and perceived discrimination, and between age and race/ethnicity, as those who were younger than their classmates and either reported experiencing discrimination frequently or identified as Asian or Native American/Pacific Islander were at increased risk of becoming frequent vapers.

Conclusions

Machine learning can produce models that accurately predict progression of vaping behaviors among youth. The potential association between frequent vaping and perceived discrimination warrants more in-depth analyses to confirm if discrimination constitutes a cause of increased vaping.

Implications

This study demonstrates the utility of machine learning in predicting status of frequent vaping over 6 months and understanding predictors and nuanced intersectionality by sociodemographic attributes. The high performance of the random forest model has practical implications for a personalized risk calculator that supports vaping prevention program. Public health officials need to recognize the importance of social factors that contribute to frequent vaping, particularly perceived discrimination. Youth subpopulations, including younger high school students and Asians or Native Americans/Pacific Islanders, might require specially designed interventions to help prevent habit-forming in vaping.

Introduction

Electronic cigarettes (e-cigarettes or vaping products) have been the most popular tobacco product among American youth over the past seven years.1–3 While most youth reported only experimenting with e-cigarettes once or twice, a small number of them became frequent users of these products. The most recent statistics from October 2021 suggest that among American high school students who were current vapers (i.e., past 30-day users of e-cigarettes), 43.6% of them reported vaping on 20 or more days in past 30 days, marking a rise from 38.9% observed in 2020.2,3 These statistics demonstrate a concerning trend in which American high school students are vaping more frequently and showing signs of dependence.4

In order to support policies and prevention programs to curb vaping among youth, literature has identified an array of factors associated with, or predictive of, progression of vaping from occasionally to more frequently. Cross-sectional survey studies have found that an unhealthy diet, use of alcohol, and recreational drugs are associated with heavier use of e-cigarettes.5,6 Studies of vaping dependence status suggested older age, being female, early onset of vaping, greater vaping frequency, past or current cigarette smoking, and the use of higher nicotine strengths in vape to be significant correlates.7–9 Longitudinal evidence has further linked lower education, use of any nicotine in e-cigarettes, and use of nontobacco flavors with increased risk of frequent vaping.10–12

Methodologically, these existing studies might have limitations inherent to use of conventional regression approaches. Finding potential predictors (or correlates) of frequent vaping requires a comprehensive assessment of a large number of variables—a procedure not typically feasible in regression due to risk of model overfitting. Furthermore, overreliance on the p-value to declare a statistically significant predictor has long been criticized as it could easily lead to erroneous results.13–15 Beyond finding independent predictors, understanding the intertwined effect of multiple sociodemographic attributes, i.e., intersectionality, on the risk of becoming frequent vapers remains unclear due to the challenge of having to explicitly define interaction terms in regression equation and other statistical issues.16,17 Following the Theory of Intersectionality originally arising from the humanities literature, intersectionality analysis aims to address nonadditive effects of gender and race/ethnicity but allows more general investigation on health and disease at different intersections of identity, social position, and processes of oppression or privilege.18,19 In this context, characterizing intersectionality can lead to identification of youth subpopulations that are susceptible to frequent vaping, which could enable targeted public health and social interventions to be deployed within a general regulatory framework.

We aim to address these gaps in the literature by using machine learning—a data-mining technique that uses brute computational power—to more flexibly and comprehensively detect patterns in data.20 Despite increasing use of machine learning in public health research,21 its applications remain scarce in the vaping literature.16 In this study, we developed a random forest model to predict the status of 6-month frequent vaping among high school seniors in California. We also demonstrated two routinely performed posthoc procedures to enhance the interpretations of a black box model.22 These procedures elucidated the top predictors of frequent vaping and allowed for sociodemographic assessment of intersectionality to provide additional findings that might be meaningful to policymakers of vaping control.

Methods

Study Design and Cohort

This prospective cohort analysis used data from the ongoing Happiness & Health Study.23 In Fall 2013, 4100 9th grade students from ten public high schools in metropolitan Los Angeles, California were recruited via convenience sampling, targeting schools that represent a collective set of sociodemographically heterogenous communities to create a diverse sample. Among them, 3396 students (82.8%) provided both youth assent and parental consent to participate in the study. A detailed survey on physical and mental health, use and perception of various substances, and household information was administered to participants every 6 months. School and neighborhood characteristics were collected in Fall 2013 and linked to survey responses. In this study, we used data from the Fall (Wave 7 survey in 2016) and Spring (Wave 8 survey in 2017) of their senior year. Analysis was restricted to those reporting to have ever tried an e-cigarette in the Wave 7 survey (n = 1673, 49.3% of all participants) since key questions regarding vaping (such as typical number of puffs) were only asked for ever-vapers. We further excluded students who did not start grade 12 at time of the Wave 7 survey (n = 18) and those that did not report frequency of vaping at the Wave 8 survey (n = 374). These procedures yielded 1281 ever-vaping 12th grade students in the cohort.

Outcome, Candidate Predictors, and Data Prepossessing

We focused on a binary outcome variable denoting the status of frequent vaping at the 6-month follow-up (i.e., use of nicotine-containing e-cigarettes on at least 20 days of past 30 days).2,24 Candidate predictors of frequent vaping were selected based on an ecological conceptual framework that described the substance use behaviors of youth as a result of risk and protective factors at various domains.25–28 As a result, 131 variables were extracted from the Wave 7 survey to encompass five domains: individual, family, school and peers, neighborhood, and society (see Supplementary Appendix S1). Through visual inspections of missing data across variables and participants, we confirmed the assumption of missing at random to be plausible to allow for imputation procedures (Supplementary Appendix S2). Due to the relative scarcity of missing data (11.2% total missing), we applied simple imputation using the median (for continuous variables) or the reference-case level (for categorical variables) in the primary analysis, and reserved a more sophisticated multiple imputations procedure in the sensitivity analysis to examine the robustness of the primary results.

Developing and Validating a Random Forest Model

This was a predictive modeling study where we aimed to develop a classification model to dichotomize participants into either frequent or infrequent vapers over a course of 6 months. Hence, our goal was to maximize the predictive power of the final model, rather than to convey any underlying relationships through hypothesis testing as seen in a typical explanatory modeling study.29 To align with this goal, we used the receiver operating characteristic (ROC) curve to assess the sensitivity/specificity of the classification model. The area under the ROC curve (C-Index) was also computed where a high overall classification performance was declared if it exceeded 0.80.30

Random forest is a machine learning technique that uses many classification tree models as an ensemble to make predictions.31 The “randomness” occurs in two phases: a randomly bootstrapped subsample of data is used to develop each tree and a random subset of predictors is tested to create splits in each tree. These schemes ensure individual trees to be largely uncorrelated, which contributes to the high prediction accuracy and robustness of the random forest.31 We used a ratio of 6:4 to randomly split the data into a training set (n = 769) for model development and a testing set (n = 512) to evaluate model performance. On the training set, we performed an oversampling procedure using the Synthetic Minority Oversampling Technique to overcome the potential bias arisen from the rarity of frequent vapers (n = 24 or 3.1% of the training set).32 Predictors were entered independently into the random forest where two model parameters—the number of randomly selected variables tested at each tree split and the number of trees—were optimized on the training set in an internal 10-fold cross-validation procedure. In order to assess the performance of the final model on unseen data, we computed the ROC curve and the C-Index of this model outside of the training set using data from the testing set.30

Identifying the Top Predictors of Frequent Vaping

We estimated a relative importance score for each predictor reflecting the mean decrease in classification accuracy when this predictor was omitted from the model.33,34 One-way partial dependence plots were generated for each of the top ten predictors to illustrate their marginal effects on the predicted probability of frequent vaping while integrating out other predictors. A negative partial dependence indicated that the average predicted probability of frequent vaping decreased for that value of the predictor, and a positive partial dependence suggested otherwise.34

Sociodemographic Assessment of Intersectionality

We focused on nine variables that described the sociodemographic status of participants: age (years), gender (female or male), race/ethnicity (Hispanic/Latino, Asian, African American/Black, Native American/Pacific Islander, other/multiracial, or White); neighborhood characteristics: median annual household income, percentage of high school graduates and unemployment rates; characteristics of the school attended by participants: the percentages of free and reduced-cost lunch recipients; and perceived frequency of discrimination measured by the mean score (between 0-3) on the Everyday Discrimination Scale.35 This scale comprised 11 statements describing chronic, routine, and relatively minor forms of discriminatory events such as, “People act as if they think you are dishonest” (see Supplementary Appendix 1). For each statement, participants rated the frequency of experiencing such event with a score of 0–3 where 0 = never, 1 = rarely, 2 = sometimes, and 3 = often. In this analysis, we used the mean score over the 11 items to denote the overall degree of perceived discrimination.

Following the recommended workflow for conducting a posthoc intersectionality analysis using a machine learning model,22 we assessed the strength of all 36 pairwise interactions formed by these nine sociodemographic variables using a variance-based method, which is regarded as a less computationally expensive alternative to the H-statistic.22,36 This method yields a single statistic reflecting the strength of interaction between x1 and x2.36 Intuitively, this statistic measures how the predicted probability of frequent vaping based on the pair would change, when one of the pair (x1 or x2) varies, holding the other at constant.37 Since there is no standard threshold of this statistic for declaring a “statistically significant” interaction effect, we identified the top two interactions and illustrated them using two-way partial dependence plots.36

Sensitivity Analysis

Three sets of sensitivity analysis were carried out. First, we limited the sample to be current (past-month) vapers at Wave 7 survey to assess if their predictors of becoming a frequent vaper 6 months later were systematically different from ever-vapers.24 Second, we repeated the analysis on five datasets that were independently imputed using the multiple imputation by chained equations (MICE) algorithm.38 Finally, two multivariate logistic regression models were estimated on the same oversampled training set to compare their performance on the same testing set with the random forest. The first model (the base model) comprised all nine sociodemographic variables as independent predictors. The second logistic model also contained the top ten predictors identified by the random forest in addition to sociodemographic variables. Analyses were performed on R version 3.6.1 (R Foundation for Statistical Computing).

Results

Descriptive Analysis

Among the 1281 twelfth-grade students who had tried vaping at least once, 40 (3.1%) reported vaping frequently 6 months later (Table 1). Compared to infrequent vapers, frequent vapers were more likely to be male (80.0% vs. 46.3%, p < .001) and were less likely to receive free or reduced-cost lunch at school (full-cost lunch recipients = 61.8% vs. 43.8%, p = .04). Mothers of frequent vapers were more educated (percentage of mothers with at least high school diploma = 89.1% vs. 75.8%, p = .03), although the difference was absent between fathers (p = .2). The racial/ethnical profile of the two groups was different (p = .002); for example, the proportion of Hispanic/Latino (25.6% vs. 52.2%) was smaller among frequent vapers. Frequent vapers also reported experiencing discrimination more frequently by scoring higher, on average, on the Everyday Discrimination Scale (mean score = 1.21 vs. 0.82, p = .005). Regarding cigarette smoking, we found that compared to infrequent vapers, frequent vapers were significantly more likely to have ever had a puff of a cigarette (80.0% vs. 48.7%, p < .001), have ever used a whole cigarette (70.0% vs. 35.4%, p < .001), or report to be a current (past 30-day) cigarette smoker (30.0% vs. 7.6%, p < .001).

Table 1.

Comparing Sociodemographic and Cigarette Smoking Characteristics of Ever-Vaping Twelfth-Grade Students by the Status of Frequent Vaping Reported 6 Months Later

Characteristicsn (%) or mean (SD)Infrequent vapers(n = 1241, 96.9%)Frequent vapers(n = 40, 3.1%)p-value
Sociodemographic characteristics
Sex, male575 (46.3%)32 (80.0%)<.001
Age, year17.49 (0.40)17.46 (0.36).735
Race/Ethnicity.002
Hispanic/Latino631 (52.2%)10 (25.6%)
Asian139 (11.5%)10 (25.6%)
African American/Black47 (3.9%)1 (2.6%)
Native American/Pacific Islander62 (5.1%)6 (15.4%)
Other/multiracial141 (11.7%)5 (12.8%)
White188 (15.6%)7 (17.9%)
Language used at home.127
Only English418 (35.5%)14 (36.8%)
Mostly English258 (21.9%)13 (34.2%)
Mostly or only another language501 (42.6%)11 (28.9%)
Free/reduced-cost school lunch recipient status.044
Full-cost lunch recipients460 (43.8%)21 (61.8%)
Reduced-cost lunch recipients119 (11.3%)3 (8.8%)
Free lunch recipients432 (41.1%)7 (20.6%)
Don’t know40 (3.8%)3 (8.8%)
Father’s highest education.187
Less than high school227 (24.2%)4 (10.8%)
High school248 (26.5%)9 (24.3%)
Less than college177 (18.9%)8 (21.6%)
College or above285 (30.4%)16 (43.2%)
Mother’s highest education.033
Less than high school226 (22.0%)3 (8.1%)
High school211 (20.5%)9 (24.3%)
Less than college210 (20.4%)4 (10.8%)
College or above382 (37.1%)21 (56.8%)
Living arrangement.947
With both parents713 (58.5%)26 (65.0%)
With mother only194 (15.9%)5 (12.5%)
With father only203 (16.7%)6 (15.0%)
Some mother, some father32 (2.6%)1 (2.5%)
In a group home or with someone else77 (6.3%)2 (5.0%)
Neighborhood median income, $USD68343.36 (13563.26)69636.18 (13258.30).558
Neighborhood unemployment rate, %9.55 (1.92)9.59 (1.77).903
Neighborhood high school graduates, %80.66 (11.10)82.11 (9.94).422
Proportion of students receiving free lunch at school,%33.27 (19.87)27.99 (18.90).097
Proportion of students receiving reduced-cost lunch at school, %10.36 (5.69)10.18 (6.07).844
Everyday Discrimination Scale, mean score0.82 (0.71)1.21 (0.96).005
Cigarette smoking history
Lifetime ever puffs of a cigarette604 (48.7%)32 (80.0%)<.001
Lifetime ever use of a whole cigarette439 (35.4%)28 (70.0%)<.001
Current (past 30-day) use of cigarettes94 (7.6%)12 (30.0%)<.001
Characteristicsn (%) or mean (SD)Infrequent vapers(n = 1241, 96.9%)Frequent vapers(n = 40, 3.1%)p-value
Sociodemographic characteristics
Sex, male575 (46.3%)32 (80.0%)<.001
Age, year17.49 (0.40)17.46 (0.36).735
Race/Ethnicity.002
Hispanic/Latino631 (52.2%)10 (25.6%)
Asian139 (11.5%)10 (25.6%)
African American/Black47 (3.9%)1 (2.6%)
Native American/Pacific Islander62 (5.1%)6 (15.4%)
Other/multiracial141 (11.7%)5 (12.8%)
White188 (15.6%)7 (17.9%)
Language used at home.127
Only English418 (35.5%)14 (36.8%)
Mostly English258 (21.9%)13 (34.2%)
Mostly or only another language501 (42.6%)11 (28.9%)
Free/reduced-cost school lunch recipient status.044
Full-cost lunch recipients460 (43.8%)21 (61.8%)
Reduced-cost lunch recipients119 (11.3%)3 (8.8%)
Free lunch recipients432 (41.1%)7 (20.6%)
Don’t know40 (3.8%)3 (8.8%)
Father’s highest education.187
Less than high school227 (24.2%)4 (10.8%)
High school248 (26.5%)9 (24.3%)
Less than college177 (18.9%)8 (21.6%)
College or above285 (30.4%)16 (43.2%)
Mother’s highest education.033
Less than high school226 (22.0%)3 (8.1%)
High school211 (20.5%)9 (24.3%)
Less than college210 (20.4%)4 (10.8%)
College or above382 (37.1%)21 (56.8%)
Living arrangement.947
With both parents713 (58.5%)26 (65.0%)
With mother only194 (15.9%)5 (12.5%)
With father only203 (16.7%)6 (15.0%)
Some mother, some father32 (2.6%)1 (2.5%)
In a group home or with someone else77 (6.3%)2 (5.0%)
Neighborhood median income, $USD68343.36 (13563.26)69636.18 (13258.30).558
Neighborhood unemployment rate, %9.55 (1.92)9.59 (1.77).903
Neighborhood high school graduates, %80.66 (11.10)82.11 (9.94).422
Proportion of students receiving free lunch at school,%33.27 (19.87)27.99 (18.90).097
Proportion of students receiving reduced-cost lunch at school, %10.36 (5.69)10.18 (6.07).844
Everyday Discrimination Scale, mean score0.82 (0.71)1.21 (0.96).005
Cigarette smoking history
Lifetime ever puffs of a cigarette604 (48.7%)32 (80.0%)<.001
Lifetime ever use of a whole cigarette439 (35.4%)28 (70.0%)<.001
Current (past 30-day) use of cigarettes94 (7.6%)12 (30.0%)<.001

Two-sample t-test and Fisher’s Exact Test were used to compare means and proportions, respectively. Frequent vapers were defined at the 6-month follow-up as users of nicotine-containing vape on at least 20 days of the past 30 days. SD, standard deviations.

Table 1.

Comparing Sociodemographic and Cigarette Smoking Characteristics of Ever-Vaping Twelfth-Grade Students by the Status of Frequent Vaping Reported 6 Months Later

Characteristicsn (%) or mean (SD)Infrequent vapers(n = 1241, 96.9%)Frequent vapers(n = 40, 3.1%)p-value
Sociodemographic characteristics
Sex, male575 (46.3%)32 (80.0%)<.001
Age, year17.49 (0.40)17.46 (0.36).735
Race/Ethnicity.002
Hispanic/Latino631 (52.2%)10 (25.6%)
Asian139 (11.5%)10 (25.6%)
African American/Black47 (3.9%)1 (2.6%)
Native American/Pacific Islander62 (5.1%)6 (15.4%)
Other/multiracial141 (11.7%)5 (12.8%)
White188 (15.6%)7 (17.9%)
Language used at home.127
Only English418 (35.5%)14 (36.8%)
Mostly English258 (21.9%)13 (34.2%)
Mostly or only another language501 (42.6%)11 (28.9%)
Free/reduced-cost school lunch recipient status.044
Full-cost lunch recipients460 (43.8%)21 (61.8%)
Reduced-cost lunch recipients119 (11.3%)3 (8.8%)
Free lunch recipients432 (41.1%)7 (20.6%)
Don’t know40 (3.8%)3 (8.8%)
Father’s highest education.187
Less than high school227 (24.2%)4 (10.8%)
High school248 (26.5%)9 (24.3%)
Less than college177 (18.9%)8 (21.6%)
College or above285 (30.4%)16 (43.2%)
Mother’s highest education.033
Less than high school226 (22.0%)3 (8.1%)
High school211 (20.5%)9 (24.3%)
Less than college210 (20.4%)4 (10.8%)
College or above382 (37.1%)21 (56.8%)
Living arrangement.947
With both parents713 (58.5%)26 (65.0%)
With mother only194 (15.9%)5 (12.5%)
With father only203 (16.7%)6 (15.0%)
Some mother, some father32 (2.6%)1 (2.5%)
In a group home or with someone else77 (6.3%)2 (5.0%)
Neighborhood median income, $USD68343.36 (13563.26)69636.18 (13258.30).558
Neighborhood unemployment rate, %9.55 (1.92)9.59 (1.77).903
Neighborhood high school graduates, %80.66 (11.10)82.11 (9.94).422
Proportion of students receiving free lunch at school,%33.27 (19.87)27.99 (18.90).097
Proportion of students receiving reduced-cost lunch at school, %10.36 (5.69)10.18 (6.07).844
Everyday Discrimination Scale, mean score0.82 (0.71)1.21 (0.96).005
Cigarette smoking history
Lifetime ever puffs of a cigarette604 (48.7%)32 (80.0%)<.001
Lifetime ever use of a whole cigarette439 (35.4%)28 (70.0%)<.001
Current (past 30-day) use of cigarettes94 (7.6%)12 (30.0%)<.001
Characteristicsn (%) or mean (SD)Infrequent vapers(n = 1241, 96.9%)Frequent vapers(n = 40, 3.1%)p-value
Sociodemographic characteristics
Sex, male575 (46.3%)32 (80.0%)<.001
Age, year17.49 (0.40)17.46 (0.36).735
Race/Ethnicity.002
Hispanic/Latino631 (52.2%)10 (25.6%)
Asian139 (11.5%)10 (25.6%)
African American/Black47 (3.9%)1 (2.6%)
Native American/Pacific Islander62 (5.1%)6 (15.4%)
Other/multiracial141 (11.7%)5 (12.8%)
White188 (15.6%)7 (17.9%)
Language used at home.127
Only English418 (35.5%)14 (36.8%)
Mostly English258 (21.9%)13 (34.2%)
Mostly or only another language501 (42.6%)11 (28.9%)
Free/reduced-cost school lunch recipient status.044
Full-cost lunch recipients460 (43.8%)21 (61.8%)
Reduced-cost lunch recipients119 (11.3%)3 (8.8%)
Free lunch recipients432 (41.1%)7 (20.6%)
Don’t know40 (3.8%)3 (8.8%)
Father’s highest education.187
Less than high school227 (24.2%)4 (10.8%)
High school248 (26.5%)9 (24.3%)
Less than college177 (18.9%)8 (21.6%)
College or above285 (30.4%)16 (43.2%)
Mother’s highest education.033
Less than high school226 (22.0%)3 (8.1%)
High school211 (20.5%)9 (24.3%)
Less than college210 (20.4%)4 (10.8%)
College or above382 (37.1%)21 (56.8%)
Living arrangement.947
With both parents713 (58.5%)26 (65.0%)
With mother only194 (15.9%)5 (12.5%)
With father only203 (16.7%)6 (15.0%)
Some mother, some father32 (2.6%)1 (2.5%)
In a group home or with someone else77 (6.3%)2 (5.0%)
Neighborhood median income, $USD68343.36 (13563.26)69636.18 (13258.30).558
Neighborhood unemployment rate, %9.55 (1.92)9.59 (1.77).903
Neighborhood high school graduates, %80.66 (11.10)82.11 (9.94).422
Proportion of students receiving free lunch at school,%33.27 (19.87)27.99 (18.90).097
Proportion of students receiving reduced-cost lunch at school, %10.36 (5.69)10.18 (6.07).844
Everyday Discrimination Scale, mean score0.82 (0.71)1.21 (0.96).005
Cigarette smoking history
Lifetime ever puffs of a cigarette604 (48.7%)32 (80.0%)<.001
Lifetime ever use of a whole cigarette439 (35.4%)28 (70.0%)<.001
Current (past 30-day) use of cigarettes94 (7.6%)12 (30.0%)<.001

Two-sample t-test and Fisher’s Exact Test were used to compare means and proportions, respectively. Frequent vapers were defined at the 6-month follow-up as users of nicotine-containing vape on at least 20 days of the past 30 days. SD, standard deviations.

Performance of the Random Forest

On the testing set, the random forest reached a C-Index of 0.87 (Figure 1). Using a decision threshold of 0.5 (i.e., participants with a predicted risk of 50% or above were deemed to be frequent vapers), this model had sensitivity, specificity, and accuracy of 0.31 (95% CI: 0.11–0.59), 0.97 (95% CI: 0.95–0.98) and 0.95 (95% CI: 0.93–0.97), respectively. Using a lower decision threshold of 0.25 (i.e., participants with 25% or above predicted risk were classified to be frequent vapers), the same model had sensitivity, specificity, and accuracy of 0.88 (95% CI: 0.62–0.98), 0.80 (95% CI: 0.76–0.83) and 0.80 (95% CI: 0.76–0.83), respectively.

The receiving operating characteristic curves of the random forest and logistic models on the testing set. MICE, multiple imputation by chained equations; ROC, receiving operating characteristic; RF, random forest. Notes: Only the first MICE-imputed random forest model was shown on the figure. See Supplementary Appendix S2 for the performance of all five MICE-imputed models.
Figure 1.

The receiving operating characteristic curves of the random forest and logistic models on the testing set. MICE, multiple imputation by chained equations; ROC, receiving operating characteristic; RF, random forest. Notes: Only the first MICE-imputed random forest model was shown on the figure. See Supplementary Appendix S2 for the performance of all five MICE-imputed models.

Top Predictors of Frequent Vaping at the 6-Month Follow-Up

Figure 2 shows the top ten predictors of frequent vaping identified by the random forest. They were (in descending order of importance): higher past-month nicotine concentration in vape, more daily vaping sessions in past month, greater nicotine dependence measured by the Hooked On Vaping Checklist, increased willingness to vape, more puffs per vape in past month, higher level of perceived discrimination, negative expectancies on cigarette smoking, past-month use of nicotine in vape, higher percentage of students receiving reduced-cost lunch at school, and past-month use of marijuana in vape. One-way partial dependence plots illustrating the marginal effects of these predictors on the probability of frequent vaping are presented in Supplementary Appendix S3.

Top ten predictors of frequent vaping identified by the random forest model. Notes: Nicotine dependence was measured by the 10-item Hooked on Vaping Checklist. Perceived discrimination was measured by the 11-item Everyday Discrimination Scale.
Figure 2.

Top ten predictors of frequent vaping identified by the random forest model. Notes: Nicotine dependence was measured by the 10-item Hooked on Vaping Checklist. Perceived discrimination was measured by the 11-item Everyday Discrimination Scale.

Sociodemographic Assessment of Intersectionality

We evaluated 36 pairs of sociodemographic interactions (Supplementary Appendix S3). The strongest interaction was noted between age and perceived discrimination. At all ages, we found the probability of frequent vaping to increase with more frequent experiences of discrimination. A high-risk youth subgroup was identified as those younger than their classmates and reported moderate to high levels of discrimination (Figure 3). Age and race/ethnicity produced the second strongest interaction (Supplementary Appendix S3). Across all racial/ethnical groups, the predicted probability of frequent vaping decreased to a low level for those aged 18 or above. For participants who were either Asian or Native American/Pacific Islander, their predicted probability of vaping frequently was very high, in general, and among those aged between 17–17.5 years, in particular. For African American/Black participants, we observed an abrupt decrease in frequent vaping for those around 17.5 years of age.

Change in the probability of frequent vaping by age and perceived discrimination. Notes: Perceived discrimination was measured by the mean score on the 11-item Everyday Discrimination Scale. Both the 2-dimensional (left) and 3-dimensional plots (right) demonstrate how the expected predicted probability of frequent vaping change according to participant age and levels of perceived discrimination.
Figure 3.

Change in the probability of frequent vaping by age and perceived discrimination. Notes: Perceived discrimination was measured by the mean score on the 11-item Everyday Discrimination Scale. Both the 2-dimensional (left) and 3-dimensional plots (right) demonstrate how the expected predicted probability of frequent vaping change according to participant age and levels of perceived discrimination.

Sensitivity Analysis

The ROC curves of alternative predictive models assessed in the sensitivity analysis are presented in Figure 1. The five MICE-imputed datasets yielded new random forest models with C-index ranging from 0.864 to 0.903 (Supplementary Appendix S2). Past-month use of nicotine in vape or nicotine concentration in vape, greater nicotine dependence measured by the Hooked on Vaping Checklist or the Hooked on Nicotine Checklist, and past-month daily number of vaping sessions were consistently ranked to be the top three most important predictors. These models also suggested several new predictors such as expectancies on e-cigarettes, ever puffs of a cigarette, and expectancies on vaping marijuana (Supplementary Appendix S2). Among baseline current vapers (n = 123, 9.6% of all ever-vapers), 22 of them (17.9%) reported vaping frequently in 6 months. Restricting the analysis on current vapers led to a random forest (C-Index = 0.71) that identified race/ethnicity and age to be among the top ten predictors (Supplementary Appendix S4). The two logistic models had reduced performance by reaching a C-Index of 0.58 (base model) and 0.77 (base model plus the top ten predictors identified by the original random forest), respectively.

Discussion

Consistent with prior results, this machine learning analysis found typical predictors of frequent vaping to be the level of nicotine concentration in vape, more vaping sessions per day, more puffs, and greater willingness to vape.7–10,12,39 Notably, out of 131 possible variables addressing multiple levels of socioecological influence, our results suggested nicotine concentration to be the top predictor of frequent vaping. A 2018 report by the National Academies of Sciences, Engineering, and Medicine found some evidence supporting nicotine concentration as a critical determinant of nicotine dependence.40 Regardless of the extent to which the current results generalize to other populations and settings, one implication is that regulatory policy addressing nicotine concentration might have a substantive impact on youth vaping behaviors and potential dependence. Our results also corroborated the utility of the Hooked on Vaping Checklist in identifying high school students at risk of progressing to frequent vaping.41 Specifically, we found the risk to increase abruptly for those reporting to have at least one of the ten symptoms (Supplementary Appendix S3).

An interesting finding that emerged from this analysis was that the probability of frequent vaping decreased with more positive expectancies on cigarettes. We offer two explanations: first, young vapers who use the device more intensively and frequently tend to be those who regard e-cigarettes as generally safe, or at least safer, and healthier than cigarettes. This phenomenon has been documented in Canada, where believing vaping to be less harmful than cigarette smoking was found to predict increased chance of being a current vaper.42 Second, it is possible that this finding reflects past smokers who have fully transitioned from smoking to vaping since they perceived the adverse health effects of smoking to be substantial. In order to maintain the nicotine consumption, and perhaps since vaping was not as satisfying as smoking, they started to use nicotine-containing vaping frequently, and ultimately showing signs of vaping dependence.43 Whether this speculation is plausible, especially considering the young age of high school students in our cohort, warrants additional studies to examine the risk of vaping addiction following a successful vaping-assisted smoking cessation among young cigarette quitters.

Our analysis revealed the vital role of discrimination on frequent vaping among youth. Evidence that links discrimination with substance abuse, including nicotine dependence, and poor physical and mental well-being in general, has been abundant in the literature.44–48 We advanced the literature by showing perceived discrimination to be a potential predictor of frequent vaping, in addition to leading to large increases in the risk among certain youth subpopulations. Particularly, although younger age has not been regarded as a typical predictor of vaping dependence by previous studies7–9 or identified as an independent predictor of frequent vaping in the present analysis, we found younger age to interact with perceived discrimination to elevate the risk of frequent vaping—i.e., students who were younger than their peers (aged 17-17.5) and also felt moderate to high levels of discrimination were at increased risk of becoming frequent vapers. Due to their mental immaturity, younger students tend to develop more severe behavioral maladjustments as a stress response to facing discriminations.49 Deviant peer affiliations might also play a role as the associated effect on substance use is more prominent among young and socially disadvantaged youth.49 Hence, future researchers need to conduct more in-depth analysis to understand the complex interplay between age, peer relations, discrimination, and substance use.

Policy Implications

Our findings have implications to a range of public health, school, and social interventions to curb the addictive potentials of e-cigarettes among youth. The importance of nicotine concentration in predicting frequent vaping warrants considerations on stringent nicotine regulations for vaping products in North America that are similar to the ones currently present in the UK where the prevalence of youth ever-vaping and frequent vaping are lower.50 Specifically, UK authority has restricted nicotine strength in vape to be at most 20 mg/ml and banned e-cigarette tanks greater than 2 ml.51 School officials may consider routinely administering questionnaires containing the 10-item Hooked on Vaping Checklist. Students who report any of the ten symptoms may require regular follow-ups from school counselors to closely monitor the progression of these symptoms. Public health decisionmakers need to recognize the role of discrimination on progression to frequent vapers and allow prevention programs to be deployed in concert with anti-discrimination social policies. The high-risk subgroups identified in this study, such as young Asians and Native Americans/Pacific Islanders, might require targeted interventions to help them break the habit of frequent vaping.

Study Limitations

Our analysis has limitations. First, the Happiness & Health Study is based on a convenience sample of high school students in Los Angeles and thus does not reflect the demographics of high school students in other regions.23 Hence, future researchers might consider replicating this analysis on a nationally representative sample. Furthermore, because the data used in this analysis were collected before 2018, we were unable to fully account for the changes of youth vaping trends due to the dominance of JUUL in the US market and the subsequent regulations placed on JUUL to restrict youth access.52 Our next step is to use the latest data from the Happiness & Health Study that included JUUL-specific survey questions to identify potentially distinct factors that predict frequent JUUL use. Second, our sample size is not generous when compared to past machine learning applications in public health research, although we did achieve a high C-Index indicating satisfactory performance.21 The small counts of specific subgroups (such as African American/Black students) might further hinder the generalizability of the findings on intersectionality. Still, we believe that the use of resampling and cross-validation have at least partially mitigated the issue, and through extensive MICE imputation the results were robust to missing data found in the dataset. Furthermore, the primary goal of this analysis is to construct a random forest model capable of accurately forecasting frequent vaping status, rather than conveying relational or causal associations. Specifically, we were unable to estimate the causal effect of the top ten predictors other than concluding that these variables might be more important than others in contributing to an increased risk of frequent vaping.53 Particularly, our conclusion on a positive association between frequent vaper status and perceived discrimination measured by the 11-item Everyday Discrimination Scale needs to be interpreted with caution. In order to establish a causal role of discrimination on increased vaping, future studies that incorporate a causal testing framework need to address at least three issues: first, whether a reverse relationship exists between discrimination and vaping due to stigma against vaping/vapers, i.e., the reported discrimination might be driven by being a frequent vaper54; second, how to rule out the effect of unmeasured confounders such as peer relations49; and lastly, if alternative measures of the 11-item Everyday Discrimination Scale, especially those that explicitly account for vaping/vaper-associated nonacceptance by family members, would be more appropriate in this context.55 Finally, although we demonstrated in sensitivity analysis that our results were largely robust to the small percentage of missing data (11.2%), future researchers that encounter more severely missed data might consider constructing an analytical pipeline that includes a machine learning-based imputation procedure within the predictive modeling framework to enhance the study findings.56,57

We demonstrated strengths in this analysis. With respect to accurately predicting the status of frequent vaping at the 6-month follow-up, we showed random forest to be a superior alternative to multivariate logistic regression by achieving better overall discrimination (C-Index = 0.87 vs. 0.77), although we were unable to conduct any external validation to extend this result to other settings or populations. Furthermore, while partial dependence plots are widely used as means to interpret black boxes, studies rarely assess the statistical strength of interactions other than visualizing their forms.58 We addressed this gap using a validated statistic and found the interaction between age and perceived discrimination to be the strongest among 36 pairs of sociodemographic interactions.36 Moreover, despite the rarity of frequent vapers at the follow-up (3.1%), we circumvented this statistical challenge by combining a resampling procedure with the development of the random forest. As rare outcomes are often encountered in public health research,59 our analysis provides practical implications for future researchers to tackle this data situation. Finally, our methodology allowed us to capture complex nonlinear relationships and to simultaneously consider 131 potential predictors of frequent vaping. These procedures highlight machine learning as a valuable technique for public health researchers to dramatically amplify the capacity and scope of their analysis.

Conclusions

Beyond typical predictors of frequent e-cigarette use, reporting high level of perceived discrimination might be associated with progression of vaping behaviors, especially among high school seniors who are younger than their classmates and those belonging to certain racial/ethnical minority groups. These findings warrant more in-depth hypothesis-testing studies to establish the causal role of discrimination on increased vaping and to statistically confirm the presence of these vulnerable youth subgroups. Machine learning can potentially outperform conventional regression models in predicting the progression of vaping behaviors using large survey data. Future research needs to provide evidence on the feasibility of deploying a machine learning-based model to aid vaping prevention program in the real-world setting.

Supplementary Material

A Contributorship Form detailing each author’s specific involvement with this content, as well as any supplementary data, are available online at https://academic.oup.com/ntr.

Funding

This work was supported by the Canadian Institutes of Health Research Catalyst Grant (grant number 172898); the National Cancer Institute of the National Institutes of Health and FDA Center for Tobacco Products (award numbers U54CA229974, U54CA180905); Tobacco-Related Disease Research Program (grant number 27-IR-0034); and the National Cancer Institute of the National Institutes of Health & National Institute on Drug Abuse Grant (grant number R01CA229617). The funders had no role in the study design, collection, analysis or interpretation of the data, writing the manuscript, or the decision to submit the paper for publication. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the Food and Drug Administration.

Declaration of Interests

We have no conflicts of interest to disclose. The authors have full access to all data and agree to allow the journal to review the data if requested. This manuscript has not been published and is not under consideration for publication elsewhere.

Data Availability

The R codes used in this analysis are available on the Open Science Framework: https://osf.io/czs9e/. Person-level survey data from the Happiness & Health Study used in this analysis are not publicly available. Request to access these data can be made by contacting Data Manager, Lauren Howard at [email protected] or by visiting the website: https://eosresearch.usc.edu/happiness-health-study/.

References

1.

Arrazola
RA
,
Singh
T
,
Corey
CG
, et al.
Tobacco use among middle and high school students—United States, 2011–2014
.
Morb Mortal Wkly Rep
.
2015
;
64
(
14
):
381
385
.

2.

Wang
TW
,
Neff
LJ
,
Park-Lee
E
,
Ren
C
,
Cullen
KA
,
King
BA
.
E-cigarette use among middle and high school students—United States, 2020
.
Morb Mortal Wkly Rep.
2020
;
69
(
37
):
1310
1312
.

3.

Park-Lee
E
,
Ren
C
,
Sawdey
MD
, et al.
Notes from the field: E-cigarette use among middle and high school students—National Youth Tobacco Survey, United States, 2021
.
Morb Mortal Wkly Rep.
2021
;
70
(
39
):
1387
1389
.

4.

Vogel
EA
,
Prochaska
JJ
,
Rubinstein
ML
.
Measuring e-cigarette addiction among adolescents
.
Tob Control.
2020
;
29
(
3
):
258
262
.

5.

Dunbar
MS
,
Tucker
JS
,
Ewing
BA
, et al.
Frequency of e-cigarette use, health status, and risk and protective health behaviors in adolescents
.
J Addict Med.
2017
;
11
(
1
):
55
62
.

6.

McCabe
SE
,
West
BT
,
Veliz
P
,
Boyd
CJ
.
E-cigarette use, cigarette smoking, dual use, and problem behaviors among U.S. adolescents: results from a national survey
.
J Adolesc Health.
2017
;
61
(
2
):
155
162
.

7.

Morean
ME
,
Krishnan-Sarin
S
,
S O’Malley
S
.
Assessing nicotine dependence in adolescent E-cigarette users: the 4-item Patient-Reported Outcomes Measurement Information System (PROMIS) Nicotine Dependence Item Bank for electronic cigarettes
.
Drug Alcohol Depend.
2018
;
188
:
60
63
.

8.

Foulds
J
,
Veldheer
S
,
Yingst
J
, et al.
Development of a questionnaire for assessing dependence on electronic cigarettes among a large sample of ex-smoking E-cigarette users
.
Nicotine Tob Res.
2015
;
17
(
2
):
186
192
.

9.

Camara-Medeiros
A
,
Diemert
L
,
O’Connor
S
,
Schwartz
R
,
Eissenberg
T
,
Cohen
JE
.
Perceived addiction to vaping among youth and young adult regular vapers
.
Tob Control.
2021
;
30
(
3
):
273
278
.

10.

Vogel
EA
,
Cho
J
,
McConnell
RS
,
Barrington-Trimis
JL
,
Leventhal
AM
.
Prevalence of electronic cigarette dependence among youth and its association with future use
.
JAMA Netw Open.
2020
;
3
(
2
):
e1921513
.

11.

Leventhal
AM
,
Goldenson
NI
,
Cho
J
, et al.
Flavored e-cigarette use and progression of vaping in adolescents
.
Pediatrics.
2019
;
144
(
5
):
e20190789
. doi:10.1542/peds.2019-0789

12.

Aherrera
A
,
Aravindakshan
A
,
Jarmul
S
, et al.
E-cigarette use behaviors and device characteristics of daily exclusive e-cigarette users in Maryland: implications for product toxicity
.
Tob Induc Dis.
2020
;
18
:
93
.

13.

Vidgen
B
,
Yasseri
T
.
P-values: misunderstood and misused
.
Front Phys.
2016
;
4
. doi:10.3389/fphy.2016.00006

14.

Gelman
A
.
The problems with p-values are not just with p-values
.
Am. Stat.
2016
;
70
:
1
2
.

15.

Wang
B
,
Zhou
Z
,
Wang
H
,
Tu
XM
,
Feng
C
.
The p-value and model specification in statistics
.
Gen Psychiatr.
2019
;
32
(
3
):
e100081
.

16.

Fu
R
,
Kundu
A
,
Mitsakakis
N
, et al.
Machine learning applications in tobacco research: a scoping review
.
Tob. Control
. Published online August 27,
2021
. doi:10.1136/tobaccocontrol-2020-056438

17.

Potter
LN
,
Lam
CY
,
Cinciripini
PM
,
Wetter
DW
.
Intersectionality and smoking cessation: exploring various approaches for understanding health inequities
.
Nicotine Tob Res.
2021
;
23
(
1
):
115
123
.

18.

Bauer
GR
.
Incorporating intersectionality theory into population health research methodology: Challenges and the potential to advance health equity
.
Soc Sci Med.
2014
;
110
:
10
17
.

19.

Else-Quest
NM
,
Hyde
JS
.
Intersectionality in quantitative psychological research: I. Theoretical and epistemological issues
.
Psychol Women Q
.
2016
;
40
(
2
):
155
170
.

20.

Murphy
KP.
Machine Learning: A Probabilistic Perspective
. Illustrated edition.
Cambridge, MA
:
The MIT Press
;
2012
.

21.

Morgenstern
JD
,
Buajitti
E
,
O’Neill
M
, et al.
Predicting population health with machine learning: a scoping review
.
BMJ Open.
2020
;
10
(
10
):
e037860
.

22.

Molnar
C
.
Interpretable Machine Learning: A Guide for Making Black Box Models Explainable.
2019
. https://christophm.github.io/interpretable-ml-book/. Accessed
February 21, 2021
.

23.

Leventhal
AM
,
Strong
DR
,
Kirkpatrick
MG
, et al.
Association of electronic cigarette use with initiation of combustible tobacco product smoking in early adolescence
.
JAMA.
2015
;
314
(
7
):
700
707
.

24.

Cullen
KA
,
Gentzke
AS
,
Sawdey
MD
, et al.
e-Cigarette use among youth in the United States, 2019
.
JAMA.
2019
;
322
(
21
):
2095
2103
.

25.

Blum
R
,
McNeely
C
,
Nonnemaker
J.
Adolescent Risk and Vulnerability: Concepts and Measurement
.
Washington DC, USA
:
The National Academies Press
;
2001
. doi:10.17226/10209

26.

Cleveland
MJ
,
Feinberg
ME
,
Bontempo
DE
,
Greenberg
MT
.
The role of risk and protective factors in substance use across adolescence
.
J Adolesc Health.
2008
;
43
(
2
):
157
164
.

27.

Hicks
BM
,
Johnson
W
,
Durbin
CE
,
Blonigen
DM
,
Iacono
WG
,
McGue
M
.
Gene-environment correlation in the development of adolescent substance abuse: selection effects of child personality and mediation via contextual risk factors
.
Dev Psychopathol.
2013
;
25
(
1
):
119
132
.

28.

Fitzgerald
A
,
Mac Giollabhui
N
,
Dolphin
L
,
Whelan
R
,
Dooley
B
.
Dissociable psychosocial profiles of adolescent substance users
.
PLoS One.
2018
;
13
(
8
):
e0202498
.

29.

Shmueli
G
.
To explain or to predict?
Stat Sci
.
2010
;
25
(
3
):
289
310
.

30.

Rice
ME
,
Harris
GT
.
Comparing effect sizes in follow-up studies: ROC Area, Cohen’s d, and r
.
Law Hum Behav.
2005
;
29
(
5
):
615
620
.

31.

Breiman
L
.
Random forests
.
Machine Learning
.
2001
;
45
(
1
):
5
32
.

32.

Chawla
NV
,
Bowyer
KW
,
Hall
LO
,
Kegelmeyer
WP
.
SMOTE: synthetic minority over-sampling technique
.
J Artificial Intel Res
.
2002
;
16
:
321
357
.

33.

Breiman
L
,
Friedman
J
,
Olshen
R
,
Stone
C.
Classification and Regression Trees
.
Boca Ration, FL
:
CRC Press
;
1984
.

34.

Friedman
JH
.
Greedy function approximation: a gradient boosting machine
.
Ann Statistics
.
2001
;
29
(
5
):
1189
1232
.

35.

Williams
DR
,
Yu
Y
,
Jackson
JS
,
Anderson
NB
.
Racial differences in physical and mental health: socioeconomic status, stress, and discrimination
.
J Health Psychol.
1997
;
2
(
3
):
335
351
.

36.

Greenwell
BM
,
Boehmke
BC
,
McCarthy
AJ
.
A simple and effective model-based variable importance measure
.
arXiv preprint
.
2018
[stat.ML]:1805.04755. https://arxiv.org/pdf/1805.04755v1.pdf Accessed
June 12, 2021
.

37.

Greenwell
BM
,
Boehmke
BC
.
An introduction to the vint() function
. Published online January 11,
2020
. https://koalaverse.github.io/vip/articles/vip-interaction.html. Accessed
June 12, 2021
.

38.

van Buuren
S
,
Groothuis-Oudshoorn
K
.
Mice: multivariate imputation by chained equations in R
.
J Stat Soft.
2011
;
45
(
3
):
1
67
. doi:10.18637/jss.v045.i03

39.

Abadi
MH
,
Lipperman-Kreda
S
,
Shamblen
SR
, et al.
The impact of flavored ENDS use among adolescents on daily use occasions and number of puffs, and next day intentions and willingness to vape
.
Addict Behav.
2021
;
114
:
106773
.

40.

National Academies of Sciences, Engineering, and Medicine
.
Public Health Consequences of E-Cigarettes
.
Washington DC, USA
:
The National Academies Press
;
2018
. https://doi.org/10.17226/24952 Accessed
June 14, 2021
.

41.

Wheeler
KC
,
Fletcher
KE
,
Wellman
RJ
,
Difranza
JR
.
Screening adolescents for nicotine dependence: the Hooked On Nicotine Checklist
.
J Adolesc Health.
2004
;
35
(
3
):
225
230
.

42.

Fu
R
,
Mitsakakis
N
,
Chaiton
M
.
A machine learning approach to identify correlates of current e-cigarette use in Canada
.
Exploration of Medicine
.
2021
;
2
:
74
85
.

43.

Yong
HH
,
Borland
R
,
Cummings
KM
, et al.
Reasons for regular vaping and for its discontinuation among smokers and recent ex-smokers: findings from the 2016 ITC Four Country Smoking and Vaping Survey
.
Addiction
.
2019
;
114
Suppl 1
:
35
48
.

44.

Tran
AG
,
Lee
RM
,
Burgess
DJ
.
Perceived discrimination and substance use in Hispanic/Latino, African-born Black, and Southeast Asian immigrants
.
Cultur Divers Ethnic Minor Psychol.
2010
;
16
(
2
):
226
236
.

45.

Okamoto
J
,
Ritt-Olson
A
,
Soto
D
,
Baezconde-Garbanati
L
,
Unger
JB
.
Perceived discrimination and substance use among Latino adolescents
.
Am J Health Behav.
2009
;
33
(
6
):
718
727
.

46.

Kendzor
DE
,
Businelle
MS
,
Reitzel
LR
, et al.
Everyday discrimination is associated with nicotine dependence among African American, Latino, and White smokers
.
Nicotine Tob Res.
2014
;
16
(
6
):
633
640
.

47.

Carter
R
,
Lau
M
,
Johnson
V
,
Kirkinis
K
.
Racial discrimination and health outcomes among racial/ethnic minorities: a meta‐analytic review
.
J Multicult Couns Devel
.
2017
;
45
(
4
):
232
259
.

48.

Britt-Spells
AM
,
Slebodnik
M
,
Sands
LP
,
Rollock
D
.
Effects of perceived discrimination on depressive symptoms among black men residing in the United States: a meta-analysis
.
Am J Mens Health.
2018
;
12
(
1
):
52
63
.

49.

Fergusson
DM
,
Swain-Campbell
NR
,
Horwood
LJ
.
Deviant peer affiliations, crime and substance use: a fixed effects regression analysis
.
J Abnorm Child Psychol.
2002
;
30
(
4
):
419
430
.

50.

Hammond
D
,
Rynard
VL
,
Reid
JL
.
Changes in prevalence of vaping among youths in the United States, Canada, and England from 2017 to 2019
.
JAMA Pediatr.
2020
;
174
(
8
):
797
800
.

51.

Medicines and Healthcare Products Regulatory Agency
.
E-cigarettes: regulations for consumer products
. Published online February 26,
2016
. https://www.gov.uk/guidance/e-cigarettes-regulations-for-consumer-products. Accessed
September 15, 2021

52.

Centers for Disease Control and Prevention
.
Sales of JUUL E-Cigarettes Skyrocket, Posing Danger to Youth
.
Atlanta, GA
:
CDC
;
2018
. https://www.cdc.gov/media/releases/2018/p1002-e-Cigarettes-sales-danger-youth.html. Accessed
October 17, 2021

53.

Zhao
Q
,
Hastie
T
.
Causal interpretations of black-box models
.
J Bus Econ Stat.
2019
;
39
(
1
):
272
281
.

54.

Alexander
JP
,
Williams
P
,
Lee
YO
.
Youth who use e-cigarettes regularly: a qualitative study of behavior, attitudes, and familial norms
.
Prev Med Rep.
2019
;
13
:
93
97
.

55.

Fu
R
,
O’Connor
S
,
Diemert
L
, et al.
Real-world vaping experiences and smoking cessation among cigarette smoking adults
.
Addict Behav.
2021
;
116
:
106814
.

56.

Marcos-Pasero
H
,
Colmenarejo
G
,
Aguilar-Aguilar
E
,
Ramírez de Molina
A
,
Reglero
G
,
Loria-Kohen
V
.
Ranking of a wide multidomain set of predictor variables of children obesity by machine learning variable importance techniques
.
Sci Rep.
2021
;
11
(
1
):
1910
.

57.

Kanwal
F
,
Taylor
TJ
,
Kramer
JR
, et al.
Development, validation, and evaluation of a simple machine learning model to predict cirrhosis mortality
.
JAMA Netw Open.
2020
;
3
(
11
):
e2023780
.

58.

Holodinsky
JK
,
Yu
AYX
,
Kapral
MK
,
Austin
PC
.
Comparing regression modeling strategies for predicting hometime
.
BMC Med Res Methodol.
2021
;
21
(
1
):
138
.

59.

VanderWeele
TJ
.
On a square-root transformation of the odds ratio for a common outcome
.
Epidemiology.
2017
;
28
(
6
):
e58
e60
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.