Abstract

Background and Aims

Built environment plays an important role in the development of cardiovascular disease. Tools to evaluate the built environment using machine vision and informatic approaches have been limited. This study aimed to investigate the association between machine vision–based built environment and prevalence of cardiometabolic disease in US cities.

Methods

This cross-sectional study used features extracted from Google Street View (GSV) images to measure the built environment and link them with prevalence of coronary heart disease (CHD). Convolutional neural networks, linear mixed-effects models, and activation maps were utilized to predict health outcomes and identify feature associations with CHD at the census tract level. The study obtained 0.53 million GSV images covering 789 census tracts in seven US cities (Cleveland, OH; Fremont, CA; Kansas City, MO; Detroit, MI; Bellevue, WA; Brownsville, TX; and Denver, CO).

Results

Built environment features extracted from GSV using deep learning predicted 63% of the census tract variation in CHD prevalence. The addition of GSV features improved a model that only included census tract-level age, sex, race, income, and education or composite indices of social determinant of health. Activation maps from the features revealed a set of neighbourhood features represented by buildings and roads associated with CHD prevalence.

Conclusions

In this cross-sectional study, the prevalence of CHD was associated with built environment factors derived from GSV through deep learning analysis, independent of census tract demographics. Machine vision–enabled assessment of the built environment could potentially offer a more precise approach to identify at-risk neighbourhoods, thereby providing an efficient avenue to address and reduce cardiovascular health disparities in urban environments.

Extracted features from street view images via artificial intelligence (AI) demonstrate that 63% of the variation in coronary heart disease (CHD) prevalence can be explained by these environmental factors, highlighting the significant potential of this data in informing cardiovascular health assessments. AIC, Akaike information criterion; BIC, Bayesian information criterion; DSE, demographic and socio-economic; GSV, Google Street View; LMEM: linear mixed-effects model; LRT, likelihood ratio test.
Structured Graphical Abstract

Extracted features from street view images via artificial intelligence (AI) demonstrate that 63% of the variation in coronary heart disease (CHD) prevalence can be explained by these environmental factors, highlighting the significant potential of this data in informing cardiovascular health assessments. AIC, Akaike information criterion; BIC, Bayesian information criterion; DSE, demographic and socio-economic; GSV, Google Street View; LMEM: linear mixed-effects model; LRT, likelihood ratio test.

See the editorial comment for this article ‘Artificial intelligence-enhanced exposomics: novel insights into cardiovascular health’, by R. Khera, https://doi.org/10.1093/eurheartj/ehae159.

Translational perspective

This cross-sectional study utilizes deep learning and machine vision to examine the association between the built environment and coronary heart disease at the neighbourhood level. Street view–based built environment factors, analysed using deep learning, explain a significant portion of coronary heart disease prevalence independently of socio-demographics. These findings have implications for targeted interventions, urban planning, and public health policies. Incorporating machine vision–enabled identification of urban features associated with cardiovascular risk can guide the creation of heart-healthy cities, reducing the burden of coronary artery disease and benefiting population health.

Introduction

Coronary heart disease (CHD) accounts for over 50% of mortality from heart disease in the USA, responsible for nearly 400 000 deaths in 2020.1 Despite advances in prevention and treatment over the past decade in the USA,2 CHD remains the leading cause of death in the USA since 1950 with increasing evidence for non-conventional risk factors playing a larger than anticipated role than previously suspected.1,3

Socioenvironmental factors are amongst the leading non-traditional risk factors increasingly implicated in CHD development.4–6 These factors include social determinants such as race, income, education, and culture as well as the factors in the built environment and factors in the ambient environment such as noise, temperature, and air pollution, all of which have been shown to exert significant effects on CHD.5–9

Large-scale integrated assessment of the environment at the neighbourhood level can facilitate rapid and complete assessment of its impact on CHD. Such data are however scarce, partly because of the costly and time-consuming nature of neighbourhood audits and inconsistent measurements and standards for data collection. Machine vision approaches such as Google Street View (GSV) have become an increasingly popular approach for virtual neighbourhood audits since its launch in 2007. Google Street View image coverage has been consistently expanding in recent years achieving almost full coverage in the USA.10 Previous studies have shown GSV results are comparable with field assessments and have been used to assess the built environment features such as greenspace,11,12 buildings,13 and roads.14

Google Street View images have gained popularity as a preferred data source for large-scale studies due to the widespread availability of this data set. It represents one of the largest repositories for machine vision–enabled assessments of extensive geographical areas, facilitated by standardized data collection approaches. Deep learning approaches such as convolutional neural networks (CNN) have been widely used in many studies and applications, given their excellent performance in tasks such as image classification, object detection, and image segmentation.15 The use of such approaches to rapidly assess and extract built environment features from GSV images using deep learning can help facilitate integrated assessment and capture other aspects that may not be otherwise included. The goal of this study is to use GSV images to assess the built environment and use them to estimate CHD prevalence at the census tract level.

Methods

The University Hospitals Institutional Review Board exempted this cross-sectional study from review and the need for informed consent in accordance with 45 CFR §46, as the data were deidentified and thus not considered human participant research.

Data source for coronary heart disease

The prevalence of census tract CHD was obtained from the 2018 Centers for Disease Control and Prevention (CDC) Population Level Analysis and Community Estimates (PLACES), a project that provided chronic disease risk factors, health outcomes, and clinical preventive services. This project, a collaboration between the CDC Foundation and the Robert Wood Johnson Foundation, measures CHD prevalence using data (2015, 2016) from Behavioral Risk Factor Surveillance System (BRFSS), where people aged ≥18 are surveyed to report whether or not they have been told by a doctor, nurse, or other health professional that they had angina or CHD. We collected the CHD prevalence data for 789 census tracts in seven cities: Bellevue, WA; Brownsville, TX; Cleveland, OH; Denver, CO; Detroit, MI; Fremont, CA; and Kansas City, KS. For each city, we calculated the mean CHD prevalence and its interquartile range (IQR), showing the range within which the middle 50% of prevalence lies. The seven-city selection relies on population, disease burden, and geography, as outlined in the online supplementary material.

Google Street View data

Environment information was derived from ∼0.53 million GSV images for the seven cities (143 K for Detroit, 59 K for Kansas City, 70 K for Cleveland, 65 K for Brownsville, 38 K for Fremont, 35 K for Bellevue, and 120 K for Denver). The GSV images were downloaded via GSV static application programming interface (API) from 2020 to 2021. Google Street View API provides users with street-level panoramic imagery which captures the visual domain of pedestrians in thousands of cities worldwide. The GSV images of each census tract were downloaded in a grid pattern in the corresponding tract with an interval of 100 m. At each location where GSV images were retrieved, four images were gathered from different directions (i.e. the cardinal directions: N, E, S, and W), which composes a panoramic view of the surroundings at that location. When latitude and longitude coordinates are provided, the API searches within a 50 m radius for a photograph closest to this location. The API would not return any images if no available images could be found.

To process these images and gain environment information from them, a pre-trained deep CNN (DCNN) Places365 CNN16 was used as the feature extractor to obtain the deep features of the image. Here, the deep features are the outputs of the deep layers in the hierarchy of the network. Compared with the shallow features in the shallow layers, these deep features represent the semantic information of the GSV images. Details of how the extraction was performed can be found in Supplementary data online, Figure S1. We used Places365 CNN as the feature extractor because the images trained on Places365 CNN are more similar to that of GSV. Places365 CNN was trained on the subset of Places Database, which contains more than 10 million images consisting of 400+ unique scene categories such as towers, soccer fields, streets, swimming pools, and train station platforms. Compared with the ImageNet database, the diversity of environmental features found in the Places Database was believed to be representative of what is contained in GSV images. Through feature extraction, we obtained 4096 features representing the average built environment information for each census tract. It is noteworthy that Google emphasizes considering various factors, including weather conditions, in their image acquisition process.10 Additionally, upon examination of our data set, only a very small proportion of the images were captured under conditions where only a limited amount of snow was present, with the observation that this has minimal influence on the computer vision algorithm.

Traditional demographics, socio-economic factors, and composite indices for social determinant of health

In addition to the GSV features, we obtained traditional demographic and socio-economic (DSE) factors: age (median), sex (male %), race (Black %), income (median $), and education (<high school %) from the 2018 American Community Survey 5-year estimates.17 We also collected the established composite indices for social determinants of health (SDoH). Specifically, we considered three widely used SDoH indices: 2018 Social Deprivation Index (SDI),18 2015 Area Deprivation Index (ADI),19 and 2018 Social Vulnerability Index (SVI).20 Social Deprivation Index, provided by Robert Graham Center, is a composite measure capturing seven demographic characteristics gathered from the American Community Survey to assess the overall socio-economic disparities at the area level. In contrast, ADI focuses on either the state or national level, ranking socio-economic (e.g. income, education, employment, and housing quality) disadvantage in those neighbourhoods. Additionally, CDC’s SVI provides insights into the vulnerability of communities by considering socio-economic status, housing, transportation, and other factors. These composite indices collectively contribute to a nuanced understanding of the SDoH in a neighbourhood.

Statistical analysis

Machine learning model using raw convolutional neural network–extracted features

We utilized three machine learning (ML) models to explore the association between the raw (4096) CNN-extracted features of GSV images and the tract-level CHD prevalence. The models for this analysis included extratrees regressor (ET),21 random forest regressor (RF),22 and light gradient boosted machine regressor (LGBM).23 All models were estimated using a 10-fold cross-validation technique for a more robust result. For 10-fold cross-validation, the data set is split into 10 equal-sized subsets, and the model is trained on 9 subsets and tested on the remaining 1 subset. This process is repeated 10 times until all 10 subsets were used once as the testing set. R2 values were reported as the model quality, quantifying the extent to which the CNN-extracted features of GSV images fit tract-level CHD prevalence. The performance of each model was also evaluated using the mean absolute error (MAE) and root mean squared error (RMSE).

Multilevel modelling with traditional demographics, socio-economic factors, and composite indices for social determinant of health

We analysed the effects of common DSE factors as well as CNN-extracted features of GSV images associated with the CHD. Details of this analysis are provided in the online supplementary material. Briefly, we followed an entirely supervised two-step modelling strategy. In the first step, a multivariate sparse partial least squares (SPLS) regression24 was applied to the CNN-extracted features to reduce the dimensionality issue and the effect of noise on the error rates of our inferences. In the second step, we compare the effects of CNN-extracted features with recognized factors by building multilevel regression models allowing to account for both fixed and random effects. Specifically, linear mixed-effect regression models were fitted with the first few selected SPLS components augmented with DSE factors as well as other composite indices for SDoH, all treated here as fixed effects, and where city was treated as a random effect. The models were all adjusted for fixed effects including age, sex, race, income, and education. We then compared the regression estimates and goodness-of-fit measures between the reduced models and the combined linear mixed-effects models (LMEMs), where each set of independent variables enter simultaneously or individually. Two sets of three models were compared in this analysis: (i) a model containing both DSE factors and SPLS components (DSE + GSV model), (ii) a model with DSE factors alone (DSE model), and (iii) a model with the SPLS components alone (GSV model). Similarly, we compared three other models: (i) a model that incorporated the three SDoH indices alongside SPLS components (SDoH + GSV model), (ii) a model comprising only the three SDoH indices (SDoH model); and (iii) a model solely featuring the SPLS components (GSV model). In all comparisons, model performance was assessed using goodness-of-fit measures such as likelihood ratios tests (as it applied), Akaike information criterion (AIC) and Bayesian information criterion (BIC) criteria, as well as marginal and conditional R2 values based on Nakagawa’s R2 for mixed models.25 For each model, we calculated the coefficient estimate of each variable with its standard error and confidence intervals (CI), the intraclass (within city) correlation coefficient (ICC) for the city random effects, the within-city random-effect residual variance (σ2), and the between-city random-effect residual variance (τ00City) to assess the reliability or consistency of the measurements. All P-values less than the α = 0.05 level were considered significant.

Features visualization using Grad-CAM

To understand the deep features of the GSV images that are associated with neighbourhood CHD prevalence, we identified the most influential GSV features contributing to the SPLS components. These top features were discerned by examining the magnitudes and signs of their coefficients in the SPLS regression model, thus helping to understanding how each feature is associated with CHD prevalence. Subsequently, we employed the Grad-CAM technique26 to create the saliency map to highlight these prominent features in the original GSV images. This process provides certain explanations of what environmental features the CNN thinks to be associated with neighbourhood CHD prevalence.

In our study, we employed Python (version 3.8.10) for deep learning and ML tasks, utilizing packages such as PyTorch (version 1.8.2), fastai (version 2.6.3), and scikit-learn (version 1.0.2). For multilevel modelling, we used R (version 4.2.1), spls (version 2.2–3), and library lmer (version 1.1–35.1).

Results

Seven cities revealed varying levels of CHD prevalence at the census tract level. In Bellevue, the median CHD prevalence (%) was 4.70 (IQR: 3.75–5.23), whereas in Brownsville, it was 7.70 (IQR: 6.48–8.63). Cleveland exhibited a median CHD prevalence of 8.70 (IQR: 7.35–10.00). Denver had a median CHD prevalence of 4.30 (IQR: 3.45–5.20), while Detroit’s median CHD prevalence was 8.55 (IQR: 7.40–9.80). Fremont displayed a median CHD prevalence of 3.70 (IQR: 3.35–4.10), and Kansas City reported a median CHD prevalence of 7.20 (IQR: 6.10–8.30).

Machine learning model results with raw convolutional neural network features

The 4096 CNN-extracted features from GSV images were able to explain more than 63% of the variance (R2 = 0.634) on the tract-level CHD prevalence in seven cities (Figure 1). The three ML models had the similar performance, with the ET achieving the best result among all models with the lowest average MAE of 1.11 and RMSE of 1.58 (see Supplementary data online, Table S1). The actual estimate from CDC’s CHD prevalence and the model-predicted CHD prevalence were mapped for all census tracts in seven cities (Figure 2). There was a good agreement between the actual estimates and predicted CHD prevalence across all census tracts in seven cities. We found a small number of extreme values that were underestimated by the models in certain census tracts of Detroit and Cleveland. The CHD prevalence of these underestimated census tracts was often more than 12%. When examining the CNN-extracted features using t-SNE, we noticed clustering of census tracts with similar values of CHD prevalence (see Supplementary data online, Figure S2).

Actual estimates (observed) and predicted CHD prevalence. A total of 789 census tracts in seven cities were analysed. Predicted CHD prevalence was from LGBM model trained using CNN-extracted features. The black dotted line represents the y = x line. Values are in percentage. CHD, coronary heart disease; CNN, convolutional neural network; LGBM, light gradient boosted machine
Figure 1

Actual estimates (observed) and predicted CHD prevalence. A total of 789 census tracts in seven cities were analysed. Predicted CHD prevalence was from LGBM model trained using CNN-extracted features. The black dotted line represents the y = x line. Values are in percentage. CHD, coronary heart disease; CNN, convolutional neural network; LGBM, light gradient boosted machine

Maps of actual estimates (left) and predicted (right) CHD prevalence. The predicted CHD prevalence is obtained by averaging the results from 100 random trials based on k-fold cross-validation (with k = 10). Values are in percentage. Maps are not on the same scale. CHD, coronary heart disease
Figure 2

Maps of actual estimates (left) and predicted (right) CHD prevalence. The predicted CHD prevalence is obtained by averaging the results from 100 random trials based on k-fold cross-validation (with k = 10). Values are in percentage. Maps are not on the same scale. CHD, coronary heart disease

Comparison of Google Street View–sparse partial least square components with demographics and socio-economic factors

With SPLS, an optimal model was obtained with h = 6 SPLS components (η = 0.6), yielding a model with 883 CNN-extracted features that explains RXY2 > 64.8% variance of CHD prevalence in the census tracts (see Supplementary data online, Figure S3). We found that the combined model (DSE + GSV) demonstrated a better goodness of fit, with statistically significant higher log-likelihood, lower AIC/BIC, and higher R2 (conditional R2 = 0.792) when compared with GSV and notably DSE model alone (Table 1). The DSE model had lower AIC and BIC values, with a significant LRT, and higher R2 when compared with the GSV model (Table 1). Supplementary data online, Table S2, presents the regression estimates and analysis of variance (ANOVA) results. Notably, four out of six GSV–SPLS components demonstrated statistical significance, while all DSE variables are statistically significant.

Table 1

Model performance and comparison of linear mixed-effects models for coronary heart disease prevalence: Google Street View and demographic and socio-economic variables

LMEMAICBICR2Log. lik.TestLRTP-value
Marg.Cond.
DSE + GSV632.0697.00.7600.792−308.0
DSE706.0743.00.6450.738−368.0DSE + GSV vs. DSE120<.001
GSV + DSE632.0697.00.7600.792−308.0
GSV988.01030.00.6080.645−468.0DSE + GSV vs. GSV320<.001
DSE706.0743.00.6450.738−368.0
GSV988.01030.00.6080.645−468.0DSE vs. GSV201<.001
LMEMAICBICR2Log. lik.TestLRTP-value
Marg.Cond.
DSE + GSV632.0697.00.7600.792−308.0
DSE706.0743.00.6450.738−368.0DSE + GSV vs. DSE120<.001
GSV + DSE632.0697.00.7600.792−308.0
GSV988.01030.00.6080.645−468.0DSE + GSV vs. GSV320<.001
DSE706.0743.00.6450.738−368.0
GSV988.01030.00.6080.645−468.0DSE vs. GSV201<.001

Models: GSV, the LMEM with only the selected SPLS components (CHD: h = 6 obtained from the 4096 CNN features); DSE, the LMEM with only the demographics and socio-economic variables (city + sex + age + race + income + education); and GSV + DSE, the LMEM with both sets of independent variables from GSV and DSE.

LMEM, linear mixed-effects model; GSV, Google Street View; AIC, Akaike information criterion; BIC, Bayesian information criterion; LRT, likelihood ratio test.

Table 1

Model performance and comparison of linear mixed-effects models for coronary heart disease prevalence: Google Street View and demographic and socio-economic variables

LMEMAICBICR2Log. lik.TestLRTP-value
Marg.Cond.
DSE + GSV632.0697.00.7600.792−308.0
DSE706.0743.00.6450.738−368.0DSE + GSV vs. DSE120<.001
GSV + DSE632.0697.00.7600.792−308.0
GSV988.01030.00.6080.645−468.0DSE + GSV vs. GSV320<.001
DSE706.0743.00.6450.738−368.0
GSV988.01030.00.6080.645−468.0DSE vs. GSV201<.001
LMEMAICBICR2Log. lik.TestLRTP-value
Marg.Cond.
DSE + GSV632.0697.00.7600.792−308.0
DSE706.0743.00.6450.738−368.0DSE + GSV vs. DSE120<.001
GSV + DSE632.0697.00.7600.792−308.0
GSV988.01030.00.6080.645−468.0DSE + GSV vs. GSV320<.001
DSE706.0743.00.6450.738−368.0
GSV988.01030.00.6080.645−468.0DSE vs. GSV201<.001

Models: GSV, the LMEM with only the selected SPLS components (CHD: h = 6 obtained from the 4096 CNN features); DSE, the LMEM with only the demographics and socio-economic variables (city + sex + age + race + income + education); and GSV + DSE, the LMEM with both sets of independent variables from GSV and DSE.

LMEM, linear mixed-effects model; GSV, Google Street View; AIC, Akaike information criterion; BIC, Bayesian information criterion; LRT, likelihood ratio test.

Comparison of Google Street View–sparse partial least square components with social determinant of health indices

As expected, the LMEM that incorporates both GSV–SPLS components and SDoH indices, referred to as SDoH + GSV, achieved the best goodness of fit and overall performance. It exhibited the lowest AIC/BIC and the highest log-likelihood and marginal/conditional R2 (conditional R2 = 0.785), compared with GSV and notably SDoH model alone (Table 2). Remarkably, the SDoH model performance was comparable with the GSV model with non-significant LRT (P = .60), while AIC/BIC criteria and R2 remained not significantly different. All three SDoH indices were statistically significant in the models, and the three GSV–SPLS components were statistically significant when adjusting for either DSE variables (see Supplementary data online, Table S2) or SDoH indices (see Supplementary data online, Table S3).

Table 2

Model performance and comparison of linear mixed-effects models for coronary heart disease prevalence: Google Street View and social determinants of health indices

LMEMAICBICR2Log. lik.TestLRTP-value
Marg.Cond.
SDoH + GSV796.0852.00.7390.785−379.0
SDoH924.0952.00.6240.680−467.0SDoH + GSV vs. SDoH279<.001
GSV + SDoH796.0852.00.7390.785−379.0
GSV988.01030.00.6080.645−468.0SDoH + GSV vs. GSV178<.001
SDoH924.0952.00.6240.680−467.0
GSV988.01030.00.6080.645−468.0SDoH vs. GSV1.89.60
LMEMAICBICR2Log. lik.TestLRTP-value
Marg.Cond.
SDoH + GSV796.0852.00.7390.785−379.0
SDoH924.0952.00.6240.680−467.0SDoH + GSV vs. SDoH279<.001
GSV + SDoH796.0852.00.7390.785−379.0
GSV988.01030.00.6080.645−468.0SDoH + GSV vs. GSV178<.001
SDoH924.0952.00.6240.680−467.0
GSV988.01030.00.6080.645−468.0SDoH vs. GSV1.89.60

Models: GSV, the LMEM with only the selected SPLS components (CHD: h = 6 obtained from the full CNN features); SDoH, the LMEM with only the SDoH indices (SDI + SVI + ADI); and GSV + SDoH, the LMEM with both sets of independent variables from GSV and SDoH.

LMEM, linear mixed-effects model; GSV, Google Street View; AIC, Akaike information criterion; BIC, Bayesian information criterion; LRT, likelihood ratio test; SDOH, social determinants of health; SDI, Social Deprivation Index; SVI, Social Vulnerability Index; ADI, Area Deprivation Index.

Table 2

Model performance and comparison of linear mixed-effects models for coronary heart disease prevalence: Google Street View and social determinants of health indices

LMEMAICBICR2Log. lik.TestLRTP-value
Marg.Cond.
SDoH + GSV796.0852.00.7390.785−379.0
SDoH924.0952.00.6240.680−467.0SDoH + GSV vs. SDoH279<.001
GSV + SDoH796.0852.00.7390.785−379.0
GSV988.01030.00.6080.645−468.0SDoH + GSV vs. GSV178<.001
SDoH924.0952.00.6240.680−467.0
GSV988.01030.00.6080.645−468.0SDoH vs. GSV1.89.60
LMEMAICBICR2Log. lik.TestLRTP-value
Marg.Cond.
SDoH + GSV796.0852.00.7390.785−379.0
SDoH924.0952.00.6240.680−467.0SDoH + GSV vs. SDoH279<.001
GSV + SDoH796.0852.00.7390.785−379.0
GSV988.01030.00.6080.645−468.0SDoH + GSV vs. GSV178<.001
SDoH924.0952.00.6240.680−467.0
GSV988.01030.00.6080.645−468.0SDoH vs. GSV1.89.60

Models: GSV, the LMEM with only the selected SPLS components (CHD: h = 6 obtained from the full CNN features); SDoH, the LMEM with only the SDoH indices (SDI + SVI + ADI); and GSV + SDoH, the LMEM with both sets of independent variables from GSV and SDoH.

LMEM, linear mixed-effects model; GSV, Google Street View; AIC, Akaike information criterion; BIC, Bayesian information criterion; LRT, likelihood ratio test; SDOH, social determinants of health; SDI, Social Deprivation Index; SVI, Social Vulnerability Index; ADI, Area Deprivation Index.

Visualization of the most influential convolutional neural network–Google Street View features

Grad-CAM was utilized to visualize top CNN-extracted features identified from SPLS regression. The saliency maps generated by the Grad-CAM suggested that feature #2017, which seemed to highlight deteriorated buildings (suggesting neighbourhood blight), had a positive association with CHD prevalence (Figure 3A). Interestingly, other images for feature #2017 seemed to also highlight wooden utility poles (see Supplementary data online, Figure S4). Another feature (feature #458) that was positively associated with CHD was found to be highlighting road cracks as shown in Figure 3B. In contrast, feature #2873 in Figure 3C had a negative association with CHD prevalence, and its heatmap highlighted trees along the road. Feature #237, seeming to focus on well-built houses, also had a negative association with CHD prevalence (Figure 3D). More examples of Grad-CAM on the CNN-extracted GSV features are provided in the Supplementary data online, Figures S4–S8, with ‘noises’.

Feature interpretations using Grad-CAMs. (A and B) Two pairs of GSV images (left) and their activation maps (right) for the features associated with higher CHD prevalence. (C and D) Two pairs of GSV images (left) and their activation maps (right) for the features associated with lower CHD prevalence. CHD, coronary heart disease; Grad-CAMs, gradient-weighted class activation mapping; GSV, Google Street View
Figure 3

Feature interpretations using Grad-CAMs. (A and B) Two pairs of GSV images (left) and their activation maps (right) for the features associated with higher CHD prevalence. (C and D) Two pairs of GSV images (left) and their activation maps (right) for the features associated with lower CHD prevalence. CHD, coronary heart disease; Grad-CAMs, gradient-weighted class activation mapping; GSV, Google Street View

Discussion

While many epidemiological studies have examined associations between cardiovascular disease and individual built environmental features (e.g. greenspace, urban architecture, street connectivity, and food availability), our approach focused on machine vision–derived physical environment, relying on CNN and its related techniques to extract features.

Our results showed a strong model fit (R2 = 0.634), indicating that the raw CNN-extracted features from GSV images effectively predict CHD prevalence at the census tract level in seven cities (Structured Graphical Abstract). This indicated that the CNN-extracted features could capture neighbourhood features related to cardiovascular health. The predicted CHD prevalence using CNN-extracted features tended to be underestimated in certain areas compared with observed CHD prevalence especially in Detroit and Cleveland. This may be caused by the limited number of samples for these extreme values in the data set; the model may struggle to accurately distinguish and predict such rare occurrences. This may also suggest that certain CHD-related factors may either not be embedded in these environments at these locations or that perhaps features not captured by street view images, such as demographic factors, ambient factors, and other demographic and traditional variables, may play a much larger role in these environments.

Previous studies have used CNN to detect pre-defined built environmental features from GSV and found that dilapidated buildings and visible wires were associated with an increased risk of cardiovascular diseases.27,28 Our approach employed a data-driven approach, taking the advantage of the knowledge that fully connected layers in the CNN contain condensed information of the input imagery that can be extracted and utilized for a variety of purposes. We utilized a pre-trained DCNN Places365 CNN,16 so that the deep features from the CNN may be more representative of the built environment. One advantage of this approach is that pre-defined relevant features in the built environment are not required. The 4096-dimensional deep features embed all essential information of the built environment in the imagery so that we could retain relevant features as much as reasonably possible. Conversely, the disadvantage of using deep features from a pre-trained CNN is that it becomes difficult to identify corresponding physical features that impact CHD at the neighbourhood level. To alleviate this issue and provide certain interpretations of the deep features, we utilized Grad-CAM techniques to visualize the CHD-related features with a saliency map.

The results of multilevel modelling using demographics and socio-economic factors indicate that DSE variables were still better predictors of CHD prevalence than GSV features. One explanation is obviously the fact that physical environmental feature even if they represent a ‘meta’ framework for other mediators may not be sufficient to convey the risk conveyed by other factors which may be sparsely represented. However, by incorporating GSV features into the model with regular DSE variables, one could help improve the prediction of CHD prevalence at the neighbourhood level (Table 1). Further comparisons between GSV features and existing SDoH indices reveal that GSV features can be on par with these established SDoH indices. Given the fact that these existing composite indices encompass a wild variety of social economic and environmental factors, our single-source GSV features offer a valuable and efficient perspective on the built environment’s potential impact on health outcomes. Our results further suggest that GSV features indeed may be helpful in highlighting specific built environment information related to CHD prevalence at the neighbourhood level as illustrated by Grad-CAM methods, which provided a potential way of identifying built environment information.

Grad-CAM highlighted several potential built environment features that are either associated with higher or lower CHD at the neighbourhood level. Deteriorated houses and roads are a feature of urban blight associated with higher CHD. This feature may in turn embody other features in the neighbourhood that drive cardiovascular risk, including lack of space for physical activity,7,29 limited access to nutritionally balanced food,30 and lack of access to health care.31 Street greenery on the other hand was highlighted as associated with lower CHD prevalence. This agrees with previous studies that showed a robust association between green space and decreased cardiovascular risks.32,33 It should be noted that some Grad-CAM results on a feature revealed different representations of built environment features. For instance, the feature that highlighted deteriorated houses also highlighted wooden utility poles in the images (see Supplementary data online, Figure S4). Other features may show combined physical features, an exemplified in Supplementary data online, Figure S8, where it seemed to show an amalgamation of tree canopy and nearby sky. This analysis identifies potential environmental features correlated with CHD, but it is crucial to note that these correlations do not establish causality. It is possible that underlying factors such as socio-economic status, which might influence living conditions and health behaviours, play a significant role in these observed correlations.

Implications

Our study carries significant implications for the field of health research clinical practice. Firstly, we have pioneered the utilization of street view features in assessing cardiovascular risk, marking a novel approach that introduces new dimensions to our comprehension of the impact of the built environment on health. Furthermore, our research, while primarily conducted at the census tract level, holds the potential for analysis at an even more granular level, down to the level of individual patients. This extended analysis could yield highly precise and personalized insights into the intricate interplay between the built environment and individual health, thereby enabling tailored interventions and healthcare strategies. As we look ahead, future research in this field may explore advanced methods such as semantic segmentation to extract specific built environment features, including elements like green space, blue space, and sidewalks. This approach promises a more intricate understanding of how the environment influences cardiovascular health, offering an avenue for even more finely targeted interventions in public health.

Our findings also underscore the utility of such data in broadening our understanding of environmental influences on health. The associations identified between street view features and cardiovascular risk not only open avenues for generating new hypotheses but also serve as a valuable reference for shaping public health policies. These insights, while emphasizing correlations, guide us in identifying potential areas for intervention and in designing studies aimed at exploring causal relationships. This approach, therefore, contributes to a more informed and targeted strategy in public health planning, with a focus on mitigating cardiovascular risks associated with specific environmental factors.

Study limitations

There are multiple limitations of this study that should be noted. Firstly, the GSV images used in the study are only available along major streets and roads, and there are some populations who do not live in such neighbourhoods. However, given the fact that most population live around the urban neighbourhood where GSV are abundant, we believe this would not significantly affect the results for majority of census tracts. Further, although Places365 database contains 400+ unique scene categories, it may not include all features that can be found in the built environment. Small objects such as trash, other environmental pollutants, and physical domains that may translate into better urban quality of life may be difficult for computer vision techniques like CNN to detect in a GSV image.34 It's important to note that interpreting Grad-CAM maps can involve a degree of speculation, and the visualizations may not always provide a definitive understanding of the exact features being detected by the model. Additionally, the census tracts with CHD prevalence data are from seven representative US cities of CDC PLACES data set and may not generalize to all census tracts in the USA, especially rural areas.35 Future work is needed to examine the disparities of urban and rural areas and its cardiovascular-related built environment features.

Conclusion

Built environment impacts cardiovascular health outcome. In this study, we used GSV and a scene-pre-trained CNN to assess the built environment. We found CNN-extracted features explain significant portion of CHD prevalence at the census tract level. Compared with traditional DSE factors or composite indices for SDoH, GSV provides unique information that may relate to CHD such as buildings, greenspace, and roads as suggested by the activation maps from Grad-CAM technique. The outcomes of our study provide proof of concept for machine vision–enabled identification of urban network features associated with risk that in principle may enable rapid identification and targeting interventions in at-risk neighbourhoods to reduce cardiovascular burden.

Supplementary data

Supplementary data are available at European Heart Journal online.

Declarations

Disclosure of Interest

All authors declare no disclosure of interest for this contribution.

Data Availability

The data used in this analysis are publicly available. The analytic code can be made available upon request.

Funding

This work was funded by the National Institute on Minority Health and Health Disparities Award nos. P50MD017351 and 1R35ES031702-01 awarded to S.R.

Ethical Approval

Ethical approval was not required.

Pre-registered Clinical Trial Number

None supplied.

References

1

Tsao
CW
,
Aday
AW
,
Almarzooq
ZI
,
Alonso
A
,
Beaton
AZ
,
Bittencourt
MS
, et al.
Heart disease and stroke statistics—2022 update: a report from the American Heart Association
.
Circulation
2022
;
145
:
e153
639
. https://doi.org/10.1161/CIR.0000000000001052

2

Health, United States, Annual Perspective, 2020–2021 [Internet]. National Center for Health Statistics (U.S.); 2022 [cited 2023 Feb 2]. Available from: https://stacks.cdc.gov/view/cdc/122044.

3

Heron
M
,
Anderson
RN
.
Changes in the leading cause of death: recent patterns in heart disease and cancer mortality
.
NCHS Data Brief
2016
;
254
:
1
8
. https://pubmed.ncbi.nlm.nih.gov/27598767/

4

Havranek
EP
,
Mujahid
MS
,
Barr
DA
,
Blair
IV
,
Cohen
MS
,
Cruz-Flores
S
, et al.
Social determinants of risk and outcomes for cardiovascular disease
.
Circulation
2015
;
132
:
873
98
. https://doi.org/10.1161/CIR.0000000000000228

5

Al-Kindi
SG
,
Brook
RD
,
Biswal
S
,
Rajagopalan
S
.
Environmental determinants of cardiovascular disease: lessons learned from air pollution
.
Nat Rev Cardiol
2020
;
17
:
656
72
. https://doi.org/10.1038/s41569-020-0371-2

6

Bhatnagar
A
.
Environmental determinants of cardiovascular disease
.
Circ Res
2017
;
121
:
162
80
. https://doi.org/10.1161/CIRCRESAHA.117.306458

7

Sallis
JF
,
Floyd
MF
,
Rodríguez
DA
,
Saelens
BE
.
Role of built environments in physical activity, obesity, and cardiovascular disease
.
Circulation
2012
;
125
:
729
37
. https://doi.org/10.1161/CIRCULATIONAHA.110.969022

8

Liu
J
,
Varghese
BM
,
Hansen
A
,
Zhang
Y
,
Driscoll
T
,
Morgan
G
, et al.
Heat exposure and cardiovascular health outcomes: a systematic review and meta-analysis
.
Lancet Planet Health
2022
;
6
:
e484
95
. https://doi.org/10.1016/S2542-5196(22)00117-6

9

Rajagopalan
S
,
Landrigan
PJ
.
Pollution and the heart
.
N Engl J Med
2021
;
385
:
1881
92
. https://doi.org/10.1056/NEJMra2030281

10

Google
. Google Maps Street View. [cited 2023 Feb 2]. How Street View works and where we will collect images next. Available from: https://www.google.com/streetview/how-it-works/.

11

Seiferling
I
,
Naik
N
,
Ratti
C
,
Proulx
R
.
Green streets − quantifying and mapping urban trees with street-level imagery and computer vision
.
Landsc Urban Plan
2017
;
165
:
93
101
. https://doi.org/10.1016/j.landurbplan.2017.05.010

12

Lu
Y
.
Using Google Street View to investigate the association between street greenery and physical activity
.
Landsc Urban Plan
2019
;
191
:
103435
. https://doi.org/10.1016/j.landurbplan.2018.08.029

13

Kang
J
,
Körner
M
,
Wang
Y
,
Taubenböck
H
,
Zhu
XX
.
Building instance classification using street view images
.
ISPRS J Photogramm Remote Sens
2018
;
145
:
44
59
. https://doi.org/10.1016/j.isprsjprs.2018.02.006

14

Nagata
S
,
Nakaya
T
,
Hanibuchi
T
,
Amagasa
S
,
Kikuchi
H
,
Inoue
S
.
Objective scoring of streetscape walkability related to leisure walking: statistical modeling approach with semantic segmentation of Google Street View images
.
Health Place
2020
;
66
:
102428
. https://doi.org/10.1016/j.healthplace.2020.102428

15

LeCun
Y
,
Bengio
Y
,
Hinton
G
.
Deep learning
.
Nature
2015
;
521
:
436
44
. https://doi.org/10.1038/nature14539

16

Zhou
B
,
Lapedriza
A
,
Khosla
A
,
Oliva
A
,
Torralba
A
.
Places: a 10 million image database for scene recognition
.
IEEE Trans Pattern Anal Mach Intell
2018
;
40
:
1452
64
. https://doi.org/10.1109/TPAMI.2017.2723009

17

Bureau UC
. Census.gov. [cited 2023 Jun 29]. American Community Survey 5-year data (2009–2021). Available from: https://www.census.gov/data/developers/data-sets/acs-5year.html.

18

Robert Graham Center—Policy Studies in Family Medicine & Primary Care
. Social Deprivation Index (SDI) [Internet]. 2018 [cited 2023 Jun 30]. Available from: https://www.graham-center.org/content/brand/rgc/maps-data-tools/social-deprivation-index.html.

19

Kind
AJH
,
Buckingham
WR
.
Making neighborhood-disadvantage metrics accessible—the neighborhood atlas
.
N Engl J Med
2018
;
378
:
2456
8
. https://doi.org/10.1056/NEJMp1802313

20

Flanagan
BE
,
Gregory
EW
,
Hallisey
EJ
,
Heitgerd
JL
,
Lewis
B
.
A social vulnerability index for disaster management
.
J Homel Secur Emerg Manag
2011
;
8
:0000102202154773551792. https://doi.org/10.2202/1547-7355.1792

21

Geurts
P
,
Ernst
D
,
Wehenkel
L
.
Extremely randomized trees
.
Mach Learn
2006
;
63
:
3
42
. https://doi.org/10.1007/s10994-006-6226-1

22

Breiman
L
.
Random forests
.
Mach learn
2001
;
45
:
5
32
. https://doi.org/10.1023/A:1010933404324

23

Ke
G
,
Meng
Q
,
Finley
T
,
Wang
T
,
Chen
W
,
Ma
W
, et al.
Lightgbm: a highly efficient gradient boosting decision tree
.
Adv Neural Inf Process Syst
2017
;
30
:
3149
3157
. https://papers.nips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html

24

Chun
H
,
Keleş
S
.
Sparse partial least squares regression for simultaneous dimension reduction and variable selection
.
J R Stat Soc Series B Stat Methodol
2010
;
72
:
3
25
. https://doi.org/10.1111/j.1467-9868.2009.00723.x

25

Nakagawa
S
,
Johnson
PCD
,
Schielzeth
H
.
The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded
.
J R Soc Interface
2017
;
14
:
20170213
. https://doi.org/10.1098/rsif.2017.0213

26

Selvaraju
RR
,
Cogswell
M
,
Das
A
,
Vedantam
R
,
Parikh
D
,
Batra
D
.
Grad-CAM: visual explanations from deep networks via gradient-based localization
. In:
Proceedings of the IEEE International Conference on Computer Vision, 2017
, p.
618
26
.

27

Phan
L
,
Yu
W
,
Keralis
JM
,
Mukhija
K
,
Dwivedi
P
,
Brunisholz
KD
, et al.
Google Street View derived built environment indicators and associations with state-level obesity, physical activity, and chronic disease mortality in the United States
.
Int J Environ Res Public Health
2020
;
17
:
3659
. https://doi.org/10.3390/ijerph17103659

28

Nguyen
TT
,
Nguyen
QC
,
Rubinsky
AD
,
Tasdizen
T
,
Deligani
AHN
,
Dwivedi
P
, et al.
Google Street View-derived neighborhood characteristics in California associated with coronary heart disease, hypertension, diabetes
.
Int J Environ Res Public Health
2021
;
18
:
10428
. https://doi.org/10.3390/ijerph181910428

29

Chandrabose
M
,
Rachele
JN
,
Gunn
L
,
Kavanagh
A
,
Owen
N
,
Turrell
G
, et al.
Built environment and cardio-metabolic health: systematic review and meta-analysis of longitudinal studies
.
Obes Rev
2019
;
20
:
41
54
. https://doi.org/10.1111/obr.12759

30

Gondi
KT
,
Larson
J
,
Sifuentes
A
,
Alexander
NB
,
Konerman
MC
,
Thomas
KS
, et al.
Health of the food environment is associated with heart failure mortality in the United States
.
Circ Heart Fail
2022
;
15
:
e009651
. https://doi.org/10.1161/CIRCHEARTFAILURE.122.009651

31

White-Williams
C
,
Rossi
LP
,
Bittner
VA
,
Driscoll
A
,
Durant
RW
,
Granger
BB
, et al.
Addressing social determinants of health in the care of patients with heart failure: a scientific statement from the American Heart Association
.
Circulation
2020
;
141
:
e841
63
. https://doi.org/10.1161/CIR.0000000000000767

32

Pereira
G
,
Foster
S
,
Martin
K
,
Christian
H
,
Boruff
BJ
,
Knuiman
M
, et al.
The association between neighborhood greenness and cardiovascular disease: an observational study
.
BMC Public Health
2012
;
12
:
466
. https://doi.org/10.1186/1471-2458-12-466

33

Mitchell
R
,
Popham
F
.
Effect of exposure to natural environment on health inequalities: an observational population study
.
Lancet
2008
;
372
:
1655
60
. https://doi.org/10.1016/S0140-6736(08)61689-X

34

Ross
CE
,
Mirowsky
J
.
Disorder and decay: the concept and measurement of perceived neighborhood disorder
.
Urban Aff Rev
1999
;
34
:
412
32
. https://doi.org/10.1177/107808749903400304

35

Loccoh
EC
,
Joynt
MKE
,
Wang
Y
,
Kazi
DS
,
Yeh
RW
,
Wadhera
RK
.
Rural-urban disparities in outcomes of myocardial infarction, heart failure, and stroke in the United States
.
J Am Coll Cardiol
2022
;
79
:
267
79
. https://doi.org/10.1016/j.jacc.2021.10.045

Author notes

Zhuo Chen and Jean-Eudes Dazard contributed equally to the study.

Sadeer Al-Kindi and Sanjay Rajagopalan Co-senior authors.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)

Supplementary data