Breeze 2.0: an interactive web-tool for visual analysis and comparison of drug response data

Abstract Functional precision medicine (fPM) offers an exciting, simplified approach to finding the right applications for existing molecules and enhancing therapeutic potential. Integrative and robust tools ensuring high accuracy and reliability of the results are critical. In response to this need, we previously developed Breeze, a drug screening data analysis pipeline, designed to facilitate quality control, dose-response curve fitting, and data visualization in a user-friendly manner. Here, we describe the latest version of Breeze (release 2.0), which implements an array of advanced data exploration capabilities, providing users with comprehensive post-analysis and interactive visualization options that are essential for minimizing false positive/negative outcomes and ensuring accurate interpretation of drug sensitivity and resistance data. The Breeze 2.0 web-tool also enables integrative analysis and cross-comparison of user-uploaded data with publicly available drug response datasets. The updated version incorporates new drug quantification metrics, supports analysis of both multi-dose and single-dose drug screening data and introduces a redesigned, intuitive user interface. With these enhancements, Breeze 2.0 is anticipated to substantially broaden its potential applications in diverse domains of fPM.


INTRODUCTION
De v elopment of new drugs and repurposing of existing ones for new indications is a critical and ongoing process with significant potential for enhancing future disease management strategies (1)(2)(3)(4)(5)(6)(7)(8)(9). With the escalating prevalence of various diseases and the growing demand for innovati v e drug de v elopment methods, high-throughput screening (HTS) has emerged as a systematic approach for identifying potential hits for drug discovery by profiling thousands of chemical compounds. ( 10 ). Howe v er, interpreting and analyzing the vast amounts of drug response data generated from the HTS experiments is a complex task, requiring specialized expertise in statistical analysis and programming. Furthermore, the attainment of accurate, interpr etable, and r eproducible r esults is essential to identify robust and reliable drug candidates ( 11 , 12 ). One of the key applications of high throughput drug screening lies in the realm of drug repurposing / repositioning. Drug repurposing entails the discovery of novel therapeutic uses for existing drugs, providing a highly effecti v e strategy for the de v elopment of drug molecules with innovati v e therapeutic indications. This process involves the examination of a panel of drugs against specific tar gets, follo wed by a systematic comparison with di v erse datasets. In response to this challenge, we de v eloped Breeze, a w e b application for interacti v e quality control, analysis and visualization of drug doseresponse data ( 13 ).
Br eeze str eamlines the analysis and visualization of drug responses generated from cell-based drug screening experiments by implementing comprehensi v e quality control (QC) pr ocedures, r obust dose-response curve-fitting, diverse response quantification metrics, and interactive visualizations. Breeze's QC process plays a crucial role in identifying and quantifying potential errors in data generated from HTS assays, which are prone to common technical issues such as spatial plate variability, striping, and edge effects. Breeze provides a comprehensi v e set of QC metrics and visualizations, enabling r esear chers to monitor and identify technical problems, ensuring the accuracy and reproducibility of the screening results. The next critical step involves dose-response curve fitting, which utilizes ma thema tical modeling to describe the relationship between drug concentrations and the observed responses, such as cell viability or toxicity. Finally, the fitted drug responses are quantified and summarized into single metrics such as half-maximal inhibitory concentration (IC50), halfmaximal effecti v e concentration (EC50), area under the curve (AUC), or drug sensitivity score (DSS) to enable comparison across different compounds and concentrations, identifying clinically relevant dose ranges and the most potent and efficient compounds for a gi v en target, patient or disease.
Howe v er, Breeze 1.0 did not allow r esear chers to integrate and compare analyzed datasets with publicly available drug response data, which is crucial for establishing reliable r efer ence baselines for response comparison, validating results, and gaining insights into the broader implications of findings ( 14 ). Furthermore, the absence of user-friendly features and automated procedures in Breeze 1.0 posed some challenges for r esear chers with limited computational expertise. To address these limitations, we have implemented the Breeze 2.0 w e b-a pplication, w hich introduces a curated database for easy data integration and comparison, novel interacti v e visualizations, ne w drug response metrics, and a redesigned, intuiti v e user interface. Breeze 2.0 supports analysis of both multi-dose and single-dose drug screening experiments and it utilizes machine learning to flag poorquality dose-response curves. We believe that the updated w e b platform will become an e v en more useful tool, allowing comprehensible and interpretable analysis of drug response data, thus expediting the identification of novel treatment options for various diseases.

Ov ervie w of the workflow
Breeze 2.0 introduces a number of novel features and improvements for interactive analysis and visualization of drug response data; these include: (i) a curated database of published drug screening da ta tha t facilita tes easy integration and cross-comparison of user provided data with the pub licly availab le datasets, including standar dized comparison with healthy controls or other r efer ence datasets; (ii) nov el interacti v e visualiza tion options for integra ti v e analysis of user-provided and published data; (iii) implementation of new response metrics for antiviral data analysis; (iv) implementation of a machine learning-based approach for automated identification of poor-quality dose-response curves and (v) analysis of both multi-dose and single-dose drug scr eening data. Additionally, Br eeze 2.0 introduces a re-designed user interface that is more intuiti v e and userfriendly. Table 1 provides a detailed comparison of the featur es between Br eeze r eleases 1.0 and 2.0. The users of the Breeze w e b-application provided valuable input, beta testing and suggestions for improvements which were implemented into Breeze 2.0.

Data processing pipeline
The Breeze 2.0 pipeline starts with processing of the raw da ta to genera te a comprehensi v e QC report, featuring pla te-specific hea tmaps, sca tterplots, control barplots, and an in-depth summary of QC statistics emphasizing control well performance. For each drug-dose data point, percent inhibition / viability is determined with r efer ence to the plate controls (Figure 1 A). Subsequently, dose-response curve fitting is carried out using four-parameter logistic modeling of percent inhibition values as a function of drug concentration (Figure 1 B, left panel).
The curve fitting quality can be visually evaluated using the dose-response curve fitting plots and by analyzing the fitting errors. To ensure improved accuracy, we also employed a machine learning-based model (see the Implementa tion section) tha t automa tically detects curve fitting errors and flags them in the summary curve fit table. Finally, based on the curve fitting parameters, various drug relati v e metrics, such as sDSS and SI index that allow for comparison between samples and controls. The barplot illustrates an example where the DSS score was used as the quantification metric ( 12 ). Additionally, Breeze offers the possibility to cross-compare user-provided data with previously reported drug responses incorporated into the Breeze database, serving as reference controls for comparison (right panel). ( C ) Next, an interactive heatmap is generated to compare drug responses across different samples (e.g. cell lines or experimental conditions), with sDSS scores shown as an example to highlight the selecti v e efficacy of the drugs, while other metrics can also be used in the heatmap. ( D ) As an alternati v e to heatmap, statistically significant differences in drug responses between two groups of samples can be identified using a volcano plot.
quantification metrics, such as IC50, EC50, DSS, AUC, are calculated and reported in the summary table (Figure  1 B, right panel). In addition, Breeze 2.0 allows calculation of relati v e metrics, such as selecti v e DSS (sDSS), selecti v e A UC (sA UC) and selecti vity inde x (SI), that enab le comparison between samples and controls, and help in the joint assessment of drug efficacy and toxicity. For example, in antiviral drug screening, SI is calculated by dividing a drug's cytotoxicity (its ability to kill cells) by its antiviral activity (its ability to inhibit viral r eplication), r esulting in a ratio that reflects the drug's selectivity for viral targets over host cells. These metrics facilitate the identification of clinically relevant dose ranges and the most potent, effective and selecti v e compounds for specific targets, patients or diseases.

Breeze 2.0 database
Breeze 2.0 introduces a curated database that allows researchers to integrate and compare their own datasets with pub licly availab le data, which is essential for establishing solid comparison baselines, validating results, and for exploring the wider implications of their findings. The drug responses from the database can serve as a reference control for comparison of user-provided drug responses in the same condition / disease, control groups, or cell lines, identifying disease-specific drug responses and uncovering potent targeted therapies. The database used in the Breeze 2.0 includes data from Malani et al . ( 15 ), which includes drug sensitivity data from 186 AML patient samples and 17 healthy controls. In addition, we incorporated the PharmacoDB, the most comprehensi v e da tabase tha t consolida tes pharmacogenomics cell line data from multiple sources such as CCLE, GDSC, NCI-60, CTRP and others ( 14 ). In the future, we aim to expand the Breeze database by incorporating additional curated and published datasets, to impro ve co verage of drug response patterns across a wider range of tissues and drug classes.

Visualizations
The results of Breeze 2.0 pipeline are visualized in the form of multiple interacti v e plots such as heatmaps (Figure 1 C), volcano plots (Figure 1 D), barplots , scatterplots , and circular trees, allowing easy investigation of the results. The details on how to obtain and interpret each visualization plot are explained in the Breeze technical documenta tion: https://breeze.fimm.fi/DSRT documenta tion/ docs.html . Within the Breeze interface, users can access the 'Curve Fitting' tab and select one or more dose-response curves from a dropdown menu. The software also allows users to incorporate r efer ence data from the database, including information on healthy controls, enabling integrati v e analysis of drug response data (see e.g. Figure 1 B). The resulting plots can be exported as PDF, PNG and HTML files, while a summary table displays drug quantification metrics for selected drugs and screens, which can be downloaded as a spreadsheet.

Implementation
The Breeze 2.0 w e b-server is pow ered by PHP and MySQL for database support. The data processing pipeline utilizes the R programming language and a variety of R packages, while the interacti v e visualizations ar e cr eated using GG-Plot, Plotly and D3.JS in JavaScript. To ensure the accuracy of curve fitting, an Adaboost machine learning classifier has been trained using Breeze's e xtensi v e in-house data set of over 10 000 expert-curated and classified 5-point doseresponse curves to flag poor-quality curve-fits in the user data. The final model employs conformal prediction with a 0.8 confidence threshold, which allows users to exclude low-confidence model pr edictions, ther eby flagging only the most confident low-quality curve fits. Currently, the model is capable to flag dose-response curves with a minimum of four doses, up to fiv e drug doses.

RESULTS AND DISCUSSION
Breeze 2.0 represents a major upgrade to the existing drug screening data analysis pipeline presented in the first version of the tool. Designed to be accessible by r esear chers with no programming skills, Breeze 2.0 requires only the raw screening data as an input (either raw responses or percentage inhibition / viability), and it automatically analyzes, quantifies and visualizes the drug response data, thereby significantly reducing the manual time r equir ed for the analysis of large-scale drug screening experiments.
By incorporating novel data exploration capabilities, users can cross-compare their data in the context of published drug response datasets, enabling the identification of sample-specific, selecti v e drug responses, and avoiding the prioritization of false positi v e hits, such as generally toxic drugs, when comparing against responses observed in healthy controls. Additionally, Breeze 2.0 enables users to better understand the dose-response relationship with other drugs targeting the same target pathways.
The Breeze 2.0 pipeline's fle xib le input format and data upload functionality makes it suitable for a wide range of readouts, including numeric data obtained from microscopic images, RNAi experiments, and other sources. This makes Breeze 2.0 a versatile tool that can be used across a variety of differ ent r esear ch applications, extending its utility beyond traditional drug de v elopment wor kflows. To the best of our knowledge, there are no similar and equally comprehensi v e w e b-platf orms f or drug response data analysis.
Mor eover, Br eeze 2.0 featur es a completely r edesigned user interface that is more intuiti v e and user-friendly. It also includes novel interactive visualization options, which wer e r equested by the users, and essential for avoiding false positi v e / negati v e findings. Moving forward, we plan to further expand Breeze by integrating existing and emerging large-scale drug screening resources, with an emphasis on healthy controls from di v erse tissues.
Extensi v e documentation of all the features is available at https://breeze.fimm.fi/DSRT documentation/docs.html .
The Breeze 2.0 database includes published datasets, with links to the original sources provided below. The Phar-macoDB datasets were integrated using the PharmacoGx 2.6.0 R / Bioconductor package ( https://bioconductor.org/ packages/r elease/bioc/html/PharmacoGx.html ), and wer e downloaded using the downloadPSet function. The Malani et al . dataset can be accessed at https://zenodo.org/record/ 7274740 .