Blinded, Multicenter Evaluation of Drug-induced Changes in Contractility Using Human-induced Pluripotent Stem Cell-derived Cardiomyocytes

Abstract Animal models are 78% accurate in determining whether drugs will alter contractility of the human heart. To evaluate the suitability of human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) for predictive safety pharmacology, we quantified changes in contractility, voltage, and/or Ca2+ handling in 2D monolayers or 3D engineered heart tissues (EHTs). Protocols were unified via a drug training set, allowing subsequent blinded multicenter evaluation of drugs with known positive, negative, or neutral inotropic effects. Accuracy ranged from 44% to 85% across the platform-cell configurations, indicating the need to refine test conditions. This was achieved by adopting approaches to reduce signal-to-noise ratio, reduce spontaneous beat rate to ≤ 1 Hz or enable chronic testing, improving accuracy to 85% for monolayers and 93% for EHTs. Contraction amplitude was a good predictor of negative inotropes across all the platform-cell configurations and of positive inotropes in the 3D EHTs. Although contraction- and relaxation-time provided confirmatory readouts forpositive inotropes in 3D EHTs, these parameters typically served as the primary source of predictivity in 2D. The reliance of these “secondary” parameters to inotropy in the 2D systems was not automatically intuitive and may be a quirk of hiPSC-CMs, hence require adaptations in interpreting the data from this model system. Of the platform-cell configurations, responses in EHTs aligned most closely to the free therapeutic plasma concentration. This study adds to the notion that hiPSC-CMs could add value to drug safety evaluation.

On average, 37 new drugs are launched to market each year but cost of development has increased from approximately $14M per drug in the 1960s to approximately $1.5Bn now, inflation adjusted (Catapult, 2018;IFPMA, 2017). Attrition rates remain high, with only approximately 2% of the drugs entering phase 1 clinical trials actually progressing to use in patients. A key concern is cardiovascular toxicity, where acute, chronic, and comorbidity effects account for 17% of the 462 drugs withdrawn from market (Onakpoya et al., 2016) and for 41% of the top 200 prescribed drugs being labeled with adverse drug reaction or black box warnings (Fuentes et al., 2018).
Altered cardiac electrophysiology was implicated in the withdrawal of 13 drugs from market between 1990 and 2006 (Shah, 2006). Such events led to International Conference on Harmonization (ICH) S7B guidelines for proarrhythmic risk detection using simplified in vitro assays to measure blockade of the rapid repolarization I Kr current, commonly known as hERG (Food and Drug Administration, 2005b). Combined with ICH E14 (Food and Drug Administration, 2005a) guidelines on electrocardiogram monitoring, drug withdrawal due to electrical dysfunction has reduced, with no reported incidences since 2007. These approaches have improved safety, but the relatively poor specificity for predicting human outcomes and overconservativism of the assays has raised concern that promising drug candidates may be abandoned too early due to false positives (Gintant, 2011).
Greater predictivity during the early stages of the drug development pipeline will require certain attributes from the chosen assays. These include being of human origin, suitable longevity for acute and chronic studies, compatible with medium-throughput analysis, reflective of working cardiomyocyte physiology and function, and compliant with 3Rs polices. This reduces the attractiveness or relevance of many existing systems, such as those involving animal models. This is also true for human-and animalderived primary cardiomyocytes, which rapidly dedifferentiate in culture, lose viability, or become overrun with fibroblasts.
Alternative technologies are now showing potential in cardiovascular safety evaluation, with human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) as a key modality (Gintant et al., 2017). This is evidenced by their use in disease modeling, drug discovery and cardiac safety studies, which culminated in the Comprehensive in vitro Proarrhythmia Assay (CiPA) and Japanese iPS cardiac safety assessment initiatives (Fermini et al., 2016;Kanda et al., 2018;Sager et al., 2014). Using CiPA as an example, the approach proposed was to identify proarrhythmic risk based on several key modalities including: (1) assessment of several major ion channels in transfected cell lines; (2) in silico modeling of the ion channel effects; (3) proarrhythmic assessment in hiPSC-CMs; and (4) clinical assessment of electrocardiograms from phase I human studies. Data that emerged from CiPA show that optical-and multielectrode arraybased platforms using commercially sourced hiPSC-CMs enable an 87% accuracy in predicting proarrhythmic liability across geographically diverse testing laboratories (Blinova et al., 2018), with similar studies being reported by others (Bot et al., 2018).
In contrast, there have been relatively few studies that have used hiPSC-CM contractility in predictive safety assessment (Pointon et al., 2017). This is surprising because cardiovascular liability of drugs occurs commonly via altered function of the contractile myocardium. Also, monitoring contractility in hiPSC-CMs is currently being done at low throughput or by using surrogate markers (eg, impedance) and, so far, there has been no detailed cross-site validation study to accurately assess the impact of drugs on hiPSC-CM contractility.
Within this context, a public-private partnership "InPulse CRACK-IT Challenge" was established between the pharmaceutical company, GlaxoSmithKline (GSK), and a U.K. funding agency, the National Centre for the Replacement, Refinement & Reduction of Animals in Research (NC3Rs). The aim was to develop mediumthroughput technology platform that could measure contractility in hiPSC-CMs as a physiologically relevant functional output for use in preclinical drug safety evaluation. Parallel or simultaneous measures for Ca 2þ handling and/or voltage, potentially with physiological loading, were requested as part of the challenge as optional parameters to multiplex mechanistically relevant endpoints and thereby inform integrated decision making.
To this end, we established a multinational consortium, comprising 4 academic teams in Germany, Holland, and the United Kingdom, along with 2 biotech companies (Clyde Biosciences; Ncardia), GSK and NC3Rs. We describe the process by which 36 drugs were selected, distributed, and formulated, allowing a training set of 8 drugs to be used to unify standard operating procedures (SOPs). This enabled a multicenter study to be undertaken for blinded evaluation of up to 28 drugs with known positive, negative, or neutral inotropic effects, and led to the creation of an interactive web tool to access datasets for contractility, voltage, and/or Ca 2þ handling (https://bjvanmeer. shinyapps.io/crackit/, last accessed May 11, 2020). Overall, the different platform-cell combinations had an accuracy of 44%-85% in correctly predicting inotropy. With simple refinement of the test conditions, namely by adopting approaches to reduce signal-to-noise ratio, data variability, reduce spontaneous beat rate to 1 Hz, and/or enable chronic testing, accuracy could be increased to 85%-93%, comparing favorably with that currently possible with in vivo animal models.

MATERIALS AND METHODS
CellOPTIQ: Adult rabbit ventricular cardiomyocytes. Rabbit hearts were excised via a thoracotomy and submerged in a modified Krebs-Henseleit (KH) solution of the following composition (mmol/l): NaCl (130), KCl (4.5), HEPES (5), NaH 2 PO 4 (0.4), MgCl 2 (3.5), and Glucose (10), pH 7.25 at 37 C with NaOH. Hearts were removed and perfused retrogradely at 25 mlÁmin À1 (37 C) with a modified KH solution containing 0.75 mmol/l [Ca 2þ ] for 5 min, followed by a nominally Ca 2þ -free KH solution with 0.1 mmol/ l EGTA for 5 min. Hearts were then perfused with KH solution containing 0.24 mmol/l [Ca 2þ ], 1 mg/ml collagenase (type I), and 0.06 mg/ml protease (type XIV). After approximately 4-5 min, enzyme was removed and the left ventricular free wall was then cut into strips in the recirculated enzyme solution containing 1% bovine serum albumin before being mixed to yield a single cell suspension. Cells were maintained in either Ca 2þ -free KH solution or 1 mmol/l Ca 2þ (via stepwise increments) until use. Intact cardiomyocytes in 1.8 mmol/l modified Ca 2þ KH solution were loaded with FluoVolt (Thermo Fisher Scientific) at 1:3000 dilution for 10 min. The incubation medium was removed and the cells resuspended in a modified KH solution.
Cardiomyocytes were allowed to settle on a coverslip in a bath (35 mm petri-dish) at 37 C. Cells were field stimulated at a frequency of 2 Hz with 2 ms duration voltage pulses delivered to parallel graphite electrodes (stimulation voltage set to 1.5 times the threshold) for 2 min before sampling FluoVolt and cell video for 10 s. After repeating this for 10-12 cells/dish the drug was added to quiescent cells and left for 30 min before returning to the same myocytes and repeating the stimulus and data capture protocol. A parallel set of vehicle (DMSO) time control experiments were also performed FluoVolt fluorescence (490 nm excitation) was measured at a sampling rate of 10 KHz, whereas the image was recorded on a CCD camera at 100 Hz frame rate using > 700 nm light. Sarcomere length and fractional shortening of sarcomere length was subsequently extracted from the image using a FFT-based algorithm.
TTM system. The TTM system was developed and used as described previously (van Meer et al., 2019). Briefly, the all-optical fluorescent system consists of a microscope capable of recording sequential frames while switching exposure wavelengths within 1 ms. Using 3 LEDs at 470, 560, and 656 nm this results in an effective recording speed of 333 Hz per parameter. Baselines measurements (7-s recordings) were made by choosing 3 areas per well and saving the positions to measure the same areas after drug incubation. Data were analyzed with custom software offline and automatically to reduce used bias.
Black glass-bottom 96-well plates (Grenier) were coated with 1:100 Matrigel (Sigma-Aldrich) in DMEM F12 (Sigma-Aldrich). Pluricyte hiPSC-CMs were thawed and plated (40 000/well in 100 ml) according to manufacturer's instructions in Pluricyte Cardiomyocyte Medium. To help the recovery of the cells 1:100 RevitaCell (Sigma-Aldrich) was added to each well. Empty wells around the plated CM were filled with 200 ml of PBS to minimize evaporation. Cells were refreshed with 200 ml PCM medium at day 1 and day 4 or 5 after plating. All measurements for 1 drug were always performed on days 5 and 6 or days 6 and 7 after plating.
CellOPTIQ: hiPSC-CMs. The hiPSC-CMs used were R-PAT (derived and differentiated in house as described previously) Smith et al., 2018), and iCell 2 and Pluricyte (purchased from Cellular Dynamics International and Ncardia). Manufacturer instructions were followed for the commercial hiPSC lines. For iCell 2 , seeding was at 25 000 cells into each well of 96-well plates (Nunclon Delta Surface; Thermo, 167008) and then maintained for 10 days before use in drug evaluation studies. The same plates were used for R-PAT hiPSC-CMs, which were seeded at 40 000 cells/well and maintained until days 20 and 21 of differentiation before use. For Pluricyte hiPSC-CMs, seeding was at 35 000 cells into each well of 96-well plates (Greiner Bio-One; 655087) and then maintained for 8 days before use. Confluent monolayers of R-PAT and iCell 2 hiPSC-CMs were changed into serum-free medium (SFM) (Dulbecco's Modified Eagle Medium [Gibco, 21969035] þ 10 mM galactose [Sigma-Aldrich, G0750] and 1 mM sodium pyruvate [Sigma-Aldrich, P2256] 24 h before testing). Pluricyte hiPSC-CMs were changed into 50% SFM and 50% Pluricyte Cardiomyocyte Medium (serum-free) 24 h before day of testing, and on the day of testing were changed into SFM. For the refined conditions described for R-PAT hiPSC-CMs, plating and maintenance were as before but cells were received fresh RPMI-B27 medium 24 h before testing instead of SFM.
hiPSC-CMs in 96-well plate were transiently loaded in 50 ml/ well of SFM containing FluoVolt (1:200 part B, 1:2000 part A; Life Tech, F10488) for 20 min at 37 C and 5% CO 2 . After incubation, the medium was replaced with 200 ll/well of SFM. Plates were then incubated at 37 C and 5% CO 2 for 15 min before recording, which were made using a Â40 (NAO.6) objective at 10 KHz. To apply electric field stimulation, a custom-made 8 channel electrode StimStrip (Clyde Biosciences Ltd) was placed in a row of a 96-well plate and connected externally to a box (DC power supply; Lavota). hiPSC-CMs were paced at a frequency of 1.2-1.7 Hz (or 0.7-1 Hz in the refined conditions) with an amplitude of 8 V and a pulse width of 20 ms. Recordings of 10 s were made for each well (contractility was 100 frames/s, hence 1000 frames).
Baselines and drug addition were as described for the TTM system. Electrophysiology data were analyzed using CellOPTIQ proprietary software of Clyde Biosciences and were normalized to a maximum amplitude of 1 and minimum of 0 to standardize height for comparison of traces created in OriginPro (OriginLab version 7.5). Contractility data were analyzed based on pixel displacement using an ImageJ plug-in. This plug-in uses a sum of absolute differences algorithm.
Contractility analysis was via automated video-optical contractility analysis was performed in 24-well cell culture plates (Nunc, 122475) . Video files of EHT contractions were analyzed by automated EHT figure recognition software (EHT Technologies, A0001). Top and bottom ends of EHT contour were identified and followed during the course of recording. Force was calculated based on shortening during contraction, elastic propensity, and geometry of PDMS posts.
Ca 2þ transient analysis was combined with contraction in 24-well cell imaging plates (Eppendorf, 030741005). Briefly, set up consisted of an inverted fluorescence microscope (Zeiss) with a custom-made, automated, light tight, CO 2 -and temperature-controlled XY-stage with 24-well plate holder. A camera attached to front port of the microscope was used for contraction analysis (Â1.25 objective) and a camera attached to side port of the microscope was used to adjust position of the tissue for fluorescence/Ca 2þ transient measurement (Â10 objective). A small predefined area (0.1 Â 0.4 mm) in the center of EHT was used to record Ca 2þ transients. Mercury lamp, GFP filter set, and photomultiplier tube were used to record change in fluorescence in EHTs expressing GCaMP6f (increase fluorescence during contraction phase). Predefined XYZ coordinates for contraction and Ca 2þ transient were defined and saved in a bespoke software which controlled microscope settings, XYZ coordinates, and recorded automated-, sequential-contraction and Ca 2þ transients. Cumulative concentration response curves were performed, similar to contractility analysis, for selective concentrations based on results of contractility analysis.
Data were normalized to a pool of time-matched controls (contractility analysis, ERC18: n ¼ 22 EHTs/7.0 experiments; R-PAT: 16 EHTs/5.0 experiments; combined contractility-and Ca 2þ transient-analysis: 21 EHTs/8.0 experiments) used for all contractility/combined contractility-and Ca 2þ transientmeasurements. Data are plotted as relative change to the mean baseline, normalized to pool time control. Statistical significance was evaluated by GraphPad Software, 1-way ANOVA with Dunnett's multiple comparisons test versus baseline.

Rationale for Selection, Handling, and Distribution of Training and Blinded Drug Sets
Driven by the interests of the pharmaceutical industry, and specifically GSK, the 36 drugs in this study (Supplementary Table 1) were selected on basis of: (1) commercial availability; (2) approximate balance in numbers between positive inotropes (PIs), negative inotropes (NIs), and no effect drugs (NE), with respect to contractility; (3) inclusion of false positives or negatives; (4) availability of functional data on inotropic effect on cardiomyocyte and/or heart function from 1 or more species, including human, spanning use in vitro, ex vivo, preclinically, and clinically; (5) data on free therapeutic plasma concentrations (FTPCs); (6) solubility in DMSO to allow unified testing at a maximum concentration of 0.1% v/v, which does not cause toxicity in hiPSC-CMs; and (7) broad range of modes of action relevant to contractility and toxicity, including modulation of ion channels and pumps in the cell membrane (I NaV1.5 , I Kr , I KATP , I CaL , I f , NCX, Na þ /K þ -ATPase) or sarcoplasmic reticulum (SR) (RYR, SERCA), b1-and b2-adrenoceptors, Ca 2þ sensitivity, signaling cascades (phosphodiesterase [PDE], adenylyl cyclase, cyclic AMP), energy production (ATP, mitochondrial stress), myofilament response, and myofilaments as well as less well-known mechanisms (eg, inhibition of tyrosine kinases).
All drugs were purchased, formulated and distributed under contractual material transfer agreement by GSK to ensure that the same lot numbers were used by the testing laboratories. Eight drugs were not blinded and used as a "training set" to establish the working parameters and protocols for the technology platforms (isoprenaline, nifedipine, digoxin, BayK8644, EMD-57033, ryanodine, thapsigargin, caffeine). The remaining 28 "test set" drugs were encoded and blinded by GSK's Sample Management Technology department, which dealt with handling and distribution as preweighed lots in coded brown glass vials. Samples were formulated according to prescribed, blinded instructions, and using single-shot vials of DMSO where possible to minimize absorption of water from the atmosphere, hence avoid inadvertent drug dilution. Powders and formulated drugs were stored at À80 C, with the solutions undergoing no more than 2 rounds of freeze-thaw and use within 2 weeks.

Refining and Establishing Working Parameters With the Drug "Training Set"
The training set of 8 drugs was used to unify the methods for testing, analysis, and presentation . This helped to establish SOPs (Supplementary Table 2), which were used for subsequent blinded evaluation of the "test set" drugs. Three technology platforms were used, differing in configuration, format of wells, and approach to calculate contractility. These were the Triple Transient Measurement (TTM), CellOPTIQ (CO), and engineered heart tissue (EHT) platforms.
The TTM platform is bespoke and was the only system in this study that could simultaneously measure contractility, electrophysiology, and Ca 2þ handling. Interlaced 1000 frame/s movies (ie, sequential 1 ms/channel) were recorded from hiPSC-CMs cultured in 2D monolayers in 96-well plates loaded with appropriate dye (Sala et al., 2018;van Meer et al., 2019). The CO is a proprietary system that was used to measure contractility, via bright field images, and electrophysiology, via voltage-sensitive dyes (Duncan et al., 2017). Here, it was used on hiPSC-CMs cultured in 2D monolayers in 96-well plates and isolated adult rabbit cardiomyocytes. Finally, the EHT system is commercially available, using 3D constructs fabricated from hiPSC-CMs using a fibrin hydrogel between 2 polydimethylsiloxane (PDMS) posts and multiplexed in 24-well plates . Determining Ca 2þ handling required viral transgenesis of a genetically encoded calcium indicator (GCAMP6f) during EHT fabrication. Whereas pixel displacement was used as a surrogate of contractility in the 2D systems (TTM and CO), 3D EHTs enable force of contraction to be calculated from the extent of deflection of PDMS posts upon each beat.
Action potential duration (APD 30 , APD 90 ) and triangulation (APD 90 -APD 30 ) were calculated from voltage waveforms to determine whether electrophysiology was altered, including the appearance of arrhythmias. For Ca 2þ handling, amplitude, time to peak, and decay time were assessed, whereas similar parameters (contraction amplitude [CA], contraction time [CT], and relaxation time [RT]) were derived for contraction (Supplementary Figure 2). Contractility responses were further subdivided (Supplementary Figure 2), such that positive inotropy reflected an increase in CA, positive clinotropy a decrease in CT and positive lusitropy a decrease in RT. The opposite is true for negative responses.
The training set of 8 drugs was used to test and refine protocols. Representative data are shown for the PIs, digoxin/isoprenaline, and the NI, nifedipine (Supplementary Figure 3). For each parameter, the percentage change in drug-treated samples relative to their respective vehicle control was calculated using the formula ([drug/drug baseline]/[vehicle/vehicle baseline])*100-100. Consistent with the mode of action of these drugs, in all cases there was a trend or significant response for positive inotropy (increased CA) in hiPSC-CMs treated with digoxin/isoprenaline but negative inotropy (decreased CA) for those treated with nifedipine (Supplementary Figure 3). Digoxin tended to increase RT in 2D configuration, whereas isoprenaline decreased of both CT and RT in 3D EHTs (Supplementary Figure 3). These studies allowed unification of SOPs, although physical and technical constraints between the platforms meant that some unavoidable differences remained (Supplementary Table 2).
Blinded Evaluation and Assignment of Drug "Test Set" Using the SOPs established above, up to 28 blinded drugs were tested in a rank order that was predefined by GSK (Table 1). Contraction was the parameter common to all platforms because the primary aim of the study was to predict whether drugs were positive, negative, or neutral inotropes. Differentiation of the in-house hiPSC lines, R-PAT, and ERC18, provided sufficient hiPSC-CMs to test all available drugs on the CO and/or EHT platforms. Testing was restricted to 10 drugs in rabbit CMs (as a comparator) and in commercial hiPSC-CMs (Pluricyte and iCell 2 ) due to cost, availability and timelines, whereas throughput was a limitation of the TTM system. Overall, 9 drugs were tested in common across all platform-cell combinations, which were later unblinded and identified as: PIs, epinephrine, forskolin, levosimendan, pimobendan; NIs, verapamil, sunitinib; NE, acetylsalicylic acid, atenolol, captopril.
All data for contraction, voltage, and Ca 2þ handling across the various platform-cell combinations are accessible via an interactive web tool via https://bjvanmeer.shinyapps.io/crackit/. Representative data for contractility illustrates the output format for 6 drugs where the response was generally predicted correctly or incorrectly, respectively. Thus, data are shown for the PIs, epinephrine, and levosimendan ( Figure 1A), NIs, verapamil, and sunitinib ( Figure 1B), and NE drugs, acetylsalicylic acid, and captopril ( Figure 1C). In some instances, data from Ca 2þ and, to a lesser extent, voltage analyses were used to assist in prediction of inotropic response, as was the case for the PI, epinephrine, and the NI, verapamil (Supplementary Figs. 4A and 4B).
Prediction of inotropy involved a 2-day face-to-face meeting between all investigators, with colleagues from GSK overseeing the process as observers rather than contributors. Each platform-cell-drug combination was evaluated individually and a consensus between the research team on the outcome was established using the terminology defined in Supplementary  Figure 2 (ie, positive or negative inotropy [CA], clinotropy [CT], and lusitropy [RT]; altered electrophysiology 6 arrhythmias [AE 6 *]; altered Ca 2þ ). In some cases, decisions were based on trends rather than statistical significance. For example, this was true for TTM: Pluricyte evaluation of epinephrine ( Figure 1A, see CA and CT). Once predictions were finalized, the document was "locked" and drugs were unblinded by GSK to allow comparison with their known effects (Tables 1 and 2).
Across the 9 drugs evaluated in common, all platforms mainly predicted NE drugs and NIs (Table 2). One exception was sunitinib, which was poorly detected by the EHT platform. This may be due to slower penetration, lower sensitivity, and greater cell community effect of 3D tissues protecting against shortterm (30 min) exposure to this chronic toxicant. Poorest predictivity was for PIs, which ranged from 0% to 50% accuracy. Overall predictivity for the 9 drugs ranged from 44% to 78%. This range was similar across all drugs tested (50%-85%), with NE and NI being predicted most accurately (up to 100%; Table 2). The trend was for the EHT:hiPSC-CM combination to be more predictive of PIs (75%) than CO: hiPSC-CM (0%-25%). Together, these data implied that continuous measurements from 3D preparations of hiPSC-CMs yielded more predictive contractility data than cell motion sampled from a 2D culture.
We looked for trends in the data that might point toward ways to improve the process of predicting response. The drug concentration at which the maximum responses were recorded were broadly similar across the TTM, CO, and EHT platforms, irrespective of whether prediction was correct (Table 1). However, we noted that the response range was wider in 2D relative to 3D platforms. Whereas mean maximum measurable percentage changes in CA for the TTM and CO platforms ranged from À100% to þ196%, it was À100 to þ53% for the EHT platform (Table 1). Where there was an increase in CA, the standard deviation of these mean values was 41.5 versus 12.1 for the 2D versus 3D platforms. A similar pattern was observed for NIs, and for changes in CT and RT. This was corroborated by the wider distribution of data points in the 2D platforms relative to the 3D platform (Figs. 1A-C, Supplementary Figs. 4A and 4B, web tool). This likely relates in part to the cumulative concentration response protocol used for EHTs (Supplementary Table 2).
These observations prompted us to examine sensitivity of the different platforms. Where platform-cell combinations correctly predicted inotropy, data were converted into a heat map to reflect the percentage change for PIs ( Figure 2A) and NIs ( Figure 2B). In 6/7 correctly predicted cardio-active PIs, 1 or more contractile parameters (CA, CT, and/or RT) recorded from EHTs reached statistical significance at concentrations within, or close to, the FTPC range, whereas 2D platforms appeared to be less sensitive (Figure 2A). This is illustrated by epinephrine, where significant changes (p of .05, Dunnett's post-test) in CT and RT were detected within the FTPC range at a sensitivity of 10-to 100-fold greater than the 2D platforms, and a similar pattern was seen for forskolin ( Figure 2A). These trends did not extend to NIs, presumably because on-or off-target cardiac toxicity associated with a proportion of compounds in this group will be occurring after longer incubation times and at higher concentrations than those used for therapeutic benefit, which is reflected in the FTPC.
Finally, we investigated which parameter was most informative (Figure 3). Where prediction was correct, we asked which parameter within these in vitro models reached significance at the lowest drug concentration. Pooled data from both PIs and NIs showed no clear distinction between CA, CT, and RT.  1 N E Drugs were ranked by GSK personnel to provide a testing order, based on modes of action that were of interest to them. For each platform-cell combination, data listed represent mean maximum measurable percentage change relative to baseline for CA, CT, and RT. The concentration at which this effect occurred is listed but when it was different for CA, CT, and RT information is superscripted. Green indicates the predicted effect matched the known effect on inotropy. Analysis platforms were CO, CellOPTIQEHT, and engineered heart tissueTTM, Triple Transient Measurement. Cardiomyocyte types were rabbit adult cardiomyocytes (as comparator) or hiPSC lines: P-Cyte, Pluricyte (Ncardia); iCell 2 (Cellular Dynamics International); R-PAT (University of Nottingham); ERC18 (University of Hamburg).

Understanding Why Incorrect Assignments Were Made and Improving Test Platforms
The data above highlighted several deficiencies, including: (1) the discrimination of changes in contractility assessed from intermittent video measurements using a motion algorithm on 2D systems was lower than the continuous tension measurements on 3D systems, (2) detection of PIs was challenging, particularly to 2D platforms, and (3) chronic toxicants, such as doxorubicin, were poorly predicted after the acute (30 min) treatment duration. We reasoned that medium composition may be a contributing factor in signal-to-noise ratio and poor detection of PIs. This was because, on the CO platform, switching from proteincontaining (RPMI-B27) to serum-/protein-free medium increased the mean beat rate from 0.7 to ! 1.2 Hz (Supplementary Figure 5; p .0001, Mann-Whitney). This was concurrent with increased spread of the data for CA, CT, and RT, wherein the standard deviations for RPMI-B27 versus serum-/protein-free medium were 12.1 versus 32.8, 14.5 versus 20.1, and 10.7 versus 26.9, respectively (Supplementary Figure 5). It has previously been reported in native heart muscle preparations that lower beating rate is associated with a positive force-frequency relationship (up to approximately 2 Hz) and stronger inotropic effects of most PIs (Butler et al., 2015). Therefore, we retested PIs from the test set of drugs using CO:R-PAT, because this platform:cell combination proved poorly predictive (0/8) in the blinded study. In these revised testing conditions, 6/8 PIs showed reduced CT and/or RT (Figure 4).
We extended these studies to the 3D EHT platform, this time slowing spontaneous beating to < 0.5 Hz with 300 nM ivabradine, a pharmacological blocker of the "funny" I f current (Bois et al., 1996). Notably, these "slowed" EHTs could faithfully follow electrical pacing in increments from 0.5 to 2.5 Hz (Supplementary Figure 6). Whereas force-frequency relationships was negative above 1.5 Hz, it was positive between 0.5 and 1.5 Hz, evidenced by a 147% increase in force generation concurrent with significant shortening of CT and RT (Supplementary Figure 6). We therefore re-evaluated 6 of the PIs in the presence of 300 nM ivabradine with electrical pacing at 0.5, 0.7, 1.0, 1.5, and 2.0 Hz. These also included epinephrine, levosimendan, and pimobendan, which had been incorrectly predicted by EHT:R-PAT and/or EHT:ERC platform-cell combinations. Notably, for all 6 drugs, positive inotropy was evident via increases in CA but only at 0.5 and 0.7 Hz, and sometimes at 1 Hz, but never at 1.5 or 2 Hz ( Figure 5). Consistent with this, shortening of CT was observed for all drugs acting via increased cAMP, with the largest changes often occurring at the lower pacing frequencies ( Figure 5). Thus, in both 2D and 3D configurations, lower beating frequencies led to increased predictivity for PIs.
Finally, we considered NIs that had been incorrectly predicted (or not tested), including doxorubicin and sunitinib. Although 30 min exposure of hiPSC-CMs to these anticancer drugs altered electrophysiology, effects on contractility were predicted with variable accuracy (Table 1, web tool). We asked whether longer term exposure would affect contractility (Supplementary Figure 7). The CO:R-PAT platform-cell combination showed that exposure to the highest concentration of either drug for 24 h led to loss of contraction and cell death. Similarly, using the EHT:ERC platform-cell combination, exposure to doxorubicin ceased EHT contraction after 17.5 h, whereas exposure to 1 mM sunitinib led to a decline in CA over a 7-day period (Supplementary Figure 7). Thus, although shortterm exposure to these NIs perturbed the electrophysiology of hiPSC-CMs, longer term exposure caused overt cytotoxicity and/ or negative inotropy.

DISCUSSION
Through blinded testing across multiple geographical sites, we evaluated the ability of 7 different platform-cell combinations to predict whether drugs of interest to the pharmaceutical indicate where predictions were made due to a trend rather than reaching significance, and/or by guidance from Ca 2þ and/or voltage data. Abbreviations: CO, CellOPTIQ; EHT, engineered heart tissue; TTM, Triple Transient Measurement. industry were positive, negative, or neutral inotropes. We achieved this by examining contractility parameters (CA, CT, and RT), as well as using Ca 2þ transients and/or electrophysiology to assist in decision making in some instances. Within the context of these in vitro models, we found that CA was the most informative parameter for NIs. Particularly after refinement involving slowing of beat rate to below 1 Hz ( Figure 5,  Supplementary Figure 6), CA was highly informative of PIs in the EHT system but far less so in the 2D systems. Although contraction-and relaxation-time provided confirmatory readouts for PIs in 3D EHTs, these parameters typically served as the primary source of predictivity in 2D (Figure 4), especially where the mode of action involved cAMP signaling.
We propose that an efficient way to predict the inotropic effect of drugs would be first to conduct acute (30 min) testing in hiPSC-CMs. Spontaneous beat rates should be < 1 Hz, which can be achieved by modifying the culture medium, using pharmacological blockage with ivabradine, and/or selecting hiPSC-CMs with slow intrinsic rates. In EHTs, this approach unveiled a clear positive force-frequency relationship. If no changes in inotropy are detected in the acute assay, then exposure times can be increased to ! 24 h to determine whether the drugs have a chronic effect. These timelines were suggested by GSK (ie, 30 min considered as acute; ! 24 h as chronic), which proved to be a useful approach because it allowed a predictive accuracy of 85% in 2D monolayers and 93% in 3D EHTs.
In reaching these refined conditions, we noted greater importance of the cell preparation protocol, testing conditions, methods of measuring contractility, and 2D versus 3D than the cell type used, which was not necessarily expected. There were differences in the purity/composition of the different cell types and in their baseline electrophysiological characteristics, which partly reflects their maturity state at single cell level (Supplementary Table 3). However, there was not an obvious correlation between these differences and predictivity. For example, initial testing of the same cell type (Pluricyte hiPSC-CMs) on 2 platforms (TTM and CO) gave different accuracies (78% vs 56%). The same was true for R-PAT hiPSC-CMs on the EHT and CO platforms (67% vs 44%).
Modifying both the culture environment (by including a protein source that caused a beat rate to be slowed, and signal-tonoise ratio and data variability to be reduced) and the drug exposure conditions (by including both acute chronic testing for min and ! 24 h, respectively) for the CO:R-PAT combination allowed the accuracy of prediction using this cell line to be increased to 85%. It is possible that further improvements could be made, whilst simultaneously shedding light on mechanism of action. For example, analysis of data during different stages  Figure 3 showed these to be the most informative parameters for PIs. Whereas testing in serum-/protein-free medium failed to identify any PIs correctly (A, web tool), the slowed beat rate and improved signal-to-noise ratio afforded by culture in RPMI-B27 allowed correct identification of 6/8 PIs by significant decreases in CT and/or RT. Red dotted line is free therapeutic plasma concentration. Dunnett's stats versus vehicle control: *p < .05; **p < .01; ***p < .001; ****p < .0001. of relaxation identified positive lusitropy for dobutamine (at 50% relaxation) and late relaxation deficit for ivabradine (at 80% relaxation). During the blinded phase, we also correctly predicted that drug rank 14 was omecamtiv mecarbil on account of the unusual response of increased CT (approximately þ40%) without convincing evidence for increased CA (Malik et al., 2011). This shows the value of applying pharmacological knowledge to drug responses in hiPSC-CMs to derive mechanistic information out of multiparametric assessments, in this case combined evaluation of CA, CT, and RT in hiPSC-CMs configured as 3D EHTs, in this case combined evaluation of CA, CT, and RT in hiPSC-CMs configured as 3D EHTs.
Another consideration is the difference between single cells, continuous monolayers, and 3D engineered constructs. The single rabbit cardiomyocyte assay used near-physiological rates of stimulation (2 Hz) and subphysiological rates (1 Hz or lower) may have improved scope to detect PIs. In adult heart preparations, a common feature of PI interventions that raise intracellular Ca 2þ is the occurrence of spontaneous diastolic SR Ca 2þ release. This phenomenon is linked to negative inotropy and arrhythmic behavior in single cells and intact myocardium (Allen et al., 1985;Hess and Wier, 1984). In this study, spontaneous diastolic Ca 2þ release and associated diastolic shortening was observed in isolated rabbit myocytes in response to b-adrenoreceptor stimulation and drugs that raise cAMP directly (eg, forskolin), but this phenomenon was not reported in any of the hiPSC-CMs platforms. This is consistent with the minimal involvement of the SR in excitation-contraction coupling typically seen in embryonic cardiomyocytes and hiPSC-CMs (Knollmann, 2013); in this context, the PI effect from cAMP arises mainly from cAMP-mediated stimulation of L-type Ca 2þ current.
Out of necessity, preparations of native adult CMs are dispersed and seeded as cultures of single cells. It is known that variance in cell density influences electrophysiological parameters (Du et al., 2015), particularly within, and between, preparations of single cell preparations, including hiPSC-CMs Figure 5. Slowed beat rate increases the predictivity of positive inotropes (PIs) in 3D engineered heart tissues (EHTs). Contraction analysis was carried out using EHTs with 6 of the PIs from the drug test set: epinephrine (Epi), forskolin (For), levosimendan (Lev), pimobendan (Pim), dobutamine (Dob), and milrinone (Mil) which had been predicted with variable accuracy ( Figure 4A, web tool). In all cases, the drugs (applied as a high concentration bolus) increased contraction amplitude (CA) but only at 0.5 and 0.7 Hz, and sometimes at 1 Hz, with the largest effects seen in contraction time (CT) also often occurring at these frequencies. Scatter plots show percentage changes relative to baseline at respective frequency. Averaged peaks for force are shown for baseline (BL, black peaks) versus after treatment in EHTs paced at 0.5 Hz (blue peaks) or 2.0 Hz (red peaks). Dunnett's stats versus baseline at respective frequency: *p < .05; **p < .01; ***p < .001; ****p < .0001. . Therefore, although the structural and function (eg, prevalent SR) maturity of adult native adult CMs from rabbit is an advantage over hiPSC-CMs, this is offset by the high level of heterogeneity of single cell preparations, which reduces the discrimination power. Although some of the differences will be due to the physiology of rabbit CMs relative to human CMs, we have also seen higher variability in human atrial trabeculae as compared with hiPSC-EHT; hence, there are separate challenges in using native cells. It is for these reasons we elected to use hiPSC-CMs within 2D or 3D syncytium, where individual cell-to-cell variability is averaged due to mechanical and electrical coupling. In addition, several studies have shown previously that the EHT format compared with standard 2D culture favors maturation in terms of MDP and upstroke velocity (Lemoine et al., 2017(Lemoine et al., , 2018, structure , and metabolic preference (Ulmer et al., 2018).
Blinded analysis was done using serum-/protein-free medium, with the intention of avoiding protein-drug binding that might blunt the responses within the in vitro system. However, at least in the CO:R-PAT platform-cell combination, serum-/protein-free medium was a hindrance, leading to high spontaneous beat rates, and poor signal-to-noise ratio ratios. These issues were abrogated by using protein-containing medium (RPMI-B27), which enhanced the accuracy of predicting PIs on the CO:R-PAT platform from 0% to 75%. This indicates that, for this purpose, the benefits brought by the protein-containing medium outweigh the concerns of drug binding and batch-tobatch variations of protein ingredients. Nevertheless, voltagesensitive dyes, such as FluoVolt, interact with proteins and reduce the signal-to-noise ratio, which make simultaneous recording of contraction and voltage challenging, although this may be overcome by measuring extracellular voltage.
Treatment of hiPSC-CMs with doxorubicin for 30 min caused changes in electrophysiology (eg, triangulation, see web tool) and hence gave a distinctive response compared with NEs such as acetylsalicylic acid. Nevertheless, this acute exposure did not always cause changes in inotropy, which shows the importance of examining an appropriate concentration range and/or exposure times of drugs. Increasing exposure time to approximately 1-7 days unveiled doxorubicin and sunitinib as NIs, consistent with the extended timescale over which cardiac toxicity presents clinically and is in line with data from a previous study using TKI-inhibitors (Jacob et al., 2016). Interestingly, with 24 h of doxorubicin treatment, the CO:R-PAT platform-cell combination appeared to show a trend of positive inotropy from 1 to 30 mM (increased CA; decreased CT) but cell death at 10 mM. This aligns with reports indicating the effects of doxorubicin are complex in that transient positive inotropy is followed by robust negative inotropy at higher concentrations (Kim et al., 1980;Vanboxtel et al., 1978;Wang and Korth, 1995).
The levels of accuracy of up to 93% in predicting druginduced changes in contractility in human hearts under these refined conditions are favorable relative to those from animal models (Lawrence et al., 2008;Valentin et al., 2009). They are also comparable with data using 3D cardiac microtissues containing hiPSC-CMs, cardiac endothelial cells, and cardiac fibroblasts, which correctly predicted 23 of 29 (85%) inotropes across a nonblinded panel of compounds (Pointon et al., 2017). These findings are encouraging, but it is not always straightforward to translate hiPSC-CM-related parameters of CA, CT, RT, and chronotropy (beat rate) to clinically relevant data. We have suggested that data derived from hiPSC-CMs on CA is informative for NIs in all cell-platforms combinations and for PIs in the 3D EHT platform. However, in the 2D systems, CT, and RT may be informative for PIs. We have considered different explanations for this observation. If 2D systems are less adept at showing an increase in CA, then decreases in CT (clinotropy; also expressed as an increase in dF/dt) or decreases in RT (lusitropy; also expressed as an increase in ÀdF/dt) become more important. By considering pF/dt values, which can change independently of the peak force, there is alignment with the effects seen in vivo, which are often expressed as a change in dP/dt max or ÀdP/dt min . An alternative explanation is that, although this is not automatically intuitive, it may be a quirk of hiPSC-CMs; as a model system, there are limitations and adaptations in thinking need to be made. Nevertheless, the robustness of this notion will need to be tested on a wider range of drugs to determine whether these in vitro parameters are relevant to in vivo cardiac physiology.
Effects on cardiac contractility per se may not indicate or predict a clinically relevant detrimental effect on the heart. For example, cardio-active L-type calcium channel blockers are used for treatment of hypertension. Additional "structural" endpoints, such as mitochondrial membrane permeability (Dwm), endoplasmic reticulum integrity, contractile filament expression/organization, ATP depletion, and cardiac troponin levels in hiPSC-CM-based assay platforms may add interpretive value to functional changes (Sala et al., 2018). A limitation of our study was that some modes of action were not represented in the drug panel, such as the SERCA activator CDN1163 (Dahl, 2017;Kang et al., 2016), or the myosin inhibitor blebbistatin (Kovacs et al., 2004), and these would be interesting to include in blinded assays in the future.
In our study, we looked for trends or significant changes in the parameters measured, but did not apply a cut off on how much change was considered meaningful (eg, > 15% change and statistical significance). Whether analysis can be modified in the future is for consideration but would need to be done with care. The NE drugs, enalaprilat, tolbutamide, and pravastatin caused changes of 19%-33% in CA on the EHT platform and were incorrectly predicted as PIs. However, these magnitudes of changes were similar or greater than those recorded for PIs, and so adding thresholds on CA would not have improved prediction and could have been detrimental. Relationship to FTPC was not a strong association either, with the maximum changes of the incorrectly predicted NEs varying from 3-fold below the FTPC (tolbutamide) to > 100-fold above (enalaprilat, pravastatin). This also underscores the importance of technical precision and inclusion of time controls in each single experiment, particularly in strategies employing cumulative concentrationresponse analyses which take time.
These observations likely reflect the complexity of drug-cell interactions, which means differential effects can occur dependent on the concentration and categorizing as PI, NI, and NE was difficult in some cases. Inotropic effects of high concentrations of sulphonylureas, such as tolbutamide, have been reported for in vitro systems (Huupponen, 1987). Angiotensin converting enzyme (ACE) inhibitors, such as enalaprilat, lead to bradykinin accumulation and thereby lead to bradykinin B1 receptor activation, which elevates intracellular Ca 2þ and causes a positive inotropic effect (Ignjatovic et al., 2002). Similarly, phentolamine is a neutral inotrope at free plasma concentrations of approximately 2.5 mM in patients (Wallis et al., 2015), but at concentrations above 10 mM phentolamine causes negative inotropy via modulation of fast sodium and L-type calcium channels (Rosen et al., 1971;Sada, 1978). This is consistent with our data, indicating that there are multiple pharmacological effects of this compound. For ivabradine the concentrations provided were much higher than the FTPC, which can lead to off-target effects of poorly selective drugs (Choi et al., 2016). In some cases, such as zimelidine, the difficulty arose due to lack of robust in vitro data within the literature (Forsberg and Lindbom, 2009;Lindbom and Forsberg, 1981;Naranjo et al., 1984). Care must therefore be used when (1) selecting test compounds by primary pharmacology and (2) using in vitro assay paradigms to predict in vivo, clinically relevant cardiotoxicity, either when used as a "screening assay" to rank compounds or as a reflex investigative assay eg, following detection of cardiotoxicity in animal toxicology studies.
Because PIs are cardio-active drugs, it might be expected that the FTPC aligns with responses from hiPSC-CMs. This was true for the EHT platform, where significant responses were within, or close to, the FTPC. However, both the pharmacologic or statistical sensitivity of 2D hiPSC-CM cultures and isolated single rabbit cardiomyocytes were considerably lower. Two explanations may account for these observations. First, the greater stability of CMs in 3D constructs permitted cumulative dosing of EHTs within the same well, which created tighter datasets than the 2D systems, where parallel dosing in separate well was required. Second, in contrast to 2D monolayers that evaluated hiPSC-CM movement in unloaded conditions, the EHT platform measures force of contraction under loaded conditions . This facilitates maturation and Frank-Starling mechanisms for force generation are followed, leading to higher basal tone and cAMP signaling (Huupponen, 1987;Uzun et al., 2016), which are important modes of action of the PIs epinephrine, forskolin, and milrinone. Consistent with this, positive inotropy of milrinone was shown in "Biowire II" tissue engineered constructs, which also places hiPSC-CMs under loaded conditions to facilitate maturation (Zhao et al., 2019).
To measure whole heart function and its integration with neurohormonal or hemodynamic feedback, numerous assessment parameters are used, including P max , dP/dt max (max rate of change if pressure) and left ventricular ejection fraction (Guth et al., 2015). Atenolol, a selective b1-adrenoceptor antagonist, requires intact sympathetic innervation and is an NI in the clinic but shows NE in hiPSC-CMs (Kaumann and Blinks, 1980;Lemoine et al., 1988). Similarly, the mode of action for clonidine is through presynaptic alpha-adrenoceptors on sympathetic neurons with the consequence of reduced sympathetic drive and its effect on heart function (reduced heart rate and force) (Jarrott et al., 1979;Kleiber et al., 2017), but these effects would not be detected in a "cardiomyocyte only" model. Predictive screening will also benefit from inclusion of auxiliary cell types, such as neural lineages to enable evaluation of drugs that work via the neurohormonal system.
Other issues that need to be considered for the future include throughput, user variability, cell availability, cost, and batch-to-batch variability. Although the TTM platform allowed simultaneous measurement of contraction, Ca 2þ handling, and voltage, the low throughput meant that only 10 drugs were evaluated in 1 hiPSC-CM line. User variability has been noted in other studies. For example, variations in TdP predictability across different sites using identical instruments were reported in the CiPA study 13 , and so could be a factor in the site to site differences we observed. Cost was another factor, wherein each drug assessed typically required 1.6 million (40 wells in a 96well plate) and 5 million (5 EHTs) hiPSC-CMs, corresponding to up to approximately $3000 per drug at current prices. For these reasons, most of the assays in this report were done using hiPSC-CMs produced "in house," where cost for 1.6 or 5 million cells is approximately $16 or $50, respectively, albeit without the same level of quality control of commercial cells. Nevertheless, for both in house and commercial cells, batch-tobatch variation meant that the drug assays need to be repeated. These issues become more prominent when larger tissue engineered constructs are used. For example, "heart-in-a-jar" technologies, produce miniature 3D engineered electromechanically coupled cardiac organoid chamber that mimic pumping action similar to natural heart but require 10 million hiPSC-CMs (Li et al., 2018). Thus, it is encouraging that scaled production of hiPSC-CMs is now becoming relatively routine and so costs should decrease in the near future.
Altogether, this study suggests that, even in their current status of technology evolution, hiPSC-CMs cultured in 2D and 3D may have value to predictive safety pharmacology. Given the high resource and costs involved with drug development, even modest improvements in the pipeline could have large socioeconomic and 3R benefits. More than 6000 putative medicines are in preclinical development, using millions of animals at an annual total cost of $11.3Bn, and so each percentile reduction equates to approximately $100 M. Similarly, reducing drug attrition in phase 1 clinical trials by 5% could reduce development costs by 5.5%-7.1%. Building on our work with further improvement and validation studies, conducted in a blinded manner, will facilitate uptake of hiPSC-CMs as a routine tool in safety pharmacology.

MATERIALS AND CORRESPONDENCE
Indicate the author(s) to whom correspondence and material requests should be addressed. General enquiries should be to C.D., whereas specific enquiries should be directed as follows: