DrForna: visualization of cotranscriptional folding

Abstract Motivation Understanding RNA folding at the level of secondary structures can give important insights concerning the function of a molecule. We are interested to learn how secondary structures change dynamically during transcription, as well as whether particular secondary structures form already during or only after transcription. While different approaches exist to simulate cotranscriptional folding, the current strategies for visualization are lagging behind. New, more suitable approaches are necessary to help with exploring the generated data from cotranscriptional folding simulations. Results We present DrForna, an interactive visualization app for viewing the time course of a cotranscriptional RNA folding simulation. Specifically, users can scroll along the time axis and see the population of structures that are present at any particular time point. Availability and implementation DrForna is a JavaScript project available on Github at https://github.com/ViennaRNA/drforna and deployed at https://viennarna.github.io/drforna

have a different secondary structure at different transcript lengths.However, if the id does not correspond to a unique set of base-pairs over the whole input file, the colors above the time scale will correspond to the colors of the first structure that appears for each id.

Methods for visualization 2.1 Color scheme
As discussed in the main text, we developed a coloring scheme based on the imaginary center of each stem.Suppl.Fig. 1, shows the repetition of nine colors from the Hue color circle using an artificial example that shifts the imaginary center by 0.5.(There exists no sequence compatible with the structures shown in the figure; this is for colorrange demonstrations only.)Although color codes do not repeat within sequences of length shorter than 360 nt, those differences are typically not distinguishable by eye.

The treemap layout for structure plots
We use the treemap function from the visualization library d3.js (Bostock, 2012) to adjust the size of rectangles for plotting.As the treemap function expects hierarchical data, the input data is converted into a single level of hierarchy: one parent for each time point to which all data for the time point is connected.Finally, rectangles are placed using the coordinates given by the treemap function, and contain the associated colored secondary structure as well as the respective ID.The edges of rectangles are shown to ensure that changes in occupancy are always visible, e.g. if a rectangle increases only in width and the secondary structure plot is already scaled to use the maximal height, a change in occupancy would go unnoticed.Note that the size of a rectangle can also increase when the sum of occupancies is smaller than in the previous time step.

The interactive scale area
Structure plots are generated dynamically when a new time point is selected, which can slow down animations with lots of structural alternatives when many time points are selected in short succession.We use a debounce function to skip timepoints dynamically based on the maximal number of structures m per time point.If time points are selected for shorter than t = m * 5 ms, no output is generated.Dynamic scaling based on the total number of structures helps to avoid lag due to computational demands in large input files, where too much data would have to be generated, but also in small files where only few data points are available and thus less time is spent on a single time point when hovering over the scale area.

DrForna example visualizations
Suppl.Figures 2 and 3 show DrForna visualization of the stochastic simulators Kinfold (Flamm et al., 2000) and Kinefold (Xayaphoummine et al., 2005).Both plots compare visualization of a single trajectory vs an ensemble generated from 100 trajectories.In practice, it is likely that users may want to generate data from even more individual trajectories, but then some additional post processing (coarse-graining) will be necessary to limit the otherwise overwhelming amount of data.Both figures show the same time point, but secondary structures are quite different.This is because Kinefold is a helix-level simulator which uses a different energy model than the base-pairlevel simulator Kinfold.Also, the Kinefold model includes pseudoknotted conformations while Kinfold does not.It is also worth keeping in mind that that the simulation time per nucleotide can have a large impact on the observed secondary structures.In both cases, the structure returned by the single trajectory is not the most occupied structure at 100 trajectories.As the Kinefold model inserts whole helices, the visulization of most occupied structures over time looks smoother than for Kinfold simulations which include base-pair-level stochastic fluctuations.

Figure 1 :
Figure 1: Demonstration of coloring helices by their imaginary centers.Nine colors are repeated, small changes of the imaginary center lead to clearly distinguishable colors.(No sequence is compatible with the example structures plotted here.)