Measuring rhythmic complexity: A primer to quantify and compare temporal structure in speech, movement, and animal vocalizations

Research on the evolution of human speech and phonology benefits from the comparative approach: structural, spectral, and temporal features can be extracted and compared across species in an attempt to reconstruct the evolutionary history of human speech. Here we focus on analytical tools to measure and compare temporal structure in human speech and animal vocalizations. We introduce the reader to a range of statistical methods usable, on the one hand, to quantify rhythmic complexity in single vocalizations, and on the other hand, to compare rhythmic structure between multiple vocalizations. These methods include: time series analysis, distributional measures, variability metrics, Fourier transform, autoand cross-correlation, phase portraits, and circular statistics. Using computer-generated data, we apply a range of techniques, walking the reader through the necessary software and its functions. We describe which techniques are most appropriate to test particular hypotheses on rhythmic structure, and provide possible interpretations of the tests. These techniques can be equally well applied to find rhythmic structure in gesture, movement, and any other behavior developing over time, when the research focus lies on its temporal structure. This introduction to quantitative techniques for rhythm and timing analysis will hopefully spur additional comparative research, and will produce comparable results across all disciplines working on the evolution of speech, ultimately advancing the field.

Three main avenues seem particularly relevant to the comparative study of the evolution of speech (Fig. 1).
Positional regularities: how the building blocks of speech or language, each taken holistically, are organized and related to each other (Fitch 2014;Kershenbaum et al. 2014;ten Cate 2016).
Spectral characteristics: how different frequencies and their intensities relate to one another, giving rise to tone in speech, liaison, vowel quality, harmonicity, etc. (Fitch 2000;Yip 2006;Spierings and ten Cate 2016).Figure 1.Three approaches to the analysis of animal vocalizations, including human speech, laying emphasis on sequential, spectral, or temporal features of the signal.The three-dimensional space presents a common multicomponent approach to speech and animal vocalizations, showing that these components constitute complementary dimensions, rather than a mutually exclusive classification.Depending on the species and scientific question, one-dimension might be particularly relevant.Research focus on structural, spectral, and temporal information is exemplified along the three main axes, with many cases of vocalizations falling between two categories.Counter-clockwise, the figure shows structures where a possible research emphasis is laid on: (a) Sequential: an abstract representation of a series of song notes, and its mirror permutation, containing the same elements and having the same duration but different sequential properties; (b) Spectro-sequential: spectrogram of a chunk of zebra finch song (Taeniopygia guttata), where complex spectral information is combined into a sequential repetitive structure (dashed boxes); (c) Spectral: spectrogram of a harbor seal pup call (Phoca vitulina), showing harmonic features, with less emphasis on structural/positional regularities; (d) Spectro-temporal: schematic representation of a spectrogram showing a singing lemur vocal duet (Indri indri), where spectral and temporal information might be interacting (Gamba et al. 2016); (e) Temporal: spectrogram of a California sea lion (Zalophus californianus)'s bark, showing a clear rhythmic, isochronous pattern: barks contain little spectral information and variability, but are produced with remarkable regularity, like a metronome; (f) Tempo-sequential: notes composing a zebra finch song, which in turn possesses an underlying rhythmic structure (Norton and Scharff 2016).Notice that these examples concern the research focus, rather than the nature of the signal: for instance, panel (e) is taken as a prototype of temporal signal, but it also has a clear spectral structure.This article focuses on useful methods to analyze cases (d), (e), and (f), that is, when emphasis is put on analysing the temporal structure of vocalizations.Panel (d) reproduced and modified from Gamba et al. (2016) and panels (a) and (f) from Norton and Scharff (2016), both published open access under the CC BY 4.0.Other panels generated using Praat (Boersma and Weeninck, 2013).

Sequential
Temporal structure: how the speech/vocal signal is organized and develops over time (Ramus et al. 1999;Goswami and Leong 2013;Bekius et al. 2016;Benichov et al. 2016;Hannon et al. 2016;Jadoul et al. 2016).Timing, together with some spectral characteristics, contributes to rhythm (see Table 1 for a definition of rhythm and other major concepts presented in this article).
Rhythmic properties of human speech (Ramus, et al. 1999;Grabe and Low 2002;Tilsen 2009;Lehiste 1977) and animal vocal communication (Saar and Mitra 2008;Norton and Scharff 2016;Ravignani et al. 2016) are often investigated.However, lack of an interdisciplinary statistical approach to rhythmicity of vocalizations across species hinders comparative and comparable research.
Here we introduce the reader to a broad range of quantitative methods useful for rhythm research within and across species.The article is structured as follows.A first distinction between methods concerns measuring either rhythmic complexity of one pattern, that is, how regular and predictable a sequence of events is (Section 2), or rhythmic relationships between multiple patterns, that is, similarities in the temporal structure of two or more sequences of events (Section 3).
Section 3 extends Section 2 by focusing on measures of temporal similarity between two patterns, namely: cross-correlation (subsection 3.1); multidimensional time series analysis, testing how one pattern can be linearly estimated (subsection 3.2) or causally predicted (subsection 3.3) from the other; statistical analysis of circular data, including visualization with rose plots (subsection 3.4).
Section 4 describes some research areas where the techniques we present may be particularly useful.Section 5 provides some overall conclusions.Section 6 describes how to access the data and scripts used in this article.To get a better understanding of the methods presented in this article, we encourage the reader to use the provided code (Supplementary data) to experiment with different patterns.New versions of the code will be uploaded at http://userpage.fu-berlin.de/phno/mrc/.

Overview
The temporal structure of vocalizations can be analyzed on the raw audio signal, or more commonly on time series extracted from the signal, like the amplitude envelope.Often however, one is interested in the timing of certain discrete events, for example, syllable onsets, amplitude peaks, or any other behavior developing over time.Some of the methods described here apply to To demonstrate the methods described in this section, we generated five sequences that differ in their rhythmic structure, each consisting of twenty-four events (Fig. 2, left column).To simplify notation, we demonstrate techniques for events with IOIs in the range of 1-6 s, though all presented techniques can be used at any time scale, depending on the observed behavior.

IOI distribution: which intervals occur in the pattern, and how often?
A histogram of all IOIs in a sequence provides a first visualization of its rhythmic structure (Fig. 2, right column).A normal distribution suggests a roughly isochronous pattern, where a lower spread indicates a more isochronous pattern (Fig. 2b).The existence of a few distinct IOI categories may appear as a multimodal distribution (Fig. 2c).Uniformly distributed IOIs might hint at a lack of such categories or a rhythmic structure in general (Fig. 2a).However, when many durational categories exist, a uniform distribution might also appear, concealing higher order structure (Fig. 2d).A particular limitation of histograms and statistical measures of durational distributions is their lack of sensitivity to structure: for instance, any permutation of the order of IOIs will change the structure while leaving distributional measures unaltered.

Kolmogorov-Smirnov D: does interval distribution differ from any hypothesized distribution?
The one-sample Kolmogorov-Smirnov test (K-S test, Table 2) can be used to evaluate how different an observed IOI distribution is from a particular, specified distribution, for example, a normal distribution in the isochronous case, or a uniform distribution in the case of no actual structure (Lilliefors 1967).The test uses the statistic D, measuring the maximum distance between two cumulative distribution functions.This measure can be used to compare, for example, different languages based on the normality of the intervals between syllable nuclei (Jadoul et al. 2016).When compared to a normal distribution, the isochronous pattern in Fig. 2b has  The second sequence is isochronous, that is, IOIs of successive events are equal for all events (in this case 2 s).This sequence with added noise is taken to loosely represent the hypothesized syllable-timed structure of some languages (Grabe and Low 2002).(c) Rhythmic sequence: The third pattern contains four repetitions of a sub-pattern consisting of six events (IOIs: 1, 1, 3, 2, 2 and 3 s).(d-e) The final two sequences are loosely based on concepts of timing division in language: stress timing and mora timing.(d) The stress sequence consists of clusters of four events each.The duration of all clusters is 8 s and the pattern of IOIs within a cluster are different each time (e.g.[0.5,1.5,3.75,2.25],[2,1,3.5,1.5], . ..).
Gaussian noise with a SD of 0.04 s (isochronous and rhythmic sequence) or 0.02 s (stress and mora sequence) was added to the timing of the events.
of 0.085, about half of the 0.14 of the mora pattern (Fig. 2e).In order to statistically test an IOI distribution for normality, a modification of the K-S test by Lilliefors can be used instead (Lilliefors 1967).Of the five example patterns in Fig. 2, only the random and rhythmic patterns are significantly non-normal according to the Lilliefors test (P ¼ 0.035 and 0.024, respectively; n ¼ 24).

2.4
The nPVI: how variable are adjacent intervals?
The nPVI is a measure originally developed to quantify temporal variability in speech rhythm (Table 2; Grabe and Low 2002;Toussaint 2013).It is computed as follows.For each pair of adjacent IOIs, one calculates their difference and divides it by their average.The nPVI equals the average of all these ratios, multiplied by 100, namely: The nPVI equals zero for a perfectly isochronous sequence, and it increases with an increasing alternation in onset timing.Some have suggested nPVI captures crosslinguistic rhythmic classes, with syllable-timed languages exhibiting low nPVI, stress-timed languages having high nPVI, and mora-timed languages lying between the two extremes (Grabe and Low 2002).The nPVI values of our computer-generated example patterns align with this classification: the isochronous pattern, representing syllable-timing, has a low nPVI of 3.3, the stress pattern a high value of 92.74 (close to the random pattern with 94.54), while the mora pattern lies in between with 52.84.Notice however that nPVI has little explanatory power beyond simple, zeroth-order distributional statistics, such as Kolmogorov-Smirnov D (Jadoul et al. 2016;Ravignani, in press).

Autocorrelation: how similar is a pattern to itself at different points in time?
Autocorrelation is a technique that can help reveal higher order structure within a pattern, specifically repeating temporal sub-patterns (Hamilton 1994;Ravignani and Sonnweber 2015).Autocorrelation correlates a time series (Fig. 3b) with a copy of itself at different time lags (Fig. 3c-f, red dotted line).It is calculated by having one copy of the signal slide stepwise across the other; at each step the products of all points in the two signals are calculated and added together.The result of the process is a function of these sums at the different time lags between the two signals (Fig. 3h).The autocorrelation starts at a time lag of zero, where the two copies overlap perfectly (Fig. 3c).At zero lag, the autocorrelation function is normalized to 1 (Fig. 3h).Since the signal is nonzero only at the time points of the events, the function will be zero at the lags at which none of the events overlap (Fig. 3d,h).When one or more events in the two signals overlap a peak appears in the function (Fig. 3e,h).The more the events overlap at a certain lag, the higher the peak in the function (Fig. 3f,h), suggesting that a sub-pattern might repeat after a duration that equals this lag.Note that unless the IOIs are exactly equal in duration, the events in the raw time series will not overlap.Real-world patterns, however, are rarely isochronous and any rhythmic structure usually contains some amount of noise or jitter.To deal with noise one can convolute the time series with a narrow normal probability density function (npdf) prior to autocorrelation, effectively replacing every single time point event with narrow Gauss curves (Fig. 3a,b).Several nearby, though not simultaneously overlapping, events can thereby partially overlap over a range of lags and together contribute to a single peak in the autocorrelation function (e.g.Fig. 3e).
Figure 2b shows an isochronous pattern (IOI % 2 s) of twenty-four events with a small amount of Gaussian noise added.At a lag of 2 s, a peak close to 1 appears in the autocorrelation function, as events 2-24 of the signal overlap with events 1-23 of the lagged signal (Fig. 3g).The height of the following peaks-regularly occurring at n Â 2 s-gradually decreases as the lagged signal moves beyond the original signal and fewer events overlap.
The 'rhythmic sequence' contains three repetitions of a random sub-pattern consisting of six events with IOIs equaling 1, 2, or 3 s (total duration ¼ 12 s; Fig. 2c).Looking at the autocorrelation plot of this pattern, the first peak appears at a lag of 1 s, where some of the events start to overlap, and all following peaks are at lags 2,3,4,. ..s (Fig. 3h).This suggests 1 s is the basic time unit of this pattern, that is, all IOIs are multiples of 1 s.The largest peak is located at 12 s lag, because 12 s is the total duration of the repeating sub-pattern.At this point all events of sub-patterns 2, 3 and 4 of the signal overlap with sub-patterns 1, 2 and 3 of the lagged signal.Note that a higher peak could potentially occur at a lower lag, if more than three-quarters of events were to overlap.Relatively high peaks are likely to appear at harmonics (i.e.multiples) of the 12 s lag, namely at 24 and 36 s where the following repetitions of the subpattern overlap.In the 'mora sequence' events occur every 3 s as well as at varying time points in between (Fig. 2e).Accordingly, the autocorrelation function shows peaks at 3, 6 s, etc., as well as some noise stemming from less-regularly occurring events (Fig. 3i).

Fourier transform: decomposing the pattern into a sum of regular, clock-like oscillators
The Fourier transform is used to express any signal as a sum of sine waves of different frequencies and phases.In acoustics, one application is to decompose a sound wave into its constituent frequencies.When applied to the time series of a rhythmic pattern, the Fourier transform finds periodicities in the timing of the events.A particularly important periodicity is the pulse: the slowest isochronous sequence to which most events align.The Fourier transform can be visualized in a power spectrum, where the x-axis denotes the frequencies of the waves and the y-axis their magnitude, that is, how much they contribute to the signal (Fig. 4).
In the 'isochronous sequence', the first and largest peak in the power spectrum is located at 0.5 Hz corresponding to the pulse of the pattern, that is, the frequency at which the events regularly occur, namely every 2 s (Fig. 4a).If a signal is a sine wave, the power spectrum will consist of only this peak.However, often the signal is a sequence of narrow spikes, hence additional waves are needed to reconstruct the signal.This explains the higher frequency waves in the power spectrum in Fig. 4a.
In the stress and mora patterns, the maximum peaks in the power spectra are located at 4 Hz (Fig. 4c,d) because all IOIs are integer multiples of 0.25 s, so a 4-Hz pulse coincides with all events.A more direct, albeit computationally intensive, approach to finding the pulse of a sequence is 'generateand-test' (GAT).First, a low frequency, isochronous grid or pulse is created in the form of regularly spaced timestamps (Fig. 5a).The deviation of each event in the pattern to its nearest grid element is measured and the root-mean-squared deviation of the whole pattern, multiplied by the grid frequency (frmsd) is calculated.The grid is then shifted forward in time in small steps in order to find the best fit (i.e.lowest frmsd) for the grid of that particular frequency (Fig. 5b).Next, the grid frequency is increased in small steps and the minimal frmsd is again calculated for each step (Fig. 5c,d).If a rhythmic sequence has an underlying isochronous regularity, its frequency can be determined by finding the grid frequency with the overall lowest frmsd value.This GAT approach was recently used to find isochrony in zebra finch song (Norton and Scharff 2016).In many cases, the GAT frequency with the lowest frmsd corresponds to the frequency of highest power in the Fourier transform.The GAT method, however, is specifically aimed at finding the slowest pulse that still fits all events in the sequence, also called the tatum (Bilmes 1993).In our computer-generated 'rhythmic sequence', for example, the tatum has a frequency of 1 Hz, because all IOIs (1, 2, and 3 s) are integer multiples of 1 s (Fig. 5f).However, the frequency of highest power in the Fourier transform is located at twice the frequency, 2 Hz (Fig. 4b).While a pulse of 2 Hz covers all events as well, the slower pulse of 1 Hz is enough to explain the underlying rhythmic structure.An advantage of the GAT method is that it provides the frmsd as a measure for goodness of fit.This can be used to statistically compare different patterns in terms of their pulse fidelity, albeit only to pulses of similar frequency, since the frmsd is frequency-dependent.

Phase portraits: how to visualize repeating rhythmic patters
Unlike IOI histograms, phase portraits visualize the durations of all IOIs of a pattern, while taking their firstorder sequential structure (i.e.adjacent IOIs) into account.Pairs of adjacent IOIs hereby serve as x-and y-  The same events converted to a time series and convoluted with a narrow npdf.(c) At zero lag, the stationary copy of the time series (blue, continuous line) and the timelagged copy (red, dotted line) overlap perfectly.For the sake of the example the time-lagged copy is only shown for the first six events (i.e. the first repetition of the sub-pattern).(d) At a time lag of 0.5 s none of the events overlap, that is, across the whole time range at least one of the two signals is zero.(e) At 1 s lag, events 1 and 2 of the lagged copy partially overlap with events 2 and 3 of the stationary signal.(f) At a lag of 2 s, three of the events overlap, resulting in a higher peak ('f' in h).The peak is also narrower than the one at 1 s lag, indicating that the events overlap more closely.(g) Autocorrelation function of the isochronous sequence, depicted in Fig. 2b.(h) Autocorrelation function of the rhythmic sequence, seen in Fig. 2c.The letters c-f point to the lags that are depicted in (c-f).(i) Autocorrelation function of the mora sequence (see Fig. 2e).
coordinates, respectively.A line connects coordinates of neighboring pairs.Together, these lines form a continuous trajectory on a two-dimensional plane.In the 'rhythmic sequence' (Fig. 2c), for example, the first two IOIs are both 1 s long, hence a dot is plotted at coordinates (1,1) (Fig. 6a).The next dot is plotted at coordinates (1,3), corresponding to the second and third IOI, which are 1 and 3 s, respectively.The trajectory connects these coordinates and moves from (1,1) to (1,3) to (3,2), and so on until it reaches (1,1) again at the end of the first sub-pattern (Fig. 6b-d).As the pattern repeats, the trajectory traces the same path three more times, albeit with slight deviations due to the noise in the timing of events (Fig. 6e).Geometrical regularities in phase portraits correspond to rhythmic regularities in the acoustic patterns (Ravignani, in press;Ravignani et al. 2016).A repeated rhythmic pattern corresponds to similar superimposed polygons.A cyclical permutation of a rhythmic pattern (e.g. from 1,3,2,2 to 2,2,1,3) will produce the same polygon from a different starting point.Reversing the pattern (e.g. from 1,2,3 to 3,2,1) will result in the same, though rotated, polygon.Patterns that are palindromic on any phase (the 'rhythmic sequence', for example, has a palindromic cyclical permutation: 1,3,2,2,3,1) appear as polygons symmetrical with respect to the diagonal (Fig. 6h).An isochronous pattern or sub-pattern appears as a relatively dense cloud of nearby points (Fig. 6g).Some other structural regularities, like the fact that sets of four IOIs in the 'stress pattern' add up to the same duration, might go undetected and resemble a 'random pattern' (Fig. 6f,i).

Recurrence plots: visualizing pairwise similarities among all intervals in a pattern
Recurrence plots are another descriptive method for visualizing rhythmic structure, with the potential to reveal repeating sub-patterns at a glance.These plots show when a time series revisits the same regions of phase space (Dale et al. 2011;Coco and Dale 2014).Recurrence plots can be used to visualize any time series or-as shown in Fig. 7-the sequence of IOIs.As such, the recurrence plot is a two-dimensional matrix in which the similarity between any two IOIs is color-coded.The color code can be a gradient or, as shown here, monochromatic, where a black square represents similarity between two IOIs above a particular threshold.The IOI indices are noted on both axes of a recurrence plot in sequential order.Looking at the bottom row from left to right, the position of black squares indicates the sequential positions of IOIs that are similar to the first IOI, the second row to the second IOI, and so on.
The plot for the 'isochronous sequence' is all black, indicating that all IOIs are similar to each other (Fig. 7b).The repetition of the sub-pattern (6 IOIs) of the 'rhythmic sequence' is plotted as repeating patterns in the horizontal and vertical directions.As the sub-pattern is palindromic, the plot is symmetrical with respect to both diagonals (Fig. 7c).Larger black patches, as seen in the 'mora sequence' (Fig. 7e), indicate that a number of neighboring events share a similar IOI.Like phase portraits, recurrence plots visualize sequential structure, but fail to capture higher level structures, such as the equal cluster duration in the 'stress pattern' (Fig. 7d).
Step-by-step visualization of the phase portrait based on the rhythmic sequence (a-e) and phase portraits of the five example patterns (f-j).Pairs of adjacent IOIs serve as x-and y-coordinates, respectively and a line connects coordinates of neighboring pairs.(f) The random pattern shows no discernable geometrical structure.(g) The isochronous sequence appears as a dense cloud of nearby points.(h) Repetitions of a sub-pattern as in the rhythmic sequence appear as superimposed polygons, slightly shifted due to the jitter introduced by the Gaussian noise.(i and j) Higher order regularities like the constant cluster duration of the stress sequence do not appear in the phase portrait.
Figure 5. Visualization of the pulse GAT method (a-d), and the result of its application to four of the example patterns (e-h).Note that the y-axis in (e-h) is inverted.(a-d) Depicted are the first nine events of the rhythmic sequence (black vertical lines) and the deviations (d1-d9, blue lines) to their nearest element of the isochronous grid (red dotted lines).Example grid frequencies are 0.5 Hz (a and b) and 1 Hz (c and d).(e) The isochronous sequence, where all events occur roughly 2 s apart, best fits a pulse of 0.5 Hz.(f) In the rhythmic sequence all IOIs are multiples of one (1, 2 and 3), so a pulse of 1 Hz has the lowest frmsd (see also d).(g and h) In the mora and stress sequences all IOIs are multiples of 0.25 s (0.25, 0.5, 0.75, . ..), so a pulse of 4 Hz fits best to both.
2.10 Time series and regressions: can the structure in a pattern be described by a linear equation?
Time series analysis denotes a broad range of statistical methods used to extract information from data points ordered in time.Time series analysis encompasses some of the techniques described above, namely autocorrelation, cross-correlation, and Fourier analysis.In addition, autoregressive moving average models (ARMA) are promising time series techniques for speech rhythm (Jadoul et al. 2016), until now commonly employed in ecology, neuroscience, demography, and finance (Hamilton 1994).While autocorrelation probes the existence of some linear relationship between intervals in a pattern, ARMA is used to test for, model, and quantify this linear relationship.In brief, an ARMA model is a parametric, linear description of a time series: one IOI is expressed as a linear combination (i.e.weighted sum) of previous IOI values, possibly previous values of another time series (e.g.intensity of previous units of speech/vocalization), and random noise.In principle, a number of metrical stress patterns in human speech could be captured by the equation: where a, b, and c are the (empirically estimated) model parameters, and e t is the random error.For instance, if , and c ¼ 0, the equation becomes and it describes both iambic and trochaic meters, namely an alternation of one short and one long interval, corresponding to a strong-weak, or weak-strong, alternation in stress.(The reader can try this herself by plugging IOI tÀ1 ¼ 1 in Equation ( 3) and iterating, i.e.
the resulting IOI t becomes the next IOI tÀ1 .)Instead, if c ¼ À1, then corresponding to the dactyl meter, that is, one long interval followed by two short ones.ARMA models are extremely powerful but, apart from few exceptions (Gamba et al. 2016;Jadoul et al. 2016), still scarcely used in human phonology and animal communication.

Between-pattern analytical techniques: comparing rhythms
3.1 Cross-correlation: are patterns linearly related?
The process of cross-correlation is similar to the autocorrelation described above, except that it correlates two different signals instead of two copies of the same signal.This makes it a useful tool to find common rhythmic properties in multiple sequences.
The cross-correlation function of two isochronous sequences (Table 3), one with noise added ('noisy-isochronous') and one without ('isochronous'), is similar to the autocorrelation function, except that the function is smaller than one at lag zero, as the events do not perfectly overlap (Fig. 8b, middle column).Cross-correlating two different 'mora sequences' reveals a common rhythmic property: a subset of events occurs regularly every 3 s in both sequences (Fig. 8d).

Multidimensional time series: how are patterns linearly related?
The ARMA models presented above for pattern can be used to model the linear relationship between two patterns' IOIs.To test for temporal structure between two individuals vocalizing synchronously, performing turn-taking, etc., the ARMA equation for one individual can be enriched by past IOI terms of the other individual (e.g.IOI 0 tÀ1 and IOI 0 tÀ2 ), leading to the equation: Statistical estimation of parameters a; b; c; b 0 ; and c 0 allows in turn to draw inference about the relative contribution of the two individuals to each other's timing.

Granger causality: can one pattern structure help predict the other?
Based on ARMA modeling, one can test whether one time series significantly affects the other, using a test for Granger causality (Hamilton 1994).This technique tests whether future values of a target time series are better predicted by considering values of another time series, rather than using past values of the target series alone.
In terms of equation ( 5), Granger causality corresponds to testing whether b 0 IOI 0 tÀ1 þ c 0 IOI 0 tÀ2 contributes to the statistical prediction of IOI(t).Granger causality has been successfully applied in neuroscience and economics, and it is being increasingly employed to analyze production of rhythmic patterns in humans and other animals (Seth 2010;Fuhrmann et al. 2014;Gamba et al. 2016).
3.4 Circular statistics: how does a pattern relate statistically to a fixed-pace clock?
Circular statistics (or directional statistics) offer a set of techniques that are useful when dealing with data from a circular distribution, like compass direction and time of day (Fisher 1995).In such distributions both ends of the scale have the same value, that is, they 'wrap around' and can be mapped to a circle.On a linear scale, for example, the mean of 355 and 5 would be 180 , which makes little sense when calculating, for example, compass directions.The circular mean on the other hand is 360 , which is identical to 0 .
Circular statistics can be used to compare any sequence against a known isochronous pattern.In such cases the distance of all events in the sequence to their nearest event in the isochronous pattern can be expressed as a circular distribution.An event that exactly coincides with an isochronous event is defined to have an angular direction of 0 .Events that lag behind the nearest isochronous event have positive angles >0 (i.e.positive delay), and events that precede have negative angles (equivalently, positive angles of <360 ).
The distribution of events can be visualized on a socalled rose plot, which is the circular equivalent of a linear histogram (Fig. 8, right column).An arrow can be added to mark the circular mean vector, which indicates the mean direction and has a length between 0 (events are uniformly distributed) and 1 (events concentrate on a single direction).(Fisher 1995;Berens 2009) Testing hypotheses on periodically occurring events Necessary for some data types; using classical statistics would be a mistake A number of assumptions, concerning periodicity of events generating the data, are required Several significance tests exist to evaluate the circularity of a distribution (Zar 2010).The Rayleigh z-test can determine whether the sample data comes from a uniform distribution; For instance, the deviations of the random sequence from the isochronous pattern are uniformly distributed (Fig. 8a, P ¼ 82.7, z ¼ 216, n ¼ 24).The z-test assumes that the data is unimodally distributed and sampled from a von Mises distribution, the circular equivalent of a normal distribution.Kuiper's test is the circular analogue of the Kolmogorov-Smirnov test and can be used to test the difference between two distributions, for example, the sample data against a von Mises distribution.The bimodal distribution of the rhythmic pattern (Fig. 8c) for example differs significantly from a von Mises distribution (Kuiper's test, P < 0.005, V ¼ 336, n ¼ 24).If an expected mean direction is known beforehand (e.g.0 in synchronization experiments), the more precise V-test can be used to test against uniformity.If directions are not uniformly distributed and the mean direction does not significantly differ from 0 , the patterns are synchronous (V-test, P < 0.001, V ¼ 23.8, n ¼ 24; Fig. 8b, isochronous pattern).

4.
Where can these methods be fruitfully used?
Here we suggest some avenues of research where to apply the methods described above.
First, the field of birdsong bioacoustics should put more emphasis on temporal and rhythmic, not only Rose plots show the mean vector as a thick black line.The mean vector can have a length between 0 (uniform distribution) and the radius of the plot (all vectors have the same direction) (a) Comparison of a random pattern and an isochronous pattern (without added noise).(b) Comparison of two isochronous patterns, one with added noise (top, 'noisy-iso'), the other without (bottom, 'iso').(c) Comparison of a 'rhythmic' pattern and an isochronous pattern (without added noise).(d) Comparison of two different mora patterns (mora A, top and mora B, bottom).In the rose plot for the mora patterns each of the twenty sections is split between values from mora A (light gray, dotted mean vector) and mora B (dark gray, solid mean vector).Both patterns are compared against an isochronous sequence with IOI ¼ 3 s.
spectral or positional, properties of the signal.Building on recent findings, songs from different birdsong species could be compared to temporal structures hypothesized a priori or across species (Benichov et al. 2016;Norton and Scharff 2016;Spierings andten Cate 2016, Janney et al. 2016), ultimately constructing 'rhythmic phylogenies'.Fourier analysis, GAT pulse matching and circular statistics might be particularly suitable to uncover isochronous structure in birdsong and human speech.
Second, the comparative study of vocal complexity has traditionally encompassed species capable of vocal production learning, putting relatively more emphasis on animals' abilities to modify spectral, rather than temporal, properties of the vocalizations (e.g.Kershenbaum et al. 2014).The methods presented here could be used to analyze rhythmic properties of acoustic behavior in animals with limited vocal learning, which however have potential for temporal flexibility, for instance, primates' chorusing (Fedurek et al. 2013;Gamba et al. 2016), sea lions barks (Schusterman 1977, Ravignani et al. 2016), ape drumming (Ravignani et al. 2013;Dufour et al. 2015), etc. Phase portraits, autocorrelation, circular statistics, and recurrence plots could be used, among others, to uncover the possibly flexible temporal structures in these acoustic behaviors.
Third, conversational turn-taking across human languages and animal species is a topic gaining increasing scientific interest (Levinson 2016;Vernes 2016).This area is, however, missing an advanced and unified quantitative analytical framework.We suggest that the methods described here can be used for exactly such purpose, enabling comparisons of temporal structure in communicative interaction across human languages, modalities, and species.Time series analysis, in particular ARMA models and Granger causality, might be suitable to investigate turn-taking.
Fourth, the methods we present make almost no topdown assumptions about the structure present in the signal.Hence, they can be employed to investigate phonology, individual timing, and rhythm in modalities and domains other than speech (e.g.Dale et al. 2011;Fan et al. 2016), most notably in sign languages ( de Vos et al. 2015).Distributional indexes and all other structural measures in Table 2 could be used to quantify temporal structure in a given modality or domain.
Fifth, and finally, the statistical methods presented can be used to test for cross-modal influences in multimodal communication (e.g.Tilsen 2009).Temporal interdependencies between modalities could arise, for instance, in the co-evolution of human rhythmic vocalizations and coordinated movement (Laland, et al. 2016;Ravignani and Cook 2016) or the simultaneous song and 'dance' display of some bird species (Dalziell et al. 2013;Ullrich et al. 2016).Levenshtein distance (Post and Toussaint 2011;Ravignani et al. 2016), and other dyadic techniques in Table 3 could be used to relate temporal structure between modalities.

Figure 2 .
Figure2.Five different example patterns that differ in their rhythmic structure (a-e, left column), as well as histograms of their inter-onset intervals (IOI, right column).(a) Random sequence: The time points of events of the first sequence are pseudorandomly drawn from a uniform distribution.(b) Isochronous sequence: The second sequence is isochronous, that is, IOIs of successive events are equal for all events (in this case 2 s).This sequence with added noise is taken to loosely represent the hypothesized syllable-timed structure of some languages(Grabe and Low 2002).(c) Rhythmic sequence: The third pattern contains four repetitions of a sub-pattern consisting of six events (IOIs: 1, 1, 3, 2, 2 and 3 s).(d-e) The final two sequences are loosely based on concepts of timing division in language: stress timing and mora timing.(d) The stress sequence consists of clusters of four events each.The duration of all clusters is 8 s and the pattern of IOIs within a cluster are different each time(e.g.[0.5,1.5,3.75,2.25],[2,1,3.5,1.5], . ..).(e) The mora sequence contains clusters of 3 s duration each, that either contain one or two events (e.g.[1,2], [3], [2.25,0.75], . ..).Gaussian noise with a SD of 0.04 s (isochronous and rhythmic sequence) or 0.02 s (stress and mora sequence) was added to the timing of the events.

Figure 3 .
Figure3.Visualization of the autocorrelation process (a-f) and the resulting autocorrelation function for three of the example patterns (g-i).(a) The first nine events of the rhythmic sequence depicted in Fig.2c.(b) The same events converted to a time series and convoluted with a narrow npdf.(c) At zero lag, the stationary copy of the time series (blue, continuous line) and the timelagged copy (red, dotted line) overlap perfectly.For the sake of the example the time-lagged copy is only shown for the first six events (i.e. the first repetition of the sub-pattern).(d) At a time lag of 0.5 s none of the events overlap, that is, across the whole time range at least one of the two signals is zero.(e) At 1 s lag, events 1 and 2 of the lagged copy partially overlap with events 2 and 3 of the stationary signal.(f) At a lag of 2 s, three of the events overlap, resulting in a higher peak ('f' in h).The peak is also narrower than the one at 1 s lag, indicating that the events overlap more closely.(g) Autocorrelation function of the isochronous sequence, depicted in Fig.2b.(h) Autocorrelation function of the rhythmic sequence, seen in Fig.2c.The letters c-f point to the lags that are depicted in (c-f).(i) Autocorrelation function of the mora sequence (see Fig.2e).

Figure 4 .
Figure 4. Power spectra of four of the example patterns.The power spectrum is the result of a Fourier transform, which decomposes a signal into waves of different frequencies.It shows in what magnitude (y-axis) the different frequencies (x-axis) contribute to the signal.

Figure 7 .
Figure 7. Recurrence plots of the five example patterns (a-e).Each square reflects the similarity of two IOIs.Black squares represent pairs of IOIs whose difference is below a certain threshold (here: 0.3 s).Recurrence plots are always symmetrical on the diagonal.(a) The random sequence shows no regular structure.(b) The plot for the isochronous sequence is all black, as all IOIs are similar to each other.(c) The repeating sup-patterns of the rhythmic sequence appear as repeating patterns in both dimensions.(d) Higher order regularities of the stress pattern do not appear in the recurrence plot.(e) Repeating IOIs of similar duration (3 s, in the middle of the mora pattern) appear as larger black patches.

Figure 8 .
Figure8.Cross-correlation functions (middle) and rose plots (right) for four sets of two patterns each, with different rhythmic structure (a-d).Rose plots show the mean vector as a thick black line.The mean vector can have a length between 0 (uniform distribution) and the radius of the plot (all vectors have the same direction) (a) Comparison of a random pattern and an isochronous pattern (without added noise).(b) Comparison of two isochronous patterns, one with added noise (top, 'noisy-iso'), the other without (bottom, 'iso').(c) Comparison of a 'rhythmic' pattern and an isochronous pattern (without added noise).(d) Comparison of two different mora patterns (mora A, top and mora B, bottom).In the rose plot for the mora patterns each of the twenty sections is split between values from mora A (light gray, dotted mean vector) and mora B (dark gray, solid mean vector).Both patterns are compared against an isochronous sequence with IOI ¼ 3 s.

Figure 9 .
Figure9.Flux diagram to select the most appropriate technique for rhythm analysis.Answers (regular font) to specific questions (in bold) guide the reader toward the appropriate technique and corresponding subsection (in italics).

Table 1 .
Definition of crucial concepts in comparative rhythm research

Table 2 .
Analytical techniques to unveil temporal structure in one pattern, including their specific function

Table 3 .
Analytical techniques, including their specific function, to compare temporal structure between two or more patterns , be it its origin, mechanisms, or function.Here we present a number of techniques to analyze rhythmic temporal structure in animal vocalizations,