Integration of anatomy ontologies and Evo-Devo using structured Markov chains suggests a new framework for modeling discrete phenotypic traits

Modeling discrete phenotypic traits for either ancestral character state reconstruction or morphology-based phylogenetic inference suffers from ambiguities of character coding, homology assessment, dependencies, and selection of adequate models. These drawbacks occur because trait evolution is driven by the two key processes – hierarchical and hidden – which are not accommodated simultaneously by the available phylogenetic methods. The hierarchical process refers to the dependencies between anatomical body parts, while the hidden process refers to the evolution of gene regulatory networks (GRNs) underlying trait development. Herein, I demonstrate that these processes can be efficiently modeled using structured Markov chains equipped with hidden states, which resolves the majority of the problems associated with discrete traits. Integration of structured Markov chains with anatomy ontologies adequately incorporates the hierarchical dependencies, while use of the hidden states accommodates hidden GRN evolution and mutation rate heterogeneity. This model is insensitive to alternative coding approaches which is shown by solving the Maddison’s tail color problem. Additionally, this model provides new insight into character concept and homology assessment. The practical considerations for implementing this model in phylogenetic inference and comparative methods are discussed.

implies that the actual trait evolution is hidden behind a "curtain" from the direct observation of morphology. In most cases, the observer (i.e. scientist) has no clue of how this process is operating, and only knows what is going on behind the "curtain" from the outcome of the process (i.e. morphological traits). The hierarchical process refers to the hierarchical relationships between traits that arise due to hierarchical dependencies between anatomical parts.
For example, digits and characters associated with them are located on limbs; loss of limbs during evolution simultaneously causes the loss of digits. Additionally, hidden processes of GRN evolution -through interacting cascades of genes -can also result in hidden dependencies among observable morphological traits.
The hidden and hierarchical processes are not accommodated by available phylogenetic methods simultaneously regardless the approach used for trait analyses, be it parsimony (Lee and Bryant 1999;Fitzhugh 2006;Brazeau 2011) or traditional Markov models. By proposing a new framework, I will demonstrate that the simultaneous inclusion of these processes, to a large extent, eliminates ambiguities associated with trait modeling. This framework is the extension of the traditional Markov model approach commonly used for modeling traits (Lewis 2001;O'Meara 2012). The most common version, hereafter referred as simple Markov chains (SMC), implies that a discrete character is a continuous-time Markov chain that moves sequentially from one character state to another over the course of evolution. Such a Markov chain is defined by the transition rate matrix containing infinitesimal rates of change between the states, and a base frequency vector specifying the initial probabilities of states at the root of a phylogenetic tree [see e.g., Huelsenbeck et al. (2003) for details].
The new framework extends SMC using the theory of structured Markov chains [StMC, (Nodelman et al. 2002)] and hidden Markov chains [HMC, (Beaulieu et al. 2013)] to accommodate complex evolutionary space and anatomical dependencies among traits. To justify the proposed framework, I will show how the anatomical dependencies can be efficiently incorporated into model using structured Markov chains and anatomy ontologies. This integration provides the solution for the well-known Maddison's tail color problem (Maddison 1993;Hawkins et al. 1997). Next, I will discuss how the correspondence between traits and their GRNs can be modeled using HMC. Finally, the unified framework for character modeling will be proposed and practical considerations on trait modeling and phylogenetic inference will be given. The central focus of this paper is morphological traits; however, the presented results can be directly extended to other discrete traits of phenotype, such as behavior. Encoding traits is a crucial step in constructing discrete morphological characters.

MODELING HIERARCHICAL PROCESS USING STRUCTURED MARKOV CHAINS
Although for some traits this can be straightforward, many real-life situations lack an unambiguous coding approach. In practice, this means selecting between (1) ordered or unordered characters, (2) coding schemes (a set of binary characters, one multistate character or a mixture of both), and (3) a way to code inapplicable observations (using reductive coding "?" versus a separate state). Each of the alternatives have their own pros and cons but none of them, due to the lack of consensus in the published studies (Maddison 1993;Pleijel 1995;Hawkins et al. 1997;Strong and Lipscomb 1999;Forey and Kitching 2000;Brazeau 2011), can be regarded as the ideal one. Additionally, there is no general consensus on the distinction and validity between character and character state. Some studies insist that the distinction is important (Pinna 1991;Hawkins et al. 1997;Wagner 2015), some suggest that both concepts are the same (Patterson 1982), while the others (Sereno 2007) view character states as mutually exclusive observations that have to be combined to perform analysis. The use of different coding approaches drastically affects the interpretation of results (Brazeau 2011).
The incorporation of anatomical dependencies between traits using structured Markov models resolves the conundrum associated with coding schemes and yields an invariance under alternative coding approaches. This invariance implies that any coding scheme produces the same result. I demonstrate the resiliency of my modeling framework by revisiting the exemplar Maddison's tail color problem (Maddison 1993;Hawkins et al. 1997) that seeks the optimal scheme for scoring tail traits in species which can have a tail with either blue or red color, or have no tail at all (Figs. 1, 2). The Maddison's problem exemplifies a common situation of scoring complex traits with dependencies between traits, and any solution to this problem can be extrapolated to the majority of ambiguous coding cases encountered in practice. The core ingredient of the proposed approach is the explicit incorporation of anatomical dependencies which can be inferred from ontological structure of anatomy ( Fig. 1), therefore I refer to this approach as "structured Markov models informed by anatomy ontology". In this chapter, I will begin with the reviewing the general properties of StMC, then I will demonstrate how various  (Nodelman et al. 2002), also known as continuous-time Bayesian Networks (Shelton and Ciardo 2014), arises from simple Markov chains. The only difference between the two is that the structured Markov models are equipped with a specific parametrization of the rate matrix that accommodates conditional dependencies between characters.
The structuring of Markov chains allows combining two or more initial characters into a single character. The initial characters can be combined either as independently or dependently evolving; the latter enables modeling correlated evolution across states between different initial characters. The states of the initial characters will be modified states in the combined character.
The rate matrix of the combined character is constructed out of the matrices X and Y using the equation (2) by merging the two matrices in a mathematically valid way (Supplementary Material, section S1): where and are the identity matrices of the same dimension as the matrices Y and X respectively, and ۪ denotes the Kronecker product. Given this, Z is: ൌ The combined matrix Z defines the four-state combined character Z. The four states of the character Z correspond to all possible permutations of states in the initial characters X and Y, which are {x 1 y 1 , x 2 y 1 , x 1 y 2 , x 2 y 2 }. The right-diagonal cells of this matrix are populated with zeros indicating that only one state of the initial characters can change during the infinitesimal time interval. The independent evolution of the initial characters constrains this matrix to possess some rate symmetries that do not hold otherwise (see the next section).
Two characters: general case of correlation. -The dependent evolution of the two characters X and Y implies that some states in one character are correlated with some states in the other. In this case, the combined character Z is characterized by the same states {x 1 y 1 , x 2 y 1 , x 1 y 2 , x 2 y 2 } as in the independent case but the rate symmetries in the combined matrix Z are different.
In character Z, each state consists of the two elements: the first corresponding to a state of X, the second to a state of Y. Let us denote by * any element in a combined state. For example, notation x 1 * indicates either state x 1 y 1 or x 1 y 2 of Z. Independent evolution implies that the transition rates x 1 *→x 2 * in Z must be the same as the transition rate x 1 → x 2 in the initial chain X (i.e., rate ߙ ); the same should apply for the rates x 2 *→x 1 * that must be equal to the rate x 2 → x 1 in X (rate ߚ ); and analogous symmetries must hold for pairs *y 1 → *y 2 and *y 2 → *y 1 whose rates have to be equal to y 1 → y 2 and y 2 → y 1 in the initial chain Y respectively. If these equalities do not hold simultaneously, then the evolution of the two initial chains is correlated. So, in the most sophisticated case of the correlated evolution, the rate matrix Z has all rates different except for the right-diagonal ones that are set to zeros as in the independent case: The analytical derivation of these matrix is given in the Supplementary Material (section S2).
In this matrix, the transitions x 1 y 1 → x 1 y 2 and x 1 y 2 → x 1 y 1 are prohibited and equal to zero due to this particular type of dependency. there is a two-state character X {x 1 , x 2 } specified by the transition rate matrix X as in (1); we can decompose the character X into two separate characters X 1 and X 2 which denote presence or absence of the initial states of X. So, each separate character is interpreted as follows: X 1 {x 1 absent , x 1 present }, and X 2 {x 2 absent , x 2 present }. Since states x 1 and x 2 are mutually exclusive as they are the parts of the same initial character, the transitions in X 1 and X 2 occur synchronously: for example, the transition x 1 absent → x 1 present in X 1 immediately causes the compliment transition x 2 present → x 2 absent in X 2 . The scale-free property of character suggests that the reverse operation -combining X 1 and X 2 into one character -is possible. Let us denote this combination of X 1 and X 2 by Z. The character Z is supposed to have four states which are permutations of the original states: {x 1 absent x 2 absent , x 1 present x 2 absent , x 1 absent x 2 present , x 1 present x 2 present }. The synchronous transitions in X 1 and X 2 precludes using the previous approaches to combine X 1 and X 2 . This happens because the structure of the previous rate matrices allows only one change over the infinitesimal time interval.
In contrast, the synchronous evolution assumes the opposite -one change transitions have to be prohibited, while the two change transitions must be allowed. Additionally, any transitions associated with the states {x 1 absent x 2 absent } and {x 1 present x 2 present } must be also prohibited as these states cannot exists given the initial condition. This yields the following transition matrix of the character Z: Further, without loose of generality this combined matrix can be reduced to that of the initial character X (1). This property of StMC provides an invariance for character decomposition and subsequent merge.
Combining arbitrary number of characters. -The techniques shown above can be extrapolated to construct combined rate matrices for any arbitrary number of initial characters and character states. To combine n independently evolving characters one needs to successively repeat the equation (2) n-1 times until all initial characters are combined. For example, in the case of the three initial characters A, B, C, the first step constructs combined matrix for characters A and B (i.e. AB) and the second steps uses matrices of AB and C to construct the final combined character. In equation form this can be expressed as: The order at which matrices are combined does not matter. In the case of the correlated character evolution, the final matrix can be constructed as that for the independent characters and then modified to accommodate the desired pattern of correlation.
The combined matrices of coevolving characters have peculiar symmetries; although such matrices can be enormous, the vast majority of their cells are zeros, while the transition rates are located along the secondary diagonals ( Fig. 3a-g). For a chain of n coevolving characters with the equal number of states ω the proportion of non-zero elements in the large matrix is: The number and pattern of secondary diagonals populated with transition rates increases as the total number of states grows . If all elementary characters share equal quantity of states, the total number of secondary diagonals is (Fig. 3a,b,f,g). Each secondary diagonal encompasses rates from a single elementary character; asymmetries in the rates within a character split diagonals into rate groups ( Fig. 3c- Maddison's problem has been intensively discussed in the framework of parsimony but the coding consensus is still lacking (Brazeau 2011). I reduce this problem to assesses the alternative schemes used for coding tail traits in the three species with: (1) absent tail, (2) blue tail, and (3) red tail (Fig. 1). Traditionally, there exist three main schemes (Hawkins et al. 1997) of scoring these tail traits (Fig. 2).
The first scheme (Fig. 2a) uses two characters: (i) tail presence with two states, and (ii) tail color with three states, to encode the observations. In this scheme, the character (ii) can be reduced to only two states (blue and red) if its state "absent" is coded as inapplicable observation using "?". In the current context, the distinction between these two flavors is irrelevant.
The second scheme ( Fig. 2b) employs three binary characters: (i) tail present, (ii) blue tail present, and (iii) red tail present. This scheme can also have an alternative version that uses only characters (ii) and (iii) to encode the observations (Hawkins et al. 1997).
Finally, the third scheme ( Fig. 2e) uses one multistate character with three states (absent, blue, red) to simultaneously summarize the observations. In phylogenetic inference, one of the prevailing ways to deal with this problem is to use the scheme #2 with inapplicable coding (Maddison 1993;Hawkins et al. 1997;Strong and Lipscomb 1999). However, this coding scheme was shown to be flawed (Maddison 1993;Strong and Lipscomb 1999). In comparative phylogenetics, the preferable coding scheme is selected to best fit the needs of an analysis.

Solution using StMC. -All schemes become invariant if anatomical dependencies between
characters are incorporated using the StMC. These ontology-informed dependencies imply that tail color (either blue or red) depends on the tail presence; if tail is absent, then color trait is "switched-off". Below, I successively incorporate these dependencies using the alternative coding schemes which, in all instances, produces the scheme #3 with the special parametrization of the rate matrix.
First, let us consider the scheme #1 that treats tail traits as two binary characters. Given the invariance under character decomposition, the synchronous dependency between tail {absent} and tail color {absent} (Fig. 2a) is a redundant observation that can be omitted without loose of generality. So, each character can be represented by a two-state rate matrix: where TL is a matrix for tail presence {a -absent, p -present}, and CR is a matrix for tail color {r -red, b -blue}. The dependencies between these two characters ( Fig. 2c) can be incorporated using the "switch-off" correlation, which results in the following combined rate matrix: This matrix contains redundancy: the states ar and ab correspond to the same observation specifying absence of tail since tail color cannot be observed when the tail is absent. Interestingly, this matrix can be reduced by aggregating states ar and ab using the rule of strong lumpability (see the "Cases of lumpable chains" section) that gives the three-state matrix: This final matrix has specific symmetries between the rate parameters characterizing the dependencies between the states. So, when anatomical relationships are incorporated, the scheme #1 of the two characters, collapses to scheme #3 of one character, and the latter becomes equipped with the special parametrization of the matrix. In fact, the parametrization of such matrix should not be necessarily restricted to that in matrix (10) The same collapse happens for the coding scheme #2 that uses three binary characters to encode tail traits. In respect to anatomy, two characters in this scheme (red tail presence and blue tail presence) are dependent on the presence of the tail. There are two synchronous changes in these characters: (i) the tail absence causes absence of the blue and red color, while (ii) the absence of the red color causes presence of the blue color (Fig. 2b). Based on the properties of the synchronous changes, these dependencies can be straightforwardly reduced to those of the scheme #1 without loose of generality. In turn, the scheme #1 can be further collapsed, as shown above, to the rate matrix (10).
To sum up, the incorporation of ontological information from anatomy using StMC produces coding invariance regardless of the scheme used. This invariance eliminates ambiguity of character coding. If the given alternative schemes are modelled onto an existing phylogenetic tree, the result is expected to be the same. So, these alternatives are just different ways of representing the same morphological observations. Moreover, incorporation of dependencies does not require using inapplicable coding, thus avoiding uncertainty associated with it. At the same time, the incorporated dependencies have biologically meaningful interpretations as they are imposed by the structure of organismal anatomy. As shown above, the ontological knowledge is important for constructing realistic models of trait evolution (Fig. 1). The importance of integrating anatomy ontologies for character analysis has been recently emphasized and discussed (Vogt 2016(Vogt , 2017a(Vogt , 2017b. At present, a computer-assisted way to formalize ontologies is turning them into a promising tool for arranging and managing knowledge of organismal anatomies (Deans et al. 2015). The formalized ontologies are available for several taxa [e.g., Mungall et al. (2012) and Yoder et al. (2010)] providing an opportunity to directly link anatomy ontologies with character matrices. Presently, this linking can be performed using the specialized software (Balhoff et al. 2010), while the dependency data can be automatically extracted from semantic descriptions using anatomy ontologies (Dececchi et al. 2016). Also, the incorporation of the dependencies can be done by can be viewed as a mapping of Markov chain state(s) characterizing GRN evolution to another set of states corresponding to the states of the discrete character. This mapping yields one of the three types of correspondence between GRN and trait primary homology (Fig. 4).
Type 1: one-to-one correspondence. -This is a case when there is one-to-one correspondence between GRN states and those of a discrete character (Fig. 4a) indicating that primary homology hypotheses correctly identify underlying GRN states. This case is ideal but it is far from being realistic, as morphological states are likely to have a complex unobservable GRN space. Nevertheless, this case is possible when changes in morphology are controlled by a single gene.
Type 2: many-to-one correspondence. -It is the case when GRN space is larger than morphological space meaning that one morphological state consists of many GRN states (Fig.   4b). This case seems to be very likely as the majority of morphological traits are realizations of the complex mechanisms controlled by multiple genetic factors  which are largely unknown to the researcher. Numerous Evo-Devo studies confirm this type of the correspondence. For instance, some males of Drosophila have a pigmented spot on the wing which they use in courtship display. Given that the spot shapes are similar across many species, it would be logical to encode this trait as a character containing two states: "spot present" and "spot absent". The study of Prud'Homme et al. (2006) demonstrates that in the clade of 29 Drosophila species this spot has been gained and lost multiple times. In all analyzed species, the pigmented spots were products of the expression of gene yellow. Interestingly, the loses of pigmentation were caused by parallel inactivation of the same cis-regulatory element controlling yellow expression. Contrary to that, the independent gains of spot were caused by co-option of different cis-regulatory elements associated with the yellow gene. This unequivocally suggests that evolution of such simple trait as wing spot is combinatorial at the underlying GRN level, and the GRN state space is larger than the observable one.
The type #2 correspondence may also arise when the space of morphological observations is underestimated due to the precision of morphological examination. This may happen when external structures are examined without referencing to skeletal structures (in vertebrates), or when external skeletal structures are examined without referencing to the underlying architecture of muscles (in invertebrates). For example, body elongation in salamanders is a result of convergent evolution that occurs by addition and elongation of vertebrae (Wake et al. 2011).
This may lead to similarly elongated body shapes by modifying different individual vertebrae. If body shape is studied without referencing to skeletal structure, then similarly elongated bodies must be considered to be homologues and coded with the same state. However, a thorough examination of skeleton will eventually find this coding misleading as the real space of observable morphological evolution is undoubtedly larger. In the model formalism, the salamanders and Drosophila cases are the same as they both assume the underestimation of the underlying evolutionary space.
Type 3: partial matching. -This type embraces cases when one complex trait governed by a single GRN is thought to represent two or more independent characters (Fig. 4c). So, different states of the same GRN are mapped onto states in different morphological characters which are separately analyzed suggesting that a focal discrete character does not necessarily exhibit an independent identity. In nature, this case seems to be common as it arises when evolution of states between separate characters is correlated. For instance, mouthpart traits in insects can undergo synchronous evolutionary changes when species get adapted to a new feeding substrate. Considering an anatomical element (e.g., mandibles) of the mouthparts as a separate character without reference to all traits of the mouthparts exhibits this type of correspondence since such character is correlated with other mouthpart traits. In some cases, the existence of correlation is obvious and can be retrieved from anatomy; however, cases with unobservable correlation, which cannot be inferred from the structure of anatomy or species biology, seem to be widespread.  Suppose, there is a four state S ={s 1 , s 2 , s 3 , s 4 } Markov chain that is defined by the base frequency vector π and for-by-four matrix Q whose entities are rates ‫ݍ‬ , (Fig. 6a). Let us assume, the aggregated chain is constructed by partitioning the state space S into two groups denoted by the partitioning scheme B ={{s 1 , s 2 }, {s 3 , s 4 }}, so the total number of states in the aggregated chain, defined by matrix ‫ۿ‬ , is two F ={f 1 , f 2 }. The analogous notification procedure can be extrapolated to any arbitrary Markov chain.
The sufficient and necessary condition for Markov chain to be strongly lumpable in respect to the given partitioning scheme and any base vector is that the row-wise sum of rates within one partition block of rate matrix must be the same for all rows within given partition block, and this property must hold for all blocks in the rate matrix (Kemeny and Snell 1960;Rubino and Sericola 2014 are the transition rates in the aggregated matrix ‫ۿ‬ (Fig. 6a).
Weak lumpability. -The weak lumpability allows lumping an original chain only under particular values of base vector (Kemeny and Snell 1960;Rubino and Sericola 2014). In other words, the weak lumpability imposes stricter dependencies between the base frequency vector and rate matrix. There is no straightforward way to find all possible dependencies between any arbitrary rate matrix and base vector that fulfill the conditions of the weak lumpability (Rubino and Sericola 2014); however, there exist a finite algorithm for elucidating the base frequency vector satisfying the conditions of weak lumpability given rate matrix and partitioning scheme Sericola 1993, 2014). Some sufficient conditions for weak lumpability are given in Kemeny and Snell (1960). An example of weakly lumpable chain is given in the Supplementary

Material (section S4).
Interesting case of weak lumpability arises when the initial vector of Markov chain coincides with the stationary distribution of that chain (meaning that the process is stationary). Since the molecular states space is significantly larger, we can assume that the aggregated phenotypic chain is composed of a few states and each state comprises large number of the molecular states.

Such chain is weakly lumpable in respect to
In the trivial case of equal evolutionary rates across all sites, the strong lumpability condition can be satisfied for numerous partitioning schemes that can be applied to the molecular rate matrix (Fig. 3b,g). Lumpability and reality. -Apparently, the exact conditions for strong and weak lumpability are unlikely to be encountered in nature due to their symmetry constraints in the rates and base frequency vector. Moreover, the weak lumpability is generally not relevant for modelling characters on phylogenetic trees as character reconstruction is largely insensitive to the base frequency vector (Yang 2006). Nevertheless, the case of weak lumpability when the original chain is stationary may occur on large trees. The stationary distribution might be an approximation to the character distribution on branches located closer to tips if the process is time-homogeneous. Since, these branches predominate on large trees, the stationary state of Markov chain will occupy majority of branches on this tree. In this respect, the stationary state will be a good approximation of the global dynamics, thereby allowing lumping the chain under any possible partitioning scheme. There are also no evidence supporting frequent occurrence of Trait homology and HMC. -In phylogenetics, every character statement is a hypothesis of primary homology (Hawkins et al. 1997). The "real" assessment of whether an observation in one species is homologous or homoplasious in respect to the other species can be done only through phylogenetic inference or reconstruction of character evolution (secondary homology). It is considered that a thoughtful statement of primary homology is a prerequisite for the successful comparative analysis and phylogenetic inference. Obviously, in simple Markov models and parsimony, the secondary homology is dependent on primary homology statement (Agnarsson and Coddington 2008;Brazeau 2011) since the construction of a discrete character is a subjective procedure. The discordance between primary and secondary homology, to a large extent, occurs due to the discordance between the morphological traits and GRNs. If a trait is modeled onto a known phylogenetic tree to reconstruct ancestral states and evolutionary rates, then the incorporation of hierarchical and hidden processes is straightforward.
The hierarchical processes can be incorporated by any software which allow specification of user-defined rate matrices. These rate matrices have to be parametrized using approaches summarized above to reflect the dependencies between anatomical parts. These dependencies can be retrieved from anatomy ontologies or prior knowledge of organismal anatomies. The flexible incorporation of hidden process, in the context of this study, is presently provided by only two software packages corHMM (Beaulieu et al. 2013;Beaulieu and O'Meara 2014) and RevBayes (Höhna et al. 2016).
CorHMM. -This likelihood-based method assumes that every observable state consists of a specified number of hidden states. The hidden states were originally proposed to model rate heterogeneity among the observable states (Beaulieu et al. 2013); however the hidden states can be equally interpreted as different GRN states. This method offers full flexibility to accommodate all necessary correspondences (type #2 and #3) between morphology and GRNs.
Although, it requires a priori specification of the number of hidden states and their mapping with the observable states, it allows testing different models by manually varying the number and mapping of the hidden states; so the best parametrization of the matrix can be selected using e.g., Akaike information criterion (Akaike 1974). Thus, the potential space of possible models can be explored. This flexibility is extremely important as the structure of transitions between hidden states as well as their number is unknown a priori, and therefore have to be tested during the analysis.
RevBayes. -This software provides a broad flexibility for developing new phylogenetic models using the graphical model concept and Rev language. The RevBayes supports implementation of hidden Markov models using expandCharacters("number of hidden states") method as well as specification of the user-defined rate matrices with fnFreeK() function. These two features are sufficient to incorporate the hierarchical and hidden process reviewed in this paper. Since the modelling capabilities of RevBayes are enormous and require programming in the Rev language, I direct interested reader to the original paper (Höhna et al. 2016) for the additional information.
HMC and data. -Testing between simple Markov chains and HMC to select the best model must be a prerequisite of any comparative analysis. However, it is worth mentioning that the performance of HMC depends on the number of taxa and thereby small trees are not likely to favor HMC over SMC (Beaulieu et al. 2013 are challenging to parametrize. This occurs because morphological states are not aligned across characters in a sense that "state 1" in one character is not the same as "state 1" in another. This issue poses a general barrier for modelling rate heterogeneity (Lewis 2001) as well as hidden processes in phylogenetic inference. Theoretically, HMC models can be used in phylogenetic inference with morphology but their implementation will be limited; it will likely require assignment of the same HMM with a priori specified hidden states to all characters of the same partition. This approach substantially restricts the possibility for exploring hidden space of evolution. However, the ontology-informed characters using structured Markov models can be implemented in inference. Presently, this can be done using the RevBayes software that allows using user-defined rate matrices with fnFreeK() function. In this respect, characters which are known to be interdependent have to be merged in one character and assigned to a separate partition. This approach does not require using inapplicable coding. Next, a user-defined rate matrix characterizing dependencies within such characters can be applied to this partition.

CONCLUSIONS
This paper lays out the theoretical consideration for improving the modeling of discrete morphological characters in phylogenetic context. The main suggestion of the paper from the theoretical perspective is to use structured Markov models equipped with hidden states to accommodate the complexity of processes driving the evolution of traits. The approaches summarized here can be used for developing new methods aiming at reconstructing correlated trait evolution and ancestral ontologies. Since the considerations summarized here are mainly theoretical, further empirical research is needed to understand the behavior of the proposed models.

SUPPLEMENTARY MATERIAL
The supplementary file is available from the BioRxiv (www.biorxiv.org).    D  e  p  e  n  d  e  n  t  a  n  d  i  n  d  e  p  e  n  d  e  n  t  c  o  e  v  o  l  u  t  i  o  n  o  f  s  e  v  e  r  a  l  c  h  a  r  a  c  t  e  r