A simulator for spatially extended kappa models

Summary: Spatial Kappa is a simulator of models written in a variant of the rule-based stochastic modelling language Kappa, with spatial extensions. Availability: The spatial kappa simulator is an open-source project licensed under the LGPLv3, with Java source, binaries and manual available at http://github.com/lptolik/SpatialKappa. Contact: oksana.sorokina@ed.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

What is rule-based modeling?
Rule-based modeling uses rules to construct models of systems of interacting bio-molecules. The approach is especially effective for systems with combinatorial complexity, where relatively concise rule-sets may approximate information about a complex biochemical network with enough precision. Proteins that participate in signaling cascades are commonly composed of multiple subunits and binding domains. These domains can mediate more than one domain-domain interaction, and/or allow for the formation or repeated polymeric motifs, which results in many possible protein complexes. Domain-domain interactions, in turn, are governed by post-translational modifications that provide an additional level of variety. Therefore, when conventional modelling, such as ODEs, takes these elements of molecular structure into account -model sizes grow exponentially (or worse) with the number of interactions. Instead, graphs can be used to conveniently represent structured objects (such as molecules with there sites and states) and graph-rewriting rules can approximate their interactions. A graph-rewriting rule specifies an edge addition or removal when proteins are binding or unbinding. It can also change the state (label) of any component to represent post-translational modification. Each rule accounts for the class of reactions that chemicals with common structure could share. Only information that is important for a particular interaction is taken into account, so that the states and domains that are not involved are ignored. All the reactions that are implied by one rule will be parameterized by the same rate constant, which gives a significant reduction of the model complexity.

What is Kappa?
Kappa is a member of the rule-based language family. Its main simulator is KaSim (Feret and Krivine, 2012) now in version 3. To know how plain Kappa is used in a modeling context, the following short note is a good start (Danos, 2009). A longer article is also available (Danos et al., 2007a); see also this tutorial (Krivine et al., 2009).

What is this User Guide about?
This manual documents an extension of the Kappa language, called Spatial Kappa. Differently from Kappa which assumes that all agents in a model co-exist in an homogeneous and wellstirred volume, in Spatial Kappa, one can define compartments and rules for diffusion within and transport across them. The spatial Kappa language is backward compatible with basic Kappa. Potential users, already familiar with Kappa, should find the extension intuitive.
Existing Kappa models can be run without modifications -provided they do not incorporate directives related to causality analysis and causal flows. This allows one to develop a model in a space-less manner first, and to then introduce spatial aspects gradually.

Chapter 1 Kappa Language Extensions
Note on terminology: in the following 'voxel' means a subunit of a defined compartment; 'species' is used to refer either to a particular agent, or a partial complex of agents, or a full complex -when we need to be more explicit, we say agent, partial complex, or full complex; usual Kappa rules are sometimes called transform rules, as they entail the modification of their targets, to stress the difference with transport or diffusion rules which merely displace their targets.

Existing Kappa language
We start with a brief description of the current Kappa language as defined in the KaSim Reference Manual (Feret and Krivine, 2012). A formal description of the language including spatial extensions is given in Appendix A.
Basic elements in a model can be declared either as agents or tokens. Agents are used to represent complex objects with multiple binding sites that can bind/unbind and modify others. They have a discrete number of instances. Tokens correspond to simple unstructured objects. They cannot bind to other tokens and agents, and can only appear and disappear. They have continuous concentrations. An agent declaration contains its signature, that is to say the information about the agent that will be used in the model, such as its name, its interactions sites and their states. For instance, the line: %agent: A (x,y~u~p,z~0~1~2) declares an agent A with three binding sites x, y and z, where y can exist in two states (u and p, typically for phosphorylated and unphosphorylated) and z -in three states, 0, 1, and 2. It is possible to declare an agent with any number of sites, including no sites at all (in which case the agent essentially a token).
Declared agents are used in turn to define rules which describe their dynamics. A typical rule starts with a name (a string) and contains a left hand side and a right hand side expression together with a corresponding kinetic rate.

'my rule' kappa_expression -> kappa_expression @ rate
Rules can also be bi-directional, in which case one must provide two kinetic rates: The initial string, i.e. the name of the rule, is optional: A(state~old) -> A(state~new) @ 0.1 # Unnamed rule (Note above the use of a comment with comment character #.) Initial conditions are defined in the form of algebraic expressions to be evaluated before the initialization of the simulation: The above snippet of code introduces a variable 'n'. Once declared, a variable can be referenced in rates, observables, as well as in perturbations (see below). Observables define the outputs of the simulation.
%obs 'Label' A() # All agents A() %obs A() # Unnamed observation, will default to 'A()' in outputs %obs A(state~old) # All agents matching A(state~old) %obs 'binding' # Activity of the transform rule named 'binding' %var 'Named variable' B() Variables can be used to trigger specific events named perturbations during the simulation.
# When observable 'mRNA' drops below 50 set the rate of 'transcribe' to 0.5 %mod: 'mRNA' < 50 do 'transcribe' := 0.5 Perturbations can also condition events on the global time of the simulation:

New Concepts
Spatial Kappa introduces a number of new features: compartments, voxels, and channels which are used to defined transport between voxels and compartments, as well as new kinds of 'transport' rules which exploit these additional features. We go briefly over an informal presentation of these spatial features before describing the syntax of the language in the next Subsection.
Compartments: A model may have multiple compartments each containing reacting species. Each compartment has a dimension and is subdivided into voxels. Dimensions range from 0dimensional single voxels, to 1-, 2-, or 3-dimensional structures. Compartments with differences in size can be specified simply by assigning them different number of voxels, e.g. the reacting volume of a nucleus relative to the surrounding cytosol. Differences in shape can also be specified, for example the thin layer of cytosol next to the inner surface of the plasma membrane versus the rest of the cytosol. To create concisely more accurate models, predefined compartment shapes are available: circles, spheres and cylinders (hollow or filled), etc.
Channels: To establish both intra-and inter-compartment connections, the language incorporates a notion of channel. Intra-compartment channels allow the description of multiple structures: 1-dimensional linear arrays or circles, 2-dimensional square or hexagonal meshes, cylinders or tori, 3-dimensional cubes, filled cylinders, spheres, etc. Commonly used inter-and intra-compartment channels can be named for easier reference.
Locating species: Species can be placed within compartments. For instance: a DNA complex can be made to reside within the nucleus; cell receptors can be limited to the plasma membrane. It is important to note that multi-agent species do not need not be confined to a single voxel or even a single compartment, but may span multiple neighbouring voxels. We refer to them as multi-voxel species.
Locating transform rules: Similarly, rules can be placed within compartments. For instance: transcription-related rules can be confined to the nucleus. The language also allows the same transform rule to be specified with different rates depending on the location of the reacting species (which can be useful for instance when some locations are more crowded).
Translocation rules: One can define rules for the (active or diffusive) transport and diffusion of species, including multi-voxel ones. These rules can be within or between compartments and must use previously described channels. To transport multi-voxel species, one needs channels with multiple voxels as sources. When a rule specifies the translocation of a list of n > 0 partial species F i , 0 ≤ i < n, via a channel c with n input voxels, translocation will only happen if the full complexes F i , which F i belongs to, are contained in the input voxels of c, If it is not the case, one of the F i s exceeds the inputs of c, this is a clash or null event, meaning the event is cancelled (and time advances). If it is thecase, the F i full complexes will move to their destination voxels. One cannot drag along agents not in the inputs of c, or let them lag behind, neither can translocation break links from the F i s to such agents. It is the "no drag, no lag, no break" convention. Note that it is possible that F i = F j if there are links across the corresponding voxels (links which might or might not belong to the F i s themselves). Note also that the translocation of the F i s might lead to a situation where cross-voxels links are formed which have not been declared. This is another source of potential clashes and event cancellations. Rates can be made to depend on the translocated species size and composition.
Granularity of locations: It is possible to specify locations at the level of entire compartments or single voxels within a compartment. This allows models to represent, for example, a signalling cascade being initiated at one point in the cytosol, and the resulting signal molecules being diffused through the cytosol.

New language constructs
A full BNF description of the extended Kappa grammar is given in Appendix A. The new constructs are identifiable as new types of statement (%channel and %compartment) and rules, and location or channel identifiers in existing rule types prefixed with ':'.
Warning -the Spatial Kappa parser expects a file to end with an empty line.

Compartments and voxels
Compartments are defined as single voxels or regular multidimensional arrays of voxels as follows  When locations are used, it is only legal to refer to the compartment as a whole by omitting the cell indices, or to refer to a single voxel by proving as many cell indices as the dimension of the compartment. Thus the third line above is illegal. Also, variable names are only permitted in locations within channel definitions, described below.
Warning -in the example above, you can see that there are spaces before the + andoperators, e.g. in y -1; this space is necessary, as the parser reads y-1 as the name of a variable.

Predefined compartment types
Commonly used non-rectangular compartments can also be concisely defined. These include both solid and hollow (open) shapes in 2 and 3 dimensions. The available predefined compartment types are described in Table 1.1.
This allows the creation of a voxellated approximation of the shape specified, which can then be used for simulation. For the open compartment types, thickness specifies how thick in voxels the compartment reaction volume is.  The syntax to specify predefined compartments is the same as above, except that one needs to specify the type of compartment and provide the necessary arguments.

Channels
The structure of a compartment is complemented by the definition of channels which define how voxels are linked within and across compartments. Channels are used in defining both static links between agents in different voxels, and the motion of agents through the geometry of the model. The syntax is as follows: The above code defines a thin torus composed of a 2D mesh. (Note the use of \ to extend a command beyond a line.) Locations on the left hand side (lhs) of the channel definitions above may contain either constant values or single variable names; complex expressions are forbidden and repeated variables are ignored and treated as new ones. The variable names are used to define the dimensions which will be iterated through to produce links. Locations on the right hand side (rhs) allow constant values or complex expressions involving the variables defined on the left hand side of the expression. It is invalid for the rhs to use variables not defined in the lhs. If setting the values of variables references valid voxels on both the left and right, then those voxels are deemed to be linked. References to voxels outside the dimensions of the compartment are ignored, and no link is created. Locations referenced in a channel definition can belong to the same compartment, thus defining an intra-compartment channel, or to different compartments. The modulus operator % is useful in defining repeated patterns of linkage within a compartment, for example the 2D hexagonal mesh described in Appendix B.1.
Channels can make use of multiple source voxels simultaneously. For example if a model was to represent the movement of transmembrane proteins laterally along the surface of a membrane, then the channel used to describe the lateral motion would need to include simultaneous movement in two compartments (cytosol and membrane). This is represented as follows: Here, variables x, y represent locations in the membrane, and u, v represent locations in the top layer of the cytosol. The definition updates the locations in these two compartments in unison, resulting in the declaration of 5 4 different 2-input channels. Note that there must be the same number of voxels in the source and target of a channel (two in this example), and thus, channels define a one-one mapping between source and target voxels.
Further examples of compartment and channel specifications for common structures are given in Appendix B.1.

Predefined channel types
Commonly used 2-and 3-dimensional channel types can be concisely defined.

Name
Each voxel connected to 2D EdgeNeighbour 4 neighbours which share an edge in grid Hexagonal 6 neighbours which share an edge in hexagonal grid Neighbour 8 neighbours which share an edge or corner in grid 3D FaceNeighbour 6 neighbours which share a face in grid Neighbour 26 neighbours which share a face, edge or corner in grid There are also predefined directional channel types usable in both 2D and 3D compartments Name Each voxel connected to Radial neighbours both directly towards and away from compartment centre RadialIn neighbours both directly towards compartment centre RadialOut neighbours both directly away from compartment centre Lateral neighbours at same distance from compartment centre The syntax to use a predefined channel type is

Using predefined channel types between compartments
It is possible to use predefined channel types (usually 'Neighbour' see also Table 1.2) to describe movement between compartments. For this to be valid, the compartments must have compatible geometry. Compatible compartments are treated as being nested one inside the other and the channel then specifies the diffusion (or agent linkage) across the boundary between the two compartments. The compartments must fit together exactly, with no overlapping voxels or gaps. For example, a circle may only be nested inside a larger open circle whose dimensions (diameter and thickness) allow the smaller circle to fit exactly inside.

Locating agents
The definitions above can now be used to locate species within the model. Any rule or directive (i.e., variables, observables, and initial conditions) that accepts agents as part of its definition, now allows these agents to be located. For each group of agents, a prefixed location constrains the agents to that location. (Saying that n copies of A are evenly distributed among a set of N voxels, means that all voxels receive the same number of As, i.e. the quotient of n by N , the remainder being allocated randomly.) When locations are specified both for agent groups and individual agents, the individual agent location takes precedence. This allows for concise definition of agent groups where all but one of the agents in the group share a location.
Voxel wildcards may also be used when specifying agent location. For example: The above describes a model where the species AB exists in two compartments, B in the cytosol and A embedded in the membrane. When specifying agent links using channels, only one end of the link needs to specify the channel. If a link does not specify the channel, it is assumed that both agents party to the link exist in the same voxel.
Links including channels can be created or broken in the same way as basic Kappa links in rules.

Species movement
Species can move along defined channels. Movement transition rules can either constrain the movement by species chosen, or by source location. Species movement is described using the ->: operator.

Fixed location constraints
It is necessary in some models to distinguish between species which can diffuse freely within all voxels of a compartment, and species which are fixed to a single voxel. The predefined location :fixed, used on the right hand side of a transition rule, allows this. For example, in the model below, agent A() can diffuse freely, while agent B() must remain static.

Instance specific reaction rates
The rate of a transition rule can depend on the composition and size of the particular species it is being applied to. This is possible for all types of transition, not just diffusion transitions. An agent description enclosed in || may be used in a rate equation.
where agentGroup is agent definition syntax as used on the left hand side of transition rules. When applied to particular instances of complexes, the number of matching agent structures in the complex instances chosen are counted and substituted into the rate.

Voxel breakdown of observables
Observables using the constructs %obs: and %var: can be used to get the total amount of species per compartment. For example: %obs: 'A in cytosol' :cytosol A() %var: 'A in cytosol' :cytosol A() However, for some simulations, recording the value of observables in each voxel of a compartment is useful. In these cases, using the keyword voxel will allow recording of observables per voxel. This works only on observable or variable declarations which specify a compartment as the location of the declaration. For example: %obs: voxel 'A in cytosol' :cytosol A() %var: voxel 'A in cytosol' :cytosol A() The above each declare variables which record for each voxel within the compartment cytosol.
Warning -as all voxels within the compartment are recorded for each observable using the voxel keyword at each timepoint, the output data file will rapidly become very large.
For example models demonstrating the use of the language extensions, refer to Appendix B.

Obtaining the simulator
The simulator is available from GitHub as the source Eclipse project, or as a single executable jar file. Both are available at https://github.com/lptolik/SpatialKappa/.

Starting the simulator 2.2.1 Running the executable jar
The simulator can be started by running the executable jar file: java -jar SpatialKappa-v2.1.1.jar Double clicking the jar file usually works too.

Running from the Eclipse project
The main class of the simulator is org.demonsoft.spatialkappa.ui.SpatialKappaSimulator Running as a Java Application will bring up the simulator.

Using the simulator
The initial screen appears as figure 2.1.
The toolbar options are:

Running a simulation
With a successfully opened Kappa model, one can run a simulation by selecting the 'Run' button. Simulation parameters can be set on the toolbar before running. There is the option to do an event or a time based simulation. For an event based simulation, the number of steps for the The simulation can be halted at any point by selecting the 'Stop' button. Note that complex simulations may take some time to start up while data structures are being generated.
A replay file of the simulation is created in the same directory as the Kappa model. The file format is similar to that produced by KaSim.

Running a simulation replay
As the simulation runs, the state of the simulation observables are logged to disk in a replay file after every step. Once the simulation is complete, this replay file can be rerun by selecting the 'Replay' button. The 'Replay Interval' field allows a delay (in ms) to be added between each step.
The file format is similar to that produced by KaSim. Renaming KaSim data output to have the suffix .kareplay will allow KaSim output to also be visualised using this tool.

Time series chart
There is a simple visualisation panel in the simulator. This is dynamically updated as the simulation runs to give the user an idea of how the simulation is progressing. It is however basic in comparison to some of the commercial simulation data visualisation tools available.
The time series chart is similar to the standard Gnuplot output from KaSim. It is a line graph showing observable quantity against time for all observable definitions in the model. The excellent JFreeChart (Gilbert et al., 2010) library was used for generating the charting component. The chart has formatting and save capability, and is zoomable.

Spatial Kappa reference implementation
The algorithm implemented by Spatial Kappa simulator is similar to the Next Subvolume algorithm described in Ref. (Elf and Ehrenberg, 2004). In this chapter we briefly outline its main steps.
Apart from the differences coming from the rule-based nature of the Spatial Kappa simulator, the major difference between our algorithm and the classicalone is that there there is no distinction made between the application of a reaction event and a diffusion one. The reason for a different treatment of diffusion events in the original algorithm was the optimisation of performance. With the introduction of multi-compartmental complexes the benefit of such a different treatment has disappeared, so in Spatial Kappa all events are seen as equal in terms of the underpinning Gillespie algorithm.

Initialization
The Kappa grammar and its spatial extension are defined within the ANTLR 3.0 parser generation framework (see also Appendix A). This allows one to perform a number of initialization steps during the parsing phase, but most of the initialization rests on the simulator itself.
The rule-based nature of the framework allows to avoid some steps of initialization, for example, there is no need to build a connectivity matrix between voxels. The channel definitions play the role of rules defining diffusion reactions, in the same way as transform rules replace reaction definitions.
The first main task of the initialization is the distribution of agents and complexes between voxels. There are two options to do this: the model may define the exact number of agents and complexes in a particular voxel, or the total amount of agents in a compartment. In the latter case, the simulator will distribute agents homogeneously among all the compartment voxels.
The second important initialization step is to record the possible applications of reaction and diffusion rules based on the initial state. In this step one needs to calculate all possible mappings between rule left-hand-sides and the system state (a.k.a., all matchings) and their activities. The activity of the rule is the sum of activities of each of its mappings. (Rules with no mappings have zero activity.)

Main simulation loop
The main simulation loop of the Spatial Kappa simulator is the same as in Next Subvolume algorithm with modifications due to rule-based nature of the model. There are two types of model declarations that need to be processed before ordinary rules: perturbations and infinite rate rules.
A perturbation is a standard Kappa language construct that allows a model to change its state when special firing conditions are met. For example, adding a drug to the system at a particular time point, or opening ion channels when a transmembrane potential reaches some value. In the Spatial Kappa simulator perturbations are checked first in the main simulation loop and, if firing conditions are met, their modifications are applied.
It is possible to set an infinite rate constant for a rule. For example, to compare a Spatial Kappa model with an ordinary Kappa model, one can assign an infinite rate to well-chosen temporary diffusion rules (via perturbations) to 'stir' the model. In addition, rules which have rate constants specified by a function could receive an infinite rate depending on the system state. These require special treatment as their applications do not increment the system time. In the simulator, infinite rate rules are applied to the system after perturbation check until either there are no such rules left (or the cumulated number of applications exceeds a statically defined parameter -conventionally 1000 in the current implementation).

Finite rate rules
After exhausting infinite rate rules, the simulator picks one of the remaining rules and a mapping thereof and applies it to the system. The process of selecting the rule and its mapping is the same as in the Next Subvolume method: the simulator picks rule r with a probability proportional to r's activity in the current state, and selects uniformly at random a mapping of r. The rule application changes the state of the system according to r's right-hand-side, and time progresses by the same increment as in the Next Subvolume method. One then recalculates the new rule mappings and activities locally as in Ref. (Danos et al., 2007b).  B.2 Sample Spatial Kappa models