Learning optimal integration of spatial and temporal information in noisy chemotaxis

Abstract We investigate the boundary between chemotaxis driven by spatial estimation of gradients and chemotaxis driven by temporal estimation. While it is well known that spatial chemotaxis becomes disadvantageous for small organisms at high noise levels, it is unclear whether there is a discontinuous switch of optimal strategies or a continuous transition exists. Here, we employ deep reinforcement learning to study the possible integration of spatial and temporal information in an a priori unconstrained manner. We parameterize such a combined chemotactic policy by a recurrent neural network and evaluate it using a minimal theoretical model of a chemotactic cell. By comparing with constrained variants of the policy, we show that it converges to purely temporal and spatial strategies at small and large cell sizes, respectively. We find that the transition between the regimes is continuous, with the combined strategy outperforming in the transition region both the constrained variants as well as models that explicitly integrate spatial and temporal information. Finally, by utilizing the attribution method of integrated gradients, we show that the policy relies on a nontrivial combination of spatially and temporally derived gradient information in a ratio that varies dynamically during the chemotactic trajectories.


I. INTRODUCTION
Chemotaxis, the directed motion of organisms towards or away from chemical cues, is a fundamental biological mechanism that spans biological kingdoms.For instance, prokaryotes rely on chemotaxis to find nutrients, avoid toxins or even optimize oxygen and pH levels by sensing molecular cues [1][2][3].Single-celled eukaryotes show similar chemotactic traits [4], and countless biological processes in multicellular eukaryotes are supported by chemotaxis such as the fighting of bacterial infections by white blood cells, the positioning of stem cells during early embryonic development, and formation of multicellular structures in slime mould development [4][5][6].Likewise, a hallmark of cancer metastasis is the chemotaxis of tumour cells towards blood vessels [7].
However, the ubiquity of chemotaxis in biology does not imply uniformity in the mechanisms that underlie the navigation.At the scale of microorganisms, the fluctuations of the molecules that bind to the cells' receptors are non-negligible and impose physical limits on the accuracy of the measurements and, thus navigation.Chemotaxis is typically dichotomized into spatial and temporal strategies (FIG 1C) [8][9][10].Larger cells, usually eukaryotes, primarily exploit spatial sensing, harnessing their size to directly perceive chemical concentration gradients [11], whereas smaller cells like bacteria are known to adopt temporal sensing, detecting alterations in chemical concentrations temporally to deduce information on the gradient's direction, as the fluctuations across their body render spatial sensing useless [12][13][14].These differences in sensing mechanisms have direct consequences for the possible types of navigation decision processes.* Correspondence email address: julius.kirkegaard@nbi.ku.dkThis binary classification enables detailed analysis of the distinct forms of chemotaxis within each category.However, as the optimal strategy is dependent on continuously varying parameters such as the size and velocity of the organism as well as the chemoattractant concentration, it leaves the question of whether organisms can utilize an integration of both spatial and temporal sensing mechanisms in their chemotactic strategies [15], and whether such a combination would be preferential in intermediate ranges of these parameters.Interestingly, it has been shown that cells thought only to use spatial sensing also rely on temporal information during chemotaxis when given periodic waves of chemoattractant [16,17].Static temporal averaging of previous measurements has also been shown to reduce sensing noise on cells placed in shallow concentrations [11]; however, this does not take into account the effect of the motile cell itself reacting to the measurements.Previous work has proposed a more complex inclusion of both types of sensing to develop newer strategies without being able to outperform single sensing strategies [18], showcasing that efficient integration of both strategies is probably non-trivial.
Here, we employ deep reinforcement learning (DRL) to discover optimal chemotactic strategies that can combine spatial and temporal sensing.Previous work has successfully made use of DRL to find optimal strategies for self-propelled agents exploiting the flow in fluid environments [19] and for studying the tracking policies of flying insects relying on memory from noisy measurements to locate food or other insects [20].Similarly, machine learning has been utilized for demonstrating the optimality of known chemotactic strategies [21,22].
We propose a minimal single-cell model and use modern policy optimization techniques [24] to identify the strategy that minimises the time it takes the cell to reach a source of chemoattractant.The model cell is endowed t , ..., m

FIG. 1:
A Representation of the model cell with five sensors surrounded by chemoattractant particles.Each sensor measures the number of particles M i inside its sensing range r s and transforms it as m i = log(M i + 1).B Illustration of the simulation environment where the cell navigates towards the centre of the chemoattractant source.C Phase space diagram of cell sizes and speeds showing the distribution of common unicellular prokaryotes and eukaryotes.The dashed line roughly indicates the binary division between temporal and spatial navigation strategies [8].Data from Refs.[8,23].D Our three neural network policies output the cell's action based on the measurements and hidden states.The combined policy has access to the individual measurement of its sensors and has a hidden state used in a recurrent neural network layer, whereas spatial and temporal only have one of these features.Dense: a linear NN layer connecting all inputs with all outputs.MLP : Multilayer perceptron, a sequence of dense layers with non-linear activations.GRU : Gated recurrent unit, a simple form of recurrent neural network module, which combines a hidden state with new input.The policy output of the model is both a mean value µ t and a standard deviation σ t , which defines a normal distribution from which an action a t is sampled.In our experiments, σ t → 0 at the end of training (see SI) results in deterministic policies.
with distinct sensors that enable spatial gradient estimation and is given an internal memory state that allows temporal information to be derived.Based on a combination of these inputs, the cell must modify its orientation, a mapping that we leave largely unconstrained by employing deep neural networks.
We demonstrate the existence of a better performant chemotactic strategy that non-trivially combines spatial and temporal sensing.Specifically, we pinpoint a range of cell sizes where a combined sensing strategy outperforms optimal single sensing strategies.We then concentrate our analysis on this interface, comparing it to analytical ones and offering both qualitative and quantitative insights on the internal dynamics of the optimal navigational policy.

A. The simulation model
We study a exponentially decaying, two-dimensional distribution C(x) of chemoattractant particles with a concentration peak at x = 0, We take λ = 0.01µm −1 , and study C 0 varying from C q = 16µm −2 to 10 • C q , which in turn sets the signalto-noise ratio of the system.In the SI, we further give examples of algebraic and Bessel function concentration profiles which can e.g.arise from decaying and diffusing particles emanating from a central static source, Here, D is the particle diffusion coefficient, κ is a particle decay rate, which sets a length scale D/κ.ρ is the rate of particle release at x = 0. Our cell model consists of a circular disk of radius R, equipped with K sensors uniformly spaced around its surface (FIG.1A), whose objective is to reach the source of the chemoattractant by controlling its direction of motion depending on the environmental measurements.The cell senses the environment through molecules binding to cell-surface receptors.Still, in the interest of keeping our model as simple as possible, we neglect the complex receptor dynamics of receptor binding and unbinding [25].Thus, we assume each of the K sensors to possess a detection area of radius r s = R sin(π/K) such that the entire surface of the cell is covered.We fix K = 5 for all our experiments.
Our model cell moves forward at a constant speed v, with its trajectory orientation θ(t) being modified both by its own actions as well as due to rotational noise, dθ = a t dt + 2D R dW. ( Here, a t is the output of the cell's navigational policy π, and the second term is Wiener noise with rotational diffusion coefficient D R .Rotational diffusion forces the cells' navigation policies to react to the sensor signals at least on a time scale 1/D R [26].In our experiments we use v = 5 µm/s and D r = 0.025/s, and use a timestepping of ∆t = 0.1 s to solve the stochastic equations.
We model the cell receptors as perfect instruments [25], and thus, at each time step of our simulation, the sensors measure the number of molecules inside their sensor range.This induces fluctuations in measurements with a signal-to-noise ratio that increases with concentration.Thus, nutrient-deprived environments with a low number of detected particles are noisy, and nutrient-rich environments are more deterministic.
We approximate the particle count within each sensor's area as a stochastic process sampling from a Poisson distribution.Exploiting the nearly constant particle density over the detection area, we use d i being the radial distance of the receptor centre to the source of the chemoattractant.
Simulations are initialized at random distances d 0 from the source with random orientations θ 0 and crucially with a rate of particle release ρ, which we sample in the range ρ 0 and 10 • ρ 0 .These random initializations ensure that the cell agents cannot overtrain to specific molecule counts and specific trajectories but rather need to generalize across noise levels and become adaptable to varying concentration profiles.Specifically, Finally, we do biologically inspired preprocessing of the receptor input by transforming according to the Weber-Fechner law [27], While this could have been learned directly from the data, it conveniently brings the neural network input to a tightly constrained domain that is more suitable for DRL, and also means that noise in m i decreases not just relative to the signal but also in absolute numbers as ρ increases.

B. The policy
The internal mechanisms of a chemotactic cell involve a complex set of biochemical spatio-temporal reactions.Here, we do not model these reactions explicitly, but instead model directly an input-output approximator, the cell policy π.This policy maps an internal state s t , which in the simplest case could just be the vector of instantaneous measurements, to an action a t .We parameterize the function by using artificial neural networks (ANN) to minimize expressive restrictions on the learned policy.
To estimate the cell policy, we assume that it is an optimizer of efficient chemotaxis, which we define as minimizing the time it takes to reach a certain distance from the source.More precisely, at the end of a simulation, we calculate a reward by where d is the final distance to the source (which will be = δ if the source has reached) and τ is the time it took to reach the source (or = t max if the source was not reached).The first term is a normalized reward for getting to the source fast and the second is a bootstrapping reward that punishes cells that do not reach within the required distance δ of the source.The reward is normalized between [−1, 1], as is convention in reinforcement learning.Simulations terminate when the cell has reached δ distance to the source or the simulation time has exceeded t max , thus only one term of the reward expression is nonzero at the end of the episode; with the distance reward dominating early in training and the time reward at the end of training.
To find the optimal ANN policy, we employ Proximal Policy Optimization (PPO) [24], which adapts the policy π in order to maximize the average reward.We study three variants of the agents (FIG.1D): one policy we restrict to act purely on instantaneous spatial information.This is enforced by simply designing the neural network to be a pure feedforward network -from measurements {m } to output a t .Likewise, we design a purely temporal network, which does not receive spatial information but rather the average of all receptors ⟨m t ⟩.Instead, this agent must rely on memory to provide temporal information on the particle gradients.This is achieved by introducing a recurrent layer into the policy neural network, which emulates the biochemical memory of real cells.Finally, we study a combined agent, which has access to both spatially resolved measurements and has memory that can be used to derive temporal information.This agent can execute pure spatial and pure temporal strategies but can furthermore act on any combination of this information.Network details are given in SI.
Our networks also output an estimate of the final reward V t (FIG.1D), which the PPO algorithms use to speed up convergence, but which does not influence the policy once trained.Further, as the nature of PPO's exploration strategy adds noise to the policy output, we also recurrently feed the cell's action back into the temporal policies, which aids the training in reaching a deterministic strategy without hindering stochastic exploration.

A. Optimizing for noise-robust strategies
Our deep reinforcement learning approach is designed principally to work at all noise levels.In nutrient-rich environments, where input to the agents are not corrupted by nosie, our DRL framework converges quickly to effective temporal and spatial strategies.Resulting trajectories in these environments are close to deterministic as the noise from measurements gets reduced, and fewer mistakes in orientation corrections tend to occur.In those scenarios, spatial-based gradient estimation is effective in directly locating the source of chemoattractants and noise due to rotational diffusion does not pose a challenge for the cell, which only needs to follow the strength of the sensors (FIG.2C).Likewise, the optimal temporal sensing strategy at high concentrations is easily understood as it continuously measures the change in concentration and increases the turn when the concentration starts diminishing.As the temporal strategy contains no information about the sensors' positions, it has to spontaneously break its rotational symmetry, which is exemplified in the resulting left-turning shown in FIG.2A, resembling e.g. the chirality of sperm chemotaxis trajectories [28].
In contrast, in the low concentration limit, the input to the cell receptors is extremely noisy (FIG.2D), and the identification of optimal strategies becomes less clear.Yet, our DRL approach is able to identify working strategies both using purely spatial and purely temporal sensing mechanisms (FIG.2B).Qualitatively, we note that the identified low-concentration temporal strategy behaves very robustly against noise, as its trajectory remains smooth despite its stochastic input.This can be interpreted as low reactivity, which also showcases itself as the temporal strategy only slowly adapts its trajectory as it nears the source.In comparison, the spatial strategy is very reactive, and while this makes it susceptible to the stochastic input, it enables it to quickly adapt its orientation once it nears the source and the concentration is relatively high.Finally, we observe the first hint that the combined strategy can outperform the two: it shows both low reactivity when it is far from the source and high reactivity once in its proximity.
While deterministic policies are fast to identify, the information which reinforces policies in the low concentration limit is much more stochastic, making the optimization process harder.To enable learning in this very noisy regime, our reinforcement learning steps rely on averaging the result of thousands of runs and require millions of simulations to converge to a solution (see SI table).To make this feasible, we developed a custom end-to-end RL implementation which runs exclusively on GPUs (see Code Availability).
We note that DRL is in not guaranteed to find the globally optimal policy.However, we find that independent runs of the DRL training procedure result in the same policies, which hints that the obtained local optima could be global.

B. Smooth transition between a temporal and a spatial strategy
For evaluation, we define a strategy's chemotactic efficiency η by how fast the cell reaches the source compared to the minimal time a cell of speed v would take to reach it from the same initial position (note that this is independent of t max which was used for training).Thus, the efficiency of a strategy is given by where τ is the time it takes the cell to reach the source threshold distance δ and d 0 is the initial distance to the source, and the average is taken over all realisations.We train our three variants, spatial (S), temporal (T) and combined (C), on the same simulation parameters at different cell sizes and proceed to calculate their efficiencies (FIG.3A).At small sizes, where the positional sensor information becomes indistinguishable due to the noise, both T and C policies show the same performance.This is in accordance with previous studies showing that small cells are incapable of sensing gradients along their own body due to the fluctuations in measurements [15].Nevertheless, as the cell size increases, C starts to outperform T, indicating that the tiny amount of available gradient information, as observed by the poor performance of S, can somehow be integrated into a temporally dominated strategy to improve its performance.At large cell sizes, S dominates T, and while a gap still remains between S and C at large R, it shows convergence towards the same strategy.Thus, at the largest scales, the sensors need not rely heavily on old measurements to accurately estimate the gradient.At intermediate cell sizes, we find that the optimal strategy is not purely spatial or temporal.In detail, we observe a smooth transition between strategies, indicating that there is a continuous integration of information stemming from spatial input and memory.Despite being dominated by noise, as illustrated in FIG.2D, C is capable of taking advantage of the measurement differences between the different receptors on the cell surface to improve its efficiency.To explore this integration, we now focus on this intermediate region where both S and T perform similarly yet are outperformed by C, at R ≈ 2 µm.
Inspecting the distribution of arrival times as shown in FIG.3B for R = 2µm, we observe a clear difference in skewness between T and S. The distribution of arrival times in S has long tails since cells that start far away from the source are experiencing very low concentrations of molecules, which disproportionally affect the spatial strategy.In contrast, T shows very few cells that reach the source quickly, as this strategy relies on building memory.Interestingly, the cells that use C are both fast and do not get trapped, having both benefits of the other variants.
To evaluate the optimality of the found strategy C, we compare it against commonly proposed strategies that use memory kernels to integrate temporal information into spatial strategies.Likewise, we explore a switching strategy in which cells start by using the noise-robust T strategy and later switch to using the reactive S strategy at a set threshold.This incorporates the advantages of each variant as shown in Fig 3B .In all cases, we find that the RL learned strategy outcompetes these simpler explicit strategies (see Appendix ).

C. Integrating temporal and spatial information
Having established that C can integrate spatial and temporal information to outcompete both T and S, we move on to studying the internals of C directly.The policy π C is a highly non-linear, recursive function which we have parameterized by deep learning neural networksthis at the cost of lack of interpretability.Nonetheless, numerous techniques have been developed for gaining insight into the internals of a trained neural network, for instance, by estimating the importance of the input variables.One of the most elegant techniques to study this attribution problem is the method of integrated gradients (IG) [29], which calculates the importance of feature x i as where x ′ is a baseline, which we here simply take to be no input x ′ = 0. IG is sensitive meaning I i is non-zero if and only if x i contribute to the output, and satisfies completeness such that the attributions sum to the output, i.e. a t = π(x) = π(0) + i I i .We use IG to understand how the cell relies on previous measurements transmitted to it by the hidden state h t−1 , compared to current measurements m t from the receptors.We define U h as the relative importance of memory, A U h > 0.5 value indicates that the hidden state contributes more to the output than the current measure-ment.Note that the definitions sums over contributions from all hidden states and all measurements, and is thus virtually independent on e.g. the number of hidden states.
FIG. 4A shows how average memory usage U h changes as a function of cell size and chemoattractant concentration.We observe a smooth transition of decreasing contribution of memory as the cell gets larger, in accordance with previous conclusions.This transition occurs at smaller sizes the higher the concentration.Interestingly, when evaluating U h within a single environment (FIG.4B), we observe a decrease in memory usage as the cell approaches the source.Thus, the cell is adapting between temporally and spatially dominated strategies during a single trajectory, akin to a continuous version of the discrete switching strategy just considered.
Although the input to the neural network policy π is the current measurements m t and the hidden state h t−1 , the output a t can also be considered a function of all previous measurements {m 1 , m 2 , • • • , m t }, being processed recursively by a sequence of hidden states, i.e. 9) in this formulation, we can attribute importance individually to all previous measurements on the current output.FIG.5A shows the IG attributions of measurements for the purely spatial, the purely temporal, and the combined strategy at R = 2 µm.As the cell diagram indicates, a positive IG value translates into a contribution for a positive reorientation and a negative value viceversa.On a pure spatial strategy, the sensors work in opposition, while obviously, there is no contribution from previous measurements.In contrast, on a pure temporal strategy, all sensors contribute the same, but current measurements are opposed by previous measurements.Curiously, the shape of the contributions highly resembles the bi-loped shape of the chemotactic memory kernel measured experimentally on the impulse responses of E. coli bacteria [26].
Similar to the spatial strategy, the combined strategy shows sensors working in opposition, but the left-right symmetry is broken and compensated by temporal variance, with one side dominating early and the other side contributing late.This sensor signature of the combined strategy makes explicit the non-trivial combination of information it is utilizing, and while these curves are merely IG components, they are indicative of a non-linear combination of information that asymmetrically merges spatial and temporal processing (FIG.5A).We observe a transition from temporal towards spatial information processing by looking at how measurements are integrated into the combined policy for different sizes (FIG.5B).This similarity is clearly observed when comparing trajectories of the purely temporal and pure spatial policies with the combined one at the respective extreme cell sizes (FIG.5C).

IV. DISCUSSION
In this study, we have explored the theoretical possibilities in chemotaxis that arise when traditional limitations are relaxed.Our findings show that the borders of binary classifications of chemotaxis strategies can be blurred by suitable integration of spatial and temporal information.In particular, we have shown that for cells with the ability to sense across their bodies as well as having memory access, there is a navigation strategy that outperforms those with only one sensing ability.Without imposing any constraints on the policy, we have seen the optimal solution to converge to known policies in the limits where it is known that one sensing mechanism clearly provides faster information on the chemical gradient.Here, we explored this as a function of cell size and found that for large cells, the emerged combined strategy converges to relying only on spatial information, whereas for small microorganisms, the gradient information is strictly obtained on temporal differences.In the intermediate range, we found no sudden switch in strategy, but instead, the transition between them is continuous and smooth, where information is slowly being in-tegrated by the cell into its decision process.
Our general perspective on chemotaxis is achieved by employing artificial neural networks and optimizing these by reinforcement learning.The drawback to this is that the obtained strategies are difficult to interpret.Yet, by comparing analytical strategies and employing integrated gradients to study feature attribution, we find that the optimal strategy that employs both spatial and temporal information is not a simple combination of known strategies, nor is its integration of information types trivial.Our analysis reveals that memory usage varies with cell size and concentration and changes dynamically throughout trajectories.This is akin to the well-known phenomenon that cells adapt their measurement sensitivity to local concentration [30], but here, we find that in an optimal setting, the navigation strategy itself must also dynamically adapt.
Using DRL to study chemotaxis in the noisedominated regime is computationally challenging, as it requires a large number of simulations that must dynamically be run during training.Our custom approach runs simulations and training on GPU, avoiding slow systemto-device transfers.Here, we have employed this ap-proach to study a simple chemotactic agent in two dimensions.An interesting avenue for future research is the move to three dimensions, where the space of possible strategies is qualitatively different.Likewise, it could be interesting to consider the consequences of a nonstatic source of chemoattractant or heterogeneous environments and discover the effect of this on a combined chemotactic policy.Similarly, it is of interest to extend our minimal cell model to specificities of particular organisms, such as a thorough modelling receptor dynamics [31], the inclusion of stochastic tumbles of peritrichously flagellated bacteria [13], or more complex behaviours as the ones seen in C. elegans [32].FIG.6: Chemotactic efficiency of proposed explicit policies compared to the neural network policies found using reinforcement learning, at R = 2 µm.Green points are for policies that integrate measurements over time (lower axis), whereas orange points correspond to the policy achieved by switching between temporal and spatial strategies at a certain concentration threshold (upper axis).
on the cell's surface with respect to the swimming direction (FIG.1A).This naive strategy is very susceptible to fluctuations in measurements and can sometimes be improved by restricting the reorientations to a certain ε.Thus, we consider policies of the form Integrating measurements over time reduces the fluctuations in concentration measurements, as has also been shown experimentally [11].Thus, we explore the possibility of cells relying on the average of previous measurements to set the change in orientation.The contribution of each sensor is then averaged by previous measurements as Here, m(t) are corrected measurements at time t.Directly using m(t) completely ruins performance, as every time an action is performed, the information of previous measurements is no longer aligned with the cell orientation.To obtain optimal strategies, we use m(t), which is corrected by the action taken a t , and thus only suffers from information decay due to rotational diffusion.We begin by studying a uniform distribution such as where all previous measurements contribute the same up to T .Moreover, we consider the use of an exponentially decaying kernel which gives more weight to newer measurements.As seen in FIG.6, the chemotactic efficiency of these models outperforms S and T when some rudimentary use of memory is allowed.We note that each reported value on the analytical strategies is evaluated with different ε and only the best-performing one is shown.Nevertheless, we observe that a large memory timescale becomes counterproductive as the movement of the cell makes previous measurements irrelevant and only contributes noise to the decision.Despite the gain in efficiency, the optimal timescale for the proposed models is far from reaching the chemotactic efficiency of C.
We note that as T → 0, S outperforms the explicit models.This can be explained by the freedom of S to dynamically control a non-linear equivalent of ε depending on the measurements.With this in mind, we investigate a new RL agent using the same neural network as S, but whose input is given by Eq. (A.3).Thus the integration of memory is fully controlled, but any non-linear action can be taken based on this input.We note that this again requires correcting previous inputs and special attention is given to the early parts of trajectories, such that the policy only averages over known measurements.FIG.6 shows that this indeed outperforms S and T, but cannot reach the performance of C.This suggests that C is not just combining a temporal average with a spatial strategy but is also using elements of a temporal strategy.
Spermatozoa have recently been shown to exhibit a biphasic chemotactic strategy, in which there is a concentration-dependent switch between hyperactive phases, characterized by random changes in orientation, and more well-known chiral motion [33].Presently, a switch between a temporal and a spatial strategy could achieve the best of the distinct time distributions of T and S in FIG.3B.We implement this by setting a cutoff particle count at which we switch from T to S. As a function of this threshold, an increase in chemotactic efficiency is observed, as shown in FIG.6, but this also does not reach the efficiency achieved by C. Nevertheless, the increase in efficiency does suggest that the contribution of temporal and spatial may change dynamically with the concentrations.
Finally, we explore the possibility of designing an agent where the effective memory scale T is linearly dependent on the measurement concentration, as suggested in FIG.4B, such that We evaluate for different parameters of A, B, and ε on a uniform kernel.FIG 7 shows the chemotactic efficiency at different parameters A, with the best performant B * (A) and ε * (A).The performance of this model is similar to that of a fixed uniform kernel.While the study of integrated gradients shows the amount of memory used, it does not reveal how this memory is used, and in particular here we find that a simple uniform kernel is far from enough to reach optimal behavior.

FIG. 2 :
FIG.2:A-B Example trajectories of found strategies at R = 2µm for each variant, in nutrient-rich media (C 0 /C q = 10 4 ) and at nutrient-depleted environments (C 0 /C q = 1), respectively.Circles indicate δ = 10 µm.C-D Measurement values of each sensor of the cell at Aand B respectively.Each color represents one of the K = 5 sensors used in the trajectories.The measurements correspond to those of the Combined cell.

FIG. 3 :
FIG.3:A Chemotactic efficiency of each variant on reaching the source as a function of cell size.Each value is the result of training and evaluating the policies at that cell radius for sampled values of C 0 .The average efficiency is evaluated on 2 16 independent runs.A "blind" agent obtains efficiency η ≈ 0.02.B Distribution of arrival times to the source of the three cell variants at R = 2µm.All evaluations use sampled concentrations.

FIG. 4 :
FIG. 4: A Average memory usage contribution to the steering output during the simulation runs at different sizes and concentration levels C 0 .The dashed line indicates U h ≈ 0.5, i.e. the transition from a memory-dominated strategy to a more reactive sensing-based policy.B Distribution of memory usage U h values during individual trajectories, evaluated at different distances to the source.R = 2 µm.

FIG. 5 :
FIG. 5: A Contribution of each sensor from past time measurements to the current action.The three variants at R = 2 µm are shown, with data coloured by sensor position as indicated in the cell diagram.For the temporal variant (dashed), only one sensor is shown as all have the same profile as per the designed symmetry.The red arrow indicates the swimming direction.Curves are obtained by averaging over ∼ 10 5 trajectories with initial conditions sampled similarly as in previous plots.B Sensor contributions on the combined policy for different cell sizes.C Trajectory visualization of both small (top) and large (bottom) cells.See SI for a plot with all three variants.

− 1 .TFIG. 7 :
FIG.7: Chemotactic efficiency of a policy that adjusts the memory time scale according to a linear dependency with the average strength of the measurements T = A⟨m t ⟩ + B. The efficiency shown as a function of A, and B and ε are the optimal values for that A. The simulations parameters are the same as in FIG.6.