Cardinality optimization in constraint-based modelling: application to human metabolism

Abstract Motivation Several applications in constraint-based modelling can be mathematically formulated as cardinality optimization problems involving the minimization or maximization of the number of nonzeros in a vector. These problems include testing for stoichiometric consistency, testing for flux consistency, testing for thermodynamic flux consistency, computing sparse solutions to flux balance analysis problems and computing the minimum number of constraints to relax to render an infeasible flux balance analysis problem feasible. Such cardinality optimization problems are computationally complex, with no known polynomial time algorithms capable of returning an exact and globally optimal solution. Results By approximating the zero-norm with nonconvex continuous functions, we reformulate a set of cardinality optimization problems in constraint-based modelling into a difference of convex functions. We implemented and numerically tested novel algorithms that approximately solve the reformulated problems using a sequence of convex programs. We applied these algorithms to various biochemical networks and demonstrate that our algorithms match or outperform existing related approaches. In particular, we illustrate the efficiency and practical utility of our algorithms for cardinality optimization problems that arise when extracting a model ready for thermodynamic flux balance analysis given a human metabolic reconstruction. Availability and implementation Open source scripts to reproduce the results are here https://github.com/opencobra/COBRA.papers/2023_cardOpt with general purpose functions integrated within the COnstraint-Based Reconstruction and Analysis toolbox: https://github.com/opencobra/cobratoolbox.


Stoichiometric consistency
A biochemical network is said to be stoichiometrically consistent if each metabolite can be assigned some positive molecular mass [1].Equivalently, each reaction in a stoichiometrically consistent network conserves mass.That is, stoichiometric inconsistencies always result in the presence of non-conserved metabolites [2].Besides external reactions, a network reconstruction may contain reactions with incompletely specified stoichiometry or molecules with incompletely specified chemical formulae, e.g., due to limitations in the available experimental evidence.The ideal scenario is to remove misspecified reactions from a reconstruction when generating any model subject to a mass balance constraint.
An external reaction is one that is heuristically identified by a single stoichiometric coefficient in the corresponding column of S, or an (abbreviated) reaction name matching a pattern (e.g.prefix EX_) or an external subsystem assignment.If a reaction is not external then it is denoted an internal reaction.Metabolites are external if they are exclusively involved in exchange reactions, and internal otherwise.The aforementioned assignments of external and internal reactions and metabolites is the result of a heuristic and might result in one or more errors, either due to misspecification or because the names of external reactions and external subsystems often vary between laboratories.A reaction is considered elementally balanced if the number of atoms of each element in substrates equals that of products, according to the chemical formulae specified for for each metabolites.A balanced metabolite refers to one that is exclusively involved in balanced reactions.Given chemical formulae for each metabolite, checking for elemental balancing of each reaction is more reliable than heuristic assignment.However, there is no guarantee that a set of supposedly elementally balanced reactions is stoichiometrically consistent because it is possible to erroneously specify the chemical formula for a metabolites.Therefore, method to detect stoichiometric consistency, without recourse to metabolite formulae, are required.
The problem of detecting the largest stoichiometrically consistent part of a biochemical network has previously been approached by maximising the number of metabolites that can be assigned a positive molecular mass subject to mass conservation, that is where w ∈ R m is proportional to the molecular mass of each metabolites, and the optimal solution w ⋆ is referred to as a maximal conservation vector.The equality constraint enforces mass conservation (for positive molecular mass).If ∥w ⋆ ∥ 0 = m then S is stoichiometrically consistent (S = N and B = 0), otherwise it is not.If w ⋆ i = 0 then the mass of the corresponding metabolites is not conserved by the reactions in the network.Mixed-integer linear [1] and linear approximations [3] to Problem (6) have previously been explored.
Gevorgyan et al. [1] reformulated (6) as a mixed-integer linear problem (MILP) Since there is no upper bound for w, the number of non-zero components of w will reach the possible maximum, i.e., globally solving (7) would give the largest possible stoichiometrically consistent part of the given biochemical network.However, there are no efficient algorithms for solving Problem (7) that can also provide a guarantee of global optimality.Moreover, approximately solving Problem (7) with an MILP solver can be time-consuming for very large-scale models.Fleming et al. [3] approximated (6) by the linear optimisation problem max w,t where t ∈ R m .Since the constraint w ≥ 0 can not be enforced numerically with floating point, it is replaced by w ≥ 1α where α is an arbitrary positive number.The scalars α, β are proportional to the smallest molecular mass considered non-zero and the largest molecular mass allowed.Our initial aim was to solve the Problem (6) by using non-convex approximations.It is clear that ( 6) is a special case of the main problem (21), in which y = w and x,z are omitted.However, in practical situations, we observed that it often occurs that one can obtain a larger value for ∥w ⋆ ∥ 0 if a stoichiometrically inconsistent reaction is first removed from the network.Even if that removal of that reaction causes a reduction in the number of metabolites in the network, it is still possible for ∥w ⋆ ∥ 0 to be larger than before.A typical example is when one removes an imbalanced reaction involving a large number of metabolites that otherwise participate in balanced reactions.This is the rationale for the new method for stoichiometric consistency testing described in Section 2.1.

Leaks and siphons
In standard notation, flux balance analysis is the linear optimisation where c ∈ R r is a parameter vector that linearly combines one or more reaction fluxes to form what is termed the objective function and where a b i < 0, or b i > 0, represents some fixed output, or input, of metabolites i.If b = 0, a metabolites may be produced from nothing (leak metabolites) or consumed for nothing (siphon metabolites) in violation of mass conservation, if all reactions are reversible and S is stoichiometrically inconsistent [1].However, even with a stoichiometrically inconsistent S, mass conservation may still not be violated if the lower and upper bounds on reaction rates, l, u ∈ R n prevent it.For instance, the reactions A + B ⇌ C and C ⇌ A are stoichiometrically inconsistent because it is impossible to assign a positive molecular mass to all metabolites whilst ensuring that each reaction conserves mass.However, if the bounds are such that both reactions are unidirectional in the forward direction, A + B → C andC → A, then leakage of B is no longer possible and so mass is conserved.
Testing for leaks or siphons has been widely used in genome-scale metabolic modelling, prior to distribution of a model for prediction of mass conserved steady-states [4].One can identify all leaks in a network with the cardinality optimisation problem wherey ⋆ i > 0 indicates that the corresponding metabolites is a leak and can be produced from nothing.To test for siphons, replace the equality constraint in (10) with Sv + y = 0.This approach is useful when double checking stoichiometric consistency, with the bounds for all reactions that are known to be stoichiometrically inconsistent set to zero.
A minimal number of reactions required to be active and result in a leak of the metabolites subset Ω can be found using the cardinality optimisation problem with v ⋆ referred to as a minimal leakage mode [1].A minimal siphon mode would be obtained if the equality constraint was replaced with Sv +y = 0. Problem 10 is a cardinality minimisation problem that we approximate by a difference of convex functions and solve using a difference of convex function algorithm specialised for cardinality optimisation (Supplementary Section 1.2).Similarly, a minimal number of reactions contributing to leakage of a minimal number of metabolites can be found using the cardinality optimisation problem where the additional inequality enforces the rate of a subset of reactions (Ω) to be active.Considering Problem (1, main text), the utility of Problems ( 11) and ( 12) is in the identification of reactions within the suspect stoichiometrically inconsistent set, to remove prior to reiteration of Problem (1, main text) with a reduced network.This requires manual curation of the results of Problems ( 11) and (12), where the advantage is that they often narrow down significantly the search for stoichiometrically inconsistent reactions, especially in the context of genome-scale networks.

Flux consistency
If one assumes that all molecules are at steady-state, the corresponding computational model should be flux consistent, meaning that each net reaction of the network has a non-zero flux in at least one feasible steady-state net flux vector.For example, consider the set of reactions In this set, the reaction D ⇌ H is flux inconsistent, as any non-zero net flux is inconsistent with the assumption that the concentration of H should be time invariant.Inclusion of flux inconsistent reactions, like D ⇌ H, in a dynamic model would be perfectly reasonable, but we omit such reactions because the focus of this paper is on modelling of steady-states.The largest set of flux consistent reactions in a network can be obtained by solving the following cardinality optimisation problem where V ∈ R r×p denotes a matrix of p ≤ r − rank(S) horizontally concatenated flux vectors and • : R r×p → R r×p denotes an entry-wise application of a step function (cf.Supplementary Figure 1), with V i,j = ς(V i,j ) = 1 if V i,j ̸ = 0 and ς(V i,j ) = 0 otherwise.For most biochemical networks, all reactions can not be simultaneously active in one flux mode, making flux consistency testing a form of matrix cardinality maximisation problem, where the maximum number of flux modes is bounded by the column rank deficiency of the stoichiometric matrix.We say a reaction, corresponding to the j th column of S, is flux consistent if the corresponding row of an optimal solution V ⋆ to Problem (14) contains at least one non-zero entry.
Previously Vlassis et al [5] used a non-convex approximation to a step function to approximate maximisation of the cardinality of a steady state flux vector.This approach took advantage of the fact that the cardinality function is quasiconvex on the positive orthant, which suited irreversible reactions but requires a heuristic approach to flip the direction of reversible reactions.In general the cardinality function is not quasiconvex.Problem 14 is a cardinality maximisation problem that can be approximated by a difference of convex functions and solved using a difference of convex function algorithm specialised for cardinality optimisation (Supplementary Section 1.2).Following Vlassis et al [5], each approximate solution to Problem (15) is obtained by using linear optimisation to maximise the approximate step function, ψ cap (t) = min{1, θ|t|}.Note that this step function may be represented as a difference between two convex functions (cf Table1).This algorithm approximately solves Problem ( 14) by approximately solving a sequence of k cardinality maximisation problems, of the form The result is a matrix The set of flux consistent metabolites is obtained by identification of each metabolite that participates in a flux consistent reaction.The remaining set of metabolites, i.e., flux inconsistent metabolites, but also referred to as dead-end metabolites in the literature [4], correspond to the metabolites that are exclusively involved in flux inconsistent reactions.

Thermodynamic flux consistency
Any stoichiometric matrix S may be split into one subset of columns corresponding to internal and external reactions, S = [N, B], where internal reactions are stoichiometrically consistent, that is ∃ℓ ∈ R m >0 such that N T ℓ = 0, and external reactions are not stoichiometrically consistent, that is ∄ℓ ∈ R m >0 such that B T ℓ = 0, as they represent net exchange of mass across the boundary of the system.Previously, Desouki et al. [6] introduced the CycleFreeFlux algorithm, which reduces a given internal reaction flux vector z ′ to its thermodynamically feasible part z ⋆ , obtained at the optimum of a linear optimisation problem of the form min where z ∈ R n and w ∈ R k are internal and external reaction fluxes, respectively.Here ∥z∥ 1 denotes the one-norm of internal reaction fluxes and c ∈ R k is a vector of objective coefficients, restricted only to external reaction fluxes (cf Desouki et al. [6], where c ∈ R r ).The last three sets of constraints require that external reaction fluxes remain unchanged and that the direction of internal reaction flux does not change, but are allowed to become equal to zero in magnitude.
Inspired by Desouki et al. [6], we observed that a thermodynamically feasible flux may be computed by a single linear optimisation problem min z,w where z and z denote lower and upper bounds on internal reaction fluxes, with the constraint that z ∈ {0, −∞} n and z ∈ {0, ∞} n , while w ∈ R k and w ∈ R k denote lower and upper bounds on external reaction fluxes, respectively.The vectors y ∈ R m , s ∈ R m and t ∈ R m , are dual variables to the steady state constraint, bounds on internal reaction rates and bounds on external reaction rates, respectively.The optimality conditions of Problem 17 are where −y ⋆ may interpreted as a vector proportional to the chemical potentials of each metabolite and −N T j y ⋆ is proportional to the change of chemical potential for reaction j.When z ∈ {0, −∞} n and z ∈ {0, ∞} n then the optimal dual variable s ⋆ j to the inequality constraints on internal reaction j, is non-zero if and only if energy conservation and the second law of thermodynamics on the optimal vector of nonzero internal reaction fluxes [7].However, when z j = 0, this is a relaxation of the thermodynamic constraint sign(z j ) = −sign(N T j y), since z ⋆ j = 0 ⇐⇒ sign(z ⋆ j ) = N T j y ⋆ + s ⋆ j = 0, so z ⋆ j = 0 ⇐⇒ N T j y ⋆ = −s ⋆ j and therefore z j = 0 ⇎ N T j y = 0. Biochemically, one may interpret this relaxation as saying that a zero internal reaction flux does not imply a zero change in chemical potential.For example, a nonzero change in chemical potential may still be consistent with zero net flux when an enzyme is absent for the corresponding reaction.To summarise, herein we define thermodynamic consistency as the requirement that any nonzero net flux be driven by a change in chemical potential for the corresponding reaction, that is However, a reactions must admit a non-zero flux that is thermodynamically consistent to be deemed thermodynamically flux consistent, so we omit reactions where zero net flux is the only thermodynamically consistent solution obtained.Problem 17 may be used to compute a single flux vector that is thermodynamically feasible.One must compute multiple thermodynamically feasible net flux vectors in order to approximate the largest thermodynamically consistent subset of a given flux consistent model, because different combinations of reactions can be active in thermodynamically feasible fluxes with disjoint sparsity patterns.Inspired by the use of randomly weighted objectives to efficiently explore the set of flux consistent reactions [8], by default, we employ a similar strategy to bias toward thermodynamically feasible flux vectors that have non-zero flux in random subsets of reactions.A greedy sequence of optimisation problems is then used to iteratively increase the number of reactions in the thermodynamically consistent subset.At each iteration the following cardinality optimisation problem is solved min z,w,p,q where g T ∥z∥ 0 denotes the weighted zero-norm of internal reaction fluxes, where the zero-norm for each reaction is weighted individually, that is Here g ∈ R n is standard random vector generated from the zero-mean normal distribution in the open interval (−0.5, 0.5) (the random part), except g j = 0 if the j th reaction has already been identified as part of the thermodynamically consistent set (the greedy part).Minimisation of the one-norm of internal net reaction flux, to promote thermodynamic feasibility [6], is achieved by constraining net flux to be equal to the difference between two non-negative vectors, that is z = p − q and minimising the sum of forward net reaction flux p ∈ R n ≥0 and reverse net reaction flux q ∈ R n ≥0 .Bounds on z and w are the same as in Problem 17.Each iteration of Problem 18 is efficiently solved using a sequence of optimisation problems, each involving a difference of convex functions [9].Depending on the signs of non-zero z j , one may identify whether a reaction admits thermodynamically consistent flux in either the forward direction, reverse direction, or both.
Given a solution Problem 18, thermodynamic consistency can also be checked using a Problem (16).The scalar weight β > 0 is chosen to trade off between cardinality optimisation of internal reaction fluxes, to explore the flux consistent subset of the model, and minimising the one norm of the sum of forward and reverse reaction rates, to promote thermodynamic consistency.By default, β = 0.1, which computational experiments determined to be approximately optimal for genome-scale models tested.If β is too small, too few of the reaction fluxes are thermodynamically feasible, while if β is too large, exploration of the thermodynamically consistent set is impeded.
Any internal reaction with a forcing bound, that is a positive lower bound (z j > 0), or a negative upper bound (z j < 0), can indirectly force steady state flux in other internal reactions, which may or may not be thermodynamically consistent, depending on the bounds on external reactions.If the direction of the bounds on external reactions is consistent, or inconsistent, with the direction of the bounds of the forced internal reaction, the directly and indirectly forced internal reactions admit, or do not admit, a thermodynamically consistent steady state flux, respectively.The difference between these alternatives can be identified by implementing the forcing bounds, by further constraining in Problem (18), allowing relaxation to zero of each bound that forces internal reaction flux, promoting non-zero flux in such reactions in Problem (16), then observing if thermodynamically feasible net flux is possible in the original direction, with magnitude equal or greater to that required by the forcing bound.The toy network illustrated in Figure 2 illustrates a key difference between flux consistency and thermodynamic flux consistency of steady state fluxes.

Sparse flux balance analysis
A typical application of flux balance analysis is to predict an optimal non-equilibrium steady-state flux vector that optimises a linear objective function, such biomass production rate, subject to bounds on certain reaction rates.It is a cardinality minimisation problem to predict the minimal number of active reactions required to carry out a particular set of biochemical transformations [10], consistent with an optimal objective derived from the result of a standard flux balance analysis problem.However, for many biochemical systems it is not clear what the correct objective is.An alternative approach is to tighten the constraints on steady-state fluxes, either by eliminating flux through reactions whose gene products are absent from the cell type being modelled or by tightening the bounds on active reactions, then minimise the number of reactions required to be active to satisfy those constraints.In each case, the corresponding problem is a cardinality minimisation problem, that we term sparse flux balance analysis, is where the last constraint is optional and represents the requirement to satisfy an optimal objective value ρ ⋆ derived from any solution to Problem (9).The optimal flux vector can be considered a steady-state biochemical pathway with minimal support, subject to the bounds on reaction rates and satisfaction of the optimal objective.Note also that Problem (19) is is a variant of Problem (12), to compute the minimal number of reactions contributing to leakage of a minimal number of metabolites.In both cases, we require minimisation of the number of active reactions, if one considers y in Problem (12) as a unidirectional external reaction.

Relaxed flux balance analysis
Every solution to Problem (9) must satisfy Sv = b and l ≤ v ≤ u, independent of any objective chosen to optimise over the set of constraints.It may occur that these constraints are not all simultaneously feasible, i.e., the system of inequalities is infeasible.This situation might be caused by an incorrectly specified reaction bound or the absence of a reaction from the stoichiometric matrix, such that a non-zero b / ∈ R(S).To resolve the infeasibility, we consider a cardinality optimisation problem that seeks to minimise the number of bounds to relax, the number of fixed outputs to relax, the number of fixed inputs to relax, or a combination of all three, in order to render the problem feasible.
The cardinality optimisation problem, that we term relaxed flux balance analysis, is min v,r,p,q where p, q ∈ R n denote the relaxations of the lower and upper bounds on reaction rates of the reaction rates vector v, and where r ∈ R m denotes a relaxation of the mass balance constraint.Non-negative scalar parameters λ and α can be used to trade off between relaxation of mass balance or bound constraints.A non-negative vector parameterλ can be used to prioritise relaxation of one mass balance constraint over another, e.g, to avoid relaxation of a mass balance constraint on a metabolite that is not desired to be exchanged across the boundary of the system.A non-negative vector parameterα may be used to prioritise relaxation of bounds on some reactions rather than others, e.g., relaxation of bounds on exchange reactions rather than internal reactions.Note that if b = 0, and rank(S) < n there always exists a non-zero solution to Sv = 0 if l = −∞ and u = ∞, that is, if b = 0 any infeasibility must come from the bounds and we can fix r = 0 leaving only p and q for relaxation of bounds.The optimal choice of parameters depends heavily on the biochemical context.A relaxation of the minimum number of constraints is desirable because ideally one should be able to biochemically justify each bound to relax, or each additional metabolites to be exchanged across the boundary of the system, by referencing the experimental literature.This task is roughly proportional to the the number of constraints proposed to be relaxed.

Cardinality optimisation and differences of convex functions
Each of the aforementioned cardinality optimisation problems is an instance, or a sequence, of the following general cardinality optimisation problem where c ∈ R p+q+r is a parameter vector in a linear objective function, the second term in the objective minimises the cardinality of the variable vector x, while the third term maximises the cardinality of the variable vector y.The data in A ∈ R s×(p+q+r) and b ∈ R s implement linear equality or inequality constraints, supplemented by box constraints on the variables, l, u ∈ R p+q+r .The parameters λ, δ > 0 can be used to trade-off between minimisation of the cardinality of x and maximisation of the cardinality of y.

Difference of convex functions
Consider the following general optimisation problem involving a difference of convex functions can be equivalently reformulated in the general form of Problem (4, main text) [11,12].A given difference of convex function has infinitely many decompositions into a difference of two convex functions (DC functions).
Therefore, there is the potential for as many different difference of convex function algorithms (DCA) as there are decompositions.The decomposition used in practice plays a critical role in determining the speed of convergence, robustness, and quality of solutions.This flexibility is of particular interest from a theoretical and an algorithmic point of view.For a complete study of DC functions and DCA, the reader is referred to [11,12] and the references therein.The overall approach of a difference of convex function algorithm [11,12] is geometrically illustrated in Figure 3 and generic algorithm is specified as follows: Difference of Convex Function Algorithm (DCA): 1. Let s 0 be any initial point andϵ be a sufficiently small tolerance.Set k = 0.
2. Computes k which belongs to the subdifferential of φ at s k :s k ∈ ∂φ(s k ).

Compute s k+1 by solving the convex optimisation problem
, and go to Step 2.

Difference of convex function approximation of cardinality optimisation
Consider the step function ς(t) : R → R where ς(t) = 1 if t ̸ = 0 and ς(t) = 0 otherwise, illustrated in Figure 1.The zero norm of a vector s∈ R n is equivalent to a separable sum of step functions, Let ψ θ (t) : R → R be an approximation of the step function ς(t).ψ θ (t) depends on a step parameter θ ∈ R and satisfieslim θ→∞ ψ θ (t) = ς(t), ∀t ∈ R. For the sake of simplicity, we will use ψ instead of ψ θ .We suppose that ψ is a DC function, i.e., where ϑ and ω are convex functions.Hence, the ℓ 0 norm of a vector s can be approximated by The objective function ζ(x, y, z) can be rewritten as 0. Consider Problem (24).Set a numerical stopping criterion ϵ.

and go to
Step 2.

Approximate step functions
Many approximate step functions are possible, but we focus on the following continuous approximate step functions: the piece-wise exponential function (ψ exp , [13]), the smoothly clipped absolute deviation penalty function (ψ SCAD , [14]), the logarithm function (ψ log , [15]), the ℓ p+ norm with 0 < p < 1 (ψ ℓ + p , [16]), the ℓ p− norm with p < 0 (ψ ℓ − p , [17]), and the capped − ℓ 1 function (ψ cap , [18]), all illustrated in Figure 4.Each of the aforementioned approximations may be represented as a DC function [9].The definition of each approximate step function and their DC decompositions are given in Table 1.For each approximate step function, Supplementary Table 2 gives the sub-gradients of their DC components, ϑ and ω.Table 1: Approximate step functions and their decomposition into a difference of convex functions ψ(t) = ϑ(t) − ω(t).Each approximate step function depends on a step parameter θ that governs the fidelity of the approximation.

Convergence properties
Based on the convergence properties of difference of convex function algorithms, certain aspects of the convergence of the DCCO algorithm are assured.Specifically, it generates a sequence(x k , ỹk , zk ) such that the sequence ζ(x k , y k , z k ) is monotonically decreasing and the sequence (x k , y k , z k ) globally converges to (x * , y * , z * ), a critical point of the objective function ζ ( [12], Theorem 3).Furthermore, when the approximate step function is the capped-ℓ 1 norm, the Problem ( 24) is a polyhedral DC, i.e, the DC components are a polyhedral convex functions.Hence, the DCCO algorithm has finite convergence.Moreover, if the function φ(x, y, z) is differentiable at (x * , y * , z * ), then (x * , y * , z * ) is a local solution of (24) (Theorem 5, [12]).For a sufficiently large parameter θ, the Capped-ℓ 1 approximate problem and the original cardinality optimisation problem are have the same set of optimal solutions [9].Moreover, the DCCO with a capped-ℓ 1 approximation, will converge to a local optimum while the other approximations will only converge to a critical point [9].
This discontinuous objective may be reformulated into a continuous objective by letting w i := max{x i , −x i } ≥ 0 and t i := max{1, θy i } ≥ 1 so Problem 25 becomes which is a linear optimisation problem subject to linear constraints.

Parametrisation
Theoretically, the greater the value of step parameter θ in Table (1), the better the approximation of a step function.However, practically, a greater value of θ, will lead to more local minima of the approximate cardinality optimisation problem.Hence, we use the following parameter updating procedure.Starting with a small value θ 0 , the parameter θ was gradually increased until a given threshold θ max .In our experiments, θ 0 was set to 0.5 and θ max was equal to 100.The tolerance for the stopping criterion of the DCCO algorithm was set with ϵ = 10 −6 .An upper bound of 10 9 was set on each w i in Problem (1 ,main text) to avoid excessively large numerical values.The threshold to distinguish non-zero and zero v j in Problem (19) was set with ε = 10 −8 as it is 100 times less than the (default) feasibility tolerance (10 −6 ) of the linear optimisation solver.Note that, in practice setting ε to be equal the feasibility tolerance of the linear optimisation solver is too stringent, since some nonzero values that smaller in magnitude (e.g.between10 −8 and10 −6 ) are necessary to satisfy the optimality conditions with a residual less than ε for a typical stoichiometric matrix, with coefficients of order one.

Hardware and software
The interface to each optimisation problem, as well as the the DCCO algorithm were implemented in MATLAB (Mathworks Inc) and integrated with the COBRA toolbox [19] RAM with a 64-bit Ubuntu 20.04 operating system.Gurobi Optimizer 6.0 (Gurobi Optimization, Inc.) was used to numerically solve all linear and mixed-integer linear sub-problems, with the exception of Supplementary Table 4, where IBM CPLEX 12.0 was used.

Sparse flux balance analysis
Table 5 provides a comparison of numerical results for flux balance analysis (FBA), one norm minimisation (ℓ 1 min) and sparse flux balance analysis with cardinality optimisation approximated by minimisation of the difference of convex functions (DCCO).Table 6 illustrates a stoichiometrically balanced cycle involving ATP hydrolysis in Recon3D.
Table 5: Comparison of numerical results for flux balance analysis (FBA), one norm minimisation (ℓ 1 min) and cardinality optimisation approximated by minimisation of the difference of convex functions (DCCO).Herein, n * denotes the number of predicted active reactions and k denotes the number of predicted active reactions that could be removed from the solution, yet satisfy the optimal objective of FBA.The best step function approximation reported by the DCCO algorithm is usually the Capped-ℓ1 approximation.
the solution (v * , p * , q * , r * ) given is q * = 0,r * = 0 and p * 5 = 10; p * i = 0, i = 1 . . .4, 6 . . .11.This means that only the lower bound of reaction r 5 needs to be relaxed in order to make the biochemical network feasible.This result perfectly corresponds to the anticipated solution.In a second setting, we disallow the relaxation of the bound on reaction r 5 .As expected, the best possible solution, to only relax the bounds of the reactions r 4 and r 6 , is returned.

Figure 2 :
Figure 2: Thermodynamic flux consistency of a toy network.(i) Network where the lower bound on external reaction Ø → A (green) and internal reaction C→X (red) force flux through parts of the network.(ii) Network with all reactions flux consistent.Note that this is only possible with thermodynamically inconsistent cyclic flux around a stoichiometrically balanced cycle, e.g., A → B → C → X → A. (iii) All internal reactions are thermodynamically consistent, but reactions C→X and X→A have zero flux, so are not thermodynamically flux consistent.(iv) A fully thermodynamically flux consistent subnetwork.Note omission of reaction C→X and X→A, which are thermodynamically flux inconsistent with the bounds given for external reaction Ø → A (green).

Figure 3 :
Figure 3: Difference of convex function algorithm.This algorithm can minimise any function ζ(s) : R n → R that can be expressed as a difference of convex functions ζ(s) := ϕ(s) − φ(s), over a convex set C. The key idea is that, at each iteration one linearly approximates the non-convex part of ζ(s) and then minimises the resulting convex function.Suppose that at iteration k, the current solution is s k .One replaces −φ(s) by its affine majorisation to obtain the convex function ζ k (s) := ϕ(s) − φ(s k ) − (s − s k ) T sk wheres k belongs to the subdifferential of φ ats k .Note that ζ(s) ≤ ζ k (s) and ζ(s k ) = ζ k (s k ).Since ζ k (s) is a convex function, one can globally solve the convex problem min s∈C ζ k (s) to obtain the solutions k+1 .Then one reiterates the previous steps until convergence of the sequence s k toward a minimal ζ(s k ).

Figure 4 :
Figure 4: Approximate step functions.ψ log and ψ ℓ − p tend to +∞ when t tends to +∞, while ψ exp ,ψ SCAD ,ψ ℓ + p , and ψ cap are step function minorants and the greater t is, the closer to the value of a step function they become.Based on Fig 1 in [9].

Table 2 :
The sub-gradients of the convex functions ϑ and ω, where ϑ and ω are the components of the difference of convex function ψ.
[3] comprehensive, cross-platform software suite for constraint-based modelling of biochemical networks, available at https://github.com/opencobra/cobratoolbox.The optimizeCardinality.m function is the general purpose interface to solve Problem 21 using our implementation of the Difference of Convex Cardinality Optimisation (DCCO) Algorithm described in Section 1.2.2.Stoichiometric consistency testing is implemented by the function findStoichConsistentSubset.m, leak and siphon testing is implemented by the function findMassLeaksAndSiphons.m, flux consistency testing is implemented by the function findFluxConsistentSubset.m, thermodynamic flux consistency testing is implemented by the function findThermoConsistentFluxSubset.m, sparse flux balance analysis is implemented by the function sparseFBA.mandrelaxedflux balance analysis is implemented by the function relaxedFBA.m.The function checkModelProperties.m provides a single interface that analyses leaks/siphons as well as stoichiometric, flux, thermodynamic flux, and a previously developed implementation that analyses a form of kinetic consistency[3].Scripts to reproduce the figures and tables are available at https://github.com/opencobra/COBRA.papers/2022_cardOpt. Furthermore, a set of tutorials illustrating each cardinality optimisation application are also available within the COBRA toolbox tutorial collection (https://github.com/opencobra/COBRA.tutorials).The COBRA toolbox provides interfaces to a wide range of commercial and open-source optimisation solvers, that can be selected to solve the optimisation subproblems that arise at each iteration of the DCCO algorithm.Numerical timing experiments were performed with MATLAB version R2021b on a desktop Intel Core i5 @3.30GHz with 16GB

Table 3 : Comparison of five approaches to identify the largest stoichiometrically consistent subset.
For each test network, the identity, citation and number of heuristically internal metabolites (m) and reactions (n) is provided.For each algorithm, the number of stoichiometrically consistent metabolites m *and reactions n * is reported, as well as the runtime in seconds.The largest stoichiometric consistency results are emphasised in bold font.

Table 4 : Temporal performance comparison for MILP versus capped-ℓ
The maximal cardinality conservation vector was computed for a range of networks using MILP and capped-ℓ 1 DCCO, and IBM CPLEX 12.0 as the MILP and LP solver.For each test network, the identity, citation and number of heuristically internal metabolites (m) and reactions (n) is provided.For each algorithm, mean and standard deviation of the solution time is given in seconds (s).The average fastest approach for each network are emphasised in bold font.

Table 7 :
Starting with Recon3Dmodel, then setting all external reaction bounds to zero, yet requiring a biomass reaction (VMH biomass_reaction https://vmh.life/#reaction/biomass_reaction)rate between a lower bound of 1 and an upper bound of 10 created an infeasible model.With RelaxedFBA, to enable biomass production, a minimal cardinality solution required the relaxation of 24 lower bounds and 2 upper bounds.