On the effectiveness of primal and dual heuristics for the transportation problem

The transportation problem is one of the most popular problems in linear programming. Over the course of time a multitude of exact solution methods and heuristics have been proposed. Due to substantial progress of exact solvers since the mid of the last century, the interest in heuristics for the transportation problem over the last few decades has been reduced to their potential as starting methods for exact algorithms. In the context of ever increasing problem dimensions, this paper contributes an in-depth study of heuristics with respect to their performance in terms of computation time and objective value. Furthermore, we consider – to the best of our knowledge for the first time – some simple efficient dual heuristics to obtain performance certificates based on weak duality without the need to solve the problem exactly. We test these heuristics in conjunction with state-of-the-art solvers on a variety of test problems. Thus we especially close the gap to almost outdated comparative studies from the last century, as for example Srinivasan & Thompson (1973); Glover et al. (1974); Gass (1990). For one class of random test problems, we extend previous approaches to provide a consistent and flexible problem generator with known solution. Based on our numerical results, it can be concluded that primal and dual heuristics are able to rapidly generate good approximations for specific randomly generated problem instances but – as expected – are not able to yield satisfactory performance in most cases. ∗Email: jonas.schwinn@math.uni-augsburg.de †Corresponding author. Email: ralf.werner@math.uni-augsburg.de


Introduction and motivation 1.Motivation
In this paper we present a detailed numerical study on the effectiveness of primal and dual heuristics for the transportation problem.Since the first statement of the transportation problem, in its continuous form due to Monge (1781), it has been the subject of numerous mathematical publications.The problem is known under several terms, among which are the general, classical or Hitchcock transportation problem as well as the distribution problem.We will consider the problem in its discrete, dense and uncapacitated form, without so called "inadmissible" cells, and refer to this form, see (2.1), as the transportation problem for the remainder of this paper.Studies of this problem date back to the 1930s and the work of A.N. Tolstoi (see Schrijver, 2002) and were followed by the well known publications of Hitchcock (1941), Kantorovich (1942) (continuous formulation), Koopmans (1948) and Dantzig (1963) who adapted his famous Simplex algorithm to the transportation problem -which we will call the (Primal) Transportation Simplex1 .In addition to exact solvers, a number of primal heuristics were proposed.Most of these heuristics like the North-West Corner Rule or Minimum Cost Allocation methods implement straightforward ideas and can be found in virtually every textbook concerning the transportation problem.More sophisticated heuristics were for example proposed by Houthakker (1955), Reinfeld & Vogel (1958) and Russel (1969).Concerning exact solution methods, the groundbreaking work of Glover et al. (1974) and Srinivasan & Thompson (1973) established the Transportation Simplex as state-of-the-art solver thanks to new methods for base representation and pivot operations.Formerly, Out-of-Kilter methods had been the method of choice for the transportation problem, see for instance Gass (1990).Recent publications indicate nowadays increasing competitiveness of modern primal-dual or dual solvers from the combinatorial optimization field, cf.Kovács (2015).From this point on, the interest in heuristics shifted to their capability as starting methods for the Transportation Simplex.Numerical studies, cf.Srinivasan & Thompson (1973); Glover et al. (1974); Gass (1990); Gottschlich & Schumacher (2014); Schrieber et al. (2017), examine the total computation time of the heuristic and the time it takes an exact solver to compute an optimal solution from the heuristic solution.

Our contribution
In contrast to this rather narrow point of view on heuristics, we provide in the following a novel analysis of the quality of heuristic solutions in terms of the gap between primal and dual heuristics.Our analysis provides a novel aspect insofar, as, up to now, primal-dual information has not been used to judge the quality of heuristics for the transportation problem.To the best of our knowledge, it also constitutes the first broad study on heuristics for the transportation problem in the last 40 years, see below.Until today, related numerical studies were usually focussed on exact solvers and not on heuristics, and only consider special cases of the transportation problem, see for instance Schrieber et al. (2017), or the solution of the more general minimum cost flow problem, cf.Ahrens & Finke (1980) or Kovács (2015).En passant, we also close the gap to almost outdated comparative studies in the last century of Srinivasan & Thompson (1973); Glover et al. (1974); Gass (1990) by investigating the performance of state-of-the-art Transportation Simplex codes on our problem instances.
Furthermore, as we are not aware of any efficient dual heuristics for the pure transportation problem2 , which runs in at most O(m 2 ), where m is the number of sources and sinks in the problem, we also add to the existing literature by providing novel efficient dual heuristics running in O(m 2 ).
Subsequently, we especially provide detailed insight on the computation time and on the solution quality of primal & dual heuristics on three different problem classes, two classes of random instances and one class from real world application (imaging sciences).Here, the generation of the random instances, see especially Appendix A, extends previous approaches to allow for a more flexible and consistent choice of the instances.

Main findings
Not unexpectedly, the quality of heuristic solutions cannot be assured across all problem classes.However, for some randomly generated problem instances, primal and dual heuristics quickly compute and verify solutions with function values within a few percent of the optimal value.Since primal heuristics are usually run as part of an exact solver whenever the Transportation Simplex is used, they come at almost no costs.Further, the cost for the validation of a heuristic primal solution via an efficient dual heuristic is also negligible compared to running an exact solver.
Furthermore, we find that the performance of heuristics and exact solvers differs significantly across problem classes.For specially structured problems general heuristics might fail to deliver good results and one has to resort to exact solvers or specifically tailored heuristics (see, e.g., Gottschlich & Schumacher, 2014).In summary, our study greatly supports current academic and industrial practice to rely on exact solvers for the transportation problem instead of using heuristics, especially when quite accurate solutions are asked for.

The transportation problem
In order to provide an intuitive understanding of the problem and the methods described in the remainder of this paper, let us start with the classical economical interpretation of the transportation problem.To this end, consider the transportation of a certain good from i = 1, . . ., m origins to j = 1, . . ., n destinations.Each of the origins has a supply a i of the good whereas each of the destinations has a demand b j for the good.The costs for shipping one unit of the good from i to j are given by c i j .These cost coefficients are composed in the cost matrix C : = (c i j ) m,n i, j=1 .A transportation plan (transportation matrix) X : =(x i j ) m,n i, j=1 is given by the amount x i j of transported units between origin i and destination j.It is called feasible, if all supplies a i are used (the row constraints are satisfied) and all demands b j are saturated (the column constraints are satisfied).The aim of the transportation problem is then to find a feasible transportation plan at minimum cost.Mathematically, the transportation problem is given by the linear program: where we have c i j ≥ 0, a i > 0 and b j > 0 without loss of generality.Furthermore we make no assumption on the number of c i j equal to zero, that is, we assume the problem to be dense in the general case.We ensure feasibility of (2.1) by assuming the transportation problem to be balanced, that is ∑ m i=1 a i = ∑ n j=1 b j .By quick inspection of the constraints, we also have that (2.1) is bounded and thus allows for an optimal solution.Since the roles of the a i and b j are completely interchangeable, we assume m ≥ n without loss of generality for the remainder of this paper and further call a problem symmetric, whenever m ≤ 10 • n and asymmetric if the opposite holds.Following the standard definitions of linear optimization, we call a set B {(i, j) | i = 1, . . ., m; j = 1, . . ., n} with |B| = m + n − 1 a basis of (2.1), whenever the columns A i j corresponding to the index tupels in B are linearly independent.Further, we call a solution x satisfying Ax = e a basic solution with respect to a base B whenever x i j > 0 implies (i, j) in B, and a feasible basic solution respectively, whenever additionally x ≥ 0 is satisfied.A basic solution is called degenerate whenever we have x i j = 0 for a least one (i, j) in B.

Properties
Given the special structure of A, one is also able to provide a very intuitive characterization of bases: a given set B {(i, j) | i = 1, . . ., m; j = 1, . . ., n} with |B| = m + n − 1 is a base of (2.1) if and only if no subset of its entries form a cycle in the transportation matrix.For a detailed introduction into characteristics of bases in minimum cost flow problems (e.g.basis triangularity and graph theoretical interpretations) let us refer for instance to Ahuja et al. (1993).Accordingly, the dual of the transportation problem is given as: (2.2) As the transportation problem is a feasible and bounded linear program, we have strong duality and complementary conditions for (2.1) and (2.2).We follow the common notation and denote the dual feasibility of a given dual solution (u, v) by the reduced costs c uv i j : = c i j − u i − v j .This allows a compact notation of the weak complementary slackness conditions which are equivalent to the optimality of x and (u, v) for feasible primal and dual solutions x and (u, v).

Primal and dual heuristics
In the course of modern operations research a multitude of primal heuristics for the transportation problem has been proposed.Taking into account the development of exact solvers, only a few of these concepts remain computationally competitive today.For the sake of conciseness of our study and expressiveness of our results, we made a preselection by the following criteria.
• From the start, we excluded any candidate where computing the solution of the heuristic took a disproportionate amount of time compared to the time needed to produce exact solutions.Among thereby excluded heuristics were methods like Russel's Method (Russel, 1969), Houthakker's Method of Mutually Preferred Flows (Houthakker, 1955) and Vogel's Approximation Method (Reinfeld & Vogel, 1958) as well as some more recent adaptations of it.One might argue to exclude the Matrix Minimum Rule and its variants for the same reason, since it is similarly reported to be too time-consuming (see, e.g., Srinivasan & Thompson, 1973;Glover et al., 1974).However, in accordance with Gottschlich & Schumacher (2014), we found that one can successively speed up this heuristic by sorting the cost coefficients in advance and iterating once through the sorted list to obtain an equivalent solution.We assume this concept is to some extent equivalent to the Matrix Minimum by Rank Method proposed by Lee3 .In contrast to the vast amount of publications towards the transportation problem, this rather obvious approach has been neglected in former comparative studies.The same strategy was applied to ensure the competitiveness of related methods like the Modified Russel's Method and the Large Amount Least Cost Rule which was also proposed by Lee and mentioned in Srinivasan & Thompson (1973); Glover et al. (1974).
• In a second step, we excluded some methods that are sufficiently represented by other heuristics, in the sense that they employ rather similar concepts and produce comparable or worse results in terms of computation time and solution quality.Thus, we excluded methods like the Two Smallest in a Row Rule (Srinivasan & Thompson, 1973) as well as the (Alternating) Row Column Rule (Srinivasan & Thompson, 1973) which are represented by the Row Minimum Rule and the Modified Row Minimum Rule (Srinivasan & Thompson, 1973) as well as their column counterparts and the Tree Minimum Rule.
• Finally, this study's focus is on "classical" heuristics for the transportation problem and thereby we do not cover heuristics that consist of originally exact solvers producing inexact solutions by concepts like considering a relaxation of the problem or terminating before optimality is reached e.g. the Shortlist Method in Gottschlich & Schumacher (2014) .
From a complexity point of view and assuming m ≥ n, the fastest known strongly polynomial algorithms are due to Kleinschmidt & Schannath (1995) and Brenner (2008), with complexities O(m 2 n log m) and O mn 2 (log m + log n) respectively.
For this reason, we focus on heuristics running in O(m 2 ) or better and do not consider other heuristics any further.Since we are not aware of any competitive non-basic heuristics, all primal heuristics presented in Section 3.1 are primal basic heuristics, that is they produce feasible basic solutions for (2.1) and could thus serve as starting procedure for the Transportation Simplex.
In the case of dual heuristics we are not aware of any efficient dual heuristic running in O(m 2 ) or faster.Therefore, we implemented a variety of straightforward ideas and present these in Section 3.2.

Primal heuristics
The primal heuristics can be separated in two main classes.The first class, which we will denote as the elimination heuristics, consists of iterative methods that consider in each step a cell, row or column of the cost matrix and subsequently eliminate a row or column from further consideration by the selection of a new basis variable(s).The second class constitutes greedy heuristics which sort the (modified) cost matrix once in advance and then iterate through the list of sorted cost coefficients.In the degenerate case, a basic solution is hereby ensured by running an additional subroutine after the greedy algorithm.

Elimination heuristics
In order to give a compact description of the elimination heuristics, let us consider the Prototype Elimination Algorithm.All following heuristics implement this prototype heuristic and differ only in the implementation of the selection rule in Step (1).Note further that in Step (3) either i or j are deleted from their respective sets but never both.This ensures (including the degenerate case), that after m + n − 1 iterations, B constitutes a feasible basis for (2.1).
In iteration k: (1) Choose a new base element (i, j) with i ∈ I and j ∈ J by a selection rule.
(2) Set x i j : = min(a i , b j ), a i : = a i − x i j and b j : = b j − x i j .
(3) If a i = 0 delete i from I (eliminate row i); Otherwise delete j from J (eliminate column j).
(4) Add (i, j) to B. ( objective value.It begins by taking i = j = 1.Subsequently, whenever i or j is deleted from I or J in Step (3), either (i + 1, j) or (i, j + 1) is chosen in the next visit of Step (1).Graphically speaking, the heuristics moves in a stair-like pattern from the north-west corner of the matrix C to the south-east corner of C. Since the selection rule can be applied in constant time, this leads to a complexity of O(m).
Row Minimum Rule (RMR) This rule cycles once through the indices i in I.
Starting with i = 1 it (repeatedly) selects minimum entries j = arg min j∈J c i j of the i-th row of C and adds (i, j) to B. This procedure is repeated until a i = 0 at which point i is deleted from I and the algorithm advances to the next element in I.This results in a complexity of

Modified Row Minimum Rule (MRMR)
This rule is a modification of (RMR), cycling (possibly multiple times) through the remaining indices in I where for each row i only one minimum element j is selected before the algorithm advances to the next element in I. Whenever i is the last element in I, by convention the next element is considered to be the first element in I.This gives the same complexity

Column Minimum Rule (CMR) and Modified Column Minimum Rule (MCMR)
Exchanging the roles of the columns (I) and rows (J), we obtain an equivalent Col-umn Minimum Rule for any Row Minimum Rule above.The complexity of the Column Minimum Rules is O(m 2 ).
Tree Minimum Rule (TMR) This rule starts by selecting (i, j) corresponding to the minimal entry c i j of the cost matrix.Subsequently, this rule alternately scans rows and columns of C. Suppose, w.l.o.g., after the first allocation, j was deleted from J. The rule proceeds by (repeatedly) selecting minimum entries j = arg min j∈J c i j of the i-th row of C and adding (i, j) to B, as in one step of (RMR).Subsequently, it scans the column j of the element (i, j) selected last in Step (2) and chooses minimum entries i = arg min i∈I c i j of this column until j is deleted from J, as in one step of (CMR).Accordingly, the algorithm proceeds by scanning the row i corresponding to the last minimum entry of the column j.The algorithm proceeds in the same fashion, alternating between rows and columns until m+n−1 assignments have been made.An advantage of this procedure is that the elements of the corresponding base are considered in a depth-search sequence with respect to the spanning tree corresponding to the basis.This is for example advantageous for the computation of basis representations in the Transportation Simplex (see, e.g., Ahuja et al., 1993).The resulting complexity is O(m 2 ).

Greedy heuristics
As mentioned above the traditional Matrix Minimum Rule (and related methods) can be successively speed up by applying a greedy methodology.We will give four methods of this type in this section and further methods using the greedy methodology in Section 3.4.
Matrix Minimum Rule (MMR) This heuristics implements a greedy approach with respect to the cost coefficients.In the beginning one sets B : = / 0 and all x i j : = 0 and computes a sorting of C, that is, a mapping Then, in each step k = 1, . . ., mn, one has (i, j) : = ρ(k) and sets x i j : = min(a i , b j ), a i : = a i − x i j and b j : = b j − x i j .In the case that x i j > 0, (i, j) is added to B. In the non-degenerate case, the method computes a feasible basic solution with respect to a basis B for the transportation problem.In the degenerate case, a subroutine is run afterwards to complete the basis B. Empirical investigation showed that the time needed by the subroutine is negligible compared to the overall time of the heuristic.Its worst-case complexity is O(m + n).Since the sorting of C can be done in O (mn • log m), the overall worst-case complexity of the method is Modified Russel's Method (MRUM) This adaption of Russel's Method proposed by Gottschlich & Schumacher (2014) implements the Matrix Minimum Rule on a modified version of C. In the beginning of the algorithm each entry c i j is set to c i j − u i − v j where u i = max j∈J c i j and v j = max i∈I c i j .The intention is to approximate the reduced costs in the Transportation Simplex by estimating dual multipliers u i and v j .In contrast to the original method proposed by Russel (1969) this modification is done once at the beginning of the algorithm for the full lists I and J instead of updating with respect to I and J in every iteration.Thereby, the greedy approach is applicable and we have a complexity of O (mn • log m).
Large Amount Least Cost (LALC) In contrast to the two methods above, this heuristic, also proposed in the Master's thesis of Lee (see above), implements a greedy heuristic with respect to the remaining supply and demand.That is, it iteratively chooses the maximum entry of the updated a and b and selects the cell in the corresponding row or column with the lowest costs.Using suitable data structures (e.g.Fibonacci heaps) the maximum entry can be found in O(log m) in each step.Thereby, one obtains an overall complexity of O(m log m).

Dual heuristics
In the following, we present a list of simple efficient dual heuristics.Here, we only consider methods producing spanning solutions (see, e.g., Glover & Klingman, 1972a) for (2.2).A dual feasible solution (u, v) is called spanning, if for any i there exists a j such that c uv i j = 0 and for any j there exists an i such that c uv i j = 0.This obviously constitutes a necessary condition for (u, v) to be optimal.
Row First Method (RFM) In this rule, the dual variables are determined by firstly choosing u i : = min j=1,...,n c i j and subsequently setting v j : = min i=1,...,m c i j − u i .This can be done with complexity O(mn).
Column First Method (CFM) Employs the same methodology as (RFM) with the roles of u (rows) and v (columns) exchanged.This results in the same complexity O(mn).
Dual Greedy Method (DGM) Here, u and v are initially set to zero and the objective function (a, b) is sorted in descending order.The variables u i or v j are then considered in order of the corresponding entries a i and b j in the sorted objective function.For any variable u i or v j , we then set u i : = min j=1,...,n c i j − v j or Maximal Gain Method (MGM) This method provides a variation of the dual greedy approach.The same steps as in (DGM) are executed but with respect to an altered cost function, where a i is set to a i • min Pull Push Method (PPM) One major weakness of the rules above is that negative values in a dual solution cannot be realized.This fact is addressed here by extending the dual greedy approach presented above.Thus, u and v are initially set to zero and the objective function (a, b) is sorted in descending order.Then, we make negative assignments for the last (m + n)/2 elements.Suppose therefore that u i corresponds to the k − th element of the sorted objective function and k ≥ m + n − (m + n)/2 .We then set u i : = −M k m+n where M : = min i j c i j .Subsequently the same assignments as in (DGM) are made which leads to an overall complexity of O(m 2 ).

Overview of worst-case complexities
Recapitulating, let us give an overview of the described methods and their worstcase complexities: Table 1: Overview of the worst-case complexities of all presented heuristics.

Constructing primal solutions from dual solutions
In this section let us provide a simple scheme to construct a feasible primal solution x from a given feasible dual solution (u, v).In this context one can find various ways to construct primal solutions from dual solutions in the literature (see for instance Russel's Method, primal-dual methods or the Simplex Method), however, in order to guarantee low computation time (and thereby justify the use of such a heuristic before running an exact solver), we omit any strategy requiring more involved subprocedures like maximum flow or transportation algorithms.Instead our approach is based on the greedy version of the Minimum Matrix Rule.
Our goal is to construct, for a given feasible dual solution (u, v), a feasible primal solution x such that the complementary slackness conditions (2.3) are somewhat approximated.For this reason, we apply the greedy version of the Matrix Minimum Rule on a dual feasible spanning solution (u, v) provided by one of the dual heuristics.This strategy is justified in the following: Assume a feasible spanning solution (u, v) and the reduced cost matrix C uv : =(c uv i j ) m,n i, j=1 .Clearly, we have C uv ≥ 0 and c uv i j = 0 for at least m entries in C uv .Thus, any minimum cost allocation method applied to C uv will focus on these entries of C uv and small entries c uv i j in general and thereby approximate the complementary slackness conditions (2.3) by keeping ∑ m i=1 ∑ n j=1 x i j c uv i j small.The complexities of these methods are the combined complexities of the applied dual heuristic and the Matrix Minimum Rule.
Note that one could argue to include (the modified) Russel's Method in this class of methods, but as argued above we like to focus at this point on methods using a feasible dual solution to produce a feasible primal solution.

Constructing dual solutions from primal solutions
In addition to the previous section, we also like to present a simple scheme to produce feasible dual solutions from feasible primal solutions.Again, we aim to rapidly approximate the complementary slackness conditions (2.3).That is, given a feasible primal solution x, we want to minimize c uv i j whenever x i j > 0. Therefore, let us employ a strategy used in the Network Simplex Method.Here, one makes use of the tree representation (depth search order) of basic solutions and set c uv i j = 0 whenever x i j > 0 (see Ahuja et al., 1993).Apart from the optimal case, this leads to an infeasible dual solution.Therefore, we use a second step where we employ the Row First Method on the cost coefficients c uv i j and alter the dual variables achieved in the first step to ensure a feasible spanning solution.As one needs a depth-search sequence of the primal variables to efficiently carry out the first step, we use the Tree Minimum Rule to avoid additional computations.As for the dual method, we resort to the Row First Method because of its low computation time, since the results of other (more sophisticated) methods did not lead to significantly better solutions in our numerical experiments.

Exact solvers
In this section we briefly mention the two exact solvers we used as a reference point to evaluate the computation time and solution quality of the heuristics.

Transportation Simplex
In our implementation of the Transportation Simplex we follow the numerical evaluations of Kovács (2015) for the Network Simplex.The transportation problem constitutes a minimum cost flow problem on a complete bipartite graph.Thus, in contrast to general network flow problems, there is no need to store any information on which nodes are connected by arcs.More precisely, there is no need to store any other information than the parameters C, a and b.When specializing their algorithm to the transportation problem the only concession to be made is in finding the entering variable in the simplex step.All other operations and data structures remain the same as in the Network Simplex, especially all operations performed on the basis.Thus, we follow the recommendation of Kovács (2015) to "substantially improve the performance" of the algorithm and use the XTI (eXtended Threaded Index) labeling procedure proposed by Glover et al. (1979) over the commonly used ATI (Augmented Threaded Index) procedure for basis representation (see Ahuja et al., 1993).We used the Tree Minimum Rule to produce initial solutions for the Transportation Simplex, since it allows for an efficient initialization of the XTI representation.

Network Simplex
Furthermore, we include in our study the Network Simplex of IBM's ILOG CPLEX 12.6.2for MATLAB since our numerical investigations showed that it clearly outperforms other LP solvers provided by ILOG CPLEX or GUROBI on the transportation problem.These findings are compatible with traditional and also more recent studies (see, e.g., Glover et al., 1974;Schrieber et al., 2017).Moreover, Kovács (2015) showed that the Network Simplex is competitive with the fastest combinatorial algorithms on a variety of general minimum cost flow problems.

Performance of exact solvers
The performance of both solvers is comparable across all problem classes, with small advantages for CPLEX on asymmetric problems and on the DOTMARK instances.For more details let us refer to Figures 2 and 4, as well as Table 3.

Further implementation details
Although all methods were implemented to the best of our knowledge, for the Large Amount Least Cost Method we only achieve a complexity of O(m 2 log m) instead of O(m log m).The reason for this efficiency loss is due to MATLAB not allowing for a proper efficient implementation of priority lists like for instance Fibonacci Heaps.

Numerical experiments 4.1 Problem classes
In the following we present three different problem classes of the transportation problem.The first class constitutes the DOTMARK benchmark considered in Gottschlich & Schumacher (2014), the other two problem classes represent randomly generated problem instances.For problem class 2, we follow common strategies to produce problem instances by drawing parameters (C, a, b) from suitable uniform distributions.Finally, for problem class 3, we employ a problem generator, which randomly generates optimal solutions and corresponding problem instances for given problem dimensions (m, n).Observe, that in this study, we restrict ourselves to dense problems, that is, the cost matrix C is not sparse in the general case.Moreover, to ensure comparability of these classes, we will force ∑ i a i = ∑ j b j = 1 and 0 ≤ C ≤ 1 for all problem instances.
In the case of randomly generated problem instances, the parameters (C, a, b) are usually drawn from the uniform distribution (see, e.g., Srinivasan & Thompson, 1973;Glover et al., 1974;Ahrens & Finke, 1980;Sandi, 1986).Moreover, reports on the use of different probability distributions, cf.Ross et al. (1975), suggested that the uniform distribution produced the hardest instances.Thereby, all our cost coefficients c i j are generated by sampling from a uniform distribution.When sampling for a and b, one additionally has to make sure to satisfy ∑ i a i = ∑ j b j = 1.Thus, in order to produce statistical sound values for a and b, we uniformly generate points on the standard (m − 1)-simplex and the standard (n − 1)-simplex by sampling from a Dirichlet distribution.
DOTMARK4 This benchmark was made publicly available by Schrieber et al. (2017) for the evaluation of discrete transportation algorithms computing the Wasserstein distance between images.It provides 10 different classes of black-and-white images where each class contains 10 different images in resolutions from 32×32 to 512×512.Each image is defined by the gray values of its pixels ranging from zero (white) to 255 (black).For any two images P1 and P2, a transportation problem can be generated by traversing the pixels e.g. in a row-wise order and setting a i equal to the gray value of the i-th visited pixel in P1 and b j equal to the gray value of the j-th visited pixel in P2.The costs c i j are then chosen as the quadratic euclidean distance of the pixel positions of pixel i and j.In order to compare our study to Schrieber et al. (2017) we generated transportation problems for all pairs of two different images of a given class and fixed resolution 32×32.This amounts to 10 2 = 45 problem instances per class.For a detailed description of these problem instances let us refer to Schrieber et al. (2017).Since (C, a, b) are, by construction, non-negative integer values, we finally normalize for the sake of comparability.
UNIFORM As mentioned above we follow the common approach (see, e.g., Srinivasan & Thompson, 1973;Glover et al., 1974;Ahrens & Finke, 1980;Sandi, 1986) for randomly generating general transportation problems.Thus, we draw the cost coefficients c i j from a uniform distribution on [0, 1] where afterwards we divide by ||C|| ∞ to normalize the problem5 .In contrast to former studies we prefer to sample a and b by means of a Dirichlet distribution to ensure uniformly distributed a i and b j .SOLGEN As mentioned above uniform distributed costs provide harder problem instances, therefore we will include a third problem class constituting easierto-solve problems.This problem class comes with the advantage that problem instances are produced together with a corresponding optimal solution.To achieve this, we apply similar concepts as presented in Jeffrey & James (1994) for capacitated minimum cost flow problems.We improve their approach for the generation of uncapacitated transportation problem by ensuring a uniformly distributed topological structure of the corresponding optimal solution.In short, the approach consists of two steps: For given problem dimensions m and n, one begins by randomly constructing basic solutions x ∈ R mn , u ∈ R m and v ∈ R n together with the corresponding base B. In a second step, one obtains parameters C, a and b such that x and (u, v) are a primal-dual optimal solution pair satisfying the complementary slackness conditions (2.3).For a more detailed description of this problem class, see Appendix A. In terms of difficulty Jeffrey & James (1994) indicate that the problems generated by their algorithm "are of comparable difficulty" to problems generated by NETGEN, a commonly used minimum cost flow problem generator.
Overview Concluding, we give the expected value and the variation of the cost coefficients for the three problem classes in Table 2.

Results
Let us start with problem classes SOLGEN and UNIFORM on symmetric problems with equally sized vectors a and b, i.e. m = n.
• In terms of computation time, all presented heuristics are at least 10 times faster than the exact solvers, as can be seen in Figure 2.This justifies the use of each of these heuristics over exact solvers if one is mainly interested in an approximate solution.• In Figure 1 we illustrate approximation factors for all heuristics, obtained by dividing the function value of the respective solutions by the optimal function value.
-As depicted in Figures 1 and 2, the heuristics work quite well for the SOLGEN instances where the primal-dual gap can be reduced to less than 7% of the optimal value in 10% of the time an exact solver needs to find the optimal solution.
-Especially, we find that the primal heuristics using dual information (including Russel's Method) produce very good upper bounds (primal gap of 5%) and the more sophisticated dual methods involving sorting schemes produce lower bounds which are within 2% percent of the optimal value.
-These results reduce to a primal gap of 50% and a dual gap of 25% in the case of the UNIFORM instances and 14% of the time needed by an exact solver.This is due to the fact that the more time-consuming Large Amount Least Cost Method6 is needed to compute the upper bound.
-The Pull Push Method gives the best lower bound.
To analyze the performance on asymmetric problems we keep n = 50 fixed and only increase m.
• It can be observed in Figures 3 and 4 that, by and large, problems are better approximated by the heuristics and solved easier by the exact solvers as the ratio of m/n grows.Figure 1: Approximation factors for the problem classes SOLGEN and UNIFORM on symmetric problems, averaged over 100 problem instances.The approximation factor is given by val opt where val is the function value of the solution of the heuristic and opt is the optimal function value for the problem.The positions of the heuristics in the legend corresponds to the position of the respective graph in the figure.Note that we omitted certain heuristics such as (NWCR) and (MRR) in some figures when their approximation factors where far worse than the rest, see the legends for details.It can be observed that the approximation factors are significantly better on the SOLGEN instances than on the UNIFORM instances.given by val opt where val is the function value of the solution of the heuristic and opt is the optimal function value for the problem.The positions of the heuristics in the legend corresponds to the position of the respective graph in the figure.Note that we omitted certain heuristics such as the NWCR and MRR in some figures when their approximation factors where far worse than the rest, see the legends for details.It can be observed again that the approximation factors are significantly better on the SOLGEN instances than on the UNIFORM instances.Observe that all heuristics are at least ten times faster than the exact solvers.LALC* denotes the suboptimal implementation of LALC.An optimal implementation should realize a computation time close to MMR.
• Otherwise, results generally resemble the symmetric case, with the exception that we observe better approximation quality of the Modified Column Minimum Rule and the Column First Method compared to their row equivalents on the SOLGEN instances.
• Finally, the approximation factor of the Pull Push Method drops for asymmetric problems, whereas the other dual heuristics better approximate the optimal value.
In summary, in accordance to the findings of Ross et al. (1975), we conclude that for randomly generated problems smaller standard deviation in the cost coefficients results in problems that are solved faster by exact solvers.Further, we can observe that these problems are also better approximated by the heuristics.This matches the intuitive notion that due to less variation in the cost coefficients the number of different optimal (and close to optimal) solutions increases.In turn, the chance to heuristically produce good solutions increases and reduces thus the time to find one of the optimal solutions for the exact solver.
The DOTMARK instances demonstrate that good performance on randomly generated problems does not generally transfer to specifically structured problems.For this problem class, the classic heuristics fail to produce relevant approximation factors and one should therefore resolve to the specifically tailored methods (see, e.g., Schrieber et al., 2017).We highlight this in Table 3, where we present results for a selection of heuristics including the best and worst performing ones.As for the dual methods, the bad performance is easily explained, since by construction the cost matrix of the DOTMARK problems is zero on the diagonal which contradicts the concept of iteratively increasing the dual variables to compute spanning solutions consisting of non-negative dual variables only.This shortcoming was addressed in the Pull Push Method, where negative dual variables are possible.Although the method actually did not provide better results on the DOTMARK instances, it performed surprisingly well on other problem classes.Finally, as a negative result, the idea to use primal information to produce good dual solutions, as introduced in Section 3.5, fails and does not yield a satisfactory performance.

Conclusion
In this paper we have analyzed the effectiveness of heuristics for the transportation problem.For this purpose we have considered a selection of the most efficient (in terms of computation time) commonly used primal heuristics, together with some novel suggestions for similarly time efficient dual heuristics.For the specific case of Minimum Rules, it was observed in accordance with Gottschlich & Schumacher  (2014) that these can be significantly speed up by sorting the cost coefficients in advance.It was demonstrated that all primal and dual heuristics can be implemented to run in at most 10% of the time it takes an exact solver to find an optimal solution.As main advantage of simultaneously running primal and dual heuristics, a certificate of the quality of both heuristic solutions can be obtained by weak duality.
In summary, heuristics fail to deliver satisfactory results on most problem classes.Still, low computation time might justify the use of these heuristics before looking for optimal solutions, especially when the primal heuristic can be used as starting point for an LP solver anyway.Only in the case of specifically randomly generated problems the use of primal and dual heuristics delivers solutions with function values within a few percent of the optimal value.Our results highlight again the importance of testing transportation heuristics especially on problem instances originating from the specific application they are intended for.
Furthermore, we investigated the use of using dual information in primal heuristics and vice versa.Here, it was observed that the latter concept failed to deliver satisfactory results, whereas the former concept delivered acceptable primal solutions at least for the randomly generated problem instances.
In summary it is very hard to suggest a one-heuristic-fits-all method.Based on our study it seems reasonable to suggest using one of the primal heuristics using dual information, together with the Large Amount Least Cost Method to produce upper bounds.This should be complemented by one of the dual heuristics using sorting schemes for corresponding lower bounds, before exact solvers are invoked.
For our numerical experiments, we also improved upon the work of Jeffrey & James (1994) and suggest a more consistent problem generator called SOLGEN.This generator can produce problem instances for the transportation problem together with the corresponding optimal solution.We ensure that the topological structure of the solution is indeed uniformly distributed by employing random walks on complete bipartite graphs.Further, uniform distributions are obtained by sampling from Dirichlet distributions instead of inconsistently scaling uniform random variates.for all randomly generated values: Then, by construction, x and (u, v) are a primal-dual optimal pair for the generated problem since they are primal and dual feasible and satisfy the complementary slackness conditions (2.3).Since the entries of the cost matrix are generated by the sum of three uniform distributions (i.e. an Irwin-Hall distribution) their standard deviation is lower compared to the UNIFORM problems, which generally results in easier-to-solve problem instances (see Section 4).For the sake of comparability we employ a final step were we shift the cost matrix C, i.e. we set c i j − min i j c i j in the case that there exist negative values c i j < 0 and finally scale the cost matrix by dividing by max i j c i j .
Primal and dual degeneracy in SOLGEN One can easily adjust the SOLGEN generator in a way that the user can specify the primal and dual degeneracy of the optimal solution.Thus, primal degeneracy can be achieved by only setting a portion of the basic variables to non-zero values in (A.4) whereas dual degeneracy can be achieved by setting c i j : = u i + v j for a specified number of non-basic dual constraints.

Hardware and software choice
All tests were performed on a standard personal computer (processor: Intel Core i5-4090, 3.30 GHz, RAM: 16GB).The implementation of all methods was carried Setting c : =(c 11 , c 12 , . . ., c 1n , c 21 , . . .c mn ) and x : =(x 11 , x 12 , . . ., x 1n , x 21 , . . .x mn ) we can write (2.1) in the form min x∈R mn c x s.t.Ax = e x ≥ 0 for e : =(a, b) and corresponding A. It is well known that in the case of the transportation problem we have rank(A) = m + n − 1.
j=1,...,n c i j and b j is set to b j • min i=1,...,m c i j .Thus, the dual variables are considered in a different order than in (DGM).The complexity is again O(m 2 ).
from dual heuristics on UNIFORM for different dual methods (see Section 3.4).
from dual heuristics on UNIFORM for different dual methods (see Section 3.4).

Figure 2 :
Figure 2: Average computation time for the problem classes SOLGEN and UNI-FORM on symmetric problems, averaged over 100 problem instances.The positions of the heuristics in the legend corresponds to the positions of the respective graph in the figure.Observe that all heuristics are at least ten times fast than the exact solvers.LALC* denotes the suboptimal implementation of LALC.An optimal implementation should realize a computation time close to MMR. 19

Figure 3 :
Figure3: Approximation factors for the problem classes SOLGEN and UNIFORM on asymmetric problems.For fixed n = 50 and increasing m ≥ 50 the evaluation points were averaged over 100 problem instances.The approximation factor is given by val opt where val is the function value of the solution of the heuristic and opt is the optimal function value for the problem.The positions of the heuristics in the legend corresponds to the position of the respective graph in the figure.Note that we omitted certain heuristics such as the NWCR and MRR in some figures when their approximation factors where far worse than the rest, see the legends for details.It can be observed again that the approximation factors are significantly better on the SOLGEN instances than on the UNIFORM instances.

Figure 4 :
Figure 4: Average computation time for the problem classes SOLGEN and UNI-FORM on asymmetric problems.For fixed n = 50 and increasing m ≥ 50 the evaluation points were averaged over 100 problem instances.The positions of the heuristics in the legend corresponds to the positions of the respective graph in the figure.Observe that all heuristics are at least ten times faster than the exact solvers.LALC* denotes the suboptimal implementation of LALC.An optimal implementation should realize a computation time close to MMR.21 u i : = U ([−0.5, 0.5]) for all i = 1, . . ., m (.1) v j : = U ([−0.5, 0.5]) for all j = 1, . . ., n (.2)c i j : = u(i) + v( j) for (i, j) ∈ B u(i) + v( j) + U ([0, 1]) otherwise (.3) x i j : = p k for the k-th entry (i k , j k ) in Bwhere U ([a, b]) denotes the uniform distribution on the interval[a, b].Furthermore, as for the UNIFORM instances, we ensure ∑ i a i = ∑ j b j = 1 by uniformly generating a vector p = (p 1 , . . ., p m+n−1 ) on the (m + n)-simplex by means of a Dirichlet distribution.

Table 2 :
Analytical and empirical distribution parameters for the costs of the three problem classes.In case of the DOTMARK instances we give the interval of the values over all classes.In case of the UNIFORM and SOLGEN instances we give the values for m = n = 2000.

Table 3 :
Computation time and approximation quality for a selection of heuristics on the DOTMARK instances.The Modified Russel's Method delivered the best overall approximation factor.The primal North-West Corner Rule and the dual Row First Method performed worst in terms of approximation quality.