Leaf Extraction and Analysis Framework Graphical User Interface: Segmenting and Analyzing the Structure of Leaf Veins and Areoles 1

.

Interest in the geometry and topology of complex networks has grown immensely during the last few decades (Albert and Barabasi, 2002;Newman et al., 2006). Studies of the structure of river networks (Rodriguez-Iturbe and Rinaldo, 1997;Dodds and Rothman, 2000), the internet (Albert and Barabasi, 2002), and social networks (Watts, 1999) have been driven by large amounts of empirical information, often with concomitant theoretical development to explain network structure. More relevant to the study of leaf networks is a resurgence of interest in the structure of physical networks in biology and, in particular, resource delivery networks like cardiovascular networks, xylem networks, or leaf venation networks as a whole (LaBarbera, 1990;Sperry et al., 2003;Sack and Holbrook, 2006;Scarpella et al., 2006;Donner and Scarpella, 2009). However, because of the inherent difficulty in measuring physical biological networks, the growth in theory has arguably outpaced the available data needed to test theoretical predictions or assumptions (West et al., 1997(West et al., , 1999Bejan, 2000;Couder et al., 2002;Dodds, 2010).
Given the importance of physical networks in regulating the flow of biological fluid, one might think that libraries of data should exist with detailed measurements on their geometry from which one could evaluate theoretical predictions: this is not the case. While some data exist quantifying the dimensions of mammalian networks in their entirety (e.g. Zamir, 1996), less work has been done on plants (LaBarbera, 1990;McCulloh et al., 2003), and those descriptions that do exist are usually of a part of the network, not the whole. For example, there is a long history of measurements of components of the aboveground structure of xylem networks. Extensive measurements have been made of vein length distributions (Tyree and Zimmerman, 1983), width and scaling of xylem (Anfodillo et al., 2006;Weitz et al., 2006;Coomes et al., 2007;Mencuccini and Holtta, 2007), and even relative hydraulic resistance across distinct components of trees (Tyree and Sperry, 1989;Turcotte et al., 1998;McCulloh et al., 2003). In addition, there is a growing interest in describing detailed root network structure, largely applied to Arabidopsis (Arabidopsis thaliana) as well as to crop plants (French et al., 2009;Iyer-Pascuzzi et al., 2010;Le Bot et al., 2010).
Most attempts to quantify the hydraulic structure of physical networks in leaves have focused on quanti-fication of the geometry of part of the leaf vascular network. By network we mean the hierarchical vessel bundles that pervade leaves and contain xylem, phloem, and structural elements. Those empirical measurements thus far include quantification of: the distribution of branching angles in several leaf species (Bohn et al., 2002), the lengths and diameters of vessel bundles in the side lobe of a leaf (Turcotte et al., 1998), the length per unit area of the higher-order veins (Sack and Frole, 2006;Brodribb et al., 2007;Boyce et al., 2009;, or the length and width of the primary and sometimes secondary venation (Niinemets et al., 2007a(Niinemets et al., , 2007b. We are aware of only one attempt to quantify the linear dimensions of an entire leaf network that relied on an admirable but painstaking point-and-click approach in relatively small Arabidopsis leaves (Rolland-Lagan et al., 2009), similar to that found in Figure 1, and leaves commonly used in developmental studies. Quantifying the geometry of entire leaf networks has remained elusive in part because of the sheer number of measurements required and because of the difficulties in automated image segmentation.
Quantifying the geometry of leaf networks has significant implications for many areas of plant biology. (1) The structure of leaf networks has been implicated in the leaf economics spectrum, with several authors speculating that increased hydrodynamic and structural demands may lead to increased investment in vein networks as leaves become larger (Niinemets et al., 2007a(Niinemets et al., , 2007bNiklas et al., 2007). (2) Changes in leaf venation structure have the potential to influence mass-or area-based leaf photosynthetic rates via a reduction in specific leaf area, one of the principle dimensions underlying the leaf economics spectrum (Reich et al., 1997;Wright et al., 2004). (3) Recent studies have demonstrated that leaves are the major hydraulic bottleneck in plants (Sack and Holbrook, 2006); thus, detailed measures of vein geometry will inform attempts to model patterns of hydraulic conductance (Cochard et al., 2004). (4) Network structure can limit photosynthesis due to its effects on hydraulic efficiency, with recent studies implicating leaf vein density, defined as total vein length over area, as a strong predictor of photosynthetic rates (Sack and Frole, 2006;Brodribb et al., 2007), which may also have played a role in the diversification of early angiosperms . (5) Leaf vein patterning has been shown to be correlated with leaf shape, suggesting shared developmental pathways (Dengler and Kang, 2001). (6) Leaf vein impressions are arguably the most abundant plant macrofossil available to paleobotanists; thus, the ability to more rigorously quantify vein geometry has the potential to aid attempts to identify fossil samples with greater phylogenetic resolution (Behrenmeyer et al., 1992).
To help to address these and other issues we introduce a Leaf Extraction and Analysis Framework Graphical User Interface (LEAF GUI), a stand-alone, platform-independent, executable program available free for academic use. The LEAF GUI was developed in MATLAB and is built upon a series of algorithms designed to threshold, clean, and segment images of leaves in which the vein structure is visible. Vein visibility may be high due to enhancement via several means including, x-ray methods, chemical clearing, biological clearing, or backlit scanning (Bates, 1931;Lersten, 1986;Wing, 1992). Depending on user interests, the results may include information about areole shape and size, vein network connectivity, as well as number, length, width, and location of network edges and nodes. From these data, a large amount of information can be generated to test hypotheses regarding: (1) the structure of leaf hydraulic networks, (2) phylogenetic relationships among species, and (3) the ecological and evolutionary function of leaf vein networks.

OVERVIEW OF THE SOFTWARE SYSTEM
The following can be downloaded freely at http:// www.leafgui.org: (1) LEAF GUI for Windows, Mac, Figure 1. Series of Arabidopsis images including, original image (A), results of thresholding and cleaning (B), colored labeled areoles (C), and the results of a distance transformation on the areoles (D). These images serve to demonstrate the utility of the software for analyzing images, such as those that might be found in developmental studies. Image in A graciously provided by E. Scarpella and originally published by Donner and Scarpella (2009). and Linux (compiled); (2) detailed installation instructions; (3) user manual, including description of caveats and limitations (Supplemental Document S1); and (4) demonstration videos that illustrate different examples and functionality of the software. Annotated code is also available upon request from the primary author.
LEAF GUI is an interactive software program built in MATLAB. The purpose of the software is to dramatically increase the speed and accuracy of the extraction and processing of vascular and areole structure from digital images of leaves. The program incorporates many image processing and analysis tools into a single graphical user interface. The software is modular in construction, including preprocessing, image cleanup, and leaf network extraction steps. The interactivity of the software allows the user to extract features from monocot or dicot leaves that have been prepared, stained, and imaged using one of many standard protocols (Bates, 1931;Lersten, 1986;Wing, 1992). Although the processing of images is user assisted, the extraction of features is fully automated.
The overall process that a user might take to process a leaf image and measure structure within the leaf network can be broken down into five major steps: (1) setting the scale of the leaf image, (2) initial image cropping, (3) image thresholding, (4) binary image cleaning and processing, and (5) extracting leaf network features. The tools invoked in this series of steps (described in detail below) can be seen in a screenshot of the LEAF GUI with brief descriptions on the various functions it employs (Supplemental Fig. S1). A pipeline describing the workflow, statistics modules, and different image visualization options can be seen in Figure 2. Our software utilizes a number of standard routines included in the MATLAB image processing library and denoted in the code. We will not attempt to describe algorithms created by MATLAB: Extensive descriptions for these calls can be found on the MATLAB Web site (www.mathworks.com) or in the MATLAB documentation. We will describe in detail here only those algorithms that are unique to the LEAF GUI software. To simplify our introduction here, we will assume that the user is attempting to process an red, green, blue (color) image of a cleared leaf that contains some unwanted noise. Videos with examples of all of these steps can be seen on www.leafgui.org.
Many leaf networks are reticulate in contrast to hierarchical structures such as tree branches or roots. To describe the reticulate structure of a leaf network, we use conventional terms and their definitions from graph theory. In this manuscript and in the software, we refer to the vessel bundles in a leaf as edges and the point where two or more edges intersect as a node. A single individual edge is defined as the vessel bundle segment occurring between nodes. Thus, for example, the primary or midvein of a leaf would be viewed as a series of connected edges, rather than a single vein. Assuming edges can be approximated as cylinders, the geometry of each individual edge can be described using only its length and diameter. From these two measures, the surface area and volume of a leaf vein network are easily approximated.

STEP 1: SETTING THE SCALE
The conversion of pixels to a unit of length is required for network measurements. There are two options to set the scale, conventionally in pixels/mm. If the scale is known, it can be entered in the text box in the Set Scale panel. Alternatively, if the image contains a scale bar of length L (e.g. in mm), the user directs the software to measure the number of pixels Np in the given scale bar using the Measure Scale tool. The scale in this case will be set to Np/L (pixel/mm). A detailed explanation on how to set the scale is given in the manual and illustrated in Demo Video 2.

STEP 2: CROPPING THE INITIAL IMAGE
Two options are provided to crop the initial image: rectangular or polygonal cropping. Either cropping method is useful when extraneous features such as scale bars, labels, other leaves, or image noise need to be removed from the image. The choice between the two methods depends primarily on the location of the noise in the image. The cropping functions are demonstrated in Demo Video 3.

STEP 3: IMAGE THRESHOLDING AND SEGMENTATION
A necessary precursor to estimate the structure of a leaf vein network is to separate veins from the background. The identification of veins (also called segmentation) is accomplished in two steps: first, thresholding an image so that foreground regions (pixel value 1) are distinguished from background regions (pixel value 0); second, the resulting binary image is cleaned and processed (see next step). In many image analysis programs, the goal is to separate out distinct disconnected entities (like cells) from each other (e.g. Cell Profiler; Carpenter et al., 2006). Here, the objective is to identify edges (a vessel bundle segment) that are connected to each other at nodes.
Two different thresholding methods (local and global) are included in LEAF GUI to be used separately or combined to convert a leaf vein image into a binary image, where leaf veins are represented by ones and nonvein regions by zeros. Global thresholding takes a grayscale copy of the original image, where pixel values range from 0 to 255, and sets pixel values above a certain threshold to 1. Pixels with an intensity value lower than the threshold are set to 0. For example, if a threshold value is 125, then all pixels with a value of 125 or greater will constitute foreground, whereas pixel with lower values will represent background.
Unfortunately, global thresholding may produce poor results for unevenly illuminated images. Adaptive thresholding corrects for uneven illumination by comparing each pixel intensity value, p i , with the mean intensity value, I i , computed over a local window of size w centered at the pixel. If p i is greater than I i 2 X, for some fixed margin X, then the pixel becomes a part of the foreground; otherwise, its value is set to 0. The use of both thresholding methods is described in Demo Video 4.

STEP 4: BINARY IMAGE CLEANING AND PROCESSING
Once the image has been thresholded, a series of steps might be employed to further clean and enhance vein representation in the binary image. The choice and sequence are specific to the user requirements. These include removing unwanted connected regions below a certain size cutoff (e.g. all disconnected foreground regions smaller than 10 pixels), removing the leaf perimeter in single pixel-wide steps, filling single pixel holes, removing extraneous spurs (single pixelwide extrusions), filling or removing user-specified polygonal regions, clearing regions overlapping the image border, or removing unwanted labeled (color coded) regions. At any point, the user can create a complement of the binary image (inverted image) and perform the same tasks on the part of the image that was previously background. The LEAF GUI also provides an option to create a mask through the use of a very high or low threshold value. In the resulting image, the leaf (everything within the leaf margin) is entirely white and the rest of the image is black. This step is useful in removing unwanted background noise following thresholding (see Demo Video 4).
At any point during this process, the user has the option to use one of several visualization options to inspect how well the image is being segmented. It is important for the user to evaluate image preprocessing and segmentation steps because they will affect computation of different leaf, vein network, and areole statistics. Some of the further computations that the user can make with the processed leaf image include the skeleton of the vein network and a distance transform function on either the areoles or veins (Figs. 1 and  3), labeling leaf veins or areoles (assigning a numerical identifier to each contiguous vein or areole region), both of which can indicate how well the network is connected and consequently how well areoles are delineated (see Demo Video 7).

STEP 5: SUMMARY STATISTICS
There are three primary options the user can select within the Summary Statistics Panel to return descriptive statistics from the leaf image. These are broken down into three buttons corresponding to the different types of statistics: Area Stats, Vein Stats, and Areole Stats. The output is either an Excel spreadsheet (which requires that Excel be installed) or a tab-delimited text file based on user preferences. Examples of all three output options are available as Excel spreadsheets in Supplemental Tables S1-S3 or at www.leafgui.org.

Area Stats
Area Stats returns the area, perimeter, and records the scale of the image. These measures are computed for the entire leaf (Supplemental Table S1).

Areole Stats
Areole Stats returns the area (Fig. 4), convex area (area of the convex hull that just encloses the region), solidity (the ratio of areole area to convex area), eccentricity (the ratio of the major and minor axis of the ellipse that just encloses the region), equivalent diameter (the diameter of a circle with the same area as the region), length of the major and minor axes of the ellipse that just encloses the region, centroid position (x and y coordinates of the region's center of density), mean distance to the nearest vein, and its variance for each areole in the leaf (Supplemental Table S2).

Vein Stats
Vein Stats returns two tables, the first (see sheet titled Vein_Stats in the Excel output option; Supplemental Table S3) containing a connectivity matrix, which is a N E 3 3 matrix (N E is the number of edges) showing which labeled nodes (columns 2 and 3) are connected by which labeled edges (column 1). The table also contains the length (Fig. 4), width, and spatial position (centroid) of every edge and node of the vein network. In addition the software returns the two-dimensional (2D) area occupied by each edge and estimates for the surface area and volume based on the assumption that each edge is approximately cylindrical. The second table (see sheet titled Summary_Stats in the Excel output option; Supplemental Table S3) includes the total number of nodes and edges, the total length of the network, the total 2D area occupied by the network, and the mean edge length, width, 2D area, three-dimensional surface area, and volume.

NETWORK MEASUREMENT ALGORITHM
All measurement algorithms are performed on binary images where veins are represented in white, and the background is black. The binary image is first skeletonized. The skeleton of the image is obtained by repeatedly thinning the vein network until it is a single pixel wide. During thinning, boundary pixels are removed iteratively from different sides, resulting in the skeleton that consists of the central pixels of the network.
Labeling of individual edges requires a more complicated routine. First, we utilize a standard MATLAB routine to identify the skeleton branchpoints, which we refer to as nodes. However, to label the individual edges, they must be disconnected from each other. To do this, we remove all pixels that have three and more neighbors. As is apparent in Supplemental Fig. S2C, all branching points and some adjacent pixels may have three and more neighbors. By removing these pixels, we disconnect all edges of the network and then label them individually. We then separate the nodes, edges, and tips into three distinct images and then assign a unique numerical label to each individual node, edge,  Table I. Increasing distance from the nearest vein is indicated by colors at the yellow/red end of the spectrum as in the scale bar at right (mm). Note that the areoles and distance to nearest vein are generally smaller in C. caudata and larger in K. africana as also indicated in Table I. and tip, respectively. Supplemental Figure S2D shows a part of such labeled image of the vein network skeleton; the skeleton consists of edges (labels 112, 103, 120, 125, and 126) and nodes (labels 67 and 71).
We then apply a distance transformation to the original binary image, which indicates the distance from each network pixel (white) to the nearest nonnetwork pixel (black; Supplemental Fig. S2A). The pixel-by-pixel product of this distance transformation image and the skeleton image results in an image that represents the skeleton of the entire network with the distance to the nearest areole (nonvein region) recorded in each pixel (Supplemental Fig. S2B).
To calculate the dimensions of each individual edge, we treat each edge as a series of connected cylinders. Each cylinder is a single pixel in length, the diameter of which is approximated by doubling the value of the distance transform from the previous step, i.e. twice the distance from the medial axis to the nearest nonvein pixel. The total length of the edge is then calculated by walking along its medial axis counting one unit for pixels that share a border, and square root of 2 for pixels that touch on their corners. The diameter of each edge is the mean of the diameters in each skeleton pixel it contains (Supplemental Fig. S2, A and D). The total edge surface area and edge volume are then estimated as + where SkE is the number of pixels in the edge, d i is the diameter of the i-th pixel of the edge, and l is the length of a single pixel.

NETWORK CONNECTIVITY ALGORITHM
To generate a connectivity matrix, the 3 3 3 neighborhood around each node is searched to see what edges it is connected to (Supplemental Fig. S2D). The connectivity matrix is then represented as an N 3 N matrix where N is the number of nodes, rows and column indices represent the list of nodes, and matrix entries are the labels of the edges that connect nodes. This matrix is very sparse (i.e. E ,, N 2 , where E is the number of edges); thus, we store it in the condensed form as an E 3 3 matrix showing which labeled edges (column 1) are connected by which labeled nodes (columns 2 and 3).

FURTHER COMPUTATIONS
In addition to leaf area and leaf perimeter, the output from the Vein Stats algorithms allows the calculation of cumulative statistics for the entire leaf such as the total number of nodes, total number of edges, total network length, and total network 2D area. Using the distribution of edge dimensions contained in the Vein Stats table, one can also calculate the mean, median, mode, maximum, and minimum for any and all edge measures. Moreover, one can look at the distributions of edge dimensions and compare them with those predicted by theory. Using the Vein Stats table, one can easily plot the data to evaluate theoretical predictions. Similarly, the output from the Areole Stats algorithms can be used to calculate statistics such as total areole area or the frequency distribution of areole sizes or shapes. An example of such cumulative statistics computed for three different leaves is shown in Figure 4. In addition, the combined output data can allow the user to calculate additional summary measures of leaf morphology such as the vein density (network length/leaf area), network volume/leaf area, network surface area/ leaf area, or total areole area/leaf area.

THREE-LEAF ANALYSES: NETWORK AND AREOLE DIMENSIONS
To demonstrate the functional capabilities of the LEAF GUI, we present data from the analysis of threeleaf images obtained from the National Cleared Leaf Collection, housed at the American Museum of Natural History (Scott Wing curator). The three species are: Kigelia africana (Apocynaceae), Castanopsis caudata (Fagaceae), and Rockinghamia angustifolia (Euphorbiaceae). The three leaves were chosen somewhat at random, the only criteria for selection being clarity of their venation structure (Fig. 3). All three leaf images were obtained via chemical clearing methods. The data we present are for heuristic purposes only; thus, we are not testing any hypothesis per se. Rather, we are demonstrating some types of measurements and analyses that can be performed using LEAF GUI. All Table I. Processing times and summary statistics for the three leaves depicted in Figure 3 Whole-leaf statistics are denoted in rows 4 to 15. Areole summary statistics are given in rows 15 to 23, and edge summary statistics are given in rows 26 to 30. Calculation times are reported for a server with a 1.86-GHz processor with 16 GB of RAM and for a laptop with a 998-MHz processor with 2 GB of RAM, respectively. These represent the time it took to return statistics after the leaf image had been thresholded and cleaned. Note that as would be expected, many statistics are correlated with each other; for example, mean areole area and mean edge length are largest in K. africana and smallest in C. caudata, with R. angustifolia intermediate. Figure 3 Family three leaves were thresholded with a global threshold level of 160, and a local threshold of 0.01 with a local window size of 10. Contiguous regions in the image smaller than 5 pixels were removed, isolated single black pixels surrounded by white pixels were filled, and the Remove Spurs option was utilized once. Table I shows the mean and cumulative measures for the whole-leaf, vein network, and areole statistics that were generated by the LEAF GUI software. The three leaves differ in number of edges and areoles, mean edge length, mean areole area, and a number of other associated measures. These differences may be due to the hydrodynamic demands of the habitats in which these species are found, phylogenetic conservation of vein morphology, or some other factor; it is beyond the scope of this article to analyze the basis for such differences. Not surprisingly, many vein and areole measures are correlated with one another. For example, K. africana has fewer areoles per unit area than the other two species, and its mean areole size and perimeter are larger. In addition, K. africana has fewer edges per unit area, but each individual edge is larger than the other two species and the total vein density is less.

DISCUSSION
The LEAF GUI software is designed for plant biologists and ecologists who wish to analyze the macroscopic structure of veins in leaves. The software allows users to extract descriptive statistics on the dimensions and positions of leaf veins and the areoles they surround by following a series of thresholding, cleaning, and segmentation algorithms given images of leaves where veins have been enhanced relative to the background. The data returned by the LEAF GUI software have the potential to address many longstanding questions in leaf biology and functional ecology. Moreover, several theories have been proffered regarding the structure of leaf vein networks. For example,  hypothesized that leaf networks might be approximated by fractal structures with underlying power law distributed vein radii and lengths. Alternatively, Banavar and colleagues (1999) suggested power law relationships between network volume and length determined by leaf dimension. These are two examples from a growing number of recent theoretical attempts in need of empirical validation (Banavar et al., 1999;Couder et al., 2002;Runions et al., 2005;Dimitrov and Zucker, 2006;Dodds, 2010;Katifori et al., 2010). However, as mentioned above, to date there has been very limited data with which to test these theories.
The data provided in Table I and Figure 4 serve to illustrate potential tests of the aforementioned theories, and a few of the many types of analyses one could perform using the software. Many of these questions will involve quantifying the investment in vein mass or vein surface area per unit leaf mass or leaf surface area, and placing various investment strategies in ecological or phylogenetic contexts. For example recent work has shown that leaf vein density is correlated with photosynthetic rate and may have played a role in early angiosperm diversification (Brodribb et al., 2007;. In addition, a methodology to quantify the structure of leaf vein networks has the potential to add rigor to the classification and description of leaves. For example, current classification of leaves is based on the idea that individual veins fall into discrete rank classes based on vein diameter (Leaf Architecture Working Group, 1999). However, leaf vein sizes vary continuously and many veins appear to change size as they progress through the leaf lamina. While we readily admit that the traditional approach has been very useful in distilling the dizzying array of venation morphologies into useful categories, the concept of vein rank is itself somewhat subjective. Our methods can begin to address this question by providing much more detail regarding the size, position, and connectivity of leaf vein networks, enabling network descriptions and classification based on their inherent continuous nature.
Image segmentation in computer vision is not a trivial issue, and an entire field has developed to address the problem (Haralick and Shapiro, 1985;Sezgin and Sankur, 2004). Thus the extent to which the measurements produced by the LEAF GUI represent actual leaf traits depends both on initial image quality, and the effectiveness of our image processing and feature-extracting algorithms. Further, because LEAF GUI is user assisted, it has the advantage of being able to extract useful information from images of leaf venation networks with variable clearing and imaging methods. Similar approaches have also been taken in the study of root networks (e.g. Le Bot et al., 2010). However because LEAF GUI is user assisted, it is currently not possible to apply the same set of cleaning and extraction steps to a set of images automatically. We hope to address questions of automation in future versions of the software.
In summary, the network structures generated in the use of LEAF GUI are imperfect representations of leaf networks, and future developments, both in the imaging of leaf networks, and in image segmentation, will certainly improve our ability to accurately measure the dimensions of leaf networks. These caveats notwithstanding, the collection of algorithms detailed herein provides a means with which to accurately and rapidly extract network and areole information from leaves imaged under a wide range of conditions. Moreover, because the software and the underlying code are freely available, other investigators have the option to modify methods as needed to answer specific questions, or improve upon current approaches.