eHistology image and annotation data from the Kaufman Atlas of Mouse Development

Abstract “The Atlas of Mouse Development” by Kaufman is a classic paper atlas that is the de facto standard for the definition of mouse embryo anatomy in the context of standard histological images. We have redigitized the original haematoxylin and eosin–stained tissue sections used for the book at high resolution and transferred the hand-drawn annotations to digital form. We have augmented the annotations with standard ontological assignments (EMAPA anatomy) and made the data freely available via an online viewer (eHistology) and from the University of Edinburgh DataShare archive. The dataset captures and preserves the definitive anatomical knowledge of the original atlas, provides a core image set for deeper community annotation and teaching, and delivers a unique high-quality set of high-resolution histological images through mammalian development for manual and automated analysis.


Data Description
"The Atlas of Mouse Development" [1] is a book detailing the anatomy of mouse embryo development and stands as the definitive work in the field. The atlas is based on a lifetime of work by Kaufman, who established a unique set of histological sections of about 450 mouse embryos, many of which are full serial section-series, from which he selected carefully staged samples for the histological images within the book. The combination of the histological section series and the printed book represent a unique resource and captures the current understanding of classical mouse anatomy. In taxonomic terms, these physical sections are the reference specimens for the definition of mouse embryo anatomy, and the digitized images with the associated annotations are a digital holotype for the definition of anatomical terms and the progression of mouse embryo development. In addition, the paper atlas has given rise to the Mouse Atlas programme in Edinburgh [2] and to the EMAPA mouse anatomy ontology [3,4]. The original index for the book was used to develop the primary list of anatomical terms in the ontology, and EMAPA is now recognized as the standard mouse embryo ontology used to annotate mouse embryo data including embryo phenotype data [5,6].
In generating the eHistology Atlas, new images of the histological sections were acquired at high resolution, and the annotations have been transferred to a database. These images and annotations are now freely available from the eMouseAtlas web resource as eHistology (Fig. 1) and have been described by Graham et al. [7]. The new high-resolution images and the associated image coordinates for each annotation are fully freely available under a Creative Commons CC BY 4.0 licence. In addition, we have an agreement with Elsevier to present the images in a form similar to the original atlas plate layout (the web resource), and Elsevier is able to use the new images for their own version of an online version if they want to. Here we describe the dataset of the 937 high-resolution histology images with anatomy annotations and how they have been made available for further study and analysis.
The motivation for the eHistology resource was to capture the anatomical knowledge in a permanently accessible open and digital form delivered with a viewer providing a view of the underlying histology data not possible in the printed atlas. The high-resolution images provide a rich resource of carefully staged mouse histology, which could be used for deeper analysis of tissue development and as a teaching resource. See Fig. 1 for an illustration of the resolution now available for these images. Embryogenesis is a highly dynamic process, and in Fig. 2 we highlight some of the advantages of capturing images at cellular resolution, e.g., the ability to zoom in and morphologically identify mitotically dividing cells and apoptotic cells undergoing programmed cell death. This is simply not possible in the print version of the atlas and represents a significant contribution to the community.
All the data are available under a creative commons licence (CC BY 4.0). In the future, we envisage the annotations being extended on a tissue-by-tissue basis through community curation. The eHistology viewer is open source and is available from the Mouse Atlas technical GitHub repository (github.com/ma-tech) (eHistology; RRID:SCR 015887).
Providing secure and long-term accessibility for research data is a difficult problem. A recent study of the longevity of 375 biomedical resources/databases [8] available on the web in 1997 found that 62.3% had ceased to be available, 14.4% were static, and only 23.3% were available as an active resource. The authors concluded that survival depended primarily on institutional interest and that a strategy dependent on external funding will very likely fail. To ensure long-term preservation of the image data and supporting annotations, we have therefore registered this dataset with the University of Edinburgh DataShare [9] repository [10], with policies registered in OpenDOAR (Directory of Open Access Repositories) [11]. Specifically, the preservation policy includes indefinite preservation of the original data with format migration to ensure continued readability and accessibility. In addition, and for convenience, these data are also hosted in the GigaDB repository [12].

Histology
Details of the mouse strains used, histological sectioning, and staining are provided by Kaufman (1994) [1]. Briefly, the embryos were "isolated from spontaneously cycling (C57BL X CBA) F1 hybrid females that had been previously mated to genetically similar F1 hybrid males." The embryos were fixed, dehydrated, embedded in paraffin wax, and sectioned at 7-micron thickness. The mounted sections were then stained with haematoxylin and eosin.

Slide digitization
Digitization of the original histology slides was accomplished using the Olympus DotSlide slide scanner system. Using a × 20 objective lens, this generated full-colour images with a pixel resolution of 0.34 microns. Calibration was accomplished as part of the digitization process, allowing the inclusion of scale bars and the option to measure the distance between 2 points. In 2 instances (Plate 5 and Plate 14), the original sections could not be sourced and were presumed lost. In these instances, the original photographic negatives were used in place of the original slides to generate cellular-resolution grey-scale images.

Annotation and linking to the EMAPA ontology
Annotation was accomplished using a manual procedure whereby "flags" were positioned on points corresponding to the matching points as used in each plate in the book. The flags were placed using an editor's version of the eHistology interface [7]. Each flag was linked to the anatomical term or phrase used in the book and also an EMAPA ontology term and an associated Wikipedia link. There were more than 10 000 flag labels used to annotate the eHistology sections, and linking them to EMAPA IDs was achieved through a combination of string matching and manual assignment of terms [13]. Linking to Wikipedia was accomplished using a manual process that utilized parent terms in the partonomic ontology tree to find the closest match for a given anatomical term or tissue.

eHistology viewer
Each eHistology image is described in an Edinburgh DataShare Digital Object Identifier (DOI), and this description includes the URL link to the eHistology viewer for that image. In this way, we provide a persistent means of accessing the zoom viewer for that image. An example DOI for a single high-resolution image is dx.doi.org/10.7488/ds/1232. This link resolves to a specific page at the Edinburgh Datashare web resource[14], [15], which in turn provides a link through to the current URL for the eHistology viewer.
By starting with a fully persistent DOI, the user will always be able to locate the data and is protected from any change to the hosting domain and URL of the eHistology viewer [16]. For convenience, we also provide an interactive index to the new images based on the plate and image designations of the original atlas.

Code availability
The data are provided in open-standard tif or jpeg image formats. All metadata are in plain txt format, and the Supplementary Data are in the Microsoft Excel open xml format xlsx. The code used for the online histology viewer is provided at the ma-tech GitHub archive, and specifically we use the WlzIIPSrv tiled image server and the eAtlasViewer javascript application.

Data records
Each record has an assigned DOI that resolves to a set of data files comprising a jpeg or tiff encoded image, Dublin core and other metadata files, and the set of annotations associated with the image. The image data volumes range up to 2 Gb, with a total volume of 118 GB for the full series in compressed "zip"   Title  Title  Title  text  true  Title  alternative  Title  Alternative Title  text  false  Type  Type  Type  Controlled text  true The DCMI column provides the official Dublin Core term for the element, and Label is the heading for these data on the DataShare metadata listing.
format. Table 1 lists the files with each dataset. Each University of Edinburgh DataShare submission requires a subset of the Dublin Core (dublincore.org) data elements to be completed and allows a further set of optional elements; these are detailed in Table 2. Table 3 provides a partial listing of the datasets available as an example of the data content. The full listing of all 937 images is provided in the Supplementary Excel formatted data file SciDataKaufmanTable3.xlsx and corresponds to all of the histology section images of the original atlas for Plate numbers 2-41.

Technical Validation
The Images and associated data are all validated against the published atlas, which provides the detail of the genotype, defines the histological protocols, and establishes the correct staging of each embryo against the Theiler criteria. The section images used in the book are from specific tissue sections identified on the sets of microscope slides stored at the MRC Human Genetics Unit at the IGMM, University of Edinburgh. Each section was scanned digitally, then checked by a second curator to ensure validity. The annotations were originally captured using optical character recognition, and the text and spelling were checked by a second curator. All the end-point locations for the annotation terms have been double-checked, and a series of quality control steps have meant that inspection of the whole dataset has not revealed any errors.

Usage Notes
There are no constraints on the use of the images and associated data. The Supplementary Data file lists all samples and assays-1 for each section image-and also a "source," which is the embryonic mouse specimen used by Kaufman in producing the histological sections. The "source" can be used to identify the set of physical glass slides, archived with the Centre for Research Collections of the University of Edinburgh, on which each histological section can be found. In principle, it is possible to obtain further images of the same or other sections in the series. "Age" is defined in embryonic days post-coitum, and "stage" refers to Theiler stage, a morphological staging system used to further define mouse embryo development. "Position" describes the relative position of the section in the embryo, with 0 representing, e.g., the most cranial section in a transverse series and 1 denoting the most caudal section. We additionally include details of the pixel resolution of each image, enabling accurate measurements to be made on each high-resolution embryo atlas image.