Profiling of open chromatin in developing pig (Sus scrofa) muscle to identify regulatory regions

Abstract There is very little information about how the genome is regulated in domestic pigs (Sus scrofa). This lack of knowledge hinders efforts to define and predict the effects of genetic variants in pig breeding programs. To address this knowledge gap, we need to identify regulatory sequences in the pig genome starting with regions of open chromatin. We used the “Improved Protocol for the Assay for Transposase-Accessible Chromatin (Omni-ATAC-Seq)” to identify putative regulatory regions in flash-frozen semitendinosus muscle from 24 male piglets. We collected samples from the smallest-, average-, and largest-sized male piglets from each litter through five developmental time points. Of the 4661 ATAC-Seq peaks identified that represent regions of open chromatin, >50% were within 1 kb of known transcription start sites. Differential read count analysis revealed 377 ATAC-Seq defined genomic regions where chromatin accessibility differed significantly across developmental time points. We found regions of open chromatin associated with downregulation of genes involved in muscle development that were present in small-sized fetal piglets but absent in large-sized fetal piglets at day 90 of gestation. The dataset that we have generated provides a resource for studies of genome regulation in pigs and contributes valuable functional annotation information to filter genetic variants for use in genomic selection in pig breeding programs.


Introduction
The domestic pig (Sus scrofa) is a hugely important farmed animal species globally, contributing a source of healthy animal protein to feed the growing human population. Meeting the increased demand for healthy sustainably produced food from pigs in coming decades will require novel breeding strategies and management practices that will rely on an improved ability to predict phenotype from genotype (Clark et al. 2020). High-resolution annotations of the expressed and regulatory regions of farmed animal genomes provide a resource to accurately link genotype to phenotype (Andersson et al. 2015). Variants in putative regulatory regions have been associated with >100 phenotypes in humans (Pai et al. 2015). Recently, a functional regulatory variant in the gene myosin heavy chain 3 (MYH3) was shown to influence muscle fiber type composition in Korean native pigs (Cho et al. 2019). There is very little species-specific information about how the genome is regulated in domestic pigs. This lack of knowledge hinders efforts to identify causative variants for complex traits, and a better knowledge of genome regulation might also improve genomic prediction in breeding programs. To address this knowledge gap, we aim to identify regulatory sequences in the pig genome, starting with regions of open chromatin.
Activation of regulatory DNA drives gene expression patterns that influence phenotypic characteristics. Measurement of open chromatin gives a quantitative genome-wide profile of chromatin accessibility appearing as "peaks" in the data generated for each tissue sample (Thurman et al. 2012). These peaks can reflect the function of the adjoining regulatory DNA (Thurman et al. 2012). The Assay for Transposable Chromatin (ATAC-Seq) (Buenrostro et al. 2013(Buenrostro et al. , 2015 has been used successfully to profile regions of open chromatin in chicken, cattle, and pig genomes (Halstead et al. 2020a(Halstead et al. , 2020b. In this study, we used the Improved Protocol for the Assay for Transposase-Accessible Chromatin (Omni-ATAC-Seq) (Corces et al. 2017)  Muscle is an important tissue in commercial pig production as muscle traits (e.g., meat and carcass quality) act as economic drivers in pig breeding programs. Prior to this study knowledge of open chromatin in pig muscle was limited to data from only two adult animals (Halstead et al. 2020b) and four fetuses from three early developmental stages (Yue et al. 2021). For this study, we collected semitendinosus muscle tissues from piglets at five different stages of development (three fetal stages, one neonatal, and one juvenile stage). The developmental stages were chosen for their relevance to hyperplasic muscle development in the fetus and postnatal muscle hypertrophy (Ashmore et al. 1973;Wigmore and Stickland 1983;Rudar et al. 2019). We hypothesized that gene expression and regulation in semitendinosus muscle tissue would change as the piglets aged, allowing us to identify the transcripts and regions of open chromatin that drive myogenesis. Several studies have profiled gene expression during fetal development in pigs (Zhao et al. 2011;Yang et al. 2015;Zhao et al. 2015;Ayuso et al. 2016); however, to date only one other study has examined how chromatin openness changes as the piglet develops (Yue et al. 2021).
The number of muscle fibers in pigs is proportional to weight at birth (Aiello et al. 2018;Stange et al. 2020). Low birth weight in pigs has been shown to cause lifelong impairments in muscle development and growth (Rehfeldt and Kuhn 2006). Low birth weight piglets often display "catch up" growth, but at the expense of laying down a higher proportion of body fat compared to normal-sized littermates (Estany et al. 2017). Consistent with these observations, mesenchymal stem cells from intrauterine growthrestricted piglets show a differentiation bias toward the adipocyte lineage in comparison with their normal-sized litter mates (Weatherall et al. 2020). Low birth weight piglets tend to produce fatter, less valuable carcasses from a production perspective and as such their incidence within pig litters should be kept to a minimum (Pardo et al. 2013). Size variation within a litter is likely to be determined by many different physiological variables including variation in placental blood flow (Stenhouse et al. 2018) but may also be influenced by genetic and epigenetic factors (Wang et al. 2016;Li et al. 2020).
The study we present here used samples of muscle tissue from a common commercial breed cross (Large White Â Landrace) to generate ATAC-Seq and RNA-Seq data from the same individuals to characterize the expressed and regulatory regions of the genome during development. The aims of the study were to: (1) map regions of open chromatin in semitendinosus muscle tissue from small-, average-, and large-sized male piglets at five developmental stages (days 45, 60, and 90 prenatal and 1 and 6 weeks postnatal) and (2) analyze RNA-Seq data from the same tissues to generate gene expression profiles. The outcomes of the study will help to (1) understand the molecular drivers of muscle growth in pigs; (2) provide a foundation for functionally validating target genomic regions in vitro; and (3) identify highquality causative variants for muscle growth with the goal of harnessing genetic variation and turning it into sustainable genetic gain in pig breeding programs. The dataset we generate will also provide valuable information for annotating the pig genome, Sscrofa 11.1 (Warr et al. 2020), contributing to the efforts of the international Functional Annotation of Animal Genomes (FAANG) consortium to improve the annotation of the reference genomes of farmed animal species (Andersson et al. 2015;Giuffra and Tuggle 2019;Clark et al. 2020).

Animals
Tissue samples for this study were collected from Large White Â Landrace pigs that were euthanized, not specifically for this study but for other on-going projects on the effects of fetal size on pig development at The Roslin Institute. All gilts (n ¼ 13) were fed the same diet and kept under one controlled husbandry condition at the research farm. For each of the developmental stages, the number of litters included in the study was as follows: prenatal day 45 ¼ 1 litter, day 60 ¼ 2 litters, day 90 ¼ 5 litters, and postnatal 1 week ¼ 2 litters and 6 weeks ¼ 3 litters. Gilt identifier numbers for each litter are included in Table 1. Artificial insemination with semen from two unrelated sires was used to generate the fetuses for both pre-and postnatal time points. Fetal tissues were collected after the pregnant sow was euthanized with sodium pentobarbitone 20% w/v (Henry Schein Animal Health, Dumfries, UK) at a dose of 0.4 ml/kg by intravenous injection. Postnatal samples were collected after euthanasia by captive bolt. Tissues from the 1-week-old piglets were collected preweaning and from the 6-week-old piglets postweaning. Six-week-old piglets post weaning were fed a standard concentrate ration and kept under routine husbandry conditions.

Sample collection of frozen muscle tissue samples for ATAC-Seq and RNA-Seq
The tissue samples used for this study were from archived material (with the exception of the piglets that were 6 weeks of age) collected from the largest-, smallest-, and average-sized male piglets per litter at five different developmental stages (Table 1). The largest-, smallest-, and average-sized piglets from each litter were selected according to body weight at the time of sampling for the fetal time points and birth weight for the postnatal time points (Supplementary Table S1). Developmental stages were chosen, according to previous studies (Ashmore et al. 1973;Wigmore and Stickland 1983;Rudar et al. 2019) as follows: Day 45 of gestation-when primary muscle fibers form.
Day 60 of gestation-when secondary muscle fibers begin to form.
Day 90 of gestation-when fiber formation ceases after which subsequent muscle growth occurs through fiber hypertrophy.
One week of age-during active muscle hypertrophy.
Six weeks of age-once muscle hypertrophy has leveled off.
Due to limited sample availability, the experimental design is unbalanced (Table 1). Only one complete set of littermates (smallest, average, and largest) was included in the analysis for the day 45 (n ¼ 3) and 60 (n ¼ 3) time points. For the 6 weeks of age time point (n ¼ 5), one large sample was unavailable for inclusion in the analysis. At day 90 of gestation (n ¼ 11) three complete litters and one incomplete litter were included, while at 1 week of age samples were only available from the smallest and largest piglets from one litter (n ¼ 2). We have included a flow chart describing which samples were analyzed at each stage of this study ( Figure 1).
Samples were collected from the semitendinosus muscle from the hind leg of piglets from each developmental stage (Table 1). The only exception was day 45, when whole hind leg muscle tissue was collected, because it was not possible to differentiate specific muscle types at this early stage of development. Each sample was flash frozen in liquid nitrogen, as quickly as possible within an hour post-euthanasia and stored at À80 C for future analysis. From the 6-week-old piglets, additional samples were collected in sucrose buffer to isolate and cryopreserve nuclei according to the method described in (Halstead et al. 2020a). Cryopreserving isolated nuclei for a small number of samples would allow us to validate the data we generated from the flashfrozen material, which we were optimizing for muscle tissue. The protocol for collection of tissue samples is available via the FAANG Data Coordination Centre (FAANG-SOP1 2021).
Isolation of cryopreserved nuclei from fresh muscle tissue and preparation of tagmented nuclear DNA We used the protocol described in Halstead et al. (2020a) to isolate and cryopreserve intact nuclei from fresh muscle tissue samples from the 6-week-old piglets (Table 1). Briefly, each tissue sample was transferred to a GentleMACS C-tube (Mitenyi Biotec, Germany) with sucrose buffer and homogenized. The homogenate was then filtered and Dimethyl Sulfoxide (Sigma Aldrich, USA) added (10% final concentration), before freezing at À80 C overnight in a Mr Frosty (Nalgene, USA), and then transferring to a À80 C freezer for long-term storage. The full protocol for preparation of cryopreserved nuclei from fresh muscle tissue is available via the FAANG Data Coordination Centre (FAANG-SOP2 2021).
To prepare tagmented DNA the cryopreserved nuclei preparations were thawed slowly at room temperature by adding 500 ml of cold 1Â phosphate-buffered saline (PBS), filtered using a 70 mm corning cell strainer (Sigma Aldrich, USA) then the filtrate (containing the nuclei) centrifuged at 500g at 4 C in a swinging bucket centrifuge for 10 min. After centrifugation, the pellet was resuspended in 1 ml cold ATAC-Seq RSB Buffer þ 0.1% Tween 20 (Sigma Aldrich, USA) for lysis and centrifuged for 10 min at 500g at 4 C. The pellet of nuclei was then washed in PBS and resuspended in 50 ml transposition mix (25 ml TD buffer, 2.5ml TDE1 enzyme, Molecular Biology Grade Sterile H 2 O) from the Nextera DNA Sample Prep Kit (Ilumina, USA). The pellet was incubated with the transposition mix for 60 min at 37 C at 300 rpm on a thermomixer. The pellet of transposed nuclear DNA was purified with a MinElute PCR purification kit (Qiagen, Germany), eluted in 15 ml of Buffer EB and stored at À20 C. The full protocol is available via the FAANG Data Coordination Centre (FAANG-SOP3 2021).
Isolation of nuclei from frozen muscle tissue and preparation of tagmented nuclear DNA ATAC-Seq libraries were prepared using a version of the Omni-ATAC-Seq protocol (Corces et al. 2017). Some optimization of the protocol for flash-frozen pig muscle tissue samples was required for this study. The main modification that we introduced to the protocol was an initial dissociation step using a GentleMACS Dissociator (Mitenyi Biotec, Germany), essentially combining the Omni-ATAC-Seq protocol with the initial steps from Halstead et al. (2020a). The protocol is described in full in FAANG-SOP4 (2021) and summarized here. The components of each of the buffers are included in Supplementary Table S2. Each flash-frozen tissue sample ($200 mg per sample) was chopped into small pieces over dry ice and then dissociated in a GentleMACS C-tube (Mitenyi Biotec, Germany) in 1 ml of 1Â HB buffer (þProtease Inhibitor Cocktail). The samples were dissociated using program m_muscle_0.1_0.1 (equivalent to "E0.1c Tube") twice on a GentleMACS Dissociator (Mitenyi Biotec, Germany). Immediately after dissociation, the samples were filtered through a 70-lm corning cell strainer (Sigma Aldrich, USA) then the filtrate centrifuged at 3000g for 5 min. The pellet was resuspended in 400 ll 1ÂHB buffer and transferred to a 2 ml Eppendorf Protein Lo-Bind tube (Eppendorf, UK). Four-hundred microliter of 50% Iodixanol solution (Opti-Prep Density Gradient Medium; SLS, UK) was added to the 400 ll of cell solution (final 25% Iodixanol). An Iodixanol gradient was then created and samples transferred to a swinging bucket centrifuge and spun for 25 min at maximum speed at 4 C with no brake. A thin "whitish" band appeared between layers two and three of the gradient. Evaluation and counting of nuclei was performed by staining with Trypan Blue (Thermo Fisher Scientific, USA). One milliliter of ATAC-RSB Buffer þ 0.1% Tween 20 was then added to lyse the nuclei and the sample centrifuged for 10 min at 500g at 4 C. The pellet was then gently resuspended in 50 ml transposition mix for tagmentation as described for cryopreserved nuclei samples above.

ATAC-Seq library preparation
The library preparation protocol, adapted from Corces et al. (2017), was used for the flash-frozen tissues and the cryopreserved nuclei preparations. The protocol described in full is available via the FAANG Data Coordination Centre (FAANG-SOP5 2021). A PCR mix was set up comprising 10 ll molecular biology grade H 2 O, 2.5 ml Ad1 primer 25 mM, 2.5 ml Ad2.x primer 25 mM (variable index see Supplementary Table S3), and 25 ml 2Â NEBNext Hi-Fi PCR mix (NEB, USA) per reaction. Ten microliters of transposed DNA was added to each reaction and five amplification cycles of the following PCR performed: 72 C for 5 min, 98 C for 30 s, 98 C for 10 s, 63 C for 30 s, 72 C for 1 min. The GreenLeaf Quantitative PCR Protocol (Buenrostro et al. 2015) was used to determine the number of additional PCR amplification cycles that were required for each sample, to stop amplification prior to saturation and avoid variation across samples caused by PCR bias. Samples for which more than 5-7 additional cycles were required were discarded due to the high probability of PCR bias caused by additional cycles. Amplified ATAC-Seq libraries were then purified with a MinElute PCR purification kit (Qiagen, Germany). Library quality was checked on the Agilent 2200 TapeStation System (Agilent Genomics, USA). Libraries were assessed for quality according to an even distribution of fragments and a clearly differentiated sub-nucleosomal fragment as described in Halstead et al. (2020a). If library quality was sufficient the sub-nucleosomal fragment (150-250 bp) was size selected, to minimize the signal to noise ratio, as suggested in Halstead et al. (2020a). Size selection was performed using a Thermo Scientific E-Gel System (Thermo Fisher Scientific, USA).
To check the size of the selected fragment, an aliquot was run on the Agilent 2200 TapeStation System (Agilent Genomics, USA). After size selection, the libraries were pooled and stored at À20 C prior to sequencing.
The following parameters were measured as recommended by the ENCODE project for the validation of ATAC-Seq libraries (Davie et al. 2015(Davie et al. , 2018 ATAC-Seq peak calling using Genrich ATAC-Seq peak calling was performed using Genrich v0.5 (Gaspar 2020) under the ATAC-Seq mode while excluding PCR duplicates and mitochondrial reads (used flags: -r -j -e MT). Two rounds of peak calling were performed as follows: (1) peak calling on individual samples (n ¼ 24) and (2) aggregated multi-sample peak calling for each piglet size. All the scripts used for this analysis are found in Supplementary File S1 and the code repository (Salavati 2021).

Differential peak analysis based on read counts
A consensus set of ATAC-Seq peaks was created for the purpose of differential peak analysis. The consensus set, from individual sets that were called in all 24 samples, was created using bedtools v2.26.0 (bedtools merge -i all_samples.bed -d10 -c 4,7,10,4 -o count_distinct,mean,mode,distinct). Peaks that were within 10 nucleotides of another peak were merged into one peak, and a support value (i.e., the number of tissue samples in which the peak was present) was calculated for each peak. Peaks with a support value of <3 (i.e., they were present in <3 tissue samples) were removed, resulting in a total of 12,090 ATAC-Seq peaks, which will now be referred to as the "consensus set." A read fragment filtering and analysis workflow was devised similar to a recently published framework by Yan et al. (2020). Briefly, the mapped BAM files were filtered for high mapping quality, nonPCR duplicates and nonmitochondrial reads using samtools v1.6 (samtools view -h -f2 -q10 -F1548 -bS). A read count for each sample (using high-quality BAM files) was then generated using ht-seq v0.13.5 (Anders et al. 2015) against the consensus set of ATAC-Seq peaks (htseq-count -stranded¼no -type¼region). The library size for each BAM file was then used to normalize the read counts for multidimensional scaling analysis as described by Yan et al. (2020) using the following equation: DESeq2 v1.30.1 (Love et al. 2014) was used for differential peak analysis, of the raw read counts, to compare across developmental time points. A Likelihood Ratio Test (reduced model) was used for the analysis of the time points (design: $ size þ time; reduced: $ size). A multiple testing P-value correction was performed using the Benjamini-Hochberg (1995) method and a 10% false discovery rate (FDR) was considered as the threshold of significance.

Multidimensional scaling analysis for comparison of ATAC-Seq libraries
Nonlinear multidimensional scaling (NMDS) of the ATAC-Seq libraries was performed using the MASS::IsoMDS package (Venables and Ripley 2002) to ensure that there were no obvious outlying samples and that tissues of the same type clustered together in a biologically meaningful manner. A distance matrix (Manhattan distance) was produced using the consensus set of ATAC-Seq peaks and the normalized read counts as described in the previous section. The distance matrix was then processed for multidimensional scaling. For data validation purposes, NMDS was also used to compare the ATAC-Seq libraries prepared from either flash-frozen muscle tissue or cryopreserved nuclei from two 6-week-old piglets.

Transcription factor footprint analysis
The HMM-based Identification of Transcription factor footprints (HINT) pipeline from the Regulatory Genomics Toolbox (RGT; v0.12.3; Li et al. 2019) was used to compare transcription factor (TF) activity between developmental stages or piglet sizes. For a given comparison, the rgt-hint command in foot printing mode was used to identify TF footprints within peaks based on ATAC-Seq signal in each condition. When comparing consecutive developmental stages, the ATAC-Seq peaks identified for each stage were merged with Bedtools (v2.26.0), and footprints were identified within the merged set. When comparing different piglet sizes within a developmental time point, footprints were identified within the peak set for that time point (regardless of piglet size). ATAC-Seq signal for a given condition included aligned reads from all biological replicates (excluding libraries from cryopreserved nuclei), which were combined and filtered to remove duplicates using Samtools (v1.7). Footprints were matched to known motifs in JASPAR (Fornes et al. 2020) with rgt-motif analysis, and rgt-hint in differential mode was then used to compare the activity of each TF between two given conditions using biascorrected signal.

RNA isolation and quality control
The RNA isolation protocol is described in full in FAANG-SOP6

Poly-A-enriched library preparation and sequencing
Strand-specific paired-end reads with a fragment length of 100 bp for each sample were generated by Edinburgh Genomics, using the Illumina TruSeq mRNA library preparation protocol (poly-A selected; Illumina; Part: 15031047 Revision E). mRNA-Seq libraries were sequenced on an Illumina NovaSeq 6000 platform to generate >66 M paired-end reads per sample (min: 6.6eþ07, max: 1.21eþ08, mean: 9.17þe07).

RNA-Seq data analysis workflow
The raw sequence data were quality-controlled and trimmed using Trimmomatic (Bolger et al. 2014). The Kallisto aligner (Bray et al. 2016) was used for expression quantification of the RNA-Seq data. Briefly, a reference transcriptome fasta file of coding sequences was obtained from Sscrofa11.1 Ensembl v100 to build a Kallisto index file using default settings. The trimmed reads were then mapped for transcript-level expression quantification (de novo) in kallisto with-bias option activated. The output tabseparated value files were then imported to R using txImport package (Soneson et al. 2016) for further analysis and visualizations.
The transcript per million mapped (TPM) expression estimates for each sample were investigated using principal component analysis (PCA) in FactoMineR to identify any spurious samples that did not cluster as expected (Luo et al. 2009). Differential expression (DE) analysis was performed only on the three sizes of fetal piglet (small, average, and large) at day 90 of gestation. The Likelihood Ratio Test (LRT) model of DESeq2, including post hoc analysis, was used with small size as the reference level, i.e., denominator in log2 fold change (log2FC) [DESeqDataSetFromTxim port(txi ¼ dds, design ¼ $ Piglet size)]. After multiple correction of P-values using the BH method (Benjamini and Hochberg 1995), an FDR of 10% was considered as the significance threshold.

Overlay of differentially expressed genes and ATAC-Seq peaks
For the day 90 samples only, we reanalyzed the ATAC-Seq peaks in the three sizes of fetal piglet per litter (large, average, and small). Peak calling was performed using the same Genrich flags as previously described (aggregated multi-sample method), and we separated peaks shared between all size classes and size class-specific peaks with bedtools v2.26.0 (Quinlan and Hall 2010). The size-specific peaks generated for the fetal piglets at day 90 of gestation and the scripts used to produce them are also found in Supplementary File S1 and the code repository (Salavati 2021).
An overlay of genes that were differentially expressed between the large-and small-sized fetal piglets at day 90 was performed using ATAC-Seq peaks within 10-kb vicinity of the differentially expressed genes (either upstream or downstream). This overlay would show us which of the differentially expressed genes had an ATAC-Seq peak in their vicinity and whether that peak was present in both large-and small-sized fetal piglets, or only in one of the two sizes. The distance from the start of the gene model to the start of the ATAC-Seq peak was used as a coordinate system (i.e., positive values meant that the peak was either within the gene or within the 5 0 10-kb upstream region of the gene, and negative values corresponded to 10 kb from the 3 0 end of the gene).

Statistical analysis software and packages
All data analysis for this study was performed via bash scripting and use of R (R Core Team 2017) on the University of Edinburgh research computing facility (Edinburgh 2020). The data analysis protocol for ATAC-Seq and RNA-Seq is available in FAANG-SOP7 (2021) and FAANG-SOP8 (2021).

Results
ATAC-Seq data from frozen pig muscle tissues ATAC-Seq libraries from four different batches (24 samples in total) were multiplexed and sequenced to achieve 2.02eþ08 average reads per sample (min: 9.8eþ07, max: 3.5eþ08, median: 1.97eþ08). Reads were evenly distributed between barcodes across the first three batches. The fourth batch, which included only two samples that had a higher concentration of starting DNA, resulting in more reads per sequencing run compared to the other 22 samples (details in Supplementary File S2). Average chromosomal coverage across autosomes was 7.2Â, 4Â for X, 3.2Â for Y, and 4.49eþ04 for the mitochondrial chromosome (Supplementary Figure S1). Visual comparison of the ATAC-Seq and the RNA-Seq reads, mapped to the Sscrofa11.1 genome, was used to check for consistency between the two datasets. For example, Figure 2 shows the ATAC-Seq and RNA-Seq data as parallel tracks for a housekeeping gene (GAPDH) (Figure 2A), and a gene related to muscle development (CASQ1) ( Figure 2B). Read coverage of all 24 ATAC-Seq samples and the consensus peak called for GAPDH has also been visualized in Supplementary Figure S2.
We also measured the fragment size distribution of the 24 ATAC-Seq samples, which showed that read density was highest at the putative nucleosome-free region (approximately 50 bp insert size) followed by the mononucleosome region ($150-200 bp) and, then, the dinucleosome ($300-400 bp) region as shown in Figure 2C.

Multidimensional scaling analysis of the ATAC-Seq libraries from frozen tissue samples
NMDS was used to ensure that the ATAC-Seq dataset was biologically meaningful, reproducible and there were no outlying samples (i.e., samples from the same developmental stage should have a similar peak distribution and cluster together). NMDS was performed using the consensus set of peaks and the normalized ATAC-Seq read counts. The input matrix, which was converted to a Manhattan distance matrix prior to analysis, consisted of 12,090 consensus peaks and 24 samples. The sample separation on the first two components of the NMDS was driven by the developmental time points. The second and third components of the plot showed the greatest segregation between the developmental time points as shown in Figure 3A (NMDS components 1 and 2) and Figure 3B (NMDS components 2 and 3). The cryopreserved nuclei samples are included for comparison with the ATAC-Seq libraries prepared from flashfrozen muscle tissue. As an additional validation of the dataset, we expected these samples to cluster by developmental stage rather than nuclei isolation method, which was the case (Figure 3, A and B). A correlation heatmap of the 24 ATAC-Seq samples also indicated that the samples for each time point clustered closely together ( Figure 3C). The heatmap input  matrix was based on the deepTools v3.5.1 multiBamSummary binned (100 bp) read coverage.

Multidimensional scaling analysis of ATAC-Seq libraries prepared from either flash-frozen muscle tissue or cryopreserved nuclei
To validate the results from the Omni-ATAC-Seq protocol NMDS was also used to compare ATAC-Seq libraries prepared from either flash-frozen muscle tissue or cryopreserved nuclei from two 6-week-old piglets. The two libraries prepared for the cryopreserved nuclei samples clustered closely with the libraries prepared for flash-frozen tissue indicating that there was little difference in the data generated by the two protocols (Figure 3). Other metrics, including the percentage of ATAC-Seq peaks within promoter, proximal, distal regions, or within a gene model were also used to compare the libraries prepared from cryopreserved nuclei and flash-frozen tissue and showed little differences between the libraries (Supplementary Figure S3). There were no statistically significant differences detected between the two protocols for any of the ATAC-Seq quality control metrics chosen (ANOVA; P > 0.05) except the TSS enrichment score (TSS-ES; ttest P ¼ 0.045; cryo nuclei vs frozen tissue mean TSS-ES 8.5 vs 5.97; Supplementary Table S6 and Supplementary Figure S4).

Distribution of ATAC-Seq peaks within genomic features
The feature distribution of the ATAC-Seq peaks in all 24 samples is shown in Figure 4A. On average >6500 peaks (including overlapping regions) were called in each sample (min: 2.34eþ03, max: 1.42eþ04, median: 6.38eþ03, mean 6 SD: 6.72eþ03 6 3.63eþ03). More than 52% of the peaks were located in promoter regions in the majority of samples (19/24). There was a slight negative trend between increase in library size (depth of sequencing) and in the number of the peaks called (linear regression: slope ¼ À6eÀ05 and R 2 ¼ 0.22). Detailed metrics are found in Supplementary File S2. In five samples [day 90: large (n ¼ 3), day 90: average (n ¼ 1), and day 90: small (n ¼ 1)] most peaks were in distal intergenic regions ( Figure 4A). We could not find any batch effect, in nuclei extraction or library preparation that might account for this and as such concluded that this variation was related to the samples themselves. There was also little observable difference in how the ATAC-Seq peaks were distributed within the genome of the libraries from cryopreserved nuclei relative to the libraries from flash-frozen tissue ( Figure 4A). The breakdown of genomic feature categories in which peaks were located is presented in Table 2.

Proximity of ATAC-Seq peaks to transcription starts sites
The transcription start site enrichment score (TSS-ES) was used to validate the presence of the ATAC-Seq signal flanking transcription starts site (TSS) ( Figure 4B). ATAC-Seq libraries had a TSS-ES of 8.03 6 2.14 (mean 6 SD) on average (min 5.21; max 12.8). As noted above, the ATAC-Seq TSS-ES was higher for libraries prepared from cryopreserved nuclei relative to flash-frozen tissue. All libraries showed a uniform distribution of the TSS scores flanking TSSs as shown in Figure 4B.
Differential peak analysis of ATAC-Seq read counts using a consensus set of peaks Differential peak analysis revealed 377 ATAC-Seq peaks from the consensus peak set in which the read count differed significantly between the developmental time points. These peaks were annotated using Sscrofa11.1 corresponding to 724 unique transcripts (245 unique genes). One hundred and nine peaks were in unannotated intergenic regions. Nearly half of the peaks exhibiting differential read counts, between developmental time points, were in intronic regions and only 11.1% resided in promoters as shown in Table 3.
The read counts for the peaks that differed significantly between time points are shown in Figure 5 as normalized fragment counts. A detailed list of these peaks is included in Supplementary File S3. To test whether the fragment count distribution between the piglet sizes was different (Figure 5), we used ANOVA and found no significant differences between any of the pairwise comparisons (ANOVA þ Tukey HSD, P-value >0.05).

TF activity footprinting of the ATAC-Seq peaks (time and piglet size)
TF footprinting analysis across the developmental time points did not show any significantly different HINT scores ( Figure 6A). In the comparison between large and small piglet size at day 90 samples, five differentially active TFs (GMEB2, TFAP2C(var.2), HOXD12, FOXH1, and CEBPE) were detected using JASPAR2020 database annotation. The TF CEBPE CCAAT-Enhancer-Binding Protein-Beta showed the most extreme HINT z score (HINT z score À14.35; Figure 6B). CEBPE is known to be upregulated after muscle injury and be highly associated with muscle strength in human and mouse models (Harries et al. 2012).However, the lack of visual evidence of a TF footprint in either small or large piglets ( Figure 6B) indicates that the extreme HINT z score might be the result of a technical artifact. In comparison, GMEB2, a glucocorticoid receptor expression regulator (Kaul et al. 2000), was the only TF with significantly higher enrichment in the small size piglets (HINT z score 4.29) in comparison to the large piglets and showed visual evidence of a TF footprint ( Figure 6C). Details of the TF footprinting are shown in Table 4.

Analysis of gene expression using RNA-Seq
We generated RNA-Seq data from the same muscle tissue samples that were used to generate the ATAC-Seq libraries, to link regions of open chromatin with gene expression. The transcript expression estimates for the muscle tissue samples from the five developmental time points (26 samples in total) were calculated as TPM reads using Kallisto. The TPM expression estimates were then investigated using PCA (Figure 7) to identify any samples that did not group as expected according to the developmental time point. The samples from each developmental time point clustered together as expected in the first two dimensions of the PCA (Figure 7).
Analysis of genes that were differentially expressed between the three sizes of fetal piglet at day 90 of gestation Differential gene expression analysis was performed using the TPM values for the three sizes of fetal piglet (small, average, and large) at day 90 of gestation. Between the three sizes of fetal piglet 89 genes (FDR 10%) were found to be differentially expressed. When average-vs small-sized fetal piglets were compared, 58 upregulated and 31 downregulated genes were detected. When large-vs small-sized fetal piglets were compared, 54 upregulated and 35 downregulated genes were detected. Differentially expressed genes with an adjusted P-value (FDR <0.1) and log2FC ! 0.1 are annotated in Figure 8. The comparison between largeand small-sized fetal piglets resulted in the largest number (n ¼ 89) of differentially expressed genes. The list of differentially expressed genes and detailed analysis metrics are found in Supplementary Table S5. Many of the genes that were differentially expressed between large and small, and average and small, fetal piglets are involved in skeletal muscle function and growth (Figure 8). The gene calsequestrin 1 (CASQ1), for example, which was 1.54-fold upregulated (log2FC 0.63 6 0.17 adjusted P-value ¼ 2.0eÀ02) in large-sized relative to small-sized fetal piglets is the skeletal muscle-specific member of the calsequestrin protein family and is highly expressed in skeletal muscle in adult pigs, see (http://biogps.org/pigatlas/; Freeman et al. 2012;Summers et al. 2020;BioGPS 2021). MYBPC2, a gene that encodes myosinbinding protein C, was also two-fold upregulated (log2FC 1.02 6 0.22 adjusted P-value ¼ 3.78eÀ04) in large-sized relative to small-sized fetal piglets (Figure 8). It has also been shown to be highly expressed in the muscle of pigs (see http://biogps.org/pigat las/; Freeman et al. 2012;Summers et al. 2020;BioGPS 2021). The muscle-specific TF myogenin (MYOG) was downregulated (log2FC 0.28 6 0.09 adjusted P-value ¼ 9.0eÀ02), in small-sized relative to large-sized fetal piglets (Figure 8).
Overlay of the RNA-Seq differentially expressed genes and ATAC-Seq peaks from large vs small fetal piglets at day 90 of gestation A further overlay of the ATAC-Seq and RNA-Seq datasets was performed for the day 90 large-and small-sized fetal piglets. ATAC-Seq peaks annotated using the Sscrofa11.1 Ensembl gene track information (black) and differentially expressed genes between the large-vs small-sized fetal piglets at day 90 (green) are shown in Figure 9. This analysis allowed us to determine which of the differentially expressed genes had an ATAC-Seq peak that was specific to either large-or small-sized piglets in its vicinity. The distribution of ATAC-Seq peaks around TSSs (within a 3-kb distance) was plotted for peaks specific to the large-sized fetal piglets ( Figure 9A), or specific to the small-sized fetal piglets ( Figure 9B). Size-specific peaks within the 5 0 UTR of four differentially expressed genes, MYOG, ryanodine receptor 2 (RYR2), transmembrane 4L six family member 4 (TM4SF4), and interleukin 21 receptor (IL21R; Figure 8), were only observed in the small fetal piglets ( Figure 9B). There was no evidence of size-specific peaks near these genes in the large-sized fetal piglets ( Figure 9B). Of the four genes, MYOG is known to be highly expressed in skeletal muscle tissue (see http://biogps.org/pigatlas/; Freeman et al. 2012;Summers et al. 2020;BioGPS 2021). In some cases, a size-specific ATAC-Seq peak was located within the 5 0 UTR of a gene that was involved in muscle growth and downregulated in small relative to large piglets. MYOG, for example, was downregulated in smallsized fetal piglets (Figure 8), with a regulatory region 315 bp in size 8769 bp upstream of the TSS, that was present in the smallsized piglets ( Figure 9A) but absent in the large-sized fetal piglets ( Figure 9B).

Discussion
In this study, we used ATAC-Seq and RNA-Seq to improve our understanding of gene expression and regulation in developing pig muscle. We generated open chromatin profiles, in the form of ATAC-Seq peaks, for semitendinosus muscle from piglets from five developmental stages and compared these with gene expression profiles from the same tissue samples.
Of the 4661 ATAC-Seq peaks representing regions of open chromatin, that were identified in this study, >50% were within 1 kb of known transcription start sites. This result is consistent with studies across different species (Foissac et al. 2019;Yue et al. 2021). A study of longissimus dorsi muscle from pig embryos at days 45, 70, and 100 conducted by Yue et al. (2021) showed that 30%, 21%, and 14% of the peaks were identified in promoter regions, respectively. Of these peaks, 91% mapped to within À1 kb and þ100 bp of the TSS (Yue et al. 2021). A cross-species analysis of ATAC-Seq data showed that in mice, goats, cattle, pigs, and chicken, 10-15% of ATAC-Seq peaks were located within up to 5 kb of the TSS and were therefore considered as promoters (Foissac et al. 2019). The results from our study showed that the majority of the ATAC-Seq peak frequency was located within 61 kb of the TSS, with the remaining peaks located primarily within distal intergenic regions, particularly at day 90 of gestation. The distribution of ATAC-Seq peaks in intergenic regions at day 90 in large-and small-sized piglets indicated that piglets of different sizes show changes in genome regulation primarily at intergenic sites, which could be indicative of differential enhancer activity. Day 90 is a critical stage of muscle development when fiber formation ceases and muscle growth accelerates through fiber hypertrophy (Oksbjerg et al. 2004). Significant upregulation of genes involved in muscle growth also occurs at day 90 (Zhao et al. 2015;Ayuso et al. 2016;Yue et al. 2021). As such chromatin may be more open at this developmental stage to allow TF binding prior to the rapid muscle growth that occurs during the early postnatal period (Rudar et al. 2019).
Developmental stage-specific patterns in chromatin accessibility were observed in this study. Differential read count analysis revealed 377 ATAC-Seq defined genomic regions where chromatin accessibility differed significantly across developmental time points. Other similar studies (e.g. Yue et al. 2021) (Halstead et al. 2020c). ATAC-Seq datasets for postimplantation embryos in humans and other mammalian species are limited. ChIP-Seq analyses of a wide variety of histone markers in the brain, heart, and liver of early human embryos identified developmental stage-specific patterns in the epigenome (Yan et al. 2016). The sample size in our study was small,  with only a few biological replicates for most developmental time points, except for day 90. Even so our results are in agreement with other recent studies (e.g., Yue et al. 2021) and indicate that chromatin accessibility and regulation of gene expression change throughout development in pig semitendinosus muscle. Analyzing tissue samples from several prenatal time points can help to determine when during early development tissuespecific gene regulation might have an effect on phenotype. For example, in this study, TF footprint analysis showed that the TF GMEB2, which increases sensitivity to glucocorticoids (Kaul et al. 2000), had significantly higher TF activity in small-sized relative to large-sized piglets at day 90 of gestation. This finding is potentially phenotypically relevant because low birth weight piglets have been shown to have higher in utero-cortisol levels than their normal birth weight litter mates (Roelofs et al. 2019). Given the small sample size of this study, this result should be treated with some caution, but it does indicate that further investigation of day 90 of gestation with a larger and more balanced experiment would be warranted.
In our study, we also compared gene expression and chromatin openness between fetal piglets of different sizes (small, average, and large) at day 90 of gestation. Other studies have used a similar approach to investigate the effect of histone modification on the expression of genes involved in placental development in pigs  and chromatin accessibility in prenatal muscle development (Yue et al. 2021). Differences in open chromatin were reflected in the expression of genes involved in muscle growth. Analysis of the RNA-Seq data revealed that genes associated with muscle growth, including CASQ1, MYBPC2, and MYOG, were differentially expressed in large relative to small piglets. Differential expression of myogenic genes (e.g., MYOG) in pig muscle has been previously reported by Felicioni et al. (2020) who compared intrauterine growth-restricted piglets and normal weight piglets. CASQ1 encodes the skeletal muscle-specific member of the calsequestrin protein family, is related to muscle metabolism, and has been shown to be highly expressed in fat pig breeds (Zhao et al. 2011). In this study, CASQ1 was upregulated in large-sized relative to small-sized fetal piglets. MYBPC2 encodes the fast isoform of the myosin-binding protein C family (Weber et al. 1993). In Piedmontese (GDF8 mutant) cattle, MYBPC2 is highly expressed in fetal muscle, reflecting fast glycolytic fiber structural differentiation (Lehnert et al. 2007). In this study, MYBPC2 was highly upregulated in large vs small sized fetal piglets, potentially reflecting a greater proportion of fast glycolytic muscle fibers.
MYOG is essential for myoblast fusion during muscle development (UniProt 2021) and associated with QTLs, for body weight at birth (AnimalQTLdb 2021a) and backfat thickness (AnimalQTLdb 2021b), according to Genome Wide Association Studies (Xue and  Zhou 2006). MYOG was downregulated in small-sized fetal piglets relative to large-sized fetal piglets. Other studies measuring gene expression also found that MYOG was downregulated in muscle cell types from low birth weight piglets relative to their "normal"sized litter mates (Felicioni et al. 2020;Stange et al. 2020) and in pigs with high levels of intramuscular fat relative to those with average levels (Lim et al. 2017). When we compared the ATAC-Seq and RNA-Seq data, we identified an ATAC-Seq peak within the 5 0 UTR of MYOG (315 bp in size, 8769 bp upstream of the TSS) that was present in the small-sized fetal piglets but missing from the large-sized fetal piglets for day 90 of gestation. This result was surprising because MYOG was downregulated in the small-sized relative to the large-sized fetal piglets and as such we would have expected the chromatin to be less accessible and no peak to be present in the small-sized piglets. In future work, we plan to remove this peak using CRISPR genome editing and measure the effect on muscle progenitor cells in culture. The datasets we have generated for this study provide important functional annotation information, providing novel annotation tracks for regions of open chromatin in the pig reference genome (Sscrofa 11.1). We have made the datasets available in the public archives and via the FAANG Data Portal for this purpose. The datasets also provide a foundation for incorporating functional information in statistical analyses, to increase the precision and power with which we can fine map high-quality causal variants in pigs. This would make it possible to increase the accuracy of genomic selection and the efficiency with which breeding turns genetic variation into genetic gain. Further investigation of genetic variants within the regulatory regions identified in this study would help to validate whether they might be driving muscle and growth phenotypes. Recently, a functional regulatory variant was identified in MYH3 that influences muscle fiber type composition and intramuscular fat content in pigs (Cho et al. 2019). The next stage of the study is to leverage the ATAC-Seq and RNA-Seq data with a very large dataset of genetic variants from production pigs to determine whether any trait-linked variants are located within the open chromatin regions we have identified for muscle tissue. The characterization of regulatory and expressed regions of the genome in muscle tissues also  Proximity of ATAC-Seq peaks specific to large (A) and small (B) piglets and differentially expressed genes. Differentially expressed genes are marked in green. The x-axis (start_dist) is the distance from the start of the gene model to the start of the ATAC-Seq peak, for þve values the peak is either within the gene or within 10 kb of the 3 0 end, and for Àve values the peak is within 10 kb of the 5 0 end of the gene. The y-axis indicates the width of the peak. As the y-axis represents the width of the peak, the larger the node the wider the ATAC-Seq peak. provides a basis for genome editing to promote functional genomic variants in pig breeding programs (Jenko et al. 2015;Hickey et al. 2016;Johnsson et al. 2019), providing a route to application for FAANG data.

Conclusions
The dataset that we have generated provides a powerful foundation to investigate how the genome is regulated in production pigs and contributes valuable functional annotation information, for global FAANG efforts, and to define and predict the effects of genetic variants in pig breeding programs. The outcomes of the study will: (1) help us to understand the molecular drivers of muscle growth in pigs; (2) provide a foundation for functionally validating target genomic regions in in vitro systems; and (3) identify high-quality causative variants for muscle and growth traits with the goal of harnessing genetic variation and turning it into sustainable genetic gain in pig breeding programs.
Supplementary material is available at G3 online.