GWAS for Meat and Carcass Traits Using Imputed Sequence Level Genotypes in Pooled F2-Designs in Pigs

In order to gain insight into the genetic architecture of economically important traits in pigs and to derive suitable genetic markers to improve these traits in breeding programs, many studies have been conducted to map quantitative trait loci. Shortcomings of these studies were low mapping resolution, large confidence intervals for quantitative trait loci-positions and large linkage disequilibrium blocks. Here, we overcome these shortcomings by pooling four large F2 designs to produce smaller linkage disequilibrium blocks and by resequencing the founder generation at high coverage and the F1 generation at low coverage for subsequent imputation of the F2 generation to whole genome sequencing marker density. This lead to the discovery of more than 32 million variants, 8 million of which have not been previously reported. The pooling of the four F2 designs enabled us to perform a joint genome-wide association study, which lead to the identification of numerous significantly associated variant clusters on chromosomes 1, 2, 4, 7, 17 and 18 for the growth and carcass traits average daily gain, back fat thickness, meat fat ratio, and carcass length. We could not only confirm previously reported, but also discovered new quantitative trait loci. As a result, several new candidate genes are discussed, among them BMP2 (bone morphogenetic protein 2), which we recently discovered in a related study. Variant effect prediction revealed that 15 high impact variants for the traits back fat thickness, meat fat ratio and carcass length were among the statistically significantly associated variants.

generations and still exhibits a large genetic variation for these traits (Wellmann et al. 2013). More recent QTL-mapping experiments utilized genome-wide association studies (GWAS), which in contrast to linkage analyses, exploit historical meiosis and rely on linkage disequilibrium (LD) requiring high marker densities. The precision of GWAS is then a function of LD block lengths and the number of individuals analyzed, which in turn limits the usefulness of its application in F2 designs (Hayes and Goddard 2001). However, enormous efforts have been made in the establishment of these mapping populations, usually including extensive phenotyping far beyond what would be available in field populations. It would thus be desirable to revisit these resources using current genotyping and sequencing technologies, which would require an increase in the number of individuals and a decrease in the LD block lengths. In a recent simulation study, it was shown to be possible by pooling F2 designs, particularly when founder breeds are closely related and QTL are segregating in one founder breed . This approach has already been successfully applied based on medium density SNP chip data Stratz et al. 2018).
With the aim to overcome the aforementioned limits in mapping resolution and to fully exploit the potential of the resource populations, we pooled four well-characterized F2 designs (Table 1), three of them having the founder breed Piètrain in common. Twenty four founder animals were genotyped by high coverage whole genome sequencing (WGS) and 91 of the F1 animals were sequenced at a low coverage for subsequent imputation to a high coverage WGS level. A total of 2,657 F2 animals that were genotyped with the 62K Illumina PorcineSNP60 BeadChip (Ramos et al. 2009) were imputed to WGS levels with pedigree information and analyzed in a joint GWAS (see workflow in Figure 1). As a proof of concept four relevant production traits were analyzed: Average daily gain (ADG), back fat thickness (BFT), meat to fat ratio (MFR), and carcass length (CRCL).

Description of resource populations and phenotypes
Four well characterized experimental populations were pooled for this study. Detailed descriptions of the resource populations were done by Borchers et al. (Borchers et al. 2000) and Rückert et al. (Rückert and Bennewitz 2010), hence they will only be described briefly. The largest population was obtained from five purebred Piétrain boars and one Large White and six crossbred sows Landrace x Large White. The other three populations stemmed from a Meishan boar or Wild boar crossed with either Piétrain or Meishan females. The Wild boar and three Piétrain females were common founders in three of the crosses. The F2 generation was the result of repeatedly crossing F1 boars with F1 sows in order to obtain large full-sib families. From the crosses, a total number of 2772 animals were chosen (24 F0-generation pigs, 91 F1generation pigs, 2657 F2-generation pigs) and blood samples were used to extract genomic DNA for genotyping purposes (Table 1). The F0 and F1 animals selected for sequencing were generally chosen according to the number of F2 individuals, i.e., we prioritized individuals from which large families were derived. Four phenotypic traits were considered: ADG, BFT, MFR, and CRCL. The phenotypes were pre-corrected for systematic effects (e.g., stable, slaughter month) and for the effect of RYR1 gene (Fujii et al. 1991) using a general linear model. Trait definition, descriptive statistics and information about the pre-adjustment and the fixed effects used per cross can be found in .

Sequencing
A total number of twenty four founder animals were sequenced with an average 19x coverage at the sequencing facility University Hohenheim.
Out of 17 F1 families, 91 animals were sequenced with an average 0.9x coverage. All paired-end sequencing (read length 2 · 100 bp) was done on an Illumina HiScan SQ using TruSeq SBS v3 Kits. For the library construction, the DNA samples were fragmented on a Covaris S220 ultrasonicator. Parameters were adjusted to yield 350 bp inserts. Fragment length was measured with High Sensitivity DNA Chips on an Agilent Bioanalyzer. Sequencing adapters and indexes were ligated using Illumina's TruSeq DNA PCR-Free Library Prep Kits. Quantification of libraries was done by qPCR using KAPA Library Quant Kits. Flow cells were prepared using an Illumina cBot and TruSeq PE v3 Cluster kits. Raw sequencing data were demultiplexed and converted into FASTQ files using Illumina's CASAVA software.

Mapping and variant detection
Mapping and variant calling of the F0 generation was performed according to the GATK best practice pipeline using GATK v. 4.0 (McKenna et al. 2010) and genome assembly Sus scrofa 11.1 (GCA_ 000003025.6 provided by Swine Genome Sequencing Consortium on NCBI). Base quality score recalibration was performed with dbSNP build 150 as the knownSites dataset. Truth datasets used for Variant Quality Score Recalibration (VQSR) were as follows. SNPs: Illumina Infinium PorcineSNP60 v2 BeadChip and Affymetrix Axiom PorcineHD. INDELs: High confidence fraction (filter settings: QD 15.0, FS 200.0, ReadPosRankSum 20.0) of the PigVar database. Training dataset for SNP VQSR was also a high confidence fraction of the PigVar database (filter settings: QD 21.5, FS 60.0, MQ 40.0, MQRankSum 12.5, ReadPosRankSum 8.0, SOR 3.0) (Zhou et al. 2017). A truth sensitivity of 99.0 was chosen for SNPs and INDELs. The known dataset for SNP and INDEL VQSR was dbSNP build 150. Since SNPs were filtered with two truth datasets a Ti/Tv free recalibration according to the GATK best practice guidelines was applied to the data. Low coverage sequencing reads of F1 animals were processed according to the GATK best practice guidelines with the following deviations. SNP Calling was performed using GATK HaplotypeCaller v. 3.8 in joint mode with the settings minPruning 1 and minDanglingBranchLength 1 as well as BCFtools mpileup v 1.9 (Li et al. 2009), respectively. INDELs in the F1 variant call dataset were neglected due to low sequencing depth. An intersection variant call set between HaplotypeCaller, mpileup and the founder SNPs was created and stringently filtered with the following settings: QD 30.0, FS 60.0, MQ 40.0, QUAL 300.0.

Haplotype construction and imputation
To make use of the most recent phasing algorithms Beagle 5.0 was used for all phasing operations (Browning et al. 2018). Beagle 4.0 was applied for genotype imputation since it is the latest version that supports the n usage of pedigree information (Browning and Browning 2007). Haplotype phasing of the F0 generation variant call set was done using Beagle 5.0 and subsequent imputation with pedigree information of the F1 low coverage SNPs was achieved with Beagle 4.0. F0 and imputed F1 variants were merged with GATK CombineVariants and phased with Beagle 5.0. F2 generation 60k SNP chip data were imputed with Beagle 4.0 and pedigree information with merged and phased F0 and F1 WGS level variants as the reference panel. Local drops in imputation accuracy were determined by the construction of 24 F0 referencepanels with one animal left out. Genotype data acquired with the 60k SNP chip from each F0 individual was imputed with a reference-panel where the respective individual was missing utilizing Beagle 4.0. The 24 individual datasets were merged and together with the F0 reference dataset converted to additive coding with Plink 1.9 (Chang et al. 2015). Correlation (coefficient of determination, R 2 ) for each variant on QTL harboring chromosomes was calculated with an in house R script.
Genome wide association studies and cluster assignment Single-trait association analyses were performed with GCTA v. 1.92.4 beta 3 on the F2 population only (Yang et al. 2011). In order to perform a "leave one chromosome out" (LOCO) analysis, multiple genomic relationship matrices (GRMs) were created from the F2 60k SNP chip data by excluding each chromosome once with a minor allele frequency (MAF) cutoff of 1%. Mixed linear model association analyses (MLMAs) were performed with imputed F2 variants for each chromosome separately using the GRM where the respective chromosome was left out and a MAF cutoff of 1%. To account for the pooled population structure, covariates representing the different crosses (4 classes) were included in the MLMA. For further downstream analysis, significance threshold was established by applying Bonferroni correction (i.e., 0.05/number of independent tests). Manhattan plots were created with R, where variants with p-values . 0.001 were excluded due to software limitations. Clusters incorporating potential genomic regions of interest were defined using the Manhattan Harvester (MH) tool (Haller et al. 2019). MH provides quality assignment for each peak via a general quality score (GQS) which can be used as the main parameter for peak assessment. The GQS is generated based on a trained mixed-effects proportional odds model using 16 various parameters (e.g., maximal slope, height to width ratio) and human peak identification data. For this study, the variants with a p-value below 1.0x10 27 (option -inlimit) were included and further the clusters with a GQS . 3.5 (1 is min and 5 is max) were taken into account. Conditional association analyses were performed by including single highly associated variants as fixed effects in a LOCO analysis.

Variant effect prediction and gene enrichment analysis
To predict variant effects the Ensembl Variant Effect Predictor (VEP) release 94 was utilized, which is part of the Ensembl advanced programming interface (API) (McLaren et al. 2016). The vep command using the clusters' statistically significant variants was executed with the following settings: -merged -force_overwrite -variant_class -symbol -nearest gene. To provide further functional interpretation, the Database for Annotation, Visualization and Integrated Discovery (DAVID) (Huang da et al. 2009) was used for a systematic and integrative analysis. The gene list from the VEP output was the input for DAVID (Huang et al. 2007) and Sus scrofa genes were considered as the background. Gene Ontology (GO) terms (i.e., cellular component, molecular function, and biological process) from the functional annotation chart report which were significantly overrepresented with an EASE Score (i.e., a modified Fisher Exact P-Value) below 0.05 and with a gene count higher or equal to 5 were retained.
Statement on data and reagent availability Sequencing data, which were used to conduct this study will be made publicly available upon publication of the article in the NCBI Sequence Read Archive (SRA). Supplementary Tables have been uploaded Figure 1 Genotyping workflow. 24 Founder animals were sequenced with high coverage, variants were called with GATK 4.0 and phased with Beagle 5.0. 91 F1 animals were sequenced with low coverage and variants were called with GATK 3.8 and BCFtools mpileup. The F1 dataset was imputed using Beagle 4.0 and pedigree information with phased Founders as a reference-panel for haplotype structure. The imputed F1 was then merged with the F0 variant call data set and phased with Beagle 5.0. Finally the 2657 chip genotyped F2 individuals were imputed to WGS levels with Beagle 4.0 and pedigree information with the merged and phased Founder/F1-imputed dataset as the referencepanel.
to GSA. Supplementary Table 1 contains the coefficients of determination (R 2 ) for each variant on QTL harboring chromosomes where calculation was possible. Supplementary Table 2 containts the complete list of clusters identified in the GWAS with additional supporting information for cluster assignment. GRMs from F2 60k genotypes (File_S1.zip) were created by a "leave one chromsome out" approach using the program "Genome-wide Complex Trait Analysis (GCTA) version 1.91.4 beta3". GWAS was conducted with imputed sequence level F2 genotypes (Supplementary File_S2.zip) for each chromosome using the GRM where the respective chromosome was left out(GRM command: gcta64-bfile SG_F2_chip_wo_chrNO-autosome-maf 0.01-make-grm-out SG_F2_chip_wo_chrNO-thread-num 10-autosomenum 18 ; GWAS command: gcta64-mlma-covar Kiel_Hoh_cross.covarbfile F2_beagle4.0_ped_ChrNO-grm SG_F2_chip_wo_chrNO-pheno TRAIT.pheno-out TRAIT_chrNO-maf 0.01-thread-num 10; replace NO with the respective chromosome number and TRAIT with the respective trait to be analyzed) Phenotype files are located in Supple Table 2. The Ti/Tv of SNPs in the founder population was 2.39, whereas known SNPs had a Ti/Tv of 2.44 and novel SNPs had a Ti/Tv of 2.08. Due to the low sequencing coverage (average = 0.96 x, min = 0.58 x, max = 2.14 x) only autosomal SNPs were called in the F1 population. The raw output of Haplotypecaller consisted of 20,055,697 known and 3,529,441 novel SNPs and the raw output of mpileup contained 19,932,201 known and 3,291,758 novel SNPs. The intersection of the two datasets resulted in 19,264,662 known and 2,951,058 novel SNPs whereas removing all SNPs that were not present in the founder variant calling dataset lead to a final number of 19,224,132 known and 2,911,780 novel raw SNPs. After the application of a stringent filtering approach (see Material and Methods) 5,753,444 known and 741,155 novel SNPs remained in the variant calling dataset of the F1 population.

Identification of local drops in imputation accuracy
To detect local inaccuracies in the imputed data, we imputed chip data from each founder with the remaining 23 founders as a reference panel.
The data does not provide information about the imputation accuracy of the experiment since pedigree information could not be used. The coefficient of determination for each variant located on a chromosome harboring relevant QTL was determined where feasible. The average coefficients of determination for each chromosome analyzed are summarized in Table 3 (complete analysis results in Supplementary  Table 1).

GWAS results and clusters
From the genome-wide association study conducted in the pooled F2 population, the following number of variants exceeded the genome-wide significance threshold: 448, 17,105, 6635, and 27,641 for ADG, BFT, MFR, and CRCL, respectively. Manhattan plots of the GWAS for the four phenotypic traits are shown in Figure 2. A total of 120 clusters were designated by the MH tool (i.e., 4 for ADG, 33 for BFT, 22 for MFR and 61 for CRCL) and they were located on the following Sus Scrofa chromosomes (SSC): 1, 2, 4, 5, 7, 17, and 18. The complete cluster list with additional supporting information for cluster assignment can be found in Supplementary Table 2. From each of the defined clusters, the top 5 variants were retained. The genes incorporating or lying nearby these highly significant associations are presented in Table 4. The clusters associated with the traits overlapped on several chromosomes, specifically on SSC2, SSC4, and SSC7. The location and the extent of the overlapping clusters is depicted in Figure 3. Particular chromosomes had exclusive clusters assigned, e.g., SSC17 for CRCL and SSC18 for MFR. To evaluate all possible relations among the variants exceeding the significance threshold for each trait, a Venn diagram was used ( Figure 4). The highest number of common variants (i.e., 6,859) was between BFT and CRCL and the second highest was between BFT and MFR (i.e., 2,380). To get an estimate of systemic bias, quantile-quantile plots were generated for all p-values from each GWAS (Supplementary Figure 2). As a measure of association between observed and expected n p-values, lambda values were calculated for all four traits: l ADG = 1.282319, l BFT = 1.333425, l CRCL = 1.422044, and l MFR = 1.35587.

VEP and high impact variants
To predict functional consequences on genes the ensembl VEP tool was employed. Multiple transcripts per gene resulted in larger numbers of annotations that are reflected in the higher number of predicted effects as compared to the actual number of identified variants per trait. All inferred consequences for bonferroni corrected variants per trait and their percentage breakdown are summarized in Table 5. The large majority (over 70%) of the consequences were classified as intron variants. According to the severity of the variant consequence, intron variants are assigned to having a modifier impact, which means that predictions are difficult to be made or there is no solid evidence of impact. Variants inferred to have a disruptive impact on the protein, leading to protein truncation, loss of function or causing nonsensemediated decay were of further interest. These significant high impact variants (Table 6) were mostly located on SSC7, with the exception of SSC2:rs1110687780 (splice donor variant) affecting TCN1 for the trait MFR. For the BFT, the most severe consequences were located in the genes C6orf89, PI16, DST, and PRIM2, while for the CRCL disruptive impact variants were found in NEU1, four novel genes, ABCD4, DST, PRIM2, and LPCAT4. Notably, the same two splice donor variants affect the common genes for BFT and CRCL: DST and PRIM2. Sorting Intolerant From Tolerant (SIFT) scores were determined for all significant missense variants and are summarized in Supplementary Table 3 (Ng and Henikoff 2003).
Gene set analysis GO functional enrichment analysis revealed eleven significantly overrepresented GO terms including molecular functions (MF), biological processes (BP), and cellular components (CC). A list containing the GO terms and the associated list of genes is presented in Table 7. For BFT a GO-MF term was overrepresented and related to calcium ion binding (GO:0005509). Several olfactory receptor genes were prevalent for the GO terms assigned to MFR (e.g., GO-BP GO:0007186 G-protein coupled receptor signaling pathway, GO-MF GO:0005549 odorant binding). The gene set for the CRCL trait was associated with two BP terms (GO:0001666 response to hypoxia and GO:0008283 cell proliferation) n

Genotyping strategy
The genotyping strategy that we developed for this study is outlined in Figure 1. Briefly: 24 F0 pigs were subjected to high coverage Illumina short read sequencing and in addition 91 F1 animals were sequenced at low coverage and imputed to high coverage WGS levels in order to allow phasing. 2657 F2 animals were chip genotyped and imputed using a merged dataset of F0 and imputed F1 as reference-panel.
All imputation steps involved pedigree information. Opposed to a population-based strategy this approach does not rely on a large reference-panel but on the relatedness of individuals. In general, the genotyping strategy can be considered reliable since the majority of the QTL identified were already described for the four traits analyzed in this study (cross-reference with Pig QTL database (Hu et al. 2019)). Nevertheless, we expected to identify a variant that was associated with muscle mass and fat deposition in exon 2 of IGF2, which has been extensively described to influence muscle development (Nezer et al. 1999). The absence of IGF2 associated variants can be explained by a local drop in coefficients of determination from an average of R 2 = 0.22 to R 2 = 0.03 in the genomic region where IGF2 resides (SSC2 1469183 -1496417 bp, Figure 5). It must be pointed out that those coefficients of determination cannot be used to draw conclusions about the actual accuracy of the imputation. Since no pedigree information was included in the simulation, it can solely be used to identify local inaccuracies, which were most likely due to assembly errors in the reference genome. The genotyping approach presented in this study can be considered a reasonable strategy to radically increase the marker density of large F2 populations to WGS levels. By sequencing the founder individuals with high coverage and the F1 with low coverage, which are only a fracture of the number of F2 animals, the approach provides an affordable opportunity to improve the power and potential of otherwise obsolete datasets. Due to the relatedness of the animals deep sequencing of only a few animals is necessary, rendering it economically attractive.

Cluster identification and exploratory analysis
To fully exploit the potential of the four resource populations, the crosses were pooled and further used for conducting GWAS. The increased sample size together with the increased marker density ensures a high resolution that might allow the pinpointing of more specific causative genes and mutations. Further experiments, e.g., Sanger sequencing of promising regions could elaborate on that. Designing F2 populations implies that the LD-blocks are longer, a fact that is counteracted to some extent by jointly analyzing the four designs. Lambda values of 1.282319 to 1.422044 point to a moderate degree of p-value inflation in the GWAS, which is most likely caused by the usage of WGS data and a LOCO GWAS approach. However, to exploit the whole depth and power of the dataset we chose a LOCO analysis approach. To further comprehend the closely linked association signals from GWAS, the following approach was employed: i) clusters incorporating strong evidence for trait-associated chromosomal regions were defined, ii) the effect of the significant variants was predicted, and iii) a gene set n Table 4 Top associated genes for average daily gain (ADG), back fat thickness (BFT), meat to fat ratio (MFR), and carcass length (CRCL) identified in the GWAS. Genes incorporating or nearby the top 5 variants in the clusters are listed with chromosome and cluster numbers analysis was employed to identify sets of genes jointly associated with the traits of interest. The quantitative traits considered for this study have been investigated in the past and are mostly well represented in the Pig QTL database (Hu et al. 2019), except for MFR. The clusters assigned to each trait were compared with the QTL regions from the database. For MFR, additional fat-related traits (e.g., fat percentage in the carcass and fatcuts percentage) were considered in order to allow an adequate comparison given that the trait has few records in the database and the trait definition can be country dependent. Most of the clusters overlapped or were in the vicinity of the previously reported QTL. This was expected as the database has been recently updated and also includes our previous results  using SNP chip data and three out of the four pig populations which were taken into account here. Some of the earlier reported QTL in the database spread over large genomic regions (e.g., . 5Mb). It is assumed that many of these large QTL regions might in fact not be due to a single mutation, thus representing haplotype effects caused by several causative variants (Andersson 2009). In the current study, we were able to assign numerous clusters within these regions, which implies that a higher genomic resolution was achieved and that it may be possible to disentangle distinct quantitative trait nucleotides.
Conditional association analyses by including the top variant as a fixed effect in the MLMA were carried out in order to gather statistical evidence for putative causality (Cohen-Zinder et al. 2005) and was specifically applied to CRCL and BFT on SSC7. This chromosome exhibits the highest number of clusters (SM with Clusters) and the highest association signals. By including the top variant (rs81228492)  for BFT, only one well-supported peak was above the significance threshold (Supplementary Figure 1) meaning that there is additional genetic variation within this region. Similarly, for CRCL the two top variants (rs333021601 and rs319044994) representing the two different significant genomic regions were included alternatively in the model. After fixing the effect of the latter variant, the surrounding significant region disappeared, pointing to the possibility that there could be only one QTL responsible for CRCL on SSC7 around the 99 Mb region. An alternative or additional explanation could be the presence of long LD blocks, long-range LD and/or various epistatic interactions among the loci. The overlap among the BFT and CRCL significant variants (see Figure 3 and Figure 4) localized mostly in the genomic region 24-32 Mb indicate the existence of pleiotropic loci for the two traits. When conditioned on the top BFT variant (rs81228492) as a fixed effect for a MLMA on CRCL and the top CRCL variant (rs333021601) for MLMA on BFT, the initially associated clusters and those nearby dropped in the intensity of the association signals (Supplementary Figure 1), supporting the presence of pleiotropic loci. It is also noteworthy that CRCL might be influenced by the number of thoracolumbar vertebrae (Rohrer et al. 2015). Since the variant that has been associated with a higher number of vertebrae is a large Indel in intron 1 of the VRTN gene (Fan et al. 2013) we were n Table 5 Results of variant effect prediction for the production traits average daily gain (ADG), back fat thickness (BFT), meat to fat ratio (MFR), and carcass length (CRCL n Table 6 Statistically significant high impact variants that were discovered in the genome wide association studies for the production traits average daily gain (ADG), back fat thickness (BFT), meat to fat ratio (MFR), and carcass length (CRCL) not able to discover this variant since the genotyping pipeline applied in this study does only cover small INDELs. In order to gain insight into the possible genetic mechanisms that control the traits, an enrichment analysis of the gene function was performed with DAVID, prioritizing on the GO terms. The GO-MF calcium ion binding term found for BFT supports the relationship between the calcium ion, food intake and lipid metabolism previously described in the literature (Cui et al. 2017). Furthermore, one of the genes in this group is DST, a strong candidate gene for which high impact variants were found via VEP, which is discussed in detail below. A GO-BP term related to cell proliferation comprised the FAM83B gene, which is the gene incorporating the top variant found for CRCL. Interestingly, the majority of the genes included in the over-represented terms for MFR were olfactory receptors. This enrichment is a consequence of the MFR-identified clusters overlapping regions that are rich in various olfactory receptor genes. This particular gene family is known to have significant expansion throughout time within the Sus Scrofa genome (Nguyen et al. 2012).

Variant effect prediction
ADG: A QTL for ADG found on SSC7 comprises 115 statistically significant intron variants and 83 variants upstream (min. p-value 8.71 · 10 214 ) of the HMGCLL1 gene, which was shown by Comuzzie et al. to be associated with childhood obesity in the Hispanic population and to influence creatinine levels. Another QTL on SSC2 contains 112 intron variants in SHANK2 (min. p-value 1.33 · 10 212 ). SHANK2 was also shown to be associated with childhood obesity in the same study and to have an influence on estradiol blood concentrations (Comuzzie et al. 2012). A third QTL on SSC4 harbors 2 intron and 12 downstream variants (min p-value 1.06 · 10 213 ) affecting LYN, which encodes for the LYN proto-oncogene, which was also identified by Comuzzie et al. and correlated with the amount of fat mass in obese children (Comuzzie et al. 2012). Six additional variants in the QTL on SSC4 (min. p-value 2.44 · 10 212 ) lie in an intergenic region 13,463 -14,460 bp downstream of RPS20, a gene which in interplay with GNL1 is critical for cell growth (Krishnan et al. 2018). Another likely candidate SNP to influence ADG is an intron variant in the PLAG1 transcription factor (p-value 1.32 · 10 211 ), which is a regulator of IGF2 expression (Zatkova et al. 2004).
BFT: A QTL for BFT with a very prominent peak was detected on SSC7. The SNP with the lowest p-value (6.63 · 10 254 ) is an intron variant in gene C6orf106. C6orf106 is a target of the human miRNA has-miR-192, which has been identified to have regulatory functions in type 2 diabetes mellitus (Cui et al. 2016). The second top scoring SNP is an intron variant in the RIPOR2 gene (p-value 4.34 · 10 250 ). RIPOR2 expression and protein levels are upregulated during muscle cell differentiation in human fetal muscle cells (Yoon et al. 2007). Another gene containing top scoring variants on SSC7 is KIFC1 (7 intron variants, min p-value 3.12 · 10 247 ). Overexpression of KIFC1 promotes cell proliferation in non-small cell lung cancer (Liu et al. 2016). 21 intron and 8 downstream n Table 7 Most significant Gene Ontology (GO) terms from DAVID for the top associated genes that were identified in genome wide association studies for the the traits back fat thickness (BFT), meat to fat ratio (MFR), and carcass length (CRCL) variants in BMP5 (min. p-value 1.91 · 10 229 ), which induces cartilage and bone formation (Wozney et al. 1988), are also located in a cluster on SSC7. 6 variants downstream of the aforementioned RPS20 (min. p-value 1.90 · 10 215 ) were found in the cluster on SSC4.
MFR: GWAS for the MFR trait revealed a strong QTL on SSC2 with variant rs81327136 upstream of KRTAP5-5-like being the most significant (p-value 1.59 · 10 223 ). Of 72 variants 6 were located in KRTAP5-5-like introns and 66 in the vicinity of the gene. KRTAP5-5 was shown to control cytoskeletal function and cancer cell vascular invasion (Berens et al. 2017). Other variants found in clusters on SSC2 are located in or adjacenct to DEAF1 (8 intron variants, min p-value 3.47 · 10 229 ), which is a transcription factor that regulates proliferation of epithelial cells (Barker et al. 2008) and that forms a dominant-negative splice isoform in type 1 diabetes, which correlates with disease severity (Yip et al. 2015). Clusters on SSC2 also harbor variants associated with SHANK2 (1,714 intron variants, 3 59 UTR variants, min p-value 2.18 · 10 226 ) and CTTN (188 up-and downstream variants, min p-value 1.53 · 10 225 ). CTTN's protein product Cortactin binds to and is indirectly phosphorylated by obesity factor PTP1B (Stuible et al. 2008). A noteworthy intron variant is located in the vitamin D pathway gene DHCR7 (p-value 3.06 · 10 225 ), which has been associated with obesity traits in humans (Vimaleswaran et al. 2013). A total of 14 DHCR7 intron variants were above the significance threshold. A less prominent QTL on SSC4 harbors variants in or close to the aforementioned genes RPS20 (17 downstream variants, min p-value 1.51 · 10 225 ) and in SNTG1 (19 intron variants, min p-value 1.48 · 10 214 ), which has been associated with type 2 diabetes (Ban et al. 2010). A third, rather minor QTL on SSC18, contains 21 variants downstream of MDFIC (min p-value 1.97 · 10 215 ), a gene which has been linked to improved piglet birth weight (Zhang et al. 2014). 25 intron and 42 downstream variants were found for the PPP1R3A gene (min p-value 6.92 · 10 215 ), which in a whole exome sequencing study was found to be associated with type 2 diabetes in a Mayan population (Sánchez-Pozos et al. 2018).
CRCL: In the GWAS for CRCL 52 clusters were identified on SSC7.
Although not located in one of the clusters, the two lowest p-values (min p-value 5.40 · 10 249 ) were found in the intron and coding region (silent mutation) of FAM83B (or C6orf143) respectively. A total of 62 significant variants in FAM83B were discovered comprising of 60 intron variants, 1 silent mutation, and 1 missense mutation. Cipriano et al. demonstrated that overexpression or mutation of FAM83B leads to EGFR hyperactivation by direct interaction and consequent hyperactivation of the EGFR downstream effector phospholipase D1, which was previously associated with BMI in humans (Davenport et al. 2015). An intron variant in the RIPOR2 gene with a p-value of 5.08 · 10 247 is the same SNP, which was found in the GWAS for BFT. A total of 85 mostly intronic RIPOR2 variants were found for the CRCL trait. A second, less prominent QTL on SSC7 harbors 9 intron, 12 downstream and 317 upstream variants (min p-value 3.62 · 10 231 ), which have been assigned to the RSL24D1 gene. RSL24D1 has been identified as a potential target in familial hypercholesterolemia (Li et al. 2015). One of the clusters identified for CRCL on SSC17 contains 230 variants 122,416-126,520 bp downstream of BMP2 (min p-value 7.21 · 10 238 ), a bone formation inducing factor (Wang et al. 2013). In addition, 18 intron and 114 variants upstream of TMX4 were discovered. TMX4 was associated with feed conversion ratios in chickens (Shah et al. 2016).
High impact variants: Various high impact variants were discovered by variant effect prediction. A splice donor variant (rs80834233) in DST, the gene encoding Dystonin, is associated with BFT (p-value 1.98 · 10 219 ) and CRCL (p-value 1.25 · 10 219 ). Knockout of DST leads to intrinsic muscle weakness and instability of skeletal muscle cytoarchitecture in mice (Dalpé et al. 1999). Variant rs793752812 leads to a probable start codon loss in NEU1 and is associated with CRCL (p-value 1.49 · 10 212 ). A deficiency of the NEU1 gene product Neuraminidase 1 leads to vertebral deformities in humans (Sphranger et al. 1977), which is reasonable considering CRCL is largely determined by the number of vertebrae. Furthermore one frameshift variant in AURKA (rs693811701, p-value 2.95 · 10 212 ) and one splice donor variant in NUTM1 (rs331245426, p-value 1.38 · 10 29 ), both oncogenes (Umene et al. 2015) (Schaefer et al. 2018), are associated with CRCL. The splice donor variant rs1110687780, which affects the gene coding for placenta-specific protein 1-like, was detected in the GWAS for MFR. In humans, PLAC1 has been found to be highly expressed in various types of tumors (Koslowski et al. 2007). Application of results in breeding programs and follow up studies Functional validation studies based on appointed candidate genes and genetic variants will be considered in follow-up studies. Besides understanding the underlying molecular mechanisms of ADG, BFT, MFR and CRCL, the results of GWAS can render a substantial increase in the reliability of genomic predictions in breeding programs. This concept was demonstrated in several studies in cattle (Brøndum et al. 2015;Porto-Neto et al. 2015;van den Berg et al. 2016) and in Drosophila melanogaster  by including pre-selected variants from GWAS results in the prediction models. Even though implementing genomic selection is becoming a common practice, the usage of marker-assisted selection or genomic screening is not obsolete pointing out that the identification of relevant genetic markers via GWAS and post-GWAS analyses is still of practical importance in pig breeding.

Conclusion
Putting the results of previous simulation studies to test, we conducted GWAS in four pooled F2 designs, which have been imputed to sequence level based on high coverage founder and low coverage F1 sequencing. We found that by pooling the designs the sequence level marker density can be exploited efficiently. QTL for four well-characterized traits were identified in agreement with previous mapping studies and candidate genes and pathways were unraveled, that should be subject to further studies. Thus, the approach applied herein is a feasible strategy to efficiently utilize extremely well phenotyped experimental designs that have been established in the past.