The genetic architecture of dog ownership: large-scale genome-wide association study in 97,552 European-ancestry individuals

Abstract Dog ownership has been associated with several complex traits, and there is evidence of genetic influence. We performed a genome-wide association study of dog ownership through a meta-analysis of 31,566 Swedish twins in 5 discovery cohorts and an additional 65,986 European-ancestry individuals in 3 replication cohorts from Sweden, Norway, and the United Kingdom. Association tests with >7.4 million single-nucleotide polymorphisms were meta-analyzed using a fixed effect model after controlling for population structure and relatedness. We identified 2 suggestive loci using discovery cohorts, which did not reach genome-wide significance after meta-analysis with replication cohorts. Single-nucleotide polymorphism-based heritability of dog ownership using linkage disequilibrium score regression was estimated at 0.123 (CI 0.038–0.207) using the discovery cohorts and 0.018 (CI −0.002 to 0.039) when adding in replication cohorts. Negative genetic correlation with complex traits including type 2 diabetes, depression, neuroticism, and asthma was only found using discovery summary data. Furthermore, we did not identify any genes/gene-sets reaching even a suggestive level of significance. This genome-wide association study does not, by itself, provide clear evidence on common genetic variants that influence dog ownership among European-ancestry individuals.


Introduction
Historically, humans domesticated wolves into dogs, and the cooperative relationship between dogs and humans has been documented where dogs have been mainly used to help humans herd other animals, hunt, and protect their homes (Lord et al. 2016).Today, dogs also provide companionship and social interactions with humans, not only as domestic pets but also in interventions for the rehabilitation of prisoners (Dell and Poole 2015) and in healthcare settings (Jones et al. 2019).Dog companionship is associated with important physical and psychosocial health benefits for humans across their life span.For example, dog ownership has been associated with decreased all-cause and cardiovascular mortality in a Swedish population-based epidemiological study (Mubanga et al. 2017), although other large studies did not find such a relationship (Ding et al. 2018).Dog owners, particularly those living alone and elderly, are found to be more physically active, less lonely, and have an improved perception of well-being (Müllersdorf et al. 2010;Gee et al. 2017).Children growing up with dogs have a lower risk of asthma (Fall et al. 2015(Fall et al. , 2018) ) but not type 1 diabetes mellitus (Wernroth et al. 2017).
However, it is unclear whether the observed epidemiological associations between dog ownership and health outcomes are due to the dog adoption itself or the underlying differences in personality, health, genetics, and lifestyle between dog owners and non-dog owners.One possible explanation, which has not been addressed in previous literature, can be the potential pleiotropic effect or the genetic correlation influencing both the individual's choice of owning a dog and other traits.Evidence suggests that dog ownership has a strong genetic component.For example, we https://doi.org/10.1093/g3journal/jkae116Advance Access Publication Date: 31 May 2024

Investigation
have previously provided some evidence of latent genetic components explaining 51-57% of population variation of dog ownership in a classical twin study (Fall et al. 2019).Further, twin and family studies have also reported equivalent heritability for traits such as personality (Vukasovic ́ and Bratko 2015), body mass index (Elks et al. 2012), and other human disease traits (Polderman et al. 2015).However, no previous studies have tried to identify genetic variation associated with dog ownership.Identifying the genetic factors that influence dog ownership will improve our understanding of (1) this complex human behavior trait; (2) shared genetic architecture; and (3) the potential causal relationship between dog ownership and other complex traits.Furthermore, a mechanistic understanding of dog ownership at the molecular level may also allow its beneficial effects on health outcomes to be attained through non-pharmacological intervention.
Therefore, our aim was to explore the molecular genetic architecture of dog ownership.Specifically, we aimed to identify loci associated with dog ownership in a genome-wide association study (GWAS) in a large sample, to estimate single-nucleotide polymorphisms (SNP)-based heritability and genetic correlation, and to provide gene-based functional analysis.

Discovery cohorts
We performed separate GWAS and a meta-analysis on dog ownership in 5 independent discovery cohorts of twins from the Swedish Twin Registry (STR) (Pedersen et al. 2002;Lichtenstein et al. 2006;Magnusson et al. 2013;Zagai et al. 2019).Two sub-cohorts from the Screening Across the Lifespan Twin (SALT) study, i.e.SALTY and TwinGene, included 6,053 and 10,911 participants born 1958 or earlier who previously participated in a telephone interview between 1998 and 2002 and donated saliva samples or blood samples.The Study of Twin Adults Genes and Environment (STAGE) cohort included 8,568 twins born 1959-1985 who participated in a web-based questionnaire and donated saliva samples.The Young Adult Twin Study in Sweden (YATSS) and the Child and Adolescent Twin Study in Sweden (CATSS) included 3,271 and 6,705 twins born 1986-1992 and 1992-1997 who participated in web-based questionnaires (YATSS) or parental telephone interviews (CATSS) and provided saliva samples.
Information on dog ownership in individuals older than 18 years was retrieved via individual data linkage to the 2 dog ownership registers held by the Swedish Board of Agriculture and the Swedish Kennel Club with available information from 2001 January 1 to 2016 December 31.As it is mandatory that every dog in Sweden is registered by their owners in the dog registers, the coverage of registered dogs is excellent (83%) (Statistics Sweden 2013).Dog ownership was defined by registration from either dog register during the period, enabling at least 1 year of follow-up for the youngest twins born in 1997.We also retrieved additional information on birth year, sex, and zygosity from the initial questionnaire sent to participants.

Replication cohorts
Three cohorts were used as replication samples to assess SNP associations and the loci found in the meta-analysis of discovery cohorts: (1) the English Longitudinal Study of Aging (ELSA) including a cohort of individuals aged ≥50 years living in England in 2002 (Steptoe et al. 2013) and who also attended wave 5 follow-up survey (2010-2011); (2) wave 1 and wave 2 of genotyped individuals from the Swedish prospective birth cohort BAMSE (Swedish abbreviation for the Children, Allergy, Milieu, Stockholm, Epidemiology) who were born during 1994-1996 in Stockholm and followed at 1, 2, 4, 8, 12, 16, and 24 years of age (Wickman et al. 2002); and (3) the Trøndelag Health Study (HUNT), including people aged ≥20 years living in the Nord-Trøndelag region during 1984-1986, 1995-1997, 2006-2008, and 2017-2019(Brumpton et al. 2022)).A detailed description of each replication cohort regarding how recruitment was carried out, the definition of dog ownership, SNP genotyping, imputation, and statistical analysis, is provided in the Supplementary Methods and Supplementary Tables 1 and 2. Additionally, 23andMe, Inc. research participants (507,249 dog owners and 452,782 non-dog owners) were used to replicate the 2 suggested significant loci.
In short, the dog ownership information in ELSA was based on the responses to the wave 5 data collection (2010-2011) regarding the question "Do you keep any household pets inside your house/ flat?" followed by items about specific pets (dog, cat, bird, other furry pet, or "other" type of pet).In BAMSE, dog ownership was defined as participants who responded to the questions "Are there/ Have there been any pets at home? (Yes/No)" and "Which pet, including dogs, cats, or other (specify)" at 24 years of age.In HUNT, we retrieved information on dog ownership based on participants' responses to the question "Is there a dog in your home?(Yes/No)."In 23andMe, dog ownership information is defined as replying yes/ no to "Do you own a dog?".

Genotyping, imputation, and association analyses
DNA from the saliva or blood of the study participants from the STR was extracted using the automatic systems of CheMagic (STAR Instrument, Hamilton Robotics) or the Puregene (Gentra Systems, Minneapolis, MN, USA) at the KI Biobank.SNP-based genome-wide genotyping was performed at the SNP&SEQ Technology Platform in Uppsala, Sweden, using the Illumina Infinium PsychArray-24 BeadChip (for CATSS, TwinGene, and SALTY) and the Illumina Global Screening Array Multi-Disease (GSA-MD) BeadChip (for YATSS and STAGE).Zygosity information encoded as a case/control phenotype was used for quality control.The genotyped samples were processed using the Ricopili pipeline (Lam et al. 2020) for quality control (QC).SNPs with missingness > 0.05 were removed.Samples failed QC due to any of the following: per-sample call rate < 0.98; excessive heterozygosity (FHET outside ±0.2); and sex mismatch.Markers failed SNP QC due to any of the following: per-SNP call rate < 0.98; invariant; Hardy-Weinberg disequilibrium (P < 1 × 10 −6 in controls and P < 1 × 10 −10 in cases); and difference in call rate between cases and controls >0.02.Post-QC genotypes were imputed to the Haplotype Reference Consortium panel release 1.1(HRC 1.1) (McCarthy et al. 2016).After QC and imputation, monozygotic co-twins' genotypes were then imputed from their paired genotyped sibling and added to the main dataset.Thus, the 5 STR sub-cohorts consisted of 31,566 individuals and ∼47 M available markers, in which over 7 M common variants with high imputation quality [INFO score > 0.6 and imputation dosage R 2 > 0.8 and minor allele frequency (MAF) ≥ 1%, see Supplementary Table 1 for the steps of post-QC SNP filtering].Similar QC and imputation processes on the genotype data from ELSA, BAMSE, and HUNT are described in detail in the Supplementary supplementary methods and Supplementary Tables 1 and 2. The genotyping, imputation, and GWAS of 23andMe samples were described elsewhere.
We carried out sub-cohort-specific GWAS of dog ownership for STR and HUNT participants after excluding individuals with non-European ancestry, using the 2-step SAIGE method developed by Zhou et al. (2018): step 1, fitting a null logistic mixed model for dog ownership, with adjustment for calendar year of birth (standardized for each cohort), sex, and population stratification using the top 3 principal components (PC) of genotypic variance and generating genetic relatedness matrix (GRM).The GRM was estimated from directly genotyped markers after applying linkage disequilibrium (LD) pruning to exclude markers with r 2 > 0.1 in a sliding window covering 10,000 markers and a 1,000 marker increment.
Step 2 is fitting a mixed model with respective SNP dosage data, genetic relationship matrix, and other covariates included in step 1. ELSA GWAS of dog ownership was analyzed using SNPTEST with adjustment for year of birth, sex, and population stratification (top 4 PCs) in the logistic model.BAMSE GWAS of dog ownership was analyzed using EPACTS with adjustment for sex, standardized calendar year of birth, and top 3 principal components for population stratification.
Meta-analyses of all 5 discovery cohorts and 3 replication cohorts were performed as inverse variance weighted fixed effect models using METAL (Willer et al. 2010).Multi-allelic variants were excluded from all cohorts before running the meta-analyses.Q-Q plots and genomic inflation factors were generated to assess the potential inflation of test statistics from residual population stratification for each cohort.The association estimates between SNPs and dog ownership were reported as the primary analysis of all individual cohorts and meta-analysis with Manhattan plots.Variants with P-values P < 5 × 10 −8 were considered as genomewide significant and P < 5 × 10 −7 as suggestive significance in our meta-analyses.We also reported the heterogeneity of genetic effect estimates for selected variants (with P < 5 × 10 −6 ) across cohorts using the I 2 index (i.e.HetISq).The SNPs with the lowest association P-values from the meta-analyses were looked up in PhenoScanner to identify associated genes and functions and any previously reported associations with other diseases and traits using the GWAS Catalog (Staley et al. 2016;Buniello et al. 2019).
The 23andMe team performed the association analysis using logistic regression assuming an additive model for allelic effects adjusting for age, sex, top 5 PCs, and genotyping platforms and shared the results of 2 SNPs we requested for replication analysis based on a meta-analysis of the discovery cohorts.

Heritability estimation using GWAS summary statistics
We estimated SNP-based heritability for dog ownership and genetic correlation with 30 complex human traits including lifestylerelated, physical measures, and common somatic and psychiatric conditions using LD score regression, i.e. regressing GWAS summary statistics on LD scores (Bulik-Sullivan et al. 2015).The summary statistics of complex trait GWAS that were used in the LD score regression were listed in the Supplementary Materials (see Supplementary Table 10).We report the false discovery rate (FDR)-corrected estimates based on the summary statistics from both GWAS meta-analyses (including and excluding replication cohorts) and the individual cohorts.Furthermore, the 23andMe team reported SNP-based heritability based on the LD score regression analyses as well.

Functional annotation and gene-based analyses
Positional mapping, gene/gene-set-based analyses using generalized gene-set analysis of GWAS data (MAGMA v1.06), and tissue expression analysis (using Genotype-Tissue Expression v8 datasets) were performed using the functional mapping and annotation (FUMA) SNP2GENE pipeline/platform (Watanabe et al. 2017).

Ethical statement
The study was approved by the Regional Ethical Review Authority in Stockholm, Sweden, and informed consents were received from all study participants in the STR and other replication cohorts.All data were pseudonymized prior to analyses.

Characteristics of discovery and replication cohorts
The basic characteristics of the 5 discovery and 3 replication cohorts are reported in Table 1.The discovery cohorts consisted of 35,358 participants: 3,207 (9.1%) were identified as dog owners in the registers during 2001-2016, and 32,157 were non-dog owners.Compared with the latter, the dog owners were more frequently female, married, or cohabiting with a partner, with compulsory or high school education.

Identifying novel loci in discovery and replication cohorts
After QC, exclusion of individuals with non-European ancestry, and GWAS analyses, association results for >7.4 million SNPs of 31,566 out of 35,358 twins (89%) were meta-analyzed using a fixed effect model.The ratio of the observed to the expected median association χ 2 statistic (λ) was 1.02.Twelve SNPs located in chromosomes 5 and 17 (5q34 and 17q21.33)were suggestive with P < 5 × 10 −7 , but did not reach the genome-wide level of significance (see Supplementary Table 3 and Fig. 1).The association results for suggestive SNPs in the 5 individual twin cohorts showed similar effect sizes (see Supplementary Table 4 and Supplementary Fig. 1).Minimal population stratification was observed based on Q-Q plots and λ GC (see Supplementary Fig. 2).
After including the 3 additional replication cohorts' eligible participants (n = 65,986), the meta-analyzed results based on 9,257,239 SNPs (of which 79.8% were presented in at least 7 out of 8 cohorts) are displayed in Supplementary Table 5.We did not observe any SNPs that reached genome-wide significance; however, we found 8 suggestive SNPs at 3 genetic risk loci (i.e.1p31.1, 9q34.11,16p13.3)with P < 5 × 10 −7 (Supplementary Tables 6 and 7).The nearest genes being looked up in the GWAS Catalog were LRRC7, NCS1, AL360004, and HAGH (Supplementary Table 6).The associations of the suggestive SNPs with dog ownership in the discovery cohorts did not appear as significant in the replication cohorts (Supplementary Table 4 and Supplementary Fig. 3).Neither did we observe the loci on chromosome 5 being associated with dog ownership in the 23andMe research participants (Supplementary Table 8).

Functional annotation and gene-based analyses
Lastly, we examined whether aggregating suggestive SNP associations from the meta-analyzed results of all cohorts to the nearest genes (i.e.LRRC7, NCS1, AL360004.1,and HAGH) and linking them to tissue expressions might provide biological insights for dog ownership.We did not identify any target genes/gene-set elements that were significantly correlated with dog ownership after FDR correction (Supplementary Table 12 and 13 and Supplementary Fig. 5).Furthermore, the tissue expression analysis for sets of the mapped genes did not reveal any significantly enriched tissue types (Supplementary Fig. 6).

Discussion
In this study, we performed the largest GWAS (n = 97,552) to date on dog ownership phenotype in European-ancestry individuals from 5 large discovery cohorts in Sweden, meta-analyzed with external European-ancestry cohorts from the United Kingdom and Norway.We found no common variants associated with dog ownership that reached the genome-wide significance threshold.Six suggestive loci were discovered in relation to dog ownership, and a weak SNP-based heritability estimate was observed.Moderate genetic correlations between dog ownership with complex traits including educational attainment and ADHD were identified, which indicates some evidence of shared genetic signatures, albeit wider confidence intervals when being replicated.Furthermore, the null result of the gene-based analyses has broadened the scope of our understanding of the genetic architecture of dog ownership to some extent.
Our study presents the first GWAS analysis focused on the trait of dog ownership.The primary finding is that no genetic locus reached genome-wide significance for dog ownership in the meta-analysis, including discovery and replication sets.This null finding can be due to phenotypic heterogeneity or misclassification, i.e. different dog ownership definitions in the discovery and replication cohorts.For example, the replication cohorts' measurement of dog ownership is based on the self-reported answers to "Are there dogs at home," which does not necessarily indicate that the participant is the actual dog owner.The twin data are based on 2 dog ownership registers, which means that an individual within the household has to be motivated to register as the owner and be responsible for the dog.We could not access data on spouses/cohabiting partners to define the phenotype based on household dog ownership, which is likely to explain the lower observed prevalence of dog ownership in the STR cohorts compared to the general population.In this case, the observed results when leveraging large samples from different geographic areas (United Kingdom and Norway) could be diluted or biased toward null (Burstein et al. 2023).Furthermore, misclassification and changes in the dog ownership status over time might have skewed the phenotype distribution, biasing GWAS results toward the null.The dog ownership ascertainment among the younger and older twins was likely to be the main cause of these biases, as the parents of younger twins could have been the "registered" owners but not captured in CATSS or YATSS compared to self-reported dog ownership in BAMSE (prevalence 3-9% vs 15%).The longitudinal data from the dog ownership registers, with excellent population coverage from 2001 and onward, may not account for some of the previous owners in this study (8% in TwinGene vs 12% in SALTY).
Furthermore, as a primary parameter of the genetic architecture, the SNP-based heritability we identified was low (∼1.8%) in all discovery and replication samples and in 23andMe research participants (∼3.1%).This is comparable to the heritability estimates for some complex diseases such as Alzheimer's disease (∼3.1%) (Andrews et al. 2023).The presented "missing heritability" here was in line with findings from other complex diseases, highlighting the multifactorial etiology (Levinson et al. 2014).Even with the SNP-based heritability estimate of ∼12% for the discovery cohorts after adjusting for the liability scale, this is still much lower than the broad-sense heritability estimates of 51-57% for dog ownership in the same samples (Fall et al. 2019).There are several potential explanations for the "missing heritability."First, gene-environment interactions where the effect of some genetic loci on the population variation of dog ownership could be dependent on certain environmental exposures (e.g.living area characteristics that could facilitate dog ownership) (Young 2019).Second, all common variants with small effects remain hidden in the noise based on the current sample size or the phenotype misclassification mentioned above, as we observed a dilution of both SNP-based heritability and genetic correlations when using the summary data based on the discovery and replication cohorts.Third, imputed data allowed us to evaluate the associated common variants that are not directly genotyped.However, rare variants may contribute to the heritability estimated the from classical twin model, but their effects could much less likely be captured in the current GWAS, which relied on the HRC reference panel.Fourth, the twin heritability can be overestimated due to the gene-gene and gene-environmental interaction, violation of equal environment assumptions (e.g.presence of assortative mating) (Border et al. 2022), and common genetic components determining dog ownership are minimal.
The possible genetic correlation underlying dog ownership and other complex traits including educational attainment, ADHD, type 2 diabetes, smoking initiation, neuroticism, major depressive disorder, intelligence, gastroesophageal reflux disease, body mass index, and asthma have been observed from epidemiological studies (Saunders et al. 2017;Zijlema et al. 2019).However, the phenotypic correlations are inconsistent, with reported opposite directions on some associations.Should these findings not be attributable to false positives, the genetic overlap may suggest potential pleiotropic effects of dog ownership with these somatic and mental health outcomes.Whether these findings could be translated into a causal relationship still deserves further investigations with powerful SNPs as instrumental variables.
The major strength of our study is the good coverage of dog ownership data in the Swedish general population and the capacity to link with the Swedish Twin Registers.However, there are several important limitations that are worth mentioning.First, we could not rule out misclassification with regard to the exposure time of dog ownership, which was defined as ever versus never, assuming this is a heritable trait.Furthermore, dog ownership information could be misclassified if the dog is registered under the partner's name, for which we lack information.Ideally, we would like to consider a time-to-event GWAS, accounting for censoring, changes in dog ownership over time/at different ages.Second, our findings, based on genotyped data from European populations, may not be generalizable to populations with other ancestries.Third, adjustments for population stratification based on common variants may be insufficient.Further investigations using family-based association analyses or PC adjustment of rare variants or identity by descent in even larger GWAS will be beneficial.
In conclusion, our meta-analysis of GWAS does not, by itself, provide clear evidence on common variants that influence dog ownership among European-ancestry individuals.The "missing heritability" should be further investigated in larger samples.

*
Age by December 2016 for the five discovery cohorts and age at study participation for replication cohorts.

Fig. 1 .
Fig. 1.Manhattan plots displaying GWAS results for dog ownership from the meta-analysis of discovery cohorts with or without replication cohorts.The upper panel represents meta-analyzed GWAS results of discovery cohorts, and the lower panel represents results from discovery and replication cohorts.The red and blue lines indicate the P-value thresholds of 5 × 10 −8 and 5 × 10 −7 , respectively.

Table 1 .
Basic characteristics of discovery and replication cohorts.