Japonica Array NEO with increased genome-wide coverage and abundant disease risk SNPs

Abstract Ethnic-specific SNP arrays are becoming more important to increase the power of genome-wide association studies in diverse population. In the Tohoku Medical Megabank Project, we have been developing a series of Japonica Arrays (JPA) for genotyping participants based on reference panels constructed from whole-genome sequence data of the Japanese population. Here, we designed a novel version of the SNP array for the Japanese population, called Japonica Array NEO (JPA NEO), comprising a total of 666,883 markers. Among them, 654,246 tag SNPs of autosomes and X chromosome were selected from an expanded reference panel of 3,552 Japanese, 3.5KJPNv2, using pairwise r2 of linkage disequilibrium measures. Additionally, 28,298 markers were included for the evaluation of previously identified disease risk markers from the literature and databases, and those present in the Japanese population were extracted using the reference panel. Through genotyping 286 Japanese samples, we found that the imputation quality r2 and INFO score in the minor allele frequency bin >2.5–5% were >0.9 and >0.8, respectively, and >12 million markers were imputed with an INFO score >0.8. From these results, JPA NEO is a promising tool for genotyping the Japanese population with genome-wide coverage, contributing to the development of genetic risk scores.

population. Here, we designed a novel version of the SNP array for the Japanese population, called Japonica Array NEO (JPA NEO), comprising a total of 666,883 markers. Among them, 654,246 tag SNPs of autosomes and X chromosome were selected from an expanded reference panel of 3,552 Japanese, 3.5KJPNv2, using pairwise r 2 of linkage disequilibrium measures. Additionally, 28,298 markers were included for the evaluation of previously identified disease risk markers from the literature and databases, and those present in the Japanese population were extracted using the reference panel. Through genotyping 286 Japanese samples, we found that the imputation quality r 2 and INFO score in the minor allele frequency bin >2.5-5% were >0.9 and >0.8, respectively, and >12 million markers were imputed with an INFO score >0.8. From these results, JPA NEO is a promising tool for genotyping the Japanese population with genome-wide coverage, contributing to the development of genetic risk scores.
Abbreviations: ASA, Asian Screening Array; DQC, dish quality control; GWAS, genome-wide association study; HLA, human leukocyte antigen; JPA, Japonica Array; KIR, killer cell immunoglobulinlike receptor; LD, linkage disequilibrium; LIMS, laboratory information management system; MAF, minor allele frequency; PRS, polygenic risk score; QC, quality control; SNP, single-nucleotide polymorphism; TMM, Tohoku Medical Megabank; ToMMo, Tohoku Medical Megabank Organization; WGS, whole-genome sequencing The Tohoku Medical Megabank (TMM) Project was launched as part of reconstruction efforts following the Great East Japan Earthquake on March 11, 2011, and aims to establish a next-generation medical system for precision medicine and personalized healthcare (1). To accomplish the purpose, we have been conducting prospective genome cohort studies in connection with the establishment of an integrated biobank. Between 2013 and 2017, the Tohoku Medical Megabank Organization (ToMMo) and the Iwate Tohoku Medical Megabank Organization recruited 157,602 participants and conducted a baseline assessment, including the collection of biospecimens in Miyagi and Iwate Prefectures. The study population comprised two cohorts: the TMM Community-Based Cohort Study (TMM CommCohort Study) cohort, consisting of 84,073 adults (2), and the TMM Birth and Three-Generation Cohort Study (TMM BirThree Cohort Study) cohort, consisting of 73,529 pregnant women and their family members (3).
We have performed genome/omics analyses within the TMM project and established an integrated biobank that includes biospecimens, health and clinical information and genome/omics data to develop a research infrastructure for genomic medicine (4). Taking advantage of the two abovementioned cohorts, we planned a strategy for genomic analysis as follows: development of a whole-genome reference panel using the TMM CommCohort, large-scale genotyping and genotype imputation of both cohorts, and collection of accurate haplotype information from the TMM BirThree Cohort. Based on this strategy, we first established an allele frequency panel called 1KJPN, which includes the whole-genome sequencing (WGS) data of 1,070 participants (5). The reference panel was sequentially expanded to the latest version, 3.5KJPNv2, which consists of 3,342 and 210 samples from the participants of the TMM project and other cohorts in western Japan, respectively (6). Based on the updated reference panel, we have developed and refined custom single-nucleotide polymorphism (SNP) arrays for genotyping all 157,602 participants, as described below.
The International HapMap and the 1000 Genomes Project have shown that human genomes comprise regions with an extended linkage disequilibrium (LD) and of limited haplotype diversity, depending on the population (7,8), and that SNPs within regions could be inferred from genotypes of a smaller number of SNPs. Carlson et al. showed that the selection algorithms of the set of SNPs for genotyping (referred to as tag SNPs) based on the r 2 , which are widely used for pairwise LD measures using reference genome sequences from different populations (9). Using tag SNPs, untyped sites can be complemented by genotype imputation using a reference genome to increase the number of SNPs that can be used for further association studies (10,11). In large-scale multiethnic studies, four kinds of ethnic-specific SNP arrays were first designed for European, East Asian, African American and Latino populations with simulations of genotype imputation (12,13). Similarly, biobanks and/or cohort projects developed ethnic-specific SNP arrays, such as the UK biobank Axiom Array (14), the Axiom-NL Array based on GoNL reference data in Netherlands (15) and the Axiom Array for Finnish of the FinnGen project. In East Asia, ethnic-specific custom arrays were also developed by the Taiwan Biobank as the TWB Array, based on the Axiom Genome-Wide CHB 1 Array (16); by the Korean biobank as Axiom KoreanChip, based on 2,576 WGS data (17) and by the Axiom China Kadoorie Biobank Array (18).
Most of these large-scale projects adopted the Axiom system because of the flexibility of the manufacturing array, the highly automated assay process and the robust sample tracking with a 96-array layout. Concurrent with the trend towards developing ethnic arrays, we also selected the Axiom system to design a Japanese-specific SNP array [the Japonica Array (JPA)]. We designed the first version of the Japonica Array (JPAv1) in 2014 (19). JPAv1 contains tag SNPs selected by means of a statistical measure called 'mutual information' with minor allele frequency (MAF) 0.5% to cover rare variants from a reference panel comprising 1,070 Japanese genomes, and a number of characterized SNPs from the genome-wide association studies (GWAS) catalog plus some other databases. In 2017, we updated JPAv1 and developed the second version (JPAv2) by increasing nontag SNPs, such as human leukocyte antigen (HLA), killer cell immunoglobulin-like receptor (KIR) regions and Y chromosome, and by replacing markers that were not working in JPAv1. Genotyping of TMM participants was conducted primarily with JPAv2 until 2018.
To enhance direct genotyping of previously identified disease risk variants and to obtain maximum genomic coverage with the expanded whole-genome reference panel including nearly 4,000 Japanese individuals (6), we aimed to design a novel and substantially revised version of the JPA, which we call Japonica Array NEO (JPA NEO). In this paper, we describe how we have improved upon JPAv2 to create JPA NEO. Of the various improvements, one salient point is that we have changed the selection algorithm for tag SNPs from using the mutual information criteria to the global standard of using r 2 of the LD measure, aiming to improve imputation accuracy and to standardize the data for use in meta-analyses conducted anywhere in the world. We also report the progress of genotyping TMM participants by using all three versions of the JPA.

Materials and Methods
Tag SNP selection for JPA NEO A target set was constructed using founders in the repository of our new genome reference panel consisting of genomes from 3,820 Japanese participants (6). For X chromosome, only females of above panel (2,066 Japanese individuals) were used. Tag SNPs were selected by the standard greedy pairwise algorithm based on the pairwise r 2 of LD statistics (9,12,13). Briefly, starting with a set of target sites with an MAF higher than a specific threshold, one site with the maximum number of others exceeding the r 2 threshold was selected. Then, this maximally informative site and all other associated sites were grouped as a bin of tag SNPs and removed from the target set. These steps were iterated until the total number of tag SNPs matched that of JPAv2. When multiple tag SNPs were selected in the same step, we prioritized them according to the following three criteria: (1) the maximum score of annotation by ANNOVAR (20) (exonic or splicing ¼ 6, ncRNA ¼ 5, 5 0 -UTR or 3 0 -UTR ¼ 4, intronic ¼ 3, upstream or downstream ¼ 2, intergenic ¼ 1 and no annotation ¼ 0); (2) not A-T or G-C of alternative-reference alleles and (3) yielding the maximum variance of base-pair positions.
Selection of disease-related markers for JPA NEO Disease-related markers were selected primarily from published lists of disease-related genes and GWAS results of Japanese populations, with expert advice. In addition, markers in the NHGRI GWAS catalog (21) and the UK Biobank Array (14) were also selected. From the latter, we extracted markers present in the Japanese population by referring to the 3.5KJPNv2 panel.

Development of JPA NEO
The list of tag SNPs and disease-related markers was combined with those of Y chromosome and mitochondrial markers. Based on the combined list, the array was produced using the Axiom myDesign service (Themo Fisher Scientific, Inc.). Multiple probes were designed for markers that were not included in the Axiom TM validated probe sets. Then, control markers were added, and the total number of markers was adjusted to the maximum number for the Axiom 96-array layout. The full marker list and detailed list of disease-related SNPs are available at the jMorp website (https://jmorp. megabank.tohoku.ac.jp/downloads/#jpa).

DNA samples
Isolation and quality control (QC) of genomic DNA from blood and saliva samples in the TMM biobank were performed as described previously (22). Genomic DNA samples isolated from the blood of TMM participants, but not those included in the reference panel, were used to evaluate JPA NEO. The study was approved by the Research Ethics Committee of ToMMo, Tohoku University.
Genotyping with JPAs A genotype assay was performed according to the manufacturer's protocol (i.e. Axiom TM 2.0 Genotyping Assay User Manual for 8plate workflow). Briefly, target DNA was enzymatically amplified and fragmentated, and after confirmations of concentration and fragment length by NanoDrop (Thermo Fisher Scientific) and TapeStation System (Agilent Technologies), hybridization, ligation and scanning were processed by a semiautomated machine, GeneTitan TM Multi-Channel Instrument (Thermo Fisher Scientific). These processes were conducted using liquid-dispensable robots (Nimbus TM , Hamilton; Biomek FX P , Beckman Coulter) and managed by a laboratory information management system (LIMS, LabVantage Solutions). For the QC, the dish quality control (DQC), sample QC call rate and plate pass rate were analysed using control markers for the Axiom platform (around 19,000) according to the Axiom TM Genotyping Solution Data Analysis Guide using Axiom TM Power Tools (APT, version 1.16.1). Genotyping data satisfying the criteria were used for the following QC analyses.

QC analysis and genotype imputation
Genotyping data were further analysed for SNP QC, sample QC and plate QC using all markers per plate, in accordance with the abovementioned analysis guide. After filtering out variant sites with low call rates, low MAF, or showing substantial deviation from Hardy-Weinberg equilibrium, SHAPEIT2 (23) and IMPUTE2 (24) were used to conduct prephasing and genotype imputation, respectively. The imputation accuracy was evaluated using the squared correlation, r 2 , with leave-one-out SNP masking methods (12,13,25). Briefly, genotype imputation was performed by masking an input SNP and the imputed SNP was compared with the masked one to obtain r 2 , after which the average r 2 in each MAF bin was calculated. Another metric, the information measure (INFO score) given by IMPUTE2, was used to analyse the imputation quality for each marker, where the value 0-1 indicated the uncertainty about the imputed genotype (11).

Tag SNP selection for improved genome-wide coverage
In JPA NEO, our updated version of the JPA, we used the maximum number on a single array of the Axiom 96-array layout, and the total of nearly 670,000 markers was divided into about 650,000 tag SNPs and tens of thousands of disease-related markers. The selection process of JPA NEO is essentially the same as previous versions of the JPA. However, we have selected these markers by using the latest version of our genome reference panel, which contains the genomes of 3,552 Japanese individuals (3.5KJPNv2) (6), which is about three times greater than that used for the previous versions (JPAv1 and JPAv2). Of note, while the previous two versions of the JPA used mutual information for tag SNP selection (19), in JPA NEO we decided to change the method for selecting tag SNPs to one based on the standard protocol using pairwise r 2 (9) ( Table I). This has the advantage of allowing us to harmonize our data with those of other studies. We believe that it is of great importance to perform meta-analyses with other large-scale GWAS utilizing the same concept.
A comparison of the design of JPA NEO with those of JPAv1 and JPAv2 is summarized in Table I.
To optimize the selection of tag SNPs, we first selected tag SNPs from chromosome 10 of the 3.5KJPNv2 reference panel by using greedy pairwise algorithm (9) with different combinations of thresholds of MAF; i.e. 0.005, 0.01 or 0.05 and pairwise r 2 of LD measures; r 2 0.5 or 0.8. Two metrics were used to evaluate tag SNP performance: (1) genomic coverage, which is the proportion of untyped sites with at least one tag SNP with r 2 greater than a given threshold and (2) the number of variants obtained by genotype imputation above the threshold of a given INFO score, which is an index of imputation accuracy. When tag SNPs were selected by pairwise r 2 0.8 and MAF 0.01, the genomic coverage with r 2 0.8 and the number of imputed variants from the 2KJPN reference panel (2,049 Japanese genomes) with INFO > 0.9 were better or comparable to those of JPAv2 and Infinium Omni2.5-8 (Fig. 1). Based on these results, we decided to select tag SNPs with pairwise r 2 of LD measures 0.8 and MAF 0.01 from the target set of autosomes and the X chromosome. For the design of JPA NEO, a substantial number, >1,000 of sex-chromosome SNPs on two pseudoautosomal regions were newly selected, whereas only about 10 SNPs on these regions were available in JPAv1 and JPAv2.
We also selected Y chromosomal markers for the Y haplogroup classification of the International Society of Genetic Genealogy (26) and from those in JPAv1 and JPAv2, which were selected using preexisting Axiom arrays for Asian populations. Mitochondrial markers were extracted mainly from 3.5KJPNv2 by removing those with MAF < 0.5% as well as those with multiple alleles. Most markers corresponding to  the HLA and KIR regions were taken over from those adopted for JPAv1 and JPAv2.

Selection of disease-related markers based on published evidence
For the selection of disease-related markers, we picked 9,000 markers present in the Japanese population, mainly from among published lists of disease genes and GWAS-identified risk variants. The former includes known and candidate functional variants on gene lists from the American College of Medical Genetics and Genomics (27) (14), respectively. We used reference panel 3.5KJPNv2 to extract the markers present in the Japanese population. The novel Axiom SNP array specific to the Japanese population was developed as JPA NEO.
JPA NEO has genome-wide coverage and contains disease risk SNPs The developed JPA NEO contains a total of 666,883 markers; the number of markers in each category is shown in Table II in comparison with JPAv1/JPAv2. In JPAv1/JPAv2, tag SNPs from autosomes and the X chromosome account for 98% (>650,000 SNPs). In contrast, nearly 8,500 SNPs from the Y chromosome (779 markers), mitochondria (409 markers) and HLA and KIR regions (6,757 and 532 markers, respectively) were also included to realize genome-wide coverage and genotyping of specific functional variants.
Although there is some overlap with the above SNPs, a total of 28,298 disease-related SNPs in 12 disease categories and pharmacogenomics are included as well (Table III). These SNPs include risk alleles for complex diseases, including dementia, depression and autism spectrum disorder among psycho-neurologic diseases (5,556 markers), type 2 diabetes and hyperlipidaemia among metabolic diseases (2,948 markers) and asthma and atopic dermatitis among immunological diseases (6,426 markers). In addition, variants related to physical traits (height, blood protein levels, etc.), expression quantitative trait locus, and so on are categorized as 'others'.
Of note, 3,472 markers (0.52%) in JPA NEO were MAF < 1% as confirmed by 3.5KJPNv2 (Supplementary Table S2). This is due to the adoption of some disease-related markers regardless of their MAF in 3.5KJPNv2. We have compiled the full list of disease-related markers with keywords and disease categories as a Supplementary Table S3, which can be downloaded from the jMorp website (30).

High imputation performance of JPA NEO
We modified the tag SNPs for JPA NEO from the previous versions with the aim of improving the imputation coverage of the microarray. To verify this point, we analysed the performance of JPA NEO in comparison with that of JPAv2. To this end, the same 286 samples, which were not included in the 3.5KJPNv2 reference panel, were genotyped using both JPA NEO and JPAv2. We found that the median call rates of JPAv2 and JPA NEO for all markers per sample were >99.6% and 99.8%, respectively (Supplementary Table S4), indicating that the call rate of JPA NEO is slightly better than that of the JPAv2.
More than 99% of markers were polymorphic in both JPAv2 and JPA NEO, as we intended (Table IV). Some microarrays are designed to cover a Extracted markers present in Japanese population by 3.5KJPNv2. wide range of ethnicities, which is in contrast to the aim and scope of our Japanese-specific arrays. We hypothesized that the former type of microarrays may have lower performance compared with ethnic-specific ones. To address this point, we compared the performance of JPAv2 and the Infinium Asian Screening Array (ASA), which covers a wide range of Asian populations, including Japan, by using the genomes of 191 Japanese in the TMM cohorts. We found that >17% of markers were monomorphic in the ASA array, while >99% worked as polymorphic markers in JPAv2 (Table IV) with a median call rate of >99% for both arrays (Supplementary Table S4). This observation supports our contention that ethnic-specific microarrays are critical for analysing each ethnic population. When we closely inspected the MAF distributions of JPA NEO in comparison with those of JPAv2, we noticed that JPAv2 showed low numbers of MAF markers (15-25%) compared with JPA NEO (Fig. 2). We envisage that this may be due to the method for selecting tag SNPs. However, our new selection method has significantly improved the marker distribution in this region.
We performed genotype imputation of autosomes by using the haplotype reference panel of 3.5KJPNv2 and evaluated the imputation accuracy according to two metrics, imputation quality r 2 and INFO score. The mean r 2 and INFO score were >0.9 and 0.8, respectively, in MAF bin >2.5-5% of two arrays (Fig.  3), indicating reliable imputation accuracy for both JPAv2 and JPA NEO. However, importantly, we also noticed that there was a significant decrease in mean r 2 in the region over MAF 20% in JPAv2. Whereas the precise reason for this decrease remains to be clarified, the decrease has been abrogated in JPA NEO.
As shown in Table V, slightly but clearly more imputed markers with INFO > 0.8 were obtained from genotyping data by JPA NEO than JPAv2, especially those with MAF < 1% (1.08-fold). We found that a total of >12 million markers were imputed by the small-scale analyses of the two arrays. These results indicate that while both JPA NEO and JPAv2 provide sufficient power for genotyping the Japanese population and following genotype imputation, JPA NEO shows better imputation performance without any bias throughout MAF bins. Thus, we conclude that JPA NEO is the most reliable imputation array ever developed for the Japanese population.

Large-scale genotyping by JPAs in the TMM project
To establish a solid research infrastructure for genomic medicine in Japan, the TMM project aimed to generate as much genotype data as possible from its 150,000 participants. To this end, we have been genotyping TMM cohort participants using the JPA since 2014. To complete such as large-scale genotyping efficiently, we established an elaborate three-group system from sample selection to genotyping, which connects to the data qualification.
We prepared our own special workflow for the ToMMo analysis, which ensures efficient and reliable sample processing and supports high-throughput measurement (Fig. 4). The first step is preparing the  target sample lists containing the thousands of participants corresponding to a specific purpose, such as the TMM CommCohort participants with respiratory function data. The selection of participants and availability of DNA samples or biospecimens are supported by LIMS at the TMM biobank (22). This step is conducted by Center for Genome Platform Projects. The second step is extracting and dispensing the DNA into 96-well plates. To divide samples into individual plates in a well-ordered and formulated manner, the correspondence between sample identifier (ID) and well position is manifested by creating the plate map before dispensing the DNA samples. This step is conducted by Group of Biobank. The final step is transporting the DNA plates and plate maps to the genotyping facility attached to the TMM Biobank, which is operated using LIMS by Group of Microarray-based Genotyping Analysis. For security control, different sample IDs were used for sample collection, storage and analysis (31). Capitalizing on this workflow, in May 2020, we obtained JPA data of 130,000 participants who met the criteria for QC analysis using control markers. The dataset comprises 2,000 JPAv1, 101,000 JPAv2 and 27,000 JPA NEO data (Table VI). We have already analysed >63,000 samples from the TMM CommCohort by using JPAv2, whereas the TMM BirThree Cohort samples were analysed by either JPAv2 or JPA NEO. Considering further association analyses, we are in the process of designing a rigid protocol that would allow each family unit to be analysed by the same JPA.
We have been using DNA samples obtained primarily from peripheral or cord blood. When samples from these sources were not available, mostly those from the children of TMM BirThree Cohort participants, DNA from saliva samples was used and analysed separately with the one from blood. In our operation, the QC pass rate has been >99% for blood samples using both JPAv2 and JPA NEO. In contrast, that of saliva samples as 90% using JPAv2, likely due to the presence of lower-quality samples. We believe that with this accomplishment, JPA NEO now has enough control data of the resident population to be an important and useful array for the entire Japanese population.

Discussion
The TMM project is one of the first large-scale prospective genome cohort studies in Japan and aims to realize precision medicine and personalized healthcare. To construct genome research infrastructure, we had to consider a cost-effective and high-throughput strategy for the acquisition of genomic data of >150,000 participants. Based on previous studies on genomic variants in diverse populations (32), we recognized that commercial arrays for global or even Asian populations were not sufficient for our purpose. Therefore, we decided to develop a custom ethnic-specific SNP array, the JPA, to maximize the acquisition of polymorphic markers in the Japanese population and provide genomic coverage with reliable genotype imputation accuracy while reducing cost.
In the TMM project, the whole-genome reference panel was expanded from 1KJPN to 2KJPN and 3.5KJPN. The latest version, 3.5KJPNv2, was constructed not only with an increased number of singlenucleotide variants but also added those from the X    chromosome and mitochondria (6). JPA NEO was designed by reselecting the tag SNPs of autosomes and the X chromosome from this panel. The haplotype reference panel for genotype imputation was also updated from 2KJPN to 3.5KJPNv2. This update to the imputation panel yielded an increase of >5 million imputed variants in the preliminary analysis of 335 samples using JPAv2 data compared with those obtained by genotype imputation with 2KJPN in the same sample analysis (data not shown). Thus, 3.5KJPNv2 is more effective than previous reference panels in providing genome-wide coverage in terms of both tag SNP selection and genotype imputation. The genotype imputation performance of JPA NEO was evaluated in comparison with that of JPAv2 by performing a small-scale analysis. In the genotyping data obtained by both arrays, monomorphic markers were scarcely observed and the large number of variants were imputed with a high imputation quality r 2 and INFO score. Of note, JPA NEO showed better statistics compared with JPAv2 but without any bias, suggesting that JPA NEO is the best-ever SNP array developed for the Japanese population. The compatibility of markers in JPAv2 and JPA NEO is 40% (data not shown). Therefore, it seems important to develop a method for utilizing the genotyping data obtained by different JPA array platforms, which we plan to provide as a user guideline when the full data of all 150,000 TMM participants are released.
JPA NEO incorporates nearly 30,000 diseaserelated variants previously reported in the literature and stored in databases, to allow for the evaluation of known functional risk alleles in the Japanese population. Because some SNPs with MAF < 1% were included, their SNP cluster plots and the concordance with genotypes obtained by WGS analysis must be carefully assessed. However, qualified disease risk variants can be used for association studies along with phenotype data.
The JPAs have been used to perform large-scale genotyping of TMM samples. Whereas we have not experienced any issues with plate QC assessments  conducted so far, we are planning to carefully implement batch-based as well as statistical genetic QC analyses to assess whether a plate effect is caused by sample selection bias. Indeed, in the UK Biobank, a sample picking algorithm has been used for genotyping experiments to prevent clustering of participants in the same plate by time or date of collection, collection centre, geography or participant phenotypes (33).
In contrast, we did not intentionally randomize sample picking; we selected samples according to the aim of our analysis. For example, TMM CommCohort samples with respiratory function data for GWAS were selected and analysed using the same plates. Therefore, each plate should include samples collected from the same periods, regions and families. Among the 130,000 genotyping data of TMM participants that we have processed so far, samples satisfying the criteria of sample DQC and QC call rate are quite high, especially when using blood samples (>99.8%). However, the pass rate of saliva samples was slightly worse (>89.8%) than that of blood samples in JPAv2. The use of saliva has been reported yield a low rate in other large-scale genotyping projects, for instance, 93.8% in the Genetic Epidemiology Research on Adult Health and Aging cohort (34). This may be due to the lower quality of saliva-derived samples, which is sometimes observed by electrophoresis as DNA degradation; this is likely due to problems during sample collection by participants, such as when they mix the sample with Oragene preservative solution. We are sharing the direct and imputed genotyping data with the research community upon completion of the QC analyses and genotype imputation. More than 54,000 JPA data have already been released as of June 2020, with associated data such as biochemical examinations and questionnaires. The full genotype data of the TMM project is expected to be released soon.
Data obtained by the genotype imputation array have been successfully utilized for GWAS. Summary statistics of large-scale GWAS are precious for the development of genetic risk scores, such as the polygenic risk score (PRS) (35). PRSs will be used to identify groups of individuals for therapeutic intervention, initiation and interpretation of disease screens and life planning (36). So far, the number and scale of GWAS in the European population greatly exceed those in non-European populations (37). However, the application of PRSs based on European cohorts to other populations is limited due to biases originating from the genomic diversity among populations, for instance, the difference in LD structure around causal variants. Further studies are required to develop PRSs for the Japanese population, and to evaluate their clinical utility used together with conventional clinical risk scores as pointed out recently (38,39).
We believe that our future efforts should be focussed on acquiring genotype data from all participants of the TMM cohorts as well as implementing a GWAS to develop and evaluate genetic risk scores, including PRSs, optimized for the Japanese population. Genomic data obtained by the TMM project will serve as an excellent control for the GWAS executed using other biobanks/cohorts in Japan, and it will also be exploited for GWAS of associated phenotypes and omics data from the TMM project. We also believe that the JPA should continue to be updated. For the next version, we are planning to design a medical checkup array with a minimal set of tag SNPs that nevertheless contains abundant risk SNPs. These efforts will also contribute to further identifying genetic determinants of diseases in those of East Asian ancestry (32).
In conclusion, we designed a new version of the JPA, JPA NEO, to improve both genome-wide coverage and genotyping of disease risk variants. Disease risk variants were selected from the literature and filtered by our reference panel to extract those expected to be present in the Japanese population. Experimental verification using the developed JPA NEO showed greater imputation performance without any bias through a wide range of MAF and with increased imputed variants compared with the previous version. Large-scale genotyping of TMM samples using JPA NEO is now underway. JPA NEO will provide highly accurate, efficient and cost-effective genotyping for the Japanese population. Combining the JPA data of TMM participants with those of other Japanese biobanks/cohorts will be helpful for better understanding the genetic risks of complex diseases, leading to its application for disease risk prediction and prevention and consequently personalized healthcare.

Supplementary Data
Supplementary Data are available at JB Online.