A pan-cancer atlas of somatic mutations in miRNA biogenesis genes

Abstract It is a well-known and intensively studied phenomenon that the levels of many miRNAs are differentiated in cancer. miRNA biogenesis and functional expression are complex processes orchestrated by many proteins cumulatively called miRNA biogenesis proteins. To characterize cancer somatic mutations in the miRNA biogenesis genes and investigate their potential impact on the levels of miRNAs, we analyzed whole-exome sequencing datasets of over 10 000 cancer/normal sample pairs deposited within the TCGA repository. We identified and characterized over 3600 somatic mutations in 29 miRNA biogenesis genes and showed that some of the genes are overmutated in specific cancers and/or have recurrent hotspot mutations (e.g. SMAD4 in PAAD, COAD and READ; DICER1 in UCEC; PRKRA in OV and LIN28B in SKCM). We identified a list of miRNAs whose level is affected by particular types of mutations in either SMAD4, SMAD2 or DICER1 and showed that hotspot mutations in the RNase domains in DICER1 not only decrease the level of 5p-miRNAs but also increase the level of 3p-miRNAs, including many well-known cancer-related miRNAs. We also showed an association of the mutations with patient survival. Eventually, we created an atlas/compendium of miRNA biogenesis alterations providing a useful resource for different aspects of biomedical research.


INTRODUCTION
Since the first reports of microRNA (miRNA) contributions to B-cell chronic lymphocytic leukemia (1), we have observed a substantial increase in reports describing the role of these small regulatory RNA molecules in different human diseases, including cancers (summarized in (2)). It was suggested that miRNAs are globally downregulated in cancer (3) and that upregulation or downregulation of certain miRNAs acting either as oncogenes or tumor suppressors may contribute to cancer development and progression (4,5). Numerous miRNA profiling studies have led to the identification of many miRNAs specifically altered in different types or subtypes of cancer. Many of these miRNAs play an important role in carcinogenesis and the regulation of different cancer-related processes, such as cell growth and differentiation, cell migration, apoptosis, and epithelial-tomesenchymal transition (MET). Additionally, many miR-NAs have been implicated as diagnostic and prognostic biomarkers and/or as potential therapeutic targets in cancer [e.g. (6)(7)(8)(9)]. miRNAs are generated through a multistage process of miRNA biogenesis tightly controlled by various proteins consecutively nursing primary miRNA transcripts (pri-miRNAs) from their transcription to their cellular function within the miRNA-induced silencing complex (miRISC) (10)(11)(12)(13). The major steps of canonical miRNA biogenesis include nuclear pri-miRNA processing by the microprocessor complex, whose core is formed by RNase DROSHA acting together with DGCR8 dimer and several other regulatory proteins, including P68 (DDX5) and P72 (DDX17), to release from the pri-miRNA hairpin-shaped secondary precursor (pre-miRNA). Next, pre-miRNA is exported to the cytoplasm by the Exportin-5(XPO5):Ran-GTP(RAN) complex, where it is intercepted by the multiprotein miRISC loading complex (RLC) containing the RNase DICER1, which cuts off the pre-miRNA apical loop to release an ∼22-bp-long miRNA duplex in assistance of partner proteins such as PRKRA (PACT) or TARBP1 (TRBP). Within the miRISC, the miRNA duplex is unwound (supported by, i.a., GEMIN4 and MOV10) to select a miRNA guide strand (mature miRNA) that recognizes mRNA targets by complementary interaction and silences them with the as-sistance of AGO and TNRC6A (GW182) proteins by translation repression and/or RNA deadenylation and degradation. Each step of this process may be further regulated by additional mechanisms/proteins that either increase or decrease the miRNA biogenesis rate (11,14,15). For example, LIN28A/B binds to the apical loop of specific pre-miRNAs, including pre-let-7, and upon its uridylation by ZCCHC11 or ZCCHC6 TUTases leads to pre-miRNA degradation by DIS3L2 exonuclease (16,17). It should also be noted that there are alternative pathways of miRNA biogenesis, such as the generation of miRNAs (mirtrons) from specific short introns (DROSHA-independent) (18) or DICER1independent processing of miR-451a by AGO2 (19,20).
There are currently >2600 human miRNAs deposited in miRBase (21,22). It is speculated that miRNAs regulate the expression of most protein-coding genes (23,24). miRNA levels and consequently levels of controlled genes may be affected by various factors and processes. First, the expression of miRNA genes, such as the expression of protein-coding genes, is regulated by various transcription factors, such as MYC, TP53 or SMAD4 (25)(26)(27)(28). It was shown that many miRNA genes are located in copy number-variable (CNV) regions and are frequently amplified or deleted in cancer (29)(30)(31). miRNA genes may also be affected by aberrant DNA methylation and histone acetylation, leading to the silencing of the miRNA genes (32)(33)(34)(35)(36)(37)(38). Additionally, methylation of miRNA precursors may facilitate their processing (39) as well as impair the ability of miRNAs to downregulate their targets (40). It was also shown that the occurrence of single nucleotide polymorphisms (SNPs) (41)(42)(43)(44) and germline or somatic mutations (45)(46)(47)(48)(49) may affect miRNA processing (level) and the ability of miRNAs to recognize their targets. Mature miRNAs may be captured and inactivated by cellular miRNA sponges such as lncRNAs, circR-NAs, or pseudogenes (50,51). Finally, the deficiency or impairment of the components of miRNA gene transcription activators, miRNA processing machinery and the miRISC complex (for simplicity, cumulatively called miRNA biogenesis genes/proteins) described in the previous paragraph may affect miRNA levels and the effectiveness of miRNA gene silencing, respectively (52,53).
A body of evidence has indicated that deleterious germline mutations in the DICER1 gene are responsible for DICER1 syndrome, an inherited disorder characterized by an increased frequency of various types of malignant and benign tumors that occur predominantly in infants and young children, the most common and most characteristic of which is pleuropulmonary blastoma (54,55). It was shown that in cancers associated with DICER1 syndrome as well as other early childhood cancers (e.g. Wilms' tumor), a specific pattern of somatic DICER1 second-hit missense mutations occurs. All these mutations are located in or adjacent to metal-ion-binding residues (hotspots; predominantly D1709 and E1813) of the RNase IIIb domain (RIIIb) (54). Later, in similar types of childhood cancers, a similar pattern of somatic mutations was also identified in the corresponding residues (E1147, D1151) of the RIIIb in DROSHA (56)(57)(58)(59). Functional analyses revealed that the mutations in DICER1 lead to less effective generation of 5p-miRNAs (60)(61)(62), whereas mutations in DROSHA affect the generation of miRNAs from both pre-miRNA arms (56), as reviewed in (54,63,64). Very recently, it was also shown that the recurring mutation in the RNase IIIa domain (RIIIa) of DICER1 occurring predominantly in uterine carcinoma may cause the same effect as the mutations in RIIIb (65). Interestingly, the occurrence of DROSHA mutations in Wilms' tumor coincides with the occurrence of mutations in SIX1 and SIX2, transcription factor genes that are also frequently mutated in the tumor (66). Another hotspot mutation commonly occurring in Wilms' tumor is E518K in the double-stranded RNA-binding domain (dsRBD) of DGCR8 (57,58,66). It was also shown that in cancers with a high rate of microsatellite instability (MSI), such as colon, gastric, and endometrial tumors, specific indel hotspots occur in TRBP and C-terminal positions of XPO5 (67,68); however, these mutations were not further analyzed in other studies. Knowledge of the germline and somatic variation in miRNA biogenesis genes is summarized in (64). Additionally, SMAD4, encoding the SMAD4 transcription factor activating many genes in response to transforming growth factor beta (TGFB)/bone morphogenetic protein (BMP) signaling (69,70), is a well-known tumor suppressor gene that is highly mutated in many cancers, including pancreatic and colorectal cancers (71). Although SMAD4 was also implicated in the transcription of miRNA genes (25,(72)(73)(74), the effect of SMAD4 mutations has never been tested in the context of the activation of miRNA genes. Furthermore, it was shown that some SNPs in miRNA biogenesis genes are associated with the risk of various cancers. Examples include (i) the rs3742330 (A>C) SNP located in the 3 UTR of DICER1 that affects DICER1 mRNA stability and is associated with susceptibility and malignancy in gastric cancer (75,76), increased survival of T-cell lymphoma patients (77) and lower prostate cancer aggressiveness (78); (ii) the rs78393591 SNP in DROSHA and rs114101502 SNP in ZCCHC11 (TUTase responsible for pre-miRNA uridylation and subsequent DICER1 cleavage inhibition) associated with the risk of breast cancer (79) and (iii) the rs11786030 and rs2292779 SNPs in AGO2, rs9606250 SNP in DGCR8, and rs1057035 SNP in DICER1 associated with the survival of breast cancer patients (80). Additionally, the SNPs rs2740348 C>G and rs7813 C>T in GEMIN4, a gene involved in miRISC formation and miRNA-duplex unwinding, were implicated in the risk of several cancers, although the results were not conclusive (81)(82)(83)(84)(85).
In this study, we took advantage of the data generated within The Cancer Genome Atlas (TCGA) project to analyze the somatic mutations in miRNA biogenesis genes. As a result, in a wide panel of 33 cancer types consisting of over 10 000 samples, we identified hundreds of mutations and many recurrently mutated hotspot positions and showed that some of the genes are specifically overmutated in particular cancer types. We also confirmed the common occurrence of deleterious mutations in SMAD4 and further characterized the specific hotspot mutations in SMAD4, SMAD2 and DICER1, the last group of which were previously reported mostly in childhood cancers. We followed up on the consequences of some of the mutations and showed characteristic changes in miRNA profiles resulting from specific mutation types in DICER1, SMAD4 and SMAD2. We also showed the associations of the mutations with can-Nucleic Acids Research, 2021, Vol. 49, No. 2 603 cer characteristics and patient survival. Additionally, the specific hotspot mutations in DROSHA and DGCR8 commonly observed in Wilms' tumor and other childhood cancers were absent in adult cancers.

Data resources
We used molecular and clinical data (Level 2) for 33 cancer types generated and deposited in the TCGA repository (http://cancergenome.nih.gov). These data included the results of somatic mutation calls in whole-exome sequencing (WES) datasets of 10 369 samples (later limited to 10 255) analyzed against matched normal (noncancer) samples with the use of the standard TCGA pipeline. Hypermutated samples were defined as samples with >10 000 mutations in the whole exome. As in general, the TCGA datasets include only one sample from each cancer specimen, in the analysis, we did not consider cancer stromal heterogeneity. Copy number data were obtained via Xena UCSC as a 'gene-level copy number (gistic2 thresholded)' dataset of the TCGA Pan-Cancer (PANCAN) cohort. The crystal structure of the phosphorylated SMAD2/SMAD4 heterotrimeric complex (PDB code: 1U7V) (86) was visualized with the use of PyMOL (Schrödinger, LLC, New York, NY, USA).

Data processing
We analyzed somatic mutations in 29 miRNA biogenesis genes (coding exons were extended by 2 nt on each side to enable identification of definitive splicing mutations). The genomic coordinates of the analyzed genes/regions are shown in Supplementary Table S1. From the WES data generated with the use of four different algorithms (MuSE, MuTect2, VarScan2 and SomaticSniper), we extracted somatic mutation calls with PASS annotation. The extraction was performed as described in (45) with a set of in-house Python scripts available at (https://github.com/martynaut/ mirnaome somatic mutations). Briefly, the lists of somatic mutations detected by different algorithms were merged such that variants detected by more than one algorithm were not multiplicated. To further increase the reliability of the identified somatic mutations (and avoid the identification of uncertain mutations), we additionally removed those that did not fulfill the following criteria: (i) at least two alternative allele-supporting reads in a tumor sample (if no alternative allele-supporting read was detected in the corresponding normal sample); (ii) at least a 5× higher frequency of alternative allele-supporting reads in the tumor sample than in the corresponding normal sample; (iii) a somatic score parameter (SSC) > 30 (for VarScan2 and So-maticSniper) and (iv) a base quality (BQ) parameter for alternative allele-supporting reads in the tumor sample >20 (for MuSE and MuTect2). All mutations were designated according to HGVS nomenclature at the transcript and protein levels, and the effects of mutations were predicted using the Ensembl Variant Effect Predictor (VEP) tool (87). For visualization of mutations on the gene maps, we used ProteinPaint from St. Jude Children's Research Hospital -PeCan Data Portal (88). The protein domains visualized on gene maps were positioned according to UniProt (89)

miRNA expression analysis
We obtained miRNA expression data via Xena UCSC as batch-effects normalized data for TCGA Pan-Cancer data (note that due to normalization, some miRNA levels were below 0). Expression data from the set of ∼700 miRNAs (annotated in miRBase) were filtered to exclude miRNAs with undetectable signals (level = 0) in more than 30% of pan-cancer samples (or 10% when the analysis was performed for specific cancers). As high-confidence miRNAs, we considered those deposited in MirGeneDB v2.0 which were defined based on criteria that include careful annotation of the mature versus passenger miRNA strands and evaluation of evolutionary hierarchy (90,91). Next, to enable pan-cancer comparisons, we normalized the variation (range of −1 to 1) and median (median = 0) of miRNA levels to be equal in each cancer type. The normalized miRNA level changes (in pan-cancer) were calculated/expressed as differences, and the changes in raw (non-pan-cancer-normalized) miRNA levels (in individual cancers) were calculated/expressed as log 2 fold changes. For the differential analysis of miRNA levels, normalized against the level of miR-451a (see Results) in UCEC, we additionally used miRNA level data expressed only as reads per million miRNA mapped (RPM) values (non-batch effects normalized), downloaded from Fire-Browse (illuminahiseq mirnaseq-miR gene expression). The analysis of isomiRs was performed with the use of the miRNA isoform dataset (illuminahiseq mirnaseq-miR isoform expression) from FireBrowse.

Statistics
Unless stated otherwise, all statistical analyses were performed with statistical functions in the Python module scipy.stats. Particular statistical tests are indicated in the text, and if not stated otherwise, P < 0.05 was considered significant. If necessary, P-values were corrected for multiple tests. Mutation density was calculated as the number of detected mutations divided by the length of analyzed genes (total length of all coding exons). For patient survival analyses, we used a log-rank test (from the lifelines v0.24.8 library (Davidson-Pilon C et al., https://zenodo.org/record/ 3833188#.X8fbMs1KhPY)). To determine the direction of mutation effects on survival, we used Cox's proportional hazard model. Survival plots were created using Kaplan-MeierFitter from the lifelines library. A comparison of the occurrence of mutations in miRNA biogenesis genes among cancer stages (with correction for cancer type) was performed with the Cochran-Mantel-Haenszel test for pancancer and Fisher's exact test for specific cancers.

Distribution of somatic mutations in miRNA biogenesis genes across different cancer types
For analysis, we selected 29 miRNA biogenesis genes encoding proteins playing roles in (i) the transcription of primary miRNA precursors (pri-miRNAs), (ii) pri-miRNA to pre-miRNA processing in the nucleus, (iii) the export of pre-miRNA from the nucleus to the cytoplasm, (iv) pre-miRNA Table 1. List and characteristics of the selected miRNA biogenesis genes Gene/protein ID (alias) miRNA-related function

SMAD4
Activates miRNA precursor transcription upon TGFB/BMP activation (74,132,136) FUS Facilitates cotranscriptional DROSHA recruitment to pri-miRNAs; shown to bind a terminal loop of specific neuronal pri-miRNAs and to enhance their processing (141) SRSF3 (SRP20) Enhances mammalian pri-miRNA processing upon binding to the CNNC motif in the 3p flanking sequence (142) DROSHA RNase III; catalyzes pri-miRNA to pre-miRNA for processing/cleavage (11,143,144) DGCR8 Cofactor of DROSHA; coordinates the recognition of pri-miRNA at the dsRNA-ssRNA junction; functions as a molecular anchor and direct DROSHA to cleave pri-miRNA ∼11 bp before the junction (11,145) DDX5 (P68) Plays a role in recognition/binding of pri-miRNAs by the DROSHA complex; recruits DROSHA/DGCR8 to pri-miRNA (146,147) DDX17 (P72) Plays a role in recognition/binding of pri-miRNAs by the DROSHA complex (146,148)

GSK3B
Facilitates pri-miRNA binding by DROSHA and enhances DROSHA association with cofactors DGCR8 and P72 (149); DROSHA phosphorylation/stabilization (150) SMAD2 Accelerates pri-miRNA processing by DROSHA; in complex with SMAD4 activates miRNA precursor transcription upon TGFB/BMP signaling (72,136,137) Export to cytoplasm XPO5 (EXP5) Plays a role in the nuclear export of pre-miRNA to the cytoplasm (151,152); facilitates the nuclear cleavage of clustered pri-miRNAs (153) RAN Interacts with XPO5; plays a role in the export of pre-miRNA to the cytoplasm (151) Cytoplasmic steps of miRNA biogenesis DICER1 (DICER) RNase III; catalyzes pre-miRNA to miRNA-duplex processing by cutting off the pre-miRNA terminal loop (11,(154)(155)(156) TARBP2 (TRBP) Coordinates pre-miRNA recognition by DICER1 and the precision of DICER1 cleavage (157,158) PRKRA (PACT) Coordinates pre-miRNA cleavage by DICER1 and assures the precision of the cleavage; plays a role in miRISC assembly and thus participates in miRNA stabilization and accumulation in the cell (158,159) ADAR Double-stranded RNA-specific adenosine deaminase; plays a role in pri-and pre-miRNA stem editing (A to I), which makes miRNA precursors resistant to DICER1 cleavage (160) KHSRP (FUBP2, KSRP) Binds to the terminal loop sequence of a subset of miRNA precursors, promoting their maturation (161) LIN28A and LIN28B Bind to the terminal loops of specific pre-miRNAs (including pre-let-7 and pre-miR-9) and, upon recruitment of ZCCHC11 or ZCCHC6 that induces pre-miRNA uridylation, inhibit DICER1 processing (17,162,163) ZCCHC11 (TUT4) and ZCCHC6 (TUT7) Play a role in pre-miRNA uridylation and thus inhibit DICER1 processing (17,164)

GEMIN4
Binds to the miRNA guide strand and facilitates the formation of a miRISC by unwinding the miRNA duplex (83,168) MOV10 Upon interaction with the miRNA-loaded AGO-protein complexes, plays a role in mRNA degradation; present in P-bodies (169,170)

FMR1
Interacts with DICER1 and AGO1 during mRNA degradation (171); controls DROSHA expression (172) TNRC6A (GW182) Component of P-bodies; plays a role in mRNA degradation upon interaction with Argonaute proteins (173,174) processing and miRNA maturation in the cytoplasm and (v) miRNA:target recognition/interaction and regulation of downstream silencing effects (Table 1 and Figure 1). To identify somatic mutations in the miRNA biogenesis genes, we took advantage of WES datasets of 10 369 paired tumor/normal samples generated within the TCGA project. The collected samples cover 33 different cancer types (analyzed together as a pan-cancer cohort) (Supplementary Table S2). The list of all cancer types (full names and abbreviations) is shown in Figure 2 (to avoid confusion, we will use the abbreviations for the TCGA sample sets but not generally for particular types of cancer; in the latter case, we will use full cancer-type names). Applying the rigorous criteria described in the Materials and Methods, we identified a total of 5483 mutations in the pan-cancer cohort. However, a substantial fraction (n = 1834, ∼30%) of the mutations were identified in a relatively small number (n = 114, ∼1%) of hypermutated samples. As shown in Supplementary Figure S1, the number of mutations in hypermutated samples strongly depends on the general burden of mutations in these samples, implying enrichment of random, most likely passenger mutations in the hypermutated samples. Therefore, to reduce the proportion of confounding mutations, we removed hypermutated samples from subsequent analyses. The removed, hypermutated samples originated mostly from SKCM, UCEC and COAD (Supplementary Figure S1).
After removing the hypermutated samples, we continued analysis on 10 255 samples with 3649 mutations, including 2196 (60%) missense mutations, 774 (21%) synonymous mutations and 625 (17%) definitive deleterious mutations, consisting of 341 frameshift, 222 nonsense and 62 splice-site mutations (Supplementary Table S3). Other types of mutations, such as start or stop codon mutations or complex mutations, were also present in small fractions. At least one mutation was detected in 2104 (21%) samples, including numerous samples with more than one mutation ( Figure 2A and B). The frequency of mutated samples differed substantially among cancer types, ranging from 45% (SKCM) to 0% (KICH) (Figure 2A, Supplementary Table S3), and roughly corresponded to the mutational burden in particular cancer types.

Frequency of somatic mutations in miRNA biogenesis genes across 33 cancers
As shown in Figure 2C, TNRC6A, DICER1 and SMAD4 are among the most highly mutated genes (∼3% of samples in pan-cancer), and the least mutated genes (∼0.3%) are LIN28A, RAN and SRSF3. Although there is some correlation between the frequency of mutations in the particular genes and the length of their protein-coding sequences (R 2 = 0.62), the overall mutation frequency cannot be simply explained by the length of the genes. Additionally, none of the genes are well-known highly mutated genes or are located in late-replicating regions known to be overmutated in cancer (92).
Although there is a general correlation of mutation frequency in particular genes between cancers, there are also apparent striking exceptions of genes specifically overmutated in particular cancers ( Figure 2C). This contrasts with observations that over-and undermutated regions generally overlap between different cancer types (92) and suggests the nonrandom occurrence and potentially functional nature of the overmutations. To identify overmutated genes, we statistically compared the frequency of mutations (overmutation) in particular genes (versus all other genes) in particular cancers with corresponding frequencies in pan-cancer ( Figure 2C, Supplementary Table S4). The most striking example of an overmutated gene is SMAD4 overmutated in PAAD (22% of samples), READ (16%), COAD (14%), STAD (9%) and ESCA (8%). Other interesting examples included AGO4, LIN28A and SMAD2 overmutated in BLCA (an otherwise extremely low-mutation cancer with no mutations in other genes); LIN28B overmutated in SKCM; PRKRA overmutated in OV; and DDX5 overmutated in BRCA and KIRP. There are also other genes with increased mutation frequency, e.g. TNRC6A in STAD and DICER1 in SKCM and UCEC (nominal P > 0.005), but these overmutations are only nominally significant. Consistent with the above findings, the overmutated genes are outliers in terms of the correlation of mutation frequencies in particular genes between particular cancer types and the remaining pan-cancer (Supplementary Figure S2).

Distribution of mutations in the miRNA biogenesis genes--identification of hotspot mutations
To illustrate the mutation distribution along the protein sequences, all the mutations were visualized in lollipop plots ( Figure 3 and Supplementary Figure S3). As shown in the plots, although most of the mutations are quite evenly distributed along the genes, there are also some hotspot regions/mutations suggesting the functional nature of these changes. The most striking example is a cluster of eight hotspots of recurrently mutated amino acid (AA) residues (i.e. D351, G352, D355, P356, R361, H382, G386 and D537) occurring in the MH2 domain of SMAD4 (Figure 3A). The most prominent hotspot position is R361, which by itself acquired 37 missense mutations, accounting for 23% of all SMAD4 missense mutations. There are also two recurrently mutated AA residues (i.e. P305 and R321) in the MH2 domain of SMAD2 ( Figure 3B). The hotspot mutations in the MH2 domains (both in SMAD4 and SMAD2) likely affect SMAD4:SMAD2 heterotrimer formation. Additionally, there are two recurrent proteintruncating nonsense mutations in the MH2 domain of SMAD2, i.e. p.S306Ter and p.S464Ter, the latter of which occurs 13 times is localized in the last exon of SMAD2 and likely truncates the last five AAs of the protein just before two phosphorylation sites (S465 and S467) critical for activation of SMAD2 upon TGFB/BMP signaling (93).
It is also worth noting the hotspot missense mutation, i.e. p.P127L in the DRBM 2 domain of PRKRA ( Figure  3D), occurring 13 times in OV, COAD, GBM and LUAD  Table 1. (6, 3, 3 and 1 mutations, respectively). As the DRBM 2 domain plays a role in interaction with other proteins (e.g. protein kinase R, PKR), the mutation may affect the interactions. Consistently with the frequent occurrence of the mutation in OV, the changes in the PRKRA level were recently linked with the resistance of mucinous ovarian cancer to the miR-515-3p dependent platinum-based (oxaliplatin) treatment (95). Although the mutation overlaps with the SNP (rs75862065), the fact that it occurs predominantly in OV, which is otherwise a moderately mutated cancer, combined with the overall overmutation in PRKRA, argues against the accidental occurrences of the mutation as artifacts of the mutation calling process. There are also two hotspot synonymous mutations in FUS (p.G222= and p.G227=) (Supplementary Figure S3). A role of such hotspots cannot be excluded (96); however, we did not investigate them further in this study. Similarly, recurring missense mutations in the other genes may be important for specific cancers; however, they occur much less frequently.
There are also some hotspot indel mutations, e.g., p.N58IfsTer8 in ZCCHC11, p.Y948MfsTer16 in ZCCHC6, p.K121SfsTer8 and p.G163EfsTer20 in DDX17, p.W804GfsTer99 and p.R1183GfsTer7 in TNRC6A, p.L17CfsTer99 in AGO1, and pA603RfsTer71 in AGO2 (Figure 3 and Supplementary Figure S3). However, it must be noted that the exact position of indel hotspot mutations may not be necessarily driven by cancer advantage but by sequence properties (e.g., the presence of short tandem repeat motifs). In fact, many indel mutations and some of the indel hotspots occur at sequence motifs often mutated as a result of MSI. Additionally, as most indels in coding sequences result in frameshifts and premature termination of translation, triggering nonsense-mediated mRNA decay (NMD) and leading to the complete loss of mRNA, the exact position of indel hotspots may not be that important.
In the next step, we looked for hotspot mutations previously observed in the miRNA biogenesis genes, i.e. (i) E969 and E993 in RIIIa and E1147, D1151, Q1187 and E1222 in RIIIb in DROSHA, (ii) p.E518K in the dsRDB1 domain in DGCR8 and (iii) p.R440Ter in XPO5 observed in different pediatric cancers, especially in Wilms' tumor (56,57,59). Of these mutations, we found p.E518K in DGCR8 occurring in two of 495 (0.4%) cases of THCA, p.R440Ter in XPO5 in one case of UCEC and one case of SKCM, and p.D1151E in DROSHA in one case of COAD (Supplementary Figure  S3). Of the indel hotspots in XPO5 and TARBP2 detected previously in colon, gastric, and endometrial tumors with MSI (67,68), we detected only two cases of the C insertion in the poly-C track (p.M145HfsTer13) in TARBP2: one in UCEC and one in STAD (both cancers often characterized by MSI).

Functional consequences of SMAD4 and SMAD2 mutations
From the visual investigation ( Figure 2C and Figure 3A), it is apparent that SMAD4 is the gene with the highest density of mutations (273 mutations, 160 mut/kbp) and the largest proportion of deleterious mutations (33%). As mentioned in the previous paragraph, there are also eight hotspots of missense mutations, seven of which are located in a relatively small region (D351 to G386) of the MH2 domain, playing a role in heterotrimer formation with other receptor-dependent SMAD proteins (R-SMADs, e.g. SMAD2) to mediate the TGFB/BMP transcriptional response (97,98). The collection of a relatively large number of mutations associated with different cancer types reveals that the proportion of hotspot and deleterious mutations [in pan-cancer, 71 (44%) versus 91 (56%), respectively] differs substantially between cancers ( Figure 4A) and is the highest in READ (86% versus 14%; P = 0.001) and the lowest in PAAD (27% versus 73%; P = 0.032). This may suggest a  different role of SMAD4 in different cancers and different effects of these two types of mutations. Most of the hotspots (five of eight hotspots, 83% of all hotspot mutations) coincide with charged (basic or acidic) AAs, affecting the electrostatic properties of MH2 important for interaction with R-SMADs (86). To visualize the location of the hotspot AA residues, we marked them on a crystal structure of the heterotrimeric SMAD4:(SMAD2) 2 complex (86). As shown in Figure 4B and Supplementary S4, all hotspot residues are located on the surfaces of the SMAD4:SMAD2 interaction, which is consistent with the notion that hotspot mutations prevent SMAD complex formation and thus avert transcription of SMAD-controlled genes in response to TGFB/BMP signaling. As in SMAD4, the SMAD2 hotspot residues localize at the SMAD4:SMAD2 interaction surfaces ( Figure 4B and Supplementary S4).
The copy number analysis of SMAD4 showed that SMAD4 deletions are significantly more frequent in sam-   Figure 4C). SMAD4 mutations were previously shown to affect the expression of many genes. Although it was proposed that the TGFB/SMAD pathway also controls the expression of miRNAs (25,27,28,73,99,100), the effect of the mutations on miRNA levels has never been tested. To do so at the pan-cancer scale, we took into account only 522 miRNAs whose level was >0 in at least 70% of tested TCGA samples. We normalized miRNA levels to make the median (equal to 0) and variance of these levels comparable between cancers. As shown in Figure 4D, there is an excess of downregulated miRNAs in samples with hotspot mutations (e.g. at the level of P-value < 0.05, 78% and 22% of miRNAs are downregulated and upregulated, respectively), which is consistent with the expected impairment of SMAD complex formation and transcription factor activity. A similar effect was observed when the analysis was performed only for R361, the most frequently mutated hotspot residue ( Figure 4D (Supplementary Table S5), there are many miRNAs with welldocumented cancer-related functions, for example, the recently discovered suppressormiRs miR-23a-3p (reviewed in (101)), miR-196b-5p (102,103), and miR-1247-5p (104). Noteworthy, such an effect of excessive miRNA downregulation is not visible for deleterious mutations ( Figure 4D and Supplementary Figure S5A). Among the most significantly downregulated or upregulated miRNAs in samples with deleterious mutations are let-7c-5p and miR-125b-5p or miR-215-5p, miR-21-5p and miR-210-5p, respectively, all well-known cancer-related miRNAs (e.g. (105)). The frequent observation of pairs of 5p and 3p-miRNAs (e.g. let-7c-5p and let-7c-3p) and groups of miRNAs generated from miRNA clusters (e.g. miR-99a/let-7c or miR-1/133a cluster) among the consistently altered miRNAs in samples with both hotspot and deleterious mutations indicates that at least some of the miRNA alterations result from aberrations at the transcriptional level. As shown in Figure 4E and F, the levels of let-7c-5p and miR-196-5p, which are examples of miRNAs most significantly altered in pan-cancer, often show similar alterations in the most commonly mutated cancers, i.e. PAAD, COAD and READ, although it must be noted that due to the very limited number of mutations, the results in individual cancers are not perfectly consis-tent and have to be interpreted with caution. Nonetheless, among the miRNAs downregulated by either hotspot or deleterious SMAD4 mutations are many miRNAs demonstrated before to be activated by TGFB/SMAD signaling. It includes miR-23a-3p, miR-27a and miR-24 constituting a miRNA cluster that is the first and most well-studied group of miRNAs regulated by SMAD4 (74), miR-181b-5p (106), 199a-3p (73) and miR-494-3p (107). To identify pathways/processes enriched in the genes regulated by the downregulated miRNAs, we performed KEGG pathway enrichment analysis with miRPath v3.0. The analysis showed that miRNAs downregulated both by the hotspot and deleterious SMAD4 mutations are associated (adjusted P < 0.01) with similar cancer-related processes, including 'Pathways in cancer', 'ErbB signaling pathway', 'Glioma', and 'Proteoglycans in cancer' (Supplementary Table S6). It is worth noting that among the enriched pathways is also the 'TGF-beta signaling pathway' regulated by 32 of the 49 (65%, P = 0.0002) and 20 of the 29 (69%; P = 0.00005) miR-NAs downregulated by the hotspot and deleterious mutations, respectively.
A comparison of SMAD4 mutations with clinical cancer data did not show an association of mutations with tumor staging but showed an association with decreased overall survival (OS) at the pan-cancer level (log-rank test: P = 0.049 and P = 0.0006 for hotspot and deleterious mutations, respectively) ( Figure 4G). As the analyses performed at the pan-cancer level may be affected by the unequal distribution of mutations among cancer types, we repeated the survival analysis with the most frequently mutated individual cancers. Although the analyses of individual cancer types were of very limited statistical power because of a low number of mutations, some cancer types also showed trends toward decreased survival of patients with the mutations, i.e. COAD and READ ( Figure 4H).
Although SMAD2, with a total of 101 mutations, including 35 deleterious mutations, is much less densely mutated (70 mut/kb) than SMAD4 ( Figure 3B), it also contains recurrently mutated hotspot AA residues in the MH2 domain. The hotspots include two AA residues, i.e. P305 and R321 mutated 3 and 4 times, respectively, and the nonsense mutation p.S464Ter truncating the protein by five AAs, which is the most common SMAD2 mutation (13 occurrences in cancers such as COAD, STAD, and BRCA). As in SMAD4, the SMAD2 hotspot residues localize at the SMAD4:SMAD2 interaction surfaces ( Figure 4B and Supplementary Figure S4).
We separately analyzed miRNA level profiles in samples with hotspot missense mutations (N = 7), the p.S464Ter  Table S5). A similar effect was also observed in the group of high-confidence miRNAs (Supplementary Figure  S5B). The excess of downregulations is not visible for the hotspot missense mutations; however, it must be noted that due to a very low number of hotspot mutations, the analysis of miRNA levels for this type of mutation is of very low statistical power. Nonetheless, we observed one miRNA, miR-329-3p, playing a role in different cancers (108)(109)(110)(111)), which was consequently downregulated in all three mutational groups. Other downregulated miRNAs playing a role in cancer include (i) miR-1247-5p and miR-1247-3p (in samples with p.S464Ter), recently identified as tumor suppressor miRNAs (104), and observed as highly expressed in embryonic/placenta cells which may indicate their role in quickly dividing cells (112); (ii) miR-7-5p (in samples with deleterious mutations), a well-recognized suppressormiR, playing a role in downregulation of the growth, metastasis, and prognosis of various tumors (reviewed in (113)); and (iii) miR-380-5p (in samples with hotspot missense mutations), downregulating TP53 to control cellular survival (114). As shown in Figure 5B and C, the examples of miR-329-3p and miR-380-5p (both members of the MIR-154 family playing a role in pulmonary fibrosis and being a target of TGFB signaling (115)) illustrate that miRNA level changes identified in the pan-cancer analysis are also reflected in the most frequently mutated cancers, i.e. COAD and READ. SMAD2, similar to other R-SMADs, may also posttranscriptionally increase the level of a specific group of miRNAs, facilitating the processing of their pri-miRNAs by DROSHA (72). Among the 44 miRNAs shown to be upregulated by R-SMADs or containing specific R-SMADinteracting sequence motifs (23 of them tested in this study), miR-421 (a member of the MIR-95 family) (P = 0.01), miR-188-5p (P = 0.03), and miR-877-5p (P = 0.04) were downregulated in samples with the p.S464Ter mutation. A comparison of the SMAD2 mutations with clinical characteristics of cancers did not reveal an association of mutations with cancer stages, but p.S464Ter showed a trend toward decreased OS of patients with the mutations ( Figure 5D) in pan-cancer and in the individual cancers with >1 p.S464Ter occurrence in informative samples.
As shown in Figure 5E, although there is some overlap between miRNAs altered by the hotspot missense mutations in SMAD2 and SMAD4, generally the overlaps between the effects of different mutation types are small. The small overlap, may mainly result from the low statistical power of the analysis and the fact that different mutations predominate in different cancer types (with different miRNA profiles), but it may also result from various involvement of the proteins in distinct regulatory processes, i.e. (i) SMAD4 plays a role in the activation of miRNA transcription and in this process it may interact with different R-SMADs (not only SMAD2; different mutations may differentially affect interactions with different R-SMADs) whereas (ii) SMAD2 (but not SMAD4) plays a role in post-transcriptional regulation of miRNA precursors. Additionally, secondary regulatory processes may also play a role.

Functional consequences of DICER1 mutations
Another highly mutated gene with several characteristic hotspots is DICER1, with a total of 272 mutations (46 mut/kb), including 38 deleterious mutations and 27 missense mutations located in the aforementioned hotspots in the RIIIa (N = 6) and RIIIb (N = 21) domains ( Figure 3C). As shown in Figure 6A, the occurrence of DICER1 mutations does not correlate with gene copy number alterations, and only a small fraction of the DICER1 mutations coincide with gene deletion or amplification. The miRNA expression analysis at the pan-cancer level showed that the hotspot mutations in both the RIIIa and RIIIb domains were associated with the global downregulation of 5p-miRNAs with simultaneous upregulation of 3p-miRNAs ( Figure 6B). This effect of 5p-miRNA downregulation was observed before and was explained by the fact that the hotspot mutations in RIIIb but also in RIIIa (65) affect the activity of the RIIIb domain, preventing cleavage of miRNA precursors at the 5p arm and the release of 5p-miRNAs. The increase in 3p-miRNAs was considered an artifact, i.e. an apparent effect counterbalancing the global decrease in 5-miRNAs, resulting from the standard miRNA level normalization procedure that standardized the amount of each miRNA against the total number of miRNA reads. Surprisingly, we observed a similar effect of decreased 5p-miRNAs and increased 3p-miRNAs in samples with deleterious mutations that do not affect RIIIb but are assumed to lead to complete loss of DICER1 (Figure 6 B and C; for comparison, please see the effect of SMAD4 mutations in which 5p-and 3p-miRNAs are more or less equally distributed between the decreased and increased miRNAs). The effect of the asymmetrical distribution of altered miRNAs was not observed for other nonhotspot missense mutations outside the RNase domains and synonymous mutations for which, as expected, miRNA level changes were very low ( Figure 6C). To avoid potential biases associated with pan-cancer normalization, we performed expression analysis separately for UCEC, which is the cancer type with the most frequent mutations in DICER1 (∼9%), with 2 mutations in RIIIa, 10 mutations in RIIIb and 9 deleterious mutations. Although of lower statistical power, the UCEC analysis showed a similar effect of the hotspot and deleterious mutations on globally decreased levels of 5p-miRNAs and increased levels of 3p-miRNAs ( Figure 6B). To directly check whether the increase in 3p-miRNAs in samples with the DICER1 mutations is an effect of normalization against the total number of miRNA-specific reads, we normalized the miRNA levels against the level of miR-451a used as a reference gene. miR-451a is a miRNA whose biogenesis is not dependent on DICER1 processing (116); therefore, its level should not be affected by DICER1 mutations. As shown in Figure 6C, normalization against miR-451a in samples with hotspot mutations abolished the effects of neither 5p-miRNA decreases nor 3p-miRNA increases. A consistent effect was observed when the analysis was performed on RPM (nonbatch-effects normalized) miRNA levels and when the analysis was narrowed down to the high-confidence miRNAs    Table S5 and Supplementary Figure S5C and D, respectively). This indicates that hotspot mutations, as reported previously (60)(61)(62)65,94), decrease 5p-miRNA levels, but contrary to previous reports, they also increase the levels of numerous 3p-miRNAs. KEGG pathway enrichment analysis (Supplementary Table S6) showed that upregulated 3p-miRNAs, among others, are strongly associated with various cancer-related processes, such as 'TGFbeta signaling pathway', 'Pathways in cancer', 'ErbB signaling pathway' and 'Hippo signaling pathway', which were found among the top ten most significant associations (adjusted P <0.00001). Consistent with the predicted loss-offunction effect of the deleterious mutations, after normalization against miR-451a, the deleterious mutations are associated almost exclusively with a decrease in miRNA levels ( Figure 6C). Although an excess of 5p-miRNAs is observed among the decreased miRNAs, there is also a substantial fraction (n = 52, 26%) of 3p-miRNAs.
Finally, we checked whether the DICER1 hotspot mutations affect the 5 end heterogeneity of 3p-miRNAs and proportions of generated isomiRs. The heterogeneity of many miRNAs has been documented before (117)(118)(119)(120) and it was shown that the proportion of isomiRs may differ substantially in different conditions, including cancer (117,121). To reliably estimate fractions of isomiRs, for analysis we took only 3p-miRNAs (n = 132) undergoing high expression (≥100 RPM) in UCEC. For each of the miRNAs, we compared the level of their isoforms with the most commonly generated 5 end (main miRNAs) versus level of minor (either upstream or downstream) isomiRs. As shown in Figure  6D the mutations do not affect the global level of main miR-NAs, which fraction among the analyzed miRNAs ranged from ∼0.6 to 1 with a median in all cases >0.95. However, in some of the individual miRNAs, the level of major miR-NAs is significantly altered (mostly decreased) in samples with the DICER1 hotspot mutations (Supplementary Table  S8 and Figure 6D (right panel)). Although there is striking concordance between the effects of the mutations in RIIIa and RIIIb (Supplementary Table S8) the overall changes in fractions of the main miRNAs are very small. It is unlikely that such small changes in fractions of main miRNAs may cause any meaningful biological effect, however that may result in more substantial changes in some of the minor isomiRs ( Figure 6D (right panel)).
The lists of miRNAs differentiated in pan-cancer and UCEC with different types of DICER1 mutations are shown in Supplementary Table S5, it includes a substantial number of the high-confidence miRNAs. Among the altered miRNAs are many miRNAs whose function is well recognized in cancer and many upregulated 3p-miRNAs are passengers of such miRNAs, examples of which are listed in Supplementary Table S7. A comparison of the DICER1 mutations with clinical characteristics of cancers did not reveal any significant associations.

DISCUSSION
In this study, we provide a comprehensive pan-cancer analysis of somatic mutagenesis in a panel of 29 miRNA biogenesis genes in 33 adult-onset cancers. In total, we identified 3,649 mutations, composed of 60% missense and 17% deleterious mutations. Overall, ∼21% of the samples had at least one mutation, but the percentage differed substantially between the cancer types, reaching >40% in SKCM or COAD. The most frequently mutated genes were TNRC6A, DICER1, SMAD4 and ZCCHC11, with mutations in 2-3% of pan-cancer samples. The frequency of mutations in the miRNA biogenesis genes is similar to that in cancer-specific drivers, such as MYC, ALK, CACNA1A, POLE, BCL2, NOTCH2, MET, HRAS, FGFR and PIK3R1, with pancancer mutation frequencies up to ∼3% (122). Consistent with the potential cancer-specific role of the genes, some of them are much more frequently mutated in particular cancers, for example, SMAD4 in PAAD (22%), READ (16%), COAD (14%), and ESCA (8%); DICER1 in SKCM and UCEC (∼9%); TNRC6A in STAD (9%) and UCEC (7%); ZCCHC6 in ACC (6.5%); SMAD2 in COAD and READ (∼5%); and PRKRA in OV (4.3%). The frequencies of mutations in some of the genes are specifically and significantly enriched in particular cancer types.
The high frequency of mutations in SMAD4 is consistent with the findings of many previous studies in which SMAD4 was analyzed as a transcription factor playing a role in the activation of many cancer-related genes in response to TGFB/BMP signaling (123)(124)(125)(126). Comprehensive analysis of a large number of samples allowed us to define a profile of SMAD4 mutations in most common human cancers covered by TCGA. It confirmed the high frequency of SMAD4 mutations in cancers such as PAAD, COAD and READ but also revealed a relatively high frequency of the mutations in STAD, LUAD and ESCA and lower frequencies in many other cancers (e.g. CESC, LUSC, HNC and CHOL) that are much less studied in the context of SMAD4 deficiency. With the large collection of mutations, we have identified 8 hotspot residues in the MH2 domain, including some not reported before, and revealed that the proportions of hotspot and deleterious SMAD4 mutations differ substantially between cancers, with the fraction of hotspot mutations significantly enriched in READ (86%) and the lowest in PAAD (27%). This may suggest different mechanisms and effects of SMAD4 inactivation in different cancer types. We showed that most of the hotspots localize in residues of charged AAs, which may affect the electrostatic interactions of the MH2 domain with corresponding domains in R-SMADs (e.g. SMAD2) critical for SMAD complex formation (86). Such an effect of both lossof-charge and gain-of-charge SMAD4 mutations was computationally predicted in a previous study (127). Consistently, we showed that all the SMAD4 hotspots as well as much less frequently mutated hotspot residues in the corresponding MH2 domain in SMAD2 localize at surfaces of the SMAD4:SMAD2 interaction, confirming the role of hotspot mutations in destabilizing the SMAD complex and hence revealing an inhibitory role of these mutations in downstream TGFB/BMP signaling. We showed that both the deleterious and hotspot SMAD4 mutations frequently (∼80%) coincide with gene deletions, which indicates inactivation of the second allele and is consistent with previous observations of frequent LOH in the region (128)(129)(130)(131). Although it was shown that SMAD4 plays a role in the activation of miRNA gene transcription (summarized in (132)), the effect of SMAD4 mutations on miRNA expression has never been tested. We identified numerous miRNAs (predominantly high-confidence miRNAs) differentially expressed in samples with SMAD4 mutations and showed different patterns of miRNA level changes induced by hotspot and deleterious mutations. While hotspot mutations predominantly downregulate miRNAs, deleterious mutations both increase and decrease miRNA levels. This finding is consistent with the results obtained for SMAD4 knockdown by RNAi in hepatic stellate cells (73) and again shows a different effect of deleterious mutations that, as a result of NMD, most likely leads to complete loss of SMAD4 and hotspot mutations that modify SMAD4 structure, affecting its interactions with R-SMADs and other proteins. Therefore, the effect of hotspot mutations may result not only from impeded R-SMADs:SMAD4 complex formation but also from a shifted balance in the binding of competing coactivators and corepressors (such as p300/CBP and Ski or SnoN, respectively), whose contribution determines the outcome of signaling events (reviewed in (125,133)).
Similar effects of predominant downregulation of miRNA levels were observed for p.S464Ter, the most frequent SMAD2 hotspot mutation, as well as for deleterious SMAD2 mutations. The p.S464Ter mutation is localized in the last exon of SMAD2 and truncates the last 5 AAs of the protein but most likely does not activate NMD. As the truncated fragment is important for efficient SMAD2 phosphorylation and includes two serine residues (S465 and S467) whose phosphorylation upon TGFB signaling is critical for SMAD4(SMAD2) 2 complex formation (93,134,135), the mutation precludes the SMAD complex formation and TGFB/BMP signaling (123). We showed that the mutation is associated with decreased survival of cancer patients. Of note, two BRCA patients with the mutations showed a strikingly short OS. Regardless of its role in complexes with SMAD4, SMAD2 (along with other R-SMADs) may directly interact with a specific set of miRNA precursors, accelerating their processing by DROSHA (72,136). This posttranscriptional regulation of miRNA processing may also be affected by SMAD2 mutations, as the MH2 domain of R-SMADs was shown to interact with the P68 RNA helicase participating in the recruitment of DROSHA and DGCR8 to pri-miRNAs (137).
Another gene with characteristic hotspot mutations is DICER1. We found 21 hotspot mutations affecting 4 metalion-binding AA residues in the RIIIb domain (E1705, D1709, D1810 and E1813) and 6 hotspot mutations in one AA residue (S1344) in RIIIa. These hotspots were previously detected and functionally characterized in various pediatric cancers, including cancers associated with DICER1 syndrome (e.g. pleuropulmonary blastoma) and Wilms' tumor, as summarized in (54,64). More recently, they were also investigated in thyroid adenomas (94) and the TCGA cohort, mostly in UCEC samples (65). It was shown that RIIIb hotspot mutations affect the RIIIb cleavage of the 5p-arm of pre-miRNAs, resulting in inefficient production of 5p-miRNAs (60)(61)(62)94). In a very recent study, it was also shown that the hotspot mutations of the S1344 residue, although located in RIIIa, spatially interfere with RIIIb, resulting in the same effect as that observed for the mutations in RIIIb (65). The miRNA profiling performed in our study showed a global decrease in 5p-miRNAs and a global increase in 3p-miRNAs in samples with both hotspot and deleterious DICER1 mutations. A similar effect has been observed before (61,62,65,94,138,139). The global decrease in 5p-miRNAs has been interpreted as the result of inefficient 5-miRNA processing, but the increase in 3p-miRNAs was considered an artifact of miRNA level normalization (against the total number of miRNA reads) that, as a reflection of the global 5p-miRNA deficit, resulted in a relative increase in unaffected 3p-miRNAs (65). To eliminate this potential bias, we normalized the miRNA levels against the level of miR-451a, which is a DICER1-independent miRNA whose level is not affected by DICER1 deficiency (116). However, normalization against the miR-451a level did not eliminate the asymmetrical effect of the hotspot mutations also visible in the high-confidence miRNAs, showing that both the 5p-miRNA decreases and 3p-miRNA increases are real. We have additionally shown that the increased 3p-miRNAs strongly associate with cancer-related terms/pathways and include many miRNAs well recognized in cancer (Supplementary Table S6). The increase in 3p-miRNAs may result from a lack of competition with 5p-miRNAs during transfer to and loading onto the RISC, observed as 5p/3p arm shifting or switching (140). However, this would require some alternative mechanism of releasing 3p-miRNAs from partially processed (nicked only at 3p-arms) pre-miRNAs and transferring them to the RISC, bypassing the miRNA duplex stage. Although it was previously speculated that AGO2 may play some role in such a process (19,20), in our opinion, this step warrants further investigation. Unlike the hotspot mutations, the deleterious mutations (after normalization against miR-451a) almost exclusively decrease miRNA levels, which confirms their loss-of-function nature. Although the effect of deleterious mutations is more profound for 5p-miRNAs, the mutations also affect 3p-miRNAs (∼30%).
With exception of hotspot mutations found in DICER1, we found only a few mutations in other miRNA biogenesis genes, i.e. DROSHA, DGCR8 and XPO5, which have been recently observed in different childhood cancers associated with DICER1 syndrome and in Wilms' tumor (56)(57)(58)(59)66). This may indicate specific functions of the mutations/genes characteristic of childhood cancers but not playing a role in adult-onset cancers. We also excluded the frequent occurrence of specific indel hotspots in TRBP and XPO5 previously reported at high frequencies in cancers associated with MSI (67,68).
In summary, in this study, we present a comprehensive pan-cancer analysis of somatic mutations accumulated in genes involved in miRNA biogenesis and function. We showed that some of these genes are specifically mutated in particular cancers. We identified many hotspot mutations, including some recurring in specific cancer types. We extended knowledge about the types and distribution of SMAD4 mutations and showed their effect on the expression of miRNA genes. We also showed that all hotspot mutations in SMAD4 and SMAD2 affect AA residues located at the surface of the SMAD4:SMAD2 interaction. Moreover, we distinguished and further characterized the effects of deleterious and hotspot missense mutations in DICER1, among others, showing that hotspot mutations in the RI-IIa and RIIIb domains not only decrease the levels of 5p-miRNAs but also increase the levels of 3p-miRNAs. As a result, we have identified numerous miRNAs that are significantly increased or decreased in samples with particular mutation types, including many well-known cancer-related miRNAs. We also linked some of the mutations with the patients' clinical outcomes. Furthermore, we created a compendium of information presented as an atlas and maps of mutations in miRNA biogenesis genes that may be useful resources of information for studying a particular gene or cancer type.

DATA AVAILABILITY
The results published here are based upon data generated by the TCGA Research Network: https://www.cancer.gov/ tcga. The set of in-house Python scripts is available at (https: //github.com/martynaut/mirnaome somatic mutations).