The human DNA ends proteome uncovers an unexpected entanglement of functional pathways

Abstract DNA ends get exposed in cells upon either normal or dysfunctional cellular processes or molecular events. Telomeres need to be protected by the shelterin complex to avoid junctions occurring between chromosomes while failing topoisomerases or clustered DNA damage processing may produce double-strand breaks, thus requiring swift repair to avoid cell death. The rigorous study of the great many proteins involved in the maintenance of DNA integrity is a challenging task because of the innumerous unspecific electrostatic and/or hydrophobic DNA—protein interactions that arise due to the chemical nature of DNA. We devised a technique that discriminates the proteins recruited specifically at DNA ends from those that bind to DNA because of a generic affinity for the double helix. Our study shows that the DNA ends proteome comprises proteins of an unexpectedly wide functional spectrum, ranging from DNA repair to ribosome biogenesis and cytoskeleton, including novel proteins of undocumented function. A global mapping of the identified proteome on published DNA repair protein networks demonstrated the excellent specificity and functional coverage of our purification technique. Finally, the native nucleoproteic complexes that assembled specifically onto DNA ends were shown to be endowed with a highly efficient DNA repair activity.


Sampling scheme
For the data mining described in this work, the set of samples, as divided in two the two and categories, is shown below (sample numbers are categorized by band identity). The protein identification data obtained from all the listed samples populate that published database. Each sample is compared to the corresponding sample. For example, sample vb96 is the control sample of vb81. Note how the control sample is used twice, as a control experiment for vb81 and vb85 (two samples coming from a same-conditions comparable but not technical replicate experiments).

Mass spectrometry
The raw mass data were converted to XML-formatted data files using the msconvert program of the Proteowizard software suite running in MS-Windows. The output format was mzXML, with TPP compatibility enabled and 64-bits binary encoding precision. All the remaining data processing steps were performed on a Debian GNU/Linux platform (http://www.debian.org) using the following set of software programs: the X!Tandem protein identification software that uses tandem mass spectrometry data along with the SwissProt protein database restricted to the human proteins; the X!TandemPipeline software that interfaces with X!Tandem and provides useful features to both filter and group X!Tandem-generated protein identifications (http://pappso.inra.fr/bioinfo/xtandempipeline), the Sqlite3 database (http://www.sqlite.org) and home-made software. Two pieces of software were specifically developed in-house; the first C++ software was developed to parse the XML-formatted data generated by X!TandemPipeline and inject mass data into the Sqlite3 database, and the second C++ software-freeDnaEndsProteome-was developed to allow an easy data mining of the bio-structural data in that populated database. This software and its detailed user manual are made freely available, along with the database itself (see below).
The X!Tandem database searching program was configured with the following settings: maximum precursor ion charge: 3; fragment mass error: 0.5 Da; minimum ion count: 4; maximum missed cleavage sites: 1; b,y ions. A protein was identified only if at least two unique peptides matched its sequence.
The false discovery rate (FDR) assessment was performed by applying the target-decoy approach at the X!Tandem database search step. The spectra were thus searched twice by X!Tandem, once against the target database and once against a decoy database prepared by reverting the sequence of all the proteins. In our set of data, the FDR peaked at 0.6 %; overall, the samples did contain a maximum of a hundred identified proteins.
For the PAI calculation, only the theoretical tryptic peptides in the mass range 800-2500 Da were considered to be "observable". (B) Scheme representing the production of bibiotinylated (1) and monobiotinylated (2) DNA duplexes. Restriction by Sma I produces two fragments of 85 and 101 bp (3). DNA oligonucleotides of sizes less than 186 bp (58 bp and 74 bp) were tested initially but did not yield a correct differential between the sets of proteins purified on the monobiotinylated phase vs those purified on the bibiotinylated phase. ) or endowed with such ends (DNA ends phase;

FIGURES AND TABLES
) [the scheme only shows the process for the bibiotinylated DNA but is identical for the monobiotinylated DNA]. Biotinylated duplex DNA oligonucleotides are incubated with the streptavidin-coated beads. The beads are magnet-sedimented and the supernatant is recovered (S1). The beads are washed once more (S1'). The pelleted beads are then resuspended and treated with Sma I. Following the endonucleolytic digestion, the supernatant is collected (S2) and the beads are subjected to UV irradiation to detach the remaining DNA material, that is collected in the supernatant (S3). All of the collected supernatants are deposited onto an agarose gel. Lanes S1 and S1' of the agarose gel show that, following incubation of the biotinylated DNA duplex oligonucleotides with the beads, almost all of that material was effectively bound to them. The S2 supernatant contains a significant amount of material for the DNA ends phase ( ) ; a faint band for the control phase ( ) showed that the Sma I restriction of the bibiotinylated oligonucleotides did release a very low amount of DNA material. Following the photocleavage of the material still bound to the beads, the S3 supernatants were recovered and analysed. Lanes S3 show a thick band that contained two DNA species: 85-and 101-bp fragments, as expected following digestion of the oligonucleotides with Sma I. The band is more intense for the phase than for the phase because the amount of DNA released from is twofold the amount of the DNA released from . Overall, these results show that the bibiotinylated duplex DNA oligonucleotide did indeed loop onto itself, leading to the production of a chromatographic phase effectively devoid of free DNA ends ( ). C: 0.35 µg of undigested control DNA corresponding to a fifth of the DNA that was loaded onto the beads. MM: 50 bp molecular markers. Fig. S3: Verification of the potential effects of UV irradiation on the DNA oligonucleotides. The monobiotinylated duplex DNA oligonucleotide was used to monitor the potential UV-induced DNA damage. The purification process was performed with the usual procedure on a monobiotinylated chromatographic phase ( ), either with ( + + ) or without ( − + ) the nuclear extract. An irradiated control sample was monobiotinylated DNA that had not undergone interaction with neither the beads nor the proteins ( − − ). Non-irradiated monobiotinylated DNA was deposited in C as a migration control.
Following the purification, the samples were irradiated, the beads were magnet-pelleted and the supernatant underwent the following steps (same treatment for the control sample that migrated in the ( − − ) lane above): 1. Protein digestion was performed by Proteinase K at 55 • C for 30 min. The mixture was then phenol/chlorophorm/isoamyl alcohol-extracted and ethanol-precipitated; 2. Treatment with the DNA damage-specific nuclease that nicks at the position of cyclobutane pyrimidine dimers was performed using the T4 PDG endonuclease according to the manufacturer's instructions (New England Biolabs); 3. Treatement with the T7 endonuclease to convert nicks to double-strand breaks was performed by bringing the sample to 10 mM Tris-HCl and 10 mM MgCl 2 (pH 8.5); 4. A second protein digestion (Proteinase K) and phenol/chlorophorm/isoamyl alcohol extraction were performed, followed by ethanol precipitation; 5. Gel electrophoresis was done on a 1.8 % agarose gel.   S5: Distribution of the proteins differentially distributed into one or more bands after purification with either the control or the DNA ends phase. The proteins accounted for in this diagram were identified at least 3 times over the 6 experiments. Proteins were sorted by their relative abundance in the 720, 480, 240 kDa and common bands, in that order. For each functional category, the bottom histogram corresponds to the same data as for the top one, but after filtering applied (see Materials and Methods). The name of the proteins that are filtered out is followed by a dash. Band color code (see Fig. 3A of the main text): blue, 720 kDa; red, 480 kDa; yellow, 240 kDa; green: common. (follows in Fig. S6).     Legend. chr: chromosome, cyt: cytoskeleton, dnarr: DNA repair & recombination, dnarep: DNA replication, exo: exosome, mit: mitochondrial biogenesis, rib: ribosome biogenesis, spli: spliceosome, trf: transcription factors, trm: transcription machinery, ub: tbiquitin systems, uncl: unclassified. X5: XRCC5 (Ku80), X6: XRCC6 (Ku70), PK: PRKDC (DNA-PKcs). The asterisk next to the protein name indicates that the protein was found to be part of the core DNA ends proteome, as described in Fig. 3A of the main text. Parenthesized numerical values indicate the standard deviation of the associated percentage value.   Legend. chr: chromosome, cyt: cytoskeleton, dnarr: DNA repair & recombination, dnarep: DNA replication, exo: exosome, mit: mitochondrial biogenesis, rib: ribosome biogenesis, spli: spliceosome, trf: transcription factors, trm: transcription machinery, ub: tbiquitin systems, uncl: unclassified. X5: XRCC5 (Ku80), X6: XRCC6 (Ku70), PK: PRKDC (DNA-PKcs). The asterisk next to the protein name indicates that the protein was found to be part of the core DNA ends proteome, as described in Fig. 3A of the main text.