Zhaolei Zhang, Nicholas Carriero, Deyou Zheng, John Karro, Paul M. Harrison, Mark Gerstein; PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics 2006; 22 (12): 1437-1439. doi: 10.1093/bioinformatics/btl116
Motivation: Mammalian genomes contain many ‘genomic fossils’ i.e. pseudogenes. These are disabled copies of functional genes that have been retained in the genome by gene duplication or retrotransposition events. Pseudogenes are important resources in understanding the evolutionary history of genes and genomes.
Results: We have developed a homology-based computational pipeline (‘PseudoPipe’) that can search a mammalian genome and identify pseudogene sequences in a comprehensive and consistent manner. The key steps in the pipeline involve using BLAST to rapidly cross-reference potential “parent” proteins against the intergenic regions of the genome and then processing the resulting “raw hits” -- i.e. eliminating redundant ones, clustering together neighbors, and associating and aligning clusters with a unique parent. Finally, pseudogenes are classified based on a combination of criteria including homology, intron-exon structure, and existence of stop codons and frameshifts.
Availability: The PseudoPipe program is implemented in Python and can be downloaded at
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact firstname.lastname@example.org