Analysis of proteins registered in the PIR protein database implied that most of relatively large proteins are related to important functions in higher multicellular organisms, but not many large proteins have been registered to date. To establish a protocol for efficient analysis of cDNA clones coding for large proteins, we constructed a series of strictly size-fractionated cDNA libraries of human brain, where the average insert sizes of cDNA clones ranged from 3.3 kb to 10 kb. As judged by hybridization analysis with probes derived from mRNAs of known sizes, the libraries with insert sizes up to 7 kb, at least, contained the clones corresponding to full-length transcripts in addition to truncated products of longer transcripts, but few chimeric clones. Using one of the fractionated libraries with an average insert size of 7 kb, the single-pass sequences from both the ends of randomly sampled clones were determined and searched against DNA databases. Approximately 90% of the clones were found to be new with respect to their 5′-sequences, while their 3′-sequences were frequently similar to the registered expression sequence tags. Examination of the protein-coding capacity in an in vitro transcription/translation system showed that about 20% of the clones direct the synthesis of proteins with apparent molecular masses larger than 50 kDa. The set of libraries constructed here should be very useful for the accumulation of sequence data on large proteins in the human brain.

Author notes

* Communicated by Mituru Takanami