Motivation: The yeast Saccharomyces cerevisiae is the first eukaryotic organism to have its genome completely sequenced. Since then, several large-scale analyses of the yeast genome have provided extensive functional annotations of individual genes and proteins. One fundamental property of a protein is its subcellular localization, which provides critical information about how this protein works in a cell. An important project therefore was the creation of the yeast GFP fusion localization database by the University of California, San Francisco, USA (UCSF). This database provides localization data for 75% of the proteins believed to be encoded by the yeast genome. These proteins were classified into 22 distinct subcellular location categories by visual examination. Based on our past success at building automated systems to classify subcellular location patterns in mammalian cells, we sought to create a similar system for yeast.
Results: We developed computational methods to automatically analyze the images created by the UCSF yeast GFP fusion localization project. The system was trained to recognize the same location categories that were used in that study. We applied the system to 2640 images, and the system gave the same label as the previous assignments to 2139 images (81%). When only the highest confidence assignments were considered, 94.7% agreement was observed. Visual examination of the proteins for which the two approaches disagree suggests that at least some of the automated assignments may be more accurate. The automated method provides an objective, quantitative and repeatable assignment of protein locations that can be applied to new collections of yeast images (e.g. for different strains or the same strain under different conditions). It is also important to note that this performance could be achieved without requiring colocalization with any marker proteins.