Zinc fingers are the most abundant DNA-binding motifs encoded by eukaryotic genomes and one of the best understood DNA-recognition domains. Each zinc finger typically binds a 3-nt target sequence, and it is possible to engineer zinc-finger arrays (ZFAs) that recognize extended DNA sequences by linking together individual zinc fingers. Engineered zinc-finger proteins have proven to be valuable tools for gene regulation and genome modification because they target specific sites in a genome. Here we describe ZiFDB (Zinc Finger Database; http://bindr.gdcb.iastate.edu/ZiFDB), a web-accessible resource that compiles information on individual zinc fingers and engineered ZFAs. To enhance its utility, ZiFDB is linked to the output from ZiFiT—a software package that assists biologists in finding sites within target genes for engineering zinc-finger proteins. For many molecular biologists, ZiFDB will be particularly valuable for determining if a given ZFA (or portion thereof) has previously been constructed and whether or not it has the requisite DNA-binding activity for their experiments. ZiFDB will also be a valuable resource for those scientists interested in better understanding how zinc-finger proteins recognize target DNA.
The C2H2 zinc-finger motif is approximately 30 amino acids in length and conforms to the pattern (F/Y)-X-C-X2–5-C-X3-(F/Y)-X5-ψ-X2-H-X3–5-H, where X is an amino acid and ψ is a hydrophobic residue (1). The amino acids fold into a ββα structure, which is stabilized by a zinc ion coordinated by the two conserved cysteine and histidine residues. In binding DNA, each finger typically recognizes three adjacent nucleotides: amino acids at positions −1, +3 and +6 (relative to the beginning of the α-helix) contact nucleotides at the 3′-end, middle and 5′-end of the target nucleotide triplet. Zinc fingers can function as monomers, making it possible to link zinc fingers together in extended arrays that recognize unique DNA sequences. Such artificial or engineered zinc-finger arrays (ZFAs) are often fused to effector domains (e.g. transcriptional activation domains or nucleases), and engineered zinc-finger fusion proteins are proving increasingly valuable as reagents for gene regulation and genome modification (2).
A simple approach for engineering ZFAs is known as ‘modular assembly’, wherein individual fingers are joined together to create an array that recognizes a novel target sequence. To facilitate modular assembly, efforts have been made to obtain zinc fingers that recognize all 64 possible triplets. Approaches to create such a finger archive have included parallel selection from random libraries (3–7), rational design (8,9) and characterization of naturally occurring zinc-finger proteins (10). Zinc fingers have thus far been identified that recognize all 16 GNN triplets as well as most CNN, ANN and some TNN triplets. Modular assembly assumes that individual zinc fingers are not influenced by neighboring fingers in an array. However, systematic testing of a large number of ZFAs constructed with available zinc-finger modules resulted in success rates of 56%, 20%, 4% and 0% for 9-bp target sites composed of three, two, one or no GNN triplets, respectively (11). It is clear from this work that zinc fingers are not always modular and that a better understanding of context-dependent effects is needed to increase the reliability of modular assembly.
To minimize the context dependence associated with modular design, sequential selection (12), bipartite selection (13) and bacterial-based selection methods have been described (14,15). These methods typically involve screening large libraries of ZFA variants. The few arrays in the library that recognize a specific target with high affinity and specificity are identified using phage display or a bacterial two-hybrid system in which target binding activates expression of a reporter gene. An advantage of these selection-based strategies is that context-dependent effects are addressed through the process of selection. Selection-based approaches, however, require considerable molecular biological expertise and are time consuming. Recently, a selection platform was described, called OPEN (Oligomerized Pool ENgineering), that overcomes shortcomings of traditional selection-based methods in that it is simpler and faster; it is also much more effective than modular assembly (11,16). However, even the simplified OPEN strategy remains more labor-intensive and difficult to perform than modular assembly.
Because many scientists are interested in generating zinc-finger proteins for gene regulation and genome modification, we implemented a Zinc Finger Database (ZiFDB)—an archive of information about zinc fingers and engineered ZFAs. The database is one of several, integrated resources provided by the Zinc Finger Consortium (http://www.zincfingers.org), a group of academic laboratories dedicated to improving methods for engineering ZFAs and developing new applications for their use. The Consortium has previously created a repository of zinc-finger encoding plasmids which simplify and facilitate modular assembly of ZFAs (17). Consortium labs also developed and validated the OPEN platform for ZFA engineering (11,16). Further, Consortium-supported software, called ZiFiT (Zinc Finger Targeter), allows users to scan DNA sequences to identify potential sites for which ZFAs might be engineered by either modular assembly or OPEN (18). A particularly important feature of ZiFDB is its linkage to ZiFiT, which allows users to readily determine if particular ZFAs have been previously made by other investigators that recognize a target sequence of interest.
ZiFDB currently contains 716 zinc fingers, each of which is assigned a unique numerical designator. The database was initially populated with fingers generated and characterized by four different research groups: In some cases, investigators have used the DNA-recognition helices from the above fingers within the context of a different zinc-finger backbone. In ZiFDB, such fingers are given a different numerical designator, since the backbone may affect zinc-finger function.
Sangamo BioSciences: Sangamo Biosciences designed zinc fingers that recognize GNN triplets at each of the three positions in a three-finger array (7). It is assumed that these fingers function optimally in the position for which they were designed. Sangamo has also generated several zinc fingers that recognize non-GNN triplets, and these are also included in the database (17,19,20).
Barbas laboratory: The Barbas laboratory at the Scripps Research Institute developed a large number of zinc fingers that recognize diverse nucleotide triplets. The fingers are presumed to be modular and function at any position within a ZFA (3,4,6,21).
Toolgen: Toolgen Inc. assembled a collection of zinc fingers derived from naturally occurring zinc-finger proteins encoded in the human genome (10).
Joung laboratory: The Joung laboratory at Massachusetts General Hospital used the OPEN platform to generate a large number of ZFAs (16). All of these ZFAs are three finger domains that recognize a 9-bp target. Unique zinc fingers comprising ZFAs generated by OPEN are also included in the database.
The following information is provided for each zinc finger:
Triplet target: The sequence is provided for the cognate 3-bp DNA target of each finger.
Recognition helix: The sequence is provided for the seven amino acids that constitute the recognition helix: positions −1 to +6 relative to the start of the α-helix that contacts DNA in the finger.
Position within a three-finger array: As indicated above, some fingers were designed to function optimally at certain positions within a three-finger array. The fingers described by Sangamo BioSciences, for example, were selected at different positions within a three-finger protein, and the recognition helices for the same triplet target at different positions in a three finger protein usually differ (7). These differences are likely due to position and context-dependent effects. In addition, the fingers generated by OPEN were selected to function in a specific position within a three finger protein (16). The database records position information for the fingers described by Sangamo BioSciences and generated by OPEN. By convention, finger 1 (F1) specifies the 3′-triplet in a 9-bp target, F2 specifies the middle triplet and F3 specifies the 5′-triplet.
Source: The name of the group that generated the zinc finger is provided. The groups at present include the Barbas and Joung laboratories, Sangamo BioSciences (SGMO) and ToolGen.
Amino acid sequence: The amino acid sequence of the entire finger is provided.
Article: A link to the relevant citation is provided so the user can get more detailed information about a particular finger.
Experiment: A link directs the user to information describing the method used to generate the finger, relevant functional data and other observations made by the experimenter.
At present, ZiFDB houses 652 three-finger arrays collected from the published literature. Although engineered ZFAs of four or more fingers have been reported, the engineering platforms for which the Zinc Finger Consortium has developed reagents are only designed for constructing three-finger arrays. Larger arrays, therefore, are currently not supported by ZiFDB. Each three-finger array in the database has a unique numerical designator, and the following information is provided:
Identification numbers (IDs) of the component fingers: The IDs correspond to the zinc-finger designator and are hyperlinks that allow a user to look up detailed information about individual fingers in an array.
Binding subsites: These sequences correspond to the 3-bp nucleotide target of the corresponding component finger.
Recognition helix: For each finger in an array, the sequence is provided of the seven amino acids that make up the recognition helix.
Article: A link to the relevant citation is provided so the user can get more detailed information about that array.
Experiment: A link is provided to information describing the method used to generate the arrays and relevant functional data or other observations made by the experimenter.
To retrieve information about a particular zinc finger, any combination of three search criteria can be used: triplet target, finger position and/or finger source. By providing partial information, a list of all fingers matching the input is returned (Figure 1). By clicking the link in the article or experiment column, detailed information about related citations or experiments is displayed.
When searching for information about an array, the user provides the nucleotide triplets (subsites) recognized by each finger in the array. The database returns not only the arrays recognizing the three input subsites, but also arrays matching any two of three triplet subsites (Figure 2). This partial information may inform the design of novel ZFAs by providing the user with a two finger protein that could be modified and tested for function. For arrays that match only two of the input subsites, the matching subsites and their recognition helices are colored in red for easy identification. The output also provides the amino acid sequence of the array, IDs for each finger in the array and IDs for related articles or experiments. By clicking the relevant ID, the user is directed to information on specific fingers, articles or experiments.
Interface with pre-existing ZiFiT software
To enhance its utility, ZiFDB is interfaced directly with ZiFiT—a web accessible software package that assists biologists with zinc-finger protein design (http://bindr.gdcb.iastate.edu/ZiFiT/) (16,18). ZiFiT provides users with a list of target sites within their DNA sequence of interest as well as the corresponding Consortium-developed reagents available for engineering the ZFAs. Target site hyperlinks within the ZiFiT output directly query ZiFDB to determine if any previously constructed arrays exist that bind to completely or partially matched target sequences. In addition, ZiFiT users can query ZiFDB for finger information for a specific triplet subsite by clicking on the triplet. Thus, ZiFiT and ZiFDB work synergistically to aid in ZFA design.
ZiFDB stores zinc-finger information as a set of objects defined by Java classes. The atomic information is contained in two main classes (Zinc Finger and Zinc Finger Array). Both classes reference additional Article and Experiment classes. The Article class includes the title, journal, volume, page, year and authors for the article being represented. The Experiment class includes information on the submitter, assay, results and other comments provided by the author or submitter. The authors of articles and experiments are organized using an Author class. Each finger or array object can be associated with multiple Articles or Experiments. The objects representing the Java classes are mapped to a MySQL version 4.0.15 database using Hibernate.
Apache Tomcat version 5.5.15 is used as the servlet engine for the web site. The web application uses Struts 1.2 to implement the Model-View-Controller (MVC) pattern. Java Server Pages (JSP) are used to implement the user's view. The web application was developed using Eclipse 3.0.
ZiFDB is one of an increasing repertoire of tools for zinc-finger protein engineering provided by the Zinc Finger Consortium. ZiFDB houses information on individual zinc fingers, including the 141 zinc-finger modules currently made available as a clone library by the Consortium. Importantly, the database also provides users with information on ZFAs and their target sequences, and we believe this feature will be particularly valuable for the rapidly growing number of molecular biologists interested in generating ZFAs for modification or regulation of their genome locus of interest. Further, analysis of the data housed in ZiFDB will also likely provide new insight into how zinc-finger proteins recognize their DNA targets, which, in turn, may lead to additional improvements in ZFA engineering.
Future versions of ZiFDB will allow users to directly input information on novel zinc fingers or ZFAs. To ensure data quality, a shadow database will be created that is associated with the persistent database. Users will be permitted to submit data to the shadow database where it will be held pending approval by the curator and subsequent loading into the persistent database. This feature will ensure that the database capitalizes upon new information generated by the scientific community, including unpublished ZFAs, and thereby should expand the utility of the database.
The National Science Foundation (0501678 and 0209818 to D.F.V.); National Institutes of Health (GM069906 and GM078369 to J.K.J.). Funding for open access charge: National Science Foundation 0501678.
Conflict of interest statement. None declared.