-
PDF
- Split View
-
Views
-
Cite
Cite
Ai-Sheng Xiong, Ri-He Peng, Jing Zhuang, Feng Gao, Yi Li, Zong-Ming Cheng, Quan-Hong Yao, Chemical gene synthesis: strategies, softwares, error corrections, and applications, FEMS Microbiology Reviews, Volume 32, Issue 3, May 2008, Pages 522–540, https://doi.org/10.1111/j.1574-6976.2008.00109.x
- Share Icon Share
Abstract
Chemical synthesis of DNA sequences provides a powerful tool for modifying genes and for studying gene structure, expression and function. Modified genes and consequently protein/enzymes can bridge genomics and proteomics research or facilitate commercial applications of gene and protein technologies. In this review, we will summarize various strategies, designing softwares and error correction methods for chemical gene synthesis, particularly for the synthesis and assembly of long DNA molecules based on polymerase cycling assembly. Also, we will briefly discuss some of the major applications of chemical synthesis of DNA sequences in basic research and applied areas.
Introduction
Chemical synthesis of DNA sequences offers a highly effective technique to elucidate gene functions and analyze protein–nucleic acid interactions (Beattie et al., 1988; Engels & Uhlmann, 1988). In many cases, chemical synthesis may be the only choice because template DNAs are not readily available. Furthermore, characterization of gene function often requires expression in a heterologous system (Daly & Hearn, 2005; Jana & Deb, 2005; Macauley et al., 2005; Peng et al., 2006a), and in these cases, codon optimization is often necessary in order to achieve a high level of expression (Sharp et al., 1986; Murray et al., 1989; Kurland, 1991; Kane et al., 1995; Xiong et al., 2005, 2006a). In addition, chemical gene synthesis may be preferable to avoid tedious and costly site-directed mutagenesis and subcloning (Gao et al., 2003; Shevchuk et al., 2004; Xiong et al., 2004a).
The design and construction of synthetic genes were a dream of many scientists 30 years ago. A few organic chemists, foremost among them HG Khorana, developed one of the first sets of technologies to synthesize oligonucleotides in the 1960s and early 1970s. The earliest enzymatic gene synthesis, e.g. tRNA genes, could be tracked back to the Khorana group as early as the 1960s (Gupta et al., 1968a, b; Kleppe et al., 1976). In 1976, two small genes, one coding a regulatory lac operator and another a tyrosine suppressor, were chemically synthesized and cloned (Heyneker et al., 1976; Kleppe et al., 1976). In addition, the first protein-encoding gene, somatostatin, was chemically synthesized in 1977 (Itakura et al., 1977). Then, a large number of protein encoding genes were chemically synthesized and expressed in Escherichia coli in the 1970s and early 1980s (Goeddel et al., 1979; Tanaka et al., 1982; Ohsuye et al., 1983). These studies have also demonstrated that chemical DNA synthesis methods are capable of making biologically functional genes (Itakura & Riggs, 1980). The efficiency of chemical synthesis of DNA sequences was further improved when synthesis of more than 100-nt long oligonucleotides became possible in the mid 1980s (Caruthers et al., 1985). Oligonucleotides were assembled into functional genes through enzymatic ligation (Smith et al., 1982; Edge et al., 1983; Jay et al., 1984; Sproat & Gait, 1985) or the FokI method (Mandecki & Bolling, 1988). However, the lengths of the early synthesized DNA sequences using these techniques were generally <1.0 kb (Beattie et al., 1988; Engels & Uhlmann, 1988).
In the 1990s, PCR-based strategies were used to improve chemical synthesis and assembly of DNA molecules. These methods included self-priming PCR (Dillon & Rosen, 1990; Ciccarelli et al., 1991; Prodromou & Pearl, 1992), dual asymmetrical PCR (DA-PCR) (Sandhu et al., 1992), PCR-based assembly (Stemmer et al., 1995) and the template-directed ligation (TDL) (Strizhov et al., 1996). PCR is a key component in most of the recently developed methods for gene synthesis including thermodynamically balanced inside-out (TBIO) (Gao et al., 2003), two-step total gene synthesis coupling with dual asymmetrical PCR and overlap extension PCR (Young & Dong, 2004), PCR-based two-step DNA synthesis (PTDS) (Xiong et al., 2004a), successive extension PCR (Xiong et al., 2004b, 2006a) and microchip-based technology for multiplex gene synthesis (Tian et al., 2004). High-throughput syntheses of oligonucleotides were also easily achieved by DNA synthesis machines (Rayner et al., 1998; Cheng et al., 2002; Livesay et al., 2002; Pon & Yu, 2004, 2005). The computer-assisted oligonucleotide design has helped to reduce the cost for oligonucleotide synthesis, making chemical synthesis of DNA molecules popular in modern biological research and biotech applications (Rayner et al., 1998; Cheng et al., 2002; Livesay et al., 2002). It is now possible to synthesize complicated genes with reduced costs and a shorter turnover time. Also, the lengths of chemically synthesized DNA sequences have reached to 5, 10, and even 32 kb (Cello et al., 2002; Smith et al., 2003; Kodumal et al., 2004; Xiong et al., 2004a, 2006b). Some of the DNA sequences that were synthesized based on polymerase cycling assembly strategies are listed in Table 1. Here, we describe in detail some key synthesis strategies, particularly for the synthesis and assembly of long DNA molecules based on polymerase cycling assembly, oligonucleotide design softwares, error correction methods and applications of chemical gene synthesis.
Some examples of DNA sequences that were synthesized based polymerase cycling assembly strategies
Name of synthesized DNA fragment | Length | Major methods | Year | References |
A DNA fragment | 254 bp | PCR-based | 1990 | Barnett & Erfle (1990) |
A DNA fragment | 180 bp | PCR-based | ||
HIV-2 rev gene | 303 bp | PCR-based | 1990 | Dillon & Rosen (1990) |
rev gene | 393 bp | PCR-based | 1991 | Ciccarelli et al. (1991) |
nef gene | 655 bp | PCR-based | ||
Isozyme c | 924 bp | PCR-based | 1991 | Jayaraman et al. (1991) |
The gene for putative metalloproteinase | 546 bp | PCR-based | 1992 | Ye et al. (1992) |
Human colipase | 297 bp | PCR-mediated | 1992 | Jayaraman & Puccini (1992) |
Porcine colipase | 309 bp | |||
OmpA signal peptide and hirudin | 250 bp | Stepwise elongation of sequence-PCR | 1992 | Majumder et al. (1992) |
High molecular weight multimers of oligonucleotides | 30–100 repeats | PCR-based | 1994 | Hemat & McEntee (1994) |
β-Lactamase-encoding gene | 1.1 kb | Polymerase cycling assembly | 1995 | Stemmer et al. (1995) |
A full-length plasmid | 2.7 -kb | |||
Long primers and single-stranded DNA | 206 bp | Asymmetric PCR | 1996 | Wooddell & Burgess (1996) |
Human interleukin-5 | 350 bp | Two step ligation and PCR | 1997 | Mehta et al. (1997) |
Plasmodium falciparum gene | 2.1 kb | PCR-based | 1999 | Withers et al. (1999) |
cryIA(c) Bt | 1800 bp | Successive extension-PCR | 2001 | Peng et al. (2001) |
Full-length poliovirus cDNA | c. 7.5 kb | Polymerase cycling assembly | 2002 | Cello et al. (2002) |
Modified phytase gene fphy | 1300 bp | Successive extension-PCR | 2002 | Peng et al. (2002) |
Signal peptide sequences MF4I | 400 bp | Successive extension-PCR | 2003 | Xiong et al. (2003) |
PDK1 gene | 1712 bp | TBIO | 2003 | Gao et al. (2003) |
PDK1 gene | 1712 bp | TBC | 2003 | Gao et al. (2003) |
phiX174 bacteriophage genome | 5386 bp | Polymerase cycling assembly | 2003 | Smith et al. (2003) |
A human interleukin-18 | 471 bp | Polymerase cycling assembly | 2003 | Li et al. (2003) |
A human alpha interferon | 500 bp | Polymerase cycling assembly | 2004 | Neves et al. (2004) |
Polyketide synthase gene cluster | 32 kb | Polymerase cycling assembly | 2004 | Kodumal et al. (2004) |
Kinds of DNA fragments | 470 bp to 1.2 kb | Two-step PCR | 2004 | Young & Dong (2004) |
Vip3aI gene | 2382 bp | PTDS | 2004 | Xiong et al. (2004a) |
phyI1s gene | 1350 bp | Successive extension-PCR | 2004 | Xiong et al. (2004a) |
A fusion DNA fragment | 20 kb | Polymerase cycling assembly | 2004 | Shevchuk et al. (2004) |
Chicken anemia virus apoptin gene | 366 bp | Polymerase cycling assembly | 2005 | Chen et al. (2005) |
Modified phytase gene phyA-sh | 1347 bp | Successive extension-PCR | 2005 | Xiong et al. (2005) |
Modified phytase gene mphy2 | 1353 bp | Successive extension-PCR | 2006 | Peng (2006a, b) |
Modified phytase gene phy-pl-wt | 1230 bp | PTDS | 2006 | Xiong et al. (2006a) |
Modified phytase gene phy-pl-sh | 1230 bp | Successive extension-PCR | 2006 | Xiong et al. (2006a) |
Renilla reniformis luciferase gene | 936 bp | One-step PCR based | 2006 | Wu et al. (2006) |
Pur operon | 12 kb | PAS | 2006 | Xiong et al. (2006b) |
phyCs gene | 1074 bp | Polymerase cycling assembly | 2006 | Zou et al. (2006) |
GFP segment 1 | 531 bp | Parallel gene synthesis in a microfluidic device based PCA method | 2007 | Kong et al. (2007) |
GFP segment 2 | 529 bp | |||
hjc gene | 390 bp | |||
Randomized alba gene | 327 bp | |||
Manganese peroxidase gene | 1.0–1.5 kb | Modified overlap extension PCR | 2007 | Dong et al. (2007) |
Laccase gene | ||||
Cip1 peroxidase gene | ||||
Human hepatitis B virus large surface antigen gene | 1245 bp | PTDS | 2007 | Lou et al. (2007) |
OsDREB1BI gene | 657 bp | PTDS | 2007 | Qin et al. (2007) |
hlacz-sh gene | 1533 bp | PTDS | 2007 | Xiong et al. (2007a) |
Name of synthesized DNA fragment | Length | Major methods | Year | References |
A DNA fragment | 254 bp | PCR-based | 1990 | Barnett & Erfle (1990) |
A DNA fragment | 180 bp | PCR-based | ||
HIV-2 rev gene | 303 bp | PCR-based | 1990 | Dillon & Rosen (1990) |
rev gene | 393 bp | PCR-based | 1991 | Ciccarelli et al. (1991) |
nef gene | 655 bp | PCR-based | ||
Isozyme c | 924 bp | PCR-based | 1991 | Jayaraman et al. (1991) |
The gene for putative metalloproteinase | 546 bp | PCR-based | 1992 | Ye et al. (1992) |
Human colipase | 297 bp | PCR-mediated | 1992 | Jayaraman & Puccini (1992) |
Porcine colipase | 309 bp | |||
OmpA signal peptide and hirudin | 250 bp | Stepwise elongation of sequence-PCR | 1992 | Majumder et al. (1992) |
High molecular weight multimers of oligonucleotides | 30–100 repeats | PCR-based | 1994 | Hemat & McEntee (1994) |
β-Lactamase-encoding gene | 1.1 kb | Polymerase cycling assembly | 1995 | Stemmer et al. (1995) |
A full-length plasmid | 2.7 -kb | |||
Long primers and single-stranded DNA | 206 bp | Asymmetric PCR | 1996 | Wooddell & Burgess (1996) |
Human interleukin-5 | 350 bp | Two step ligation and PCR | 1997 | Mehta et al. (1997) |
Plasmodium falciparum gene | 2.1 kb | PCR-based | 1999 | Withers et al. (1999) |
cryIA(c) Bt | 1800 bp | Successive extension-PCR | 2001 | Peng et al. (2001) |
Full-length poliovirus cDNA | c. 7.5 kb | Polymerase cycling assembly | 2002 | Cello et al. (2002) |
Modified phytase gene fphy | 1300 bp | Successive extension-PCR | 2002 | Peng et al. (2002) |
Signal peptide sequences MF4I | 400 bp | Successive extension-PCR | 2003 | Xiong et al. (2003) |
PDK1 gene | 1712 bp | TBIO | 2003 | Gao et al. (2003) |
PDK1 gene | 1712 bp | TBC | 2003 | Gao et al. (2003) |
phiX174 bacteriophage genome | 5386 bp | Polymerase cycling assembly | 2003 | Smith et al. (2003) |
A human interleukin-18 | 471 bp | Polymerase cycling assembly | 2003 | Li et al. (2003) |
A human alpha interferon | 500 bp | Polymerase cycling assembly | 2004 | Neves et al. (2004) |
Polyketide synthase gene cluster | 32 kb | Polymerase cycling assembly | 2004 | Kodumal et al. (2004) |
Kinds of DNA fragments | 470 bp to 1.2 kb | Two-step PCR | 2004 | Young & Dong (2004) |
Vip3aI gene | 2382 bp | PTDS | 2004 | Xiong et al. (2004a) |
phyI1s gene | 1350 bp | Successive extension-PCR | 2004 | Xiong et al. (2004a) |
A fusion DNA fragment | 20 kb | Polymerase cycling assembly | 2004 | Shevchuk et al. (2004) |
Chicken anemia virus apoptin gene | 366 bp | Polymerase cycling assembly | 2005 | Chen et al. (2005) |
Modified phytase gene phyA-sh | 1347 bp | Successive extension-PCR | 2005 | Xiong et al. (2005) |
Modified phytase gene mphy2 | 1353 bp | Successive extension-PCR | 2006 | Peng (2006a, b) |
Modified phytase gene phy-pl-wt | 1230 bp | PTDS | 2006 | Xiong et al. (2006a) |
Modified phytase gene phy-pl-sh | 1230 bp | Successive extension-PCR | 2006 | Xiong et al. (2006a) |
Renilla reniformis luciferase gene | 936 bp | One-step PCR based | 2006 | Wu et al. (2006) |
Pur operon | 12 kb | PAS | 2006 | Xiong et al. (2006b) |
phyCs gene | 1074 bp | Polymerase cycling assembly | 2006 | Zou et al. (2006) |
GFP segment 1 | 531 bp | Parallel gene synthesis in a microfluidic device based PCA method | 2007 | Kong et al. (2007) |
GFP segment 2 | 529 bp | |||
hjc gene | 390 bp | |||
Randomized alba gene | 327 bp | |||
Manganese peroxidase gene | 1.0–1.5 kb | Modified overlap extension PCR | 2007 | Dong et al. (2007) |
Laccase gene | ||||
Cip1 peroxidase gene | ||||
Human hepatitis B virus large surface antigen gene | 1245 bp | PTDS | 2007 | Lou et al. (2007) |
OsDREB1BI gene | 657 bp | PTDS | 2007 | Qin et al. (2007) |
hlacz-sh gene | 1533 bp | PTDS | 2007 | Xiong et al. (2007a) |
Some examples of DNA sequences that were synthesized based polymerase cycling assembly strategies
Name of synthesized DNA fragment | Length | Major methods | Year | References |
A DNA fragment | 254 bp | PCR-based | 1990 | Barnett & Erfle (1990) |
A DNA fragment | 180 bp | PCR-based | ||
HIV-2 rev gene | 303 bp | PCR-based | 1990 | Dillon & Rosen (1990) |
rev gene | 393 bp | PCR-based | 1991 | Ciccarelli et al. (1991) |
nef gene | 655 bp | PCR-based | ||
Isozyme c | 924 bp | PCR-based | 1991 | Jayaraman et al. (1991) |
The gene for putative metalloproteinase | 546 bp | PCR-based | 1992 | Ye et al. (1992) |
Human colipase | 297 bp | PCR-mediated | 1992 | Jayaraman & Puccini (1992) |
Porcine colipase | 309 bp | |||
OmpA signal peptide and hirudin | 250 bp | Stepwise elongation of sequence-PCR | 1992 | Majumder et al. (1992) |
High molecular weight multimers of oligonucleotides | 30–100 repeats | PCR-based | 1994 | Hemat & McEntee (1994) |
β-Lactamase-encoding gene | 1.1 kb | Polymerase cycling assembly | 1995 | Stemmer et al. (1995) |
A full-length plasmid | 2.7 -kb | |||
Long primers and single-stranded DNA | 206 bp | Asymmetric PCR | 1996 | Wooddell & Burgess (1996) |
Human interleukin-5 | 350 bp | Two step ligation and PCR | 1997 | Mehta et al. (1997) |
Plasmodium falciparum gene | 2.1 kb | PCR-based | 1999 | Withers et al. (1999) |
cryIA(c) Bt | 1800 bp | Successive extension-PCR | 2001 | Peng et al. (2001) |
Full-length poliovirus cDNA | c. 7.5 kb | Polymerase cycling assembly | 2002 | Cello et al. (2002) |
Modified phytase gene fphy | 1300 bp | Successive extension-PCR | 2002 | Peng et al. (2002) |
Signal peptide sequences MF4I | 400 bp | Successive extension-PCR | 2003 | Xiong et al. (2003) |
PDK1 gene | 1712 bp | TBIO | 2003 | Gao et al. (2003) |
PDK1 gene | 1712 bp | TBC | 2003 | Gao et al. (2003) |
phiX174 bacteriophage genome | 5386 bp | Polymerase cycling assembly | 2003 | Smith et al. (2003) |
A human interleukin-18 | 471 bp | Polymerase cycling assembly | 2003 | Li et al. (2003) |
A human alpha interferon | 500 bp | Polymerase cycling assembly | 2004 | Neves et al. (2004) |
Polyketide synthase gene cluster | 32 kb | Polymerase cycling assembly | 2004 | Kodumal et al. (2004) |
Kinds of DNA fragments | 470 bp to 1.2 kb | Two-step PCR | 2004 | Young & Dong (2004) |
Vip3aI gene | 2382 bp | PTDS | 2004 | Xiong et al. (2004a) |
phyI1s gene | 1350 bp | Successive extension-PCR | 2004 | Xiong et al. (2004a) |
A fusion DNA fragment | 20 kb | Polymerase cycling assembly | 2004 | Shevchuk et al. (2004) |
Chicken anemia virus apoptin gene | 366 bp | Polymerase cycling assembly | 2005 | Chen et al. (2005) |
Modified phytase gene phyA-sh | 1347 bp | Successive extension-PCR | 2005 | Xiong et al. (2005) |
Modified phytase gene mphy2 | 1353 bp | Successive extension-PCR | 2006 | Peng (2006a, b) |
Modified phytase gene phy-pl-wt | 1230 bp | PTDS | 2006 | Xiong et al. (2006a) |
Modified phytase gene phy-pl-sh | 1230 bp | Successive extension-PCR | 2006 | Xiong et al. (2006a) |
Renilla reniformis luciferase gene | 936 bp | One-step PCR based | 2006 | Wu et al. (2006) |
Pur operon | 12 kb | PAS | 2006 | Xiong et al. (2006b) |
phyCs gene | 1074 bp | Polymerase cycling assembly | 2006 | Zou et al. (2006) |
GFP segment 1 | 531 bp | Parallel gene synthesis in a microfluidic device based PCA method | 2007 | Kong et al. (2007) |
GFP segment 2 | 529 bp | |||
hjc gene | 390 bp | |||
Randomized alba gene | 327 bp | |||
Manganese peroxidase gene | 1.0–1.5 kb | Modified overlap extension PCR | 2007 | Dong et al. (2007) |
Laccase gene | ||||
Cip1 peroxidase gene | ||||
Human hepatitis B virus large surface antigen gene | 1245 bp | PTDS | 2007 | Lou et al. (2007) |
OsDREB1BI gene | 657 bp | PTDS | 2007 | Qin et al. (2007) |
hlacz-sh gene | 1533 bp | PTDS | 2007 | Xiong et al. (2007a) |
Name of synthesized DNA fragment | Length | Major methods | Year | References |
A DNA fragment | 254 bp | PCR-based | 1990 | Barnett & Erfle (1990) |
A DNA fragment | 180 bp | PCR-based | ||
HIV-2 rev gene | 303 bp | PCR-based | 1990 | Dillon & Rosen (1990) |
rev gene | 393 bp | PCR-based | 1991 | Ciccarelli et al. (1991) |
nef gene | 655 bp | PCR-based | ||
Isozyme c | 924 bp | PCR-based | 1991 | Jayaraman et al. (1991) |
The gene for putative metalloproteinase | 546 bp | PCR-based | 1992 | Ye et al. (1992) |
Human colipase | 297 bp | PCR-mediated | 1992 | Jayaraman & Puccini (1992) |
Porcine colipase | 309 bp | |||
OmpA signal peptide and hirudin | 250 bp | Stepwise elongation of sequence-PCR | 1992 | Majumder et al. (1992) |
High molecular weight multimers of oligonucleotides | 30–100 repeats | PCR-based | 1994 | Hemat & McEntee (1994) |
β-Lactamase-encoding gene | 1.1 kb | Polymerase cycling assembly | 1995 | Stemmer et al. (1995) |
A full-length plasmid | 2.7 -kb | |||
Long primers and single-stranded DNA | 206 bp | Asymmetric PCR | 1996 | Wooddell & Burgess (1996) |
Human interleukin-5 | 350 bp | Two step ligation and PCR | 1997 | Mehta et al. (1997) |
Plasmodium falciparum gene | 2.1 kb | PCR-based | 1999 | Withers et al. (1999) |
cryIA(c) Bt | 1800 bp | Successive extension-PCR | 2001 | Peng et al. (2001) |
Full-length poliovirus cDNA | c. 7.5 kb | Polymerase cycling assembly | 2002 | Cello et al. (2002) |
Modified phytase gene fphy | 1300 bp | Successive extension-PCR | 2002 | Peng et al. (2002) |
Signal peptide sequences MF4I | 400 bp | Successive extension-PCR | 2003 | Xiong et al. (2003) |
PDK1 gene | 1712 bp | TBIO | 2003 | Gao et al. (2003) |
PDK1 gene | 1712 bp | TBC | 2003 | Gao et al. (2003) |
phiX174 bacteriophage genome | 5386 bp | Polymerase cycling assembly | 2003 | Smith et al. (2003) |
A human interleukin-18 | 471 bp | Polymerase cycling assembly | 2003 | Li et al. (2003) |
A human alpha interferon | 500 bp | Polymerase cycling assembly | 2004 | Neves et al. (2004) |
Polyketide synthase gene cluster | 32 kb | Polymerase cycling assembly | 2004 | Kodumal et al. (2004) |
Kinds of DNA fragments | 470 bp to 1.2 kb | Two-step PCR | 2004 | Young & Dong (2004) |
Vip3aI gene | 2382 bp | PTDS | 2004 | Xiong et al. (2004a) |
phyI1s gene | 1350 bp | Successive extension-PCR | 2004 | Xiong et al. (2004a) |
A fusion DNA fragment | 20 kb | Polymerase cycling assembly | 2004 | Shevchuk et al. (2004) |
Chicken anemia virus apoptin gene | 366 bp | Polymerase cycling assembly | 2005 | Chen et al. (2005) |
Modified phytase gene phyA-sh | 1347 bp | Successive extension-PCR | 2005 | Xiong et al. (2005) |
Modified phytase gene mphy2 | 1353 bp | Successive extension-PCR | 2006 | Peng (2006a, b) |
Modified phytase gene phy-pl-wt | 1230 bp | PTDS | 2006 | Xiong et al. (2006a) |
Modified phytase gene phy-pl-sh | 1230 bp | Successive extension-PCR | 2006 | Xiong et al. (2006a) |
Renilla reniformis luciferase gene | 936 bp | One-step PCR based | 2006 | Wu et al. (2006) |
Pur operon | 12 kb | PAS | 2006 | Xiong et al. (2006b) |
phyCs gene | 1074 bp | Polymerase cycling assembly | 2006 | Zou et al. (2006) |
GFP segment 1 | 531 bp | Parallel gene synthesis in a microfluidic device based PCA method | 2007 | Kong et al. (2007) |
GFP segment 2 | 529 bp | |||
hjc gene | 390 bp | |||
Randomized alba gene | 327 bp | |||
Manganese peroxidase gene | 1.0–1.5 kb | Modified overlap extension PCR | 2007 | Dong et al. (2007) |
Laccase gene | ||||
Cip1 peroxidase gene | ||||
Human hepatitis B virus large surface antigen gene | 1245 bp | PTDS | 2007 | Lou et al. (2007) |
OsDREB1BI gene | 657 bp | PTDS | 2007 | Qin et al. (2007) |
hlacz-sh gene | 1533 bp | PTDS | 2007 | Xiong et al. (2007a) |
Materials and methods
PCR-based single-step assembly
A gene can be synthesized using synthetic oligonucleotides, each being 30–60 nt in length and with a 6–9 nt overlap with neighboring oligonucleotides, and then assembled in a single-step PCR. This method, called the PCR-based single-step assembly method, was first used to synthesize a 924-bp gene coding for an isozyme of horseradish peroxidase (Jayaraman et al., 1991). In their procedure, oligonucleotides were first ligated and then the product, the entire gene, was PCR amplified using one 5′ and one 3′ outmost oligonucleotide as primers.
Stemmer et al. (1995) described a slightly different method for DNA synthesis from oligonucleotides. Their method does not rely on DNA ligase but uses DNA polymerase for PCR assembly of a large number of 40-nt oligonucleotides. A 1.1-kb fragment gene encoding for the TEM-1 beta-lactamase was assembled in a single reaction from 56 40-nt oligonucleotides. Using the same method, the group also synthesized a 2.7 kb plasmid (Fig. 1). Later, Withers (1999) further optimized the PCR assembly method and used it to synthesize a 2.1 kb Plasmodium falciparum gene (pfsub-1).

Protocol for assembly of synthetic plasmid (modified from Stemmer et al., 1995). One hundred thirty-two oligonucleotides, collectively encoding both strands of a synthetic plasmid, were synthesized (40 nt in length, as well as one 47 nt and one 56 nt). The overlap of complementary oligonucleotides was 20 nt. The 134 oligonucleotides were combined and assembled in a three stage PCR. The high molecular mass assembled product was digested with BamHI and ligated.
The ability to synthesize long, accurate DNA sequences efficiently has become possible in recent years. Smith et al. (2003) synthesized the full-length phiX174 bacteriophage (5386-bp in length) using an improved PCR-based overlap extension strategy in 14 days. Their method is based on an improved version of the traditional overlap extension-PCR (OE-PCR) technique by introducing a ligation step before the OE-PCR. Fully infectious phiX174 virions were recovered after electroporation into E. coli. The accuracy of the synthetic genomes of several infectious isolates was verified by sequence analysis. Because the infectivity was used to select clones of synthetic phiX174 virions, their reported lethal error rate, one out of 500-bp, may have been underestimated.
PCR-based two-step DNA synthesis
With a large number of overlapping oligonucleotides, PCR-based two-step DNA synthesis methods have been used to synthesize long genes (Kodumal et al., 2004; Xiong et al., 2004a, b, 2006b; Reisinger et al., 2006). Oligonucleotides used in these methods are designed to cover the entire sequence of both strands of a gene to be synthesized. The full-length sequence is generated progressively in a single PCR reaction by overlap extension, followed by PCR amplification with two outermost primers (Xiong et al., 2004a; Young & Dong, 2004). One advantage of this approach is its relatively low cost because phosphorylation of primers is not needed (Peng et al., 2003, 2006b; Xiong et al., 2004a, b, 2006b), which is required in many DNA synthesis methods (Strizhov et al., 1996; Smith et al., 2003). Phosphorylated synthetic oligonucleotides were present before subsequent assembly gene. The 5′ ends of oligonucleotides were chemically phosphorylated with kinase in the final step of DNA synthesis. Kinase (such as T4 Polynucleotide kinase) catalyzes the transfer of the γ-phosphate from ATP to the 5′-terminus of polynucleotides or to mononucleotides bearing a 5′-hydroxyl group (Berkner & Folk, 1977; Sambrook & Russell, 2001).
Young & Dong (2004) combined dual asymmetrical PCR (DA-PCR) and overlap extension PCR (OE-PCR). They also eliminated the requirement for optimization for reaction conditions. An advantage of this approach is the relatively low cost because phosphorylation or gel purification of the primers is not needed. Another advantage of shorter oligonucleotides (<25 nt) used in the gene synthesis is that it can also be used directly as sequencing primers, eliminating the requirement of additional sequencing primers, and therefore reducing the overall cost. With the introduction of a T7 endonuclease-mediated cleavage of heteroduplexes resulting from mutations, the method also decreases the mutated products. A combination of DA-PCR and OE-PCR, in the initial DA-PCR, with only four oligonucleotides mixed in each tube considerably reduces the problem of nonspecific annealing. After the DA-PCRs, the resulting adjacent fragments overlap each other and the entire sequence is then amplified by OE-PCR (Fig. 2). This is one of the fastest gene synthesis methods, with the assembly, cloning and sequence verification all achieved in less than a week. Because only one reaction condition is required for the method, it is also easily amenable to automation.

Schematic diagram of two-step PCR gene synthesis method (modified from Young & Dong, 2004). The target DNA is dissected into oligonucleotides of between 25 nt and 50 nt long. Each four adjacent primers were mixed in a separate tube. After dual asymmetrical-PCR, fragments adjacent to each other were jointed together up to 90 nt, and the terminal fragments could be easily extended to full length in the overlap extension PCR step by using the 5′ and 3′ outmost primers.
Xiong (2004a) described another PCR-based two-step DNA synthesis (PTDS) method for gene synthesis (Fig. 3). The protocol involves two key steps: synthesis of individual fragments (c. 500-bp in length) and assembly of individual fragments into the complete gene. They compared the PTDS method simultaneously with several previously published methods with regard to error rates, costs and DNA product quality. The entire process of the PTDS method can be completed in 5–7 days as a low cost (about US $712.4 for 2370 bp) and with a low error rate (an average of 0.12 %). Using this method, they have synthesized the 657-bp transcription factor OsDREB1B of rice (Qin et al., 2007), the 1230-bp Peniophora lycii phytase gene (Xiong et al., 2006b), 1245-bp HBV large surface antigen gene PRS-S1S2S (Lou et al., 2007), 2382-bp vip3aI and 5367-bp CrtEBWY gene (Xiong et al., 2004a). The protocol should also be suitable for synthesis of genes with a high G+C content, repetitive sequences or complex secondary structures.

The PCR-based two-step DNA synthesis (PTDS) strategy used in synthesis gene (modified from Xiong et al., 2004a). Divide all 60-bp oligonucleotides into an appropriate number of groups, with 12 oligonucleotides for each group, each group of 12 oligonucleotides is used to synthesize one 400–500-bp DNA fragment. For each group, 1.5 pmol of each of the inner oligonucleotides and 30 pmol of each of the two outer oligonucleotides were joined together to synthesize 400–500-bp DNA blocks. To obtain the full-length gene, the 500-bp products from the first PCR reactions were mixed and used as the template for the second PCR reaction, with the two outermost oligonucleotides as primers. Approximately equal amounts of each product from the first PCR should be used, in order to obtain high-quality full-length DNA from the second PCR.
An improved PCR-based accurate synthesis (PAS) method has been used to synthesize the pullulanase gene (pula, 2766 bp) and a 12-kb DNA fragment of the Bacillus subtilis pur operan (Xiong et al., 2006b). The protocol involves five steps: (1) design of oligonucleotides to cover the entire DNA sequence, oligonucleotide synthesis and quality control, (2) first PCR to synthesize DNA fragments, (3) second PCR to assemble the products of the first PCR into the full-length DNA sequence, (4) cloning and verification of the synthetic DNA by sequencing and (5) error correction using an OE-PCR technique. The drawback of the PAS protocol is that preparation of oligonucleotides with PAGE purification is costly and labor intensive.
Another two-step PCR-based gene synthesis method is the thermodynamically balanced inside-out (TBIO) method (Gao et al., 2003). The TBIO method involves five key steps: first, TBIO primer design and synthesis. For the TBIO method, overlapping sense-strand primers code for the N-terminal half of the gene sequence, and overlapping antisense-strand primers code for the C-terminal half of the gene sequence. Second, four to six pairs of TBIO primers (c. 60 nt) provide for continued inside-out bidirectional elongation PCRs without a template until the DNA fragment is generated. Third, gel purification of the resulting DNA fragments (0.4–0.5 kb) corresponding to the full length of the DNA to be elongated. Then, a gel-purified DNA ‘inside’ fragment was added to a final concentration c. 40–60 nM and used as the template for further inside-out bidirectional elongation with the next set of ‘outside’ primer pairs. The process of inside-out bidirectional elongation and gel purification was continued until the full-length target sequence was achieved. Finally, the fully elongated and gel-purified synthetic gene sequences are then gel-purified and ligated into a plasmid vector for DNA sequencing (Fig. 4). Gao et al. (2003) compared the TBIO method with the thermodynamically balanced conventional (TBC) method for synthesis of the human protein kinase genes PKB2, S6K1 and PDK1. Of the 15 genes sequenced, the error rate with the TBIO PCR-based gene synthesis method ranged from 0 to 0.3%. The genes synthesized using the TBIO method had fewer errors (0–0.3%) than many other methods (0.1–1%) (Hoover & Lubkowski, 2002; Smith et al., 2003; Xiong et al., 2004a; Binkowski et al., 2005).

The TBIO strategy for synthesizing a gene (modified from Gao et al., 2003). The sense and antisense primers were marked red and blue, respectively. Generation of the initial `inside' double-stranded DNA fragment from the first five pairs of the TBIO primer set. Then, inside to outside bidirectional elongation of the initial ‘inside’ fragment. The initial ‘inside’ fragment is gel-purified and used as the template for further elongation using the next five pairs of outside primers. PCR generates the fully extended and amplified DNA fragment in one step, which is gel-purified and used as the template for bidirectional elongation with the next set of pairs of outside primers.
Successive extension PCR
Another strategy, named successive extension PCR, has been developed for gene synthesis (Xiong et al., 2003, 2004b, 2005; Peng et al., 2003, 2006a). Using this method, Peng (2003) have synthesized the Bt cryIA(c) gene with 26 oligonucleotides. The length of each oligonucleotide fragment was about 90, with 20 nt overlapping between neighboring primers. Primers 1–13 located at the 5′ termini of the template matched the sequence of the Bt cryIA(c) gene while primers 14–26 at the 3′ termini matched the target DNA sequence. If all primers were joined together in one PCR reaction with equal amounts of oligonucleotides, the yields of the target products were very low and smeared DNA bands were observed sometimes when the length of the DNA fragment synthesized was over 1.0 kb (Peng et al., 2003; Xiong et al., 2004a). These problems could be prevented with a minor modification: The entire DNA sequence was assembled with a PCR reaction using a low concentration (1.5 pmol) of each inner primer and a high concentration (30 pmol) of each of two outermost primers. Then, the second PCR reaction was performed with the two outermost oligonucleotides as primers to enrich the full-length DNA (Fig. 5b; Xiong et al., 2004a).

The successive extension PCR strategy for synthesizing a gene. (a) The one-step successive extension PCR strategy for synthesizing a gene (modified from Peng et al., 2003; Xiong et al., 2004b). (b) The modified two-steps successive extension PCR strategy for synthesizing a gene (modified from Xiong et al., 2004a).
Simplified gene synthesis
Conventionally, most PCR-based gene synthesis uses two successive PCR reactions (Gao et al., 2003; Xiong et al., 2004a; Young & Dong, 2004). Recently, Wu (2006) synthesized three genes with a simplified method that combines these two PCR steps into one (Fig. 6). A single solution with all oligonucleotide (0.4 μM each) was used for a one-step PCR reaction for gene assembly. They found that the efficiency of this one-step method, named simplified gene synthesis (SGS), is affected by multiple parameters of the PCR reactions. In particular, the choice of polymerase is most critical for successful one-step assembly. Other important factors include the concentration of oligonucleotides and amplification primers. Under optimal conditions, the simplified gene synthesis method can be used to synthesize a DNA sequence with high fidelity and may be further optimized towards complete automation for the gene synthesis (Wu et al., 2006).

The simplified gene synthesis (SGS) method based on Wu (2006). Each oligonucleotide (40 nt and the overlapping region was 18–20 nt) to be assembled is represented as an arrow. The primers used for amplification are denoted as dotted arrows.
Strizhov et al. (1996) synthesized a 1907-bp cryIC gene by ligation of oligonucleotide modules using another simple strategy called the template-directed ligation-PCR method (TDL-PCR). The method uses a single-stranded DNA template derived from a wild-type gene as a template (Fig. 7) and requires the synthesis of oligonucleotides for only one strand, in contrast to other methods that require the synthesis of both strands. Thermostable Pfu DNA ligase was used to perform thermal cycling for the assembly, selection and ligation of full-length DNA molecules as well as for linear amplification of the TDL products. In combination with chemical phosphorylation, the TDL method provides a sequence-specific selection for phosphorylated full-length oligonucleotides from a complex mixture of nonphosphorylated products, and yields synthetic cryIC DNA segments generated by ligation (Strizhov et al., 1996). The major advantage of the TDL-PCR strategy is that only one strand of the target DNA needs to be synthesized; consequently, the cost is much lower compared with other strategies, which require the synthesis of both strands. The limitation of this method is, however, the requirement of template DNA of a wild-type gene.

The template directed ligation-PCR (TDL-PCR) method for gene synthesis modification (modified from Strizhov et al., 1996). This method uses oligonucleotides that are only partially homologous to the single strand template and when ligated they can produce a DNA molecule that is different from the original template. For example, the original double stranded sequence is (plus strand) 5′AAAATTTT3′/(minus strand) 5′TTTTAAAA3′, with partial homologous primer 5′AAAGTTTT3′, the final synthesized (designed) double stranded sequence becomes (plus strand) 5′AAAGTTTT3′/(minus strand) 5′TTTCAAAA3′.
Synthons and ligation by selection
Cello et al. (2002) reported that full-length poliovirus cDNA can be synthesized by assembling oligonucleotides of plus and minus strand polarity. The strategy used is to assembly a full-length cDNA from three large, overlapping DNA fragments. First, each DNA fragment was obtained by combining overlapping segments of 400 to 600 bp. The DNA segments were synthesized by assembling purified oligonucleotides (c. 69 nt) with overlapping complementary sequences at their termini. The segments were then ligated into a plasmid vector and then assembled stepwise to yield full-length cDNA via common unique restriction endonuclease cleavage sites. Their results show that it was possible to synthesize an organism by in vitro chemical–biochemical means solely by following instructions from a written sequence.
Kodumal (2004) have developed and implemented a strategy for high-throughput synthesis of long, accurate DNA sequences. Unpurified 40-nt synthetic oligonucleotides are assembled into 500–800 bp ‘synthons’ with low errors by automated PCR-based gene synthesis. These synthons are then efficiently joined into multisynthon, c. 5-kb segments, with three endonucleases and dubbed ‘ligation by selection’ (LBS). These large segments can be subsequently assembled into a very long sequence by conventional cloning (Fig. 8). They validated the approach by building a synthetic 31 656-bp polyketide synthase gene cluster whose functionality was demonstrated by its ability to produce the mega-enzyme and its polyketide products in E. coli (Kodumal et al., 2004). The advantage of this method, which combines PCR and ligation, is that smaller synthons first assembled from oligonucleotides can be easily identified by DNA sequencing reactions and can be further assembled into 5-kb DNA segments in about 2 weeks. The 31.7-kb polyketide synthase gene cluster is the largest DNA segment of chemical synthesized till now (Reisinger et al., 2006).

The strategy of combined PCR and ligation (synthons and ligation by selection) to synthesize a contiguous 32-kb polyketide synthase gene cluster (modified from Kodumal et al., 2004).
Vendors for gene synthesis
Chemical synthesis of genes can be carried out by commercial vendors, such as Slonomics™ (Sloning Biotechnology GmbH, Puchheim, Germany, http://www.sloning.de), GeneMaker (Blue Heron Bio Company, Bothell WA, USA, http://www.blueheronbio.com), Generay (Generay Biotech, Shanghai, China, http://www.generay.com.cn) and Sangon (Sangon Biotech, Shanghai, China, http://www.sangon.com). The Blue Heron technology is based on a solid-phase support strategy and enables automation (Mulligan et al., 2002, 2007; Parker & Mulligan, 2003; Mulligan & Tabone 2003, 2006). The GeneMaker is a fully automated, high-throughput gene synthesis platform (Stewart et al., 2002; Ball et al., 2004; Herrera et al., 2005; Schmidt et al., 2006; Bugl et al., 2007). Sloning Building Block technology, in the trade name of Slonomics™, is another advanced method based on a ligation-based strategy for chemical gene synthesis (Schatz & O'Connell, 2003; Schatz et al., 2004, 2006). Generay and Sangon use PCR-based technologies for gene synthesis.
Designing softwares
Designing oligonucleotides for synthesis of long DNA sequences can be a time-consuming, difficult and confusing process because many modifications, such as codon usage, GC content, restriction enzyme sites and secondary structures, need to be considered. User-defined sequences, such as restriction enzymes sites, should also be included to facilitate subsequent cloning and other manipulations. Concomitant with progresses in gene-synthesis technologies, computational software to optimize codons, to incorporate or eliminate restriction sites and secondary structures and to allow modular exchange of segments has been developed. There is also substantial interest in implementing bioinformatics tools for designing oligonucleotides. Several gene design software programs are currently available.
Dnaworks
The dnaworks program (http://mcl1.ncifcrf.gov/lubkowski.html), which automates the process of oligonucleotide design for synthetic gene construction, requires simple input information, i.e. the amino acid sequence of the target protein and melting temperature (needed for the gene assembly) of synthetic oligonucleotides. The program outputs a series of oligonucleotide sequences with optimized codons for expression in an organism of choice. Those oligonucleotides are characterized by highly homogeneous melting temperatures and a minimized tendency for hairpin formation. dnaworks provides an automated method for designing oligonucleotides for PCR-based gene synthesis (Hoover & Lubkowski, 2002).
Gems
gems, or this gene morphing system (http://software.kosan.com/GeMS), is another advanced, user-friendly software package (Jayaraj et al., 2005). This software has broad utility in the design of synthetic genes by PCR assembly of short oligonucleotides. The software comprises of a composite suite of programs, and is also provided as a stand-alone tool that automatically performs many tasks in designing a gene, including restriction site prediction, codon optimization for expression in a specific host, inclusion or exclusion of restriction sites, separation of a long gene into synthesizable fragments, Tm and stem loop determinations, optimal oligonucleotide component design and design verification/error-checking. The user interface also accommodates inexperienced users, with explanatory notes provided. The software has been used to design and successfully synthesize over 400 genes, many of which exceeded 5 kb in length, and the longest one was 32 kb (Kodumal et al., 2004). gems can be used to design synthetic genes to be made by the PCR assembly of short oligonucleotides and should also be adaptable to ligation methods. gems offers automatic spacing or manual placement of unique or redundant restriction sites along the gene sequence (Jayaraj et al., 2005).
Gene2Oligo
Rouillard (2004) developed gene2oligo (http://berry.engin.umich.edu/gene2oligo/), a web-based tool that divides a long input DNA sequence into a set of adjacent oligonucleotides representing both DNA strands. The length of oligonucleotides is dynamically optimized to ensure both specificity and uniform melting temperatures necessary for in vitro gene synthesis. Rouillard's group has successfully designed over 30-kb synthetic DNA with help from the software (Rouillard et al., 2004). gene2oligo can perform most, if not all, functions needed for gene design.
Vector nti
vector nti, a freeware program for academic users, allows scientists to study and analyze biological molecules (http://www.invitrogen.com/). It has a centralized database and five application modules: vector nti, alignx, bioannotator, contig express and genom bench (Lu & Moriyama, 2004). The software provides a number of tools for the construction and manipulation of DNA sequences, including primer design for chemical gene synthesis. vector nti advance is a more robust and highly integrated application available in the market for desktop sequence analysis and molecular biology data management, with robust data management capabilities, unique genomic sequence analysis features, superior graphics, nonproprietary sequence file formats and excellent professional technical support (Gorelenkov et al., 2001; Zhang et al., 2006, 2007). Although some improvements are necessary, vector nti appears to be a well-balanced integrated software package (Lu & Moriyama, 2004; Tippmann et al., 2004).
Others
Richardson (2006) describe another set of web-based programs (http://slam.bs.jhmi.edu/gd), genedesign, for optimization of protein expression and/or redesign of a gene of interest for detailed structural/functional studies (e.g., mutagenesis). genedesign combines many modules to provide a platform for the design of large genes for rapid synthesis. Rydzanicz (2005) described a computer program, assembly pcr oligo maker(http://publish.yorku.ca/~pjohnson/AssemblyPCRoligomaker.html), for automatic design of oligonucleotides for the PCR-based construction of long DNA molecules. Andersson (2005) have developed a method, implemented in the software dualprime (http://www.biotech.kth.se/molbio/microarray/), which reduces the number of primers required to amplify genes from two different genomes. The software of dualprime identifies regions of high sequence similarity, and design PCR primers shared between the genomes in these regions, such that either one or, preferentially, both primers in a given PCR can be used for amplification from both genomes.
Chemical synthesis of genes or genomes in vitro requires a user-friendly, easy and powerful software package for design. dnaworks is an automated method for designing oligonucleotides for PCR-based gene synthesis (Hoover & Lubkowski, 2002). gems is a user-friendly, advanced software package for designing synthetic genes, which is suitable for the PCR-mediated assembly of short oligonucleotides and should also be adaptable to ligation methods. gems offers automatic spacing or manual placement of unique or redundant restriction sites along the gene sequence (Jayaraj et al., 2005). gene2oligo can perform all of the functions needed for gene design in a directed, step-wise manner (Rouillard et al., 2004). vector nti is a robust and highly integrated application for sequence analysis and molecular biology data management but it takes time for users to become skilled (Lu & Moriyama, 2004; Tippmann et al., 2004).
Error corrections
Current oligonucleotide synthesis technologies have a tendency to produce oligonucleotides that are either prematurely terminated, or more detrimentally, contain internal deletions in the sequence. Accurate chemical synthesis of a DNA sequence depends on precise DNA amplification by DNA polymerases (Cline et al., 1996; Andre et al., 1997). Error rates of 1–10 errors per kb of DNA have been reported frequently, and the error frequency increases as the length of an oligonucleotide increases (Hoover & Lubkowski, 2002; Smith et al., 2003; Xiong et al., 2004a; Binkowski et al., 2005). High error rates are a substantial obstacle to fast, ultra-low-cost gene synthesis. Verification of synthetic DNA sequences and subsequent error corrections of errors are expensive and also cause a delay in turnover time. Several error correction strategies for chemical gene synthesis have been developed.
Oligonucleotide purification and their lengths
The success of chemical gene synthesis largely depends on the quality and purity of the oligonucleotides. Errors in oligonucleotides may produce undesirable or detrimental mutations, increase costs because error corrections are expensive and prolong the delivery time (Carr et al., 2004; Xiong et al., 2004a, 2006b; Binkowski et al., 2005). Use of polyacrylamide gel electrophoresis (PAGE, denatured, 7 M, with urea) to purify oligonucleotides can reduce error rates in the final products several fold because insertion or deletion mutations can be discarded (Ausubel et al., 1995; Sambrook & Russell, 2001; Xiong et al., 2006b). However, one needs to note that PAGE purification cannot eliminate mutant species with nucleotide substitutions, and is less effective in identifing species with one nucleotide insertion or deletion (Young & Dong, 2004). Also, purification of oligonucleotides with PAGE is costly and labor intensive.
The length of the oligonucleotides used for assembly is an important factor that influences error rates of the final products of DNA synthesis. The lengths of oligonucleotides vary from 40 bp (Stemmer et al., 1995; Wu et al., 2006), 42 bp (Smith et al., 2003), 60 bp (Xiong et al., 2004a, 2006b), 90 bp (Xiong et al., 2004b, 2005), and even over 100 bp (Ciccarelli et al., 1990; Dillon & Rosen, 1990; Ye et al., 1992; Strizhov et al., 1996; Li et al., 2003). It is conceivable that shorter oligonucleotides should have fewer carry-over errors (Xiong et al., 2004a; Young & Dong, 2004) but are more expensive to synthesize because requiring more overlap sequences in comparison with longer ones. Xiong (2004a) considered that 60-bp oligonucleotides provided a reasonable balance between low error rates and a low production cost for a comparison of fidelity using proofreading and nonproofreading polymerases in different buffers for PCR to amplify the target sequence from genomic DNA and for conversion PCR, Day (1999) tested different thermostable DNA polymerases in several PCR buffers, and found PCR conditions that improved fidelity in some cases.
Enzymatic mismatch cleavage
For all the currently known methods of chemical gene synthesis, the quality of the product is directly dependent on the accuracy of the oligonucleotides. Single base-pair mismatches/substitutions, insertions or deletions cannot be avoided during PCR-mediated amplifications and assembling (Modrich et al., 1991; Fuhrmann et al., 2005; Zeglis & Barton, 2007). Several methods have been developed to remove mismatches, with varying degrees of success. One is to use HPLC or PAGE techniques to remove oligonucletotides with deletions and additions. Another method is the MutHLS-mediated removal of mutant sequences produced during PCR amplification (Smith & Modrich, 1997). Escherichia coli MutS, MutL and MutH are used together to cleave flawed DNA products. The method can easily repair mismtaches caused by G–T, A–C, G–G, A–A substutions and small insertion or deletion but is less effective to correct other types of mismatches. Carr (2004) used a DNA mismatch-binding protein, MutS from Thermus aquaticus, to remove the synthetic DNA containing errors. This method yielded one error per 10 kb produced, a 15-fold reduction compared with conventional DNA synthesis methods. With this improvement, larger genes can be synthesized conveniently, without additional cloning steps or excessive sequencing. The approach can also be iterated multiple times for greater fidelity.
Consensus shuffling has been used to reduce random errors significantly in synthetic DNA (Binkowski et al., 2005). In this method, errors are revealed as mismatches by rehybridization of the population of DNA molecules containing multiple errors. The DNA duplexes containing mismatches can be removed from the population by affinity capture with immobilized mismatch binding protein (MutS). By two iterations of consensus shuffling, a synthetic green fluorescent protein (GFPuv) had only about one error per 3500-bp errors, a 3.5- to 4.3-fold decrease from those without consensus shuffling (Binkowski et al., 2005).
Huang (2002) developed a mutation scanning method that combines thermostable endonuclease V (Endo V) and DNA ligase. Variant and wild-type PCR amplicons are generated using fluorescent-labeled primers and heteroduplexed. Thermotoga maritima (Tma) Endo V recognizes and cleaves primarily heteroduplex DNA one base 3′ to the mismatch, in addition to nicking matched DNA at low levels. Thermus species (Tsp.) AK16D DNA ligase reseals the background nicks. Fluorescent products are separated on a DNA sequencing gel, which reveals the approximate position of the mutation. Although this method has been used successfully to detect mutations in some cases, cleavage of some exons by Endo V can make it difficult to distinguish the correct mutation cleavage signal (Pincas et al., 2004). In an improved two-step mutation scanning method (Pincas et al., 2004), Endo V is used to nick at mismatches and DNA ligase is then used to reseal incorrectly or nonspecifically nicked sites based on enzymatic mismatch cleavage by thermostable endonuclease V (EndoV), followed by a proofreading step with thermostable DNA ligase.
Fuhrmann (2005) used specific endonucleases to remove undesirable sequence variants from primary gene synthesis products. Single base-pair mismatches, insertions and deletions can be cleaved with specific endonucleases, such as phage T4 endonuclease VII, T7 endonuclease I and E. coli endonuclease V. Fuhrmann (2005) tested for the ability of endonucleases to cleave double-stranded DNA containing a single mismatched base pair in the bacterial chloramphenicol-acetyltransferase (cat) gene. Use of enzymatic mismatch cleavage to improve the quality of primary synthesis products allows one to increase the size of single-step assemblies over 1 kb, because the error frequency can be reduced considerably. Statistical analysis of error numbers in synthetic genes as determined by DNA sequencing revealed that the use of T4 and E. coli endonucleases reduced the occurrence of mutations in synthesized genes about 400-fold than the enzymatic mismatch cleavage step (Fuhrmann et al., 2005).
Functional selection
Functional selection of synthetic genes has also been used to yield genes with antibiotic resistance or a replicative bacteriophage (Smith et al., 2003). However, in many cases, functional screening is time-consuming, difficult or impossible. For instance, if one wants to synthesize a gene that will be expressed in flowers of higher plants, functional screening at the gene assembly and synthesis stages is not practical (Xiong et al., 2004a).
Site-directed mutagenesis
Site-directed mutagenesis has been widely used in molecular biological studies and genetic engineering and to study protein structure–function relationships (Akopian & Marshall, 2005; Yuan et al., 2005; Foley & Burkart, 2007; Woycechowsky et al., 2007). The method can also be a valuable tool to correct errors in gene synthesis. The availability of commercial mutagenesis kits allows efficient mutagenesis without subcloning (Salerno et al., 2005). One of the most widely used methods is the QuikChange® Site-Directed Mutagenesis Kit (Stratagene, La Jolla, CA), which produces mutation efficiencies of greater than 80% and uses a simple, one-day protocol (Wang & Malcolm, 1999). In the QuikChange method, point mutations are introduced by annealing two complementary oligonucleotides to a plasmid DNA template and extending the mutant primers in a linear cyclic amplification reaction with robust, high-fidelity PfuTurbo DNA polymerase. Extension products are digested with DpnI to eliminate methylated (parental plasmid) and hemi-methylated (parental/mutant hybrids) DNAs selectively, and, upon transformation, the majority of clones contain the desired mutation(s) (Hogrefe et al., 2002). The Stratagene QuikChange mutagenesis kit is an efficient and rapid method for the mutagenesis of DNA (Wang & Malcolm, 2002; Steffens & Williams, 2007). It can introduce point mutations at up to five sites simultaneously in plasmid DNA templates (Hogrefe et al., 2002; Scott et al., 2002; Cabre et al., 2004). Multi site-directed mutagenesis and creating randomized amino acid libraries with the QuickChange kit eliminate multiple subcloning steps, and are therefore quite rapid (Wang & Malcolm, 1999; Hogrefe et al., 2002; Miyazaki & Takenouchi, 2002; Kelley & Momany, 2003; Arendt et al., 2007).
Numerous mutagenesis methods have also been developed based on PCR techniques. Examples are overlap extension PCR-mediated site-directed corrections (Aiyar et al., 1996; Mikaelian & Sergeant, 1996; Pogulis et al., 1996; Ling & Robinson, 1997; Rabhi et al., 2004; An et al., 2005; Peng et al., 2006b; Xiong et al., 2006b; Heckman & Pease, 2007). However, these methods require at least two rounds of cloning and sequencing, as well as additional oligonucleotide synthesis.
Applications
Because of efficiency and fidelity improvement, cost reduction and process automation, chemical gene synthesis has been used in a wide array of basic and applied areas in molecular biology and biotechnology. Examples of these applications of chemical synthesis of genes are, for example, gene and protein manipulation from production in heterologous systems such as plants and microorganisms, creation of artificial (synthetic) life via synthetic genomes, gene disruption, directed molecular evolution and large-scale and high-quality cDNA fragment production. Some scientists propose to address concerns over the security of research involving commercial DNA synthesis (Bhattacharjee et al., 2007; Bugl et al., 2007).
Codon optimization for heterologous expression
One of the common and key applications is to express chemically synthesized genes in different microorganism and plant hosts to produce target proteins. There are many reasons for using modified genes produced from chemical synthesis and heterologous expression systems. For instance, with modifications, target genes can be more efficiently expressed in a heterologous system and expressed protein may be more readily extracted and purified (Macauley et al., 2005; Terpe et al., 2006). Heterologous systems often offer many advantages because proteins produced in these systems can be adapted more for large-scale extraction and purification and quality control, thus leading to a reduction in the production cost and better quality of final products (Bergquist et al., 2002; Haefner et al., 2005). Also, using heterologous systems to produce pharmaceuticals of human or animal origins can reduce health concerns when compared with the production in their native systems (Baez et al., 2005; Singh & Bhalla, 2006; Forstner et al., 2007). Historically, the expression of chemically synthesized genes in E. coli can at least be dated back to 1982 with a novel opiate peptide α-neo-endorphin (Tanaka et al., 1982).
The frequency of genetic codon usage varies significantly among organisms. Codon bias has been identified as one of the most important factors affecting prokaryotic and eukaryotic gene expressions (Alvager et al., 1989, 1990; Kurland et al., 1991; Akashi et al., 1997; Gustafsson et al., 2004; Sorensen & Mortensen, 2005). For a gene to be highly expressed in a foreign host, it often requires codon optimization, which has become a standard molecular biology protocol to overcome poor gene expression. Preferential codon usages, whether it is in prokaryotics, such as E. coli (Grosjean & Fiers, 1982; Makrides et al., 1996; Jana & Deb, 2005; Henry & Sharp, 2007; Stoletzki & Eyre-Walker, 2007), P. fluorescens (Peng et al., 2003) and B. subtilis (Shields & Sharp, 1987; Sharp et al., 1988; Moszer et al., 1999), or in eukaryotes, such as yeast (Grosjean & Fiers, 1982; Sharp et al., 1986; Zhao et al., 2000) and plants (Murray et al., 1989; Gustafsson et al., 2004), have been documented widely. Fuglsang (2003) developed a freeware tool for codon optimization, called Codon Optimizer. Gao (2004) developed another public software, which can be used to optimize any genes of interest (http://www.vectorcore.pitt.edu/upgene.html). In their software, the DNA optimization algorithm is integrated with a PCR primer design for optimal gene expression.
Pichia pastoris, a methylotrophic yeast utilizing methanol as the sole carbon source, has become a popular system for heterologous protein expression. It can grow in a simple defined medium to reach a very high cell density, and can accumulate an extremely high concentration of intracellular protein under the control of the methanol-regulated alcohol oxidase (AOX1) promoter (Gellissen et al., 2000). However, to achieve a high expression level of a foreign protein, such as a phytase, in P. pastoris, modification of its coding sequences is a must in many cases. With codon optimization and other modifications in the DNA sequence, Xiong and his colleagues have expressed a novel phytase from Aspergillus niger 113, a recombinant thermostable phytase and two recombinant acidic phytases in P. pastoris and obtained a yield of c. 10 g L−1 (Peng et al., 2002; Xiong et al., 2004b, 2005, 2006a). Xiong (2005) used P. pastoris preferred codons in the recombinant acid phytase gene (A. niger SK-57) and obtained a 14.5-folds increase in the production/activity of phytase in P. pastoris with the modified MF4I signal peptide. The use of the modified coding sequence of the phytase gene with bias codon usage led to twice as much phytase activity compared with its wild-type version (Xiong et al., 2003). The synthetic gene encoding P. lycii phytase using the modified codons led to 4.4 times the phytase yield compared with the wild-type counterpart (Xiong et al., 2006a). A phyCs gene encoding neutral phytase was designed and synthesized according to the methylotrophic yeast P. pastoris codon usage bias without altering the wild-type protein sequence. The yield of total extracellular phytase activity was 17.6 U mL−1 at the flask scale, a 90-fold increase compared with the wild-type isolate (Zou et al., 2006).
Chemically synthesized phytase genes have also been expressed successfully in Saccharomyces cerevisiae (Hase et al., 1987; Antoniukas et al., 2006) and Pseudomonas fluorescens (Peng et al., 2003) and in plants (Peng et al., 2006a). Peng (2006a) expressed an Aspergillus phytase gene in canola (Brassica napus). Phytase in transgenic plant was enhanced, with codon usage modified according to plant preferred codons. Their studies illustrated that the modified A. niger phytase gene with codons biased to plant is highly effective in increasing the level of the phytase protein in plants. The synthetic gene-encoding yeast (Schwanniomyces occidentalis) phytase whose codon usage was changed to be more similar to that of rice, and then introduced into rice led to a marked increase in enzyme activity, from 0.039 U g−1 (fresh weight) up to 4.6 U g−1 (Hamada et al., 2005). These results indicate that codon modification using chemically synthesized DNA methods, combined with the use of other sequences like a secretary signal sequence, can improve the yield and quality of phytase in heterologous systems.
Chemical gene synthesis has been used widely in crop improvement for traits like the yield, quality and resistance of insects and diseases (Estruch et al., 1997). A chemically synthesized Bacillus thuringiensis (Bt) toxin gene with expression optimization has been introduced into tobacco (Burgess et al., 2002), potato (Gulina et al., 1994), cotton (Wu et al., 2003), rice (Khanna & Raina, 2002), pine (Tang & Tian, 2003) and poplar (Zhang et al., 2002). Manjunath (2007) obtained high tryptophan maize using a chemically synthesized gene. Modified synthetic genes have also been expressed successfully in animal and human cells for enhancing the efficacy of DNA vaccines (Manoj et al., 2004), optimizing cardiovascular gene therapy (Kibbe et al., 2000) and increasing the expression of an active HIV-1 integrase in human cells (Cherepanov et al., 2000).
Synthesis of vectors and genomes
The ability to synthesize long, accurate DNA sequences efficiently is becoming increasingly important in order to take advantage of the huge potential of whole-plasmid and whole-genome sequence information. A full-length plasmid (2.7 kb) containing the bla gene, the α-fragment of the lacZ gene and the pUC origin of replication was synthesized (Stemmer et al., 1995). There were two successful synthetic replication-competent viral genomes. The first was that of the poliovirus reported by Cello et al. (2002). The second was that of a synthetic full-length phiX174 bacteriophage genome by Smith et al. (2003). With the rapid progress in the chemical synthesis of long DNA sequences and a reduction in cost, artificial life forms can now be created via chemically synthesized genomes (Zimmer, 2003; Check, 2005; Ball, 2007).
Gene disruption construct
Chemical synthesis of DNA sequences may provide a useful tool for creating gene disruption constructs for the production of large-scale mutants. The understanding of physiological processes has been considerably facilitated by creation of mutant strains or knock-out lines. Targeted deletions of almost any gene are possible because of development of a series of molecular biology techniques. Because some entire genomes were known, such as S. cerevisiae and Dictyostelium discoideum, construction and disruption of specific sequences, followed by analyses in the phenotype is one of the most powerful genetic tools for those organisms (Kreppel et al., 2004). The classical strategy for gene disruption requires isolation of a gene and digestion with restriction enzymes (Rothstein et al., 1983) but the lack of adequate restriction sites can create difficulties in this. Several PCR-based and other strategies to obtain disruption cassettes have been reported (Lorenz et al., 1995; Wach et al., 1996; Kaur et al., 1997; Nikawa & Kawabata, 1998; González, 1999; Queiros et al., 2001; Kuwayama et al., 2002; Zaragoza et al., 2003; Walker et al., 2005; Szewczyk et al., 2006). A rapid and efficient method to generate multiple gene disruptions using a single selectable marker and the Cre-loxP system was established by Faix (2004). However, many of those methods require several PCR steps, which are suboptimal for gene disruption requiring a large fragment of DNA molecules (Kuwayama et al., 2002; Walker et al., 2005). Furthermore, lack of adequate restriction sites was another disadvantage when carrying out double or multiple gene disruptions (Zaragoza et al., 2003). Because chemical gene synthesis can potentially lead to assembly of any DNA sequence, its use in the construction of gene disruption cassettes should make gene disruption simpler, more rapid and relatively inexpensive.
Molecular evolution
Molecular evolution in vitro is a powerful engine for the creation of a new phenotype. Many useful enzymes and peptides have been created following artificial evolution. DNA shuffling, high-throughput screening and chemical synthesis are important tools for the optimization of many commercially available enzymes, for which selections do not exist. Stemmer introduced the method of DNA shuffling for the in vitro formation of recombinant genes from a set of parental genes (Stemmer et al., 1994a, b). DNA shuffling and high-throughput screening offers a systematic approach to the creation of new genes and proteins and toward understanding the protein complexity, structures and function. Biologically active proteins have been used widely for medical (Locher et al., 2005), industrial (Otten & Quax, 2005), environmental purposes (Furukawa et al., 2004) and crop improvement (Lassner & Bedbrook, 2001).
It is unclear, however, whether it is more efficient to mutate an enzyme randomly or to mutate active sites or key sites specifically (Morley & Kazlauskas, 2005; Xiong et al., 2007a). Some DNA shuffling experiments have shown that amino acid changes distant from the active site can affect substrate specificity (Flores & Ellington, 2002; Xiong et al., 2007a). Such changes may alter the orientations of active site residues or the conformational dynamics of the entire protein, and therefore their effects on protein activity are difficult to predict. A conventional view is that changes of amino acids near the substrate-binding site are more likely to modify substrate specificity (Zhang et al., 1997; Geddie & Matsumura, 2004; Morley & Kazlauskas, 2005). Nevertheless, the structure of a desired protein is not always solved, residues that interact directly with the atom in question are not always known and active sites are not always identified. Xiong (2007a) developed a strategy of a semi-rational design of directed evolution, which integrates chemically synthesized DNA sequence, a semi-rational design, degenerate oligonucleotide and DNA shuffling strategies. To achieve a high expression level of β-galactosidase, the 1553-bp gene was synthesized and optimized for codon usage, GC content, as well as mRNA secondary structures. A set of synthesized genes was used to obtain mutants with high β-galactosidase activity (Xiong et al., 2007b) based on a high-efficiency and high-throughput system of directed evolution (Xiong et al., 2007c, d). Rational design and modification of genes via chemical gene synthesis therefore offer a short path toward directed evolution, which has recently emerged as an attractive approach for elucidation of protein functions and for improvement of protein activities (Xiong et al., 2006c).
Conclusion and future perspectives
Chemical DNA synthesis provides a powerful tool for basic biological research and biotechnological applications. During the last 15 years, the chemical gene synthesis technologies have been improved considerably. The lengths of DNA that can be synthesized have extended from <1 kb to >30 kb. Development of various computer softwares has also facilitated better oligonucleotide design and synthesis strategies. Because of rapid progress in the areas of refinement of existing technologies and development of new ones, we foresee that gene synthesis will be more efficient and less expensive, with higher fidelities. At the same time, gene synthesis will have much broader applications in biological fields, particularly when combined with gene shuffling, codon optimization and targeted mutagenesis. Facile designing and writing of DNA fragments, especially rapid designing and writing of ORFs for expressed proteins, that encode entire gene sequences potentially has widespread applications in biological analysis and engineering (Cox et al., 2007). Rapid progress in the chemical synthesis of long DNA or long RNA sequences and reduction in cost could transform protein engineering and production for protein design, synthetic biology and structural analysis (Check, 2005; Ball, 2007; Masuda et al., 2007). The ability to design and chemically synthesize DNA can create new proteins (Kuhlman et al., 2003). When this technology is combined with all other genomics, post-genomics and proteomics approaches, our power to elucidate the broad biological mechanisms will expand exponentially. The knowledge will be consequently transferred to new technologies for improvement of human health, food, feed, fiber and bio-fuel security and environmental stewardship (Miranda & Alewood, 2000; Fadiel et al., 2007; Pierce et al., 2007; Wu et al., 2007).
Acknowledgements
The research described here in Yao's Laboratory was supported by the Shanghai Rising-Star Program (Genzong); Shanghai Subject Chief Scientist (06XD14017); Shanghai Project for International Scientific and Technological Cooperation (055407068); The Shanghai Key Basic and laboratory Research Project (06DZ19103-07dz22011); and the 863 Program (2006AA10Z117-06Z358). The research in Cheng's Laboratory was supported by the University of Tennessee Agricultural Experiment Station. The research in Li's Laboratory was supported by the University of Connecticut Agriculture Experiment Station and USDA grants.
Statement
For this paper, there is an additional corresponding author. Their details are: Zong-Ming Cheng, Department of Plant Sciences, University of Tennessee, Knoxville, TN 37996, USA. Tel.: +1 865 974 7961; fax: +1 865 974 5365; e-mail: [email protected]
References
Editor: Jiri Damborsky