Gene synthesis
Gene synthesis is the process of synthesizing a gene in vitro without the need for initial template DNA samples.
Synthesis of the first complete gene, a yeast tRNA, was demonstrated by Har Gobind Khorana and coworkers in 1972. Synthesis of the first peptide- and protein-coding genes was performed in the laboratories of Herbert Boyer and Alexander Markham, respectively.
Commercial gene synthesis services are now available from numerous companies worldwide. Current gene synthesis approaches are most often based on a combination of organic chemistry and molecular biological techniques and entire genes may be synthesized “de novo,” without the need for precursor template DNA. Gene synthesis has become an important tool in many fields of recombinant DNA technology including heterologous gene expression, vaccine development, gene therapy and molecular engineering. The synthesis of nucleic acid sequences is often more economical than classical cloning and mutagenesis procedures.
Gene Optimization
While the ability to make increasingly long stretches of DNA efficiently and at lower prices is a technological driver of this field, increasingly attention is being focused on improving the design of genes for specific purposes. Early in the genome sequencing era, gene synthesis was used as an (expensive) source of cDNA's that were predicted by genomic or partial cDNA information but were difficult to clone. As higher quality sources of sequence verified cloned cDNA have become available, this practice has become less urgent. However, producing large amounts of protein from gene sequences (or at least the protein coding regions of genes, the open reading frame) found in nature can sometimes prove difficult. Many of the most interesting proteins sought by molecular biologist are normally regulated to be expressed in very low amounts in wild type cells. Redesigning these genes offers a means to improve gene expression in many cases. Rewriting the open reading frame is possible because of the redundancy of the genetic code. Thus it is possible to change up to about a third of the nucleotides in an open reading frame and still produce the same protein. The available number of alternate designs possible for a given protein is astronomical. For a typical protein sequence of 300 amino acids there are over 10150 codon combinations that will encode an identical protein. Using optimization methods such as replacing rarely used codons with more common codons sometimes have a dramatic effects. Further optimizations such as removing RNA secondary structures can also be included. At least in the case of E. coli, protein expression is maximized by predominantly using codons corresponding to tRNA's that retain amino acid charging during starvation (14). Computer programs are written to perform these and other simultaneous optimizations are used to handle the enormous complexity of the task. A well optimized gene can improve protein expression 2 to 10 fold, and in some cases more than 100 fold improvements have been reported. Because of the large numbers of nucleotide changes made to the original DNA sequence, the only practical way to create the newly designed genes is to use gene synthesis.
Standard Methods
1 ) Chemical Synthesis of Oligonucleotides
Oligonucleotides are chemically synthesized using nucleotides, called phosphoramidites, normal nucleotides which have protection groups: preventing amine, hydroxyl groups and phosphate groups interacting incorrectly. One phophoramidite is added at a time, the product's 5' phosphate is deprotected and a new base is added and so on (backwards), at the end, all the protection groups are removed. Nevertheless, being a chemical process, several incorrect interactions occur leading to some defective products. The longer the oligonucleotide sequence that is being synthesized, the more defects there are, thus this process is only practical for producing short sequences of nucleotides. The current practical limit is about 200 bp for an oligonucleotide with sufficient quality to be used directly for a biological application. HPLC can be used to isolate products with the proper sequence. Meanwhile a large number of oligos can be synthesized in parallel on gene chips. For optimal performance in subsequent gene synthesis procedures they should be prepared individually and in larger scales.
2) Annealing based connection of Oligonucleotides
By connecting a series of oligonucleotides and cloning the resulting sequence into a plasmid, gene sized pieces of DNA can be assembled. Usually, a set of individually designed oligonucleotides is made on automated solid-phase synthesizers, purified and then connected by specific annealing and standard ligation or polymerase reactions. To improve specificity of oligonucleotide annealing, the synthesis step relies on a set of thermostable DNA ligase and polymerase enzymes. To date, several methods for gene synthesis have been described, such as the ligation of phosphorylated overlapping oligonucleotides, the Fok I method and a modified form of ligase chain reaction for gene synthesis. Additionally, several PCR assembly approaches have been described. They usually employ oligonucleotides of 40-50 nt long that overlap each other. These oligonucleotides are designed to cover most of the sequence of both strands, and the full-length molecule is generated progressively by overlap extension (OE) PCR, thermodynamically balanced inside-out (TBIO) PCR or combined approaches. The most commonly synthesized genes are range in size form 600 to 1,200 bp.although much longer genes have been made by connecting previously assembled fragments of under 1,000 bp. In this size range it is necessary test several candidate clones confirming the sequence of the cloned synthetic gene by automated sequencing methods.
3) Limitations
Moreover, because the assembly of the full-length gene product relies on the efficient and specific alignment of long single stranded oligonucleotides, critical parameters for synthesis success include extended sequence regions comprising secondary structures caused by inverted repeats, extraordinary high or low GC-content, or repetitive structures. Usually these segments of a particular gene can only be synthesized by splitting the procedure into several consecutive steps and a final assembly of shorter sub-sequences, which in turn leads to a significant increase in time and labor needed for its production. The result of a gene synthesis experiment depends strongly on the quality of the oligonucleotides used. For these annealing based gene synthesis protocols, the quality of the product is directly and exponentially dependent on the correctness of the employed oligonucleotides. Alternatively, after performing gene synthesis with oligos of lower quality, more effort must be made in downstream quality assurance during clone analysis, which is usually done by time-consuming standard cloning and sequencing procedures. Another problem associated with all current gene synthesis methods is the high frequency of sequence errors because of the usage of chemically synthesized oligonucleotides. The error frequency increases with longer oligonucleotides, and as a consequence the percentage of correct product decreases dramatically as more oligonucleotides are used. The mutation problem could be solved by shorter oligonucleotides used to assemble the gene. However, all annealing based assembly methods require the primers to be mixed together in one tube. In this case, shorter overlaps do not always allow precise and specific annealing of complementary primers, resulting in the inhibition of full length product formation. Manual design of oligonucleotides is a laborious procedure and does not guarantee the successful synthesis of the desired gene. For optimal performance of almost all annealing based methods, the melting temperatures of the overlapping regions are supposed to be similar for all oligonucleotides. The necessary primer optimization should be performed using specialized oligonucleotide design programs. Several solutions for automated primer design for gene synthesis have been presented so far.
4) Error correction procedures
To overcome problems associated with oligonucleotide quality several elaborate strategies have been developed, employing either separately prepared fishing oligonucleotides, mismatch binding enzymes of the mutS family or specific endonucleases from bacteria or phages. Nevertheless, all these strategies increase time and costs for gene synthesis based on the annealing of chemically synthesized oligonucleotides.
Gene synthesis market
The market for gene synthesis was growing constantly over the past years. Experts estimated its volume to 40 Mio US-$ by the end of 2007. Active gene synthesis providers in the market are GenScript USA Inc., Eurofins MWG Operon, GENEART, DNA2.0, GeneWIZ Inc, Epoch Biolabs, BioBasic, Biosearch Technologies and Biomatik. Major applications of synthetic genes include: synthesize DNA sequences identified by high throughput sequencing but never cloned into plasmids, synthesis of genes optimized for protein expression in a particular host, and to safely obtain genes for vaccine research without the need to grow the full pathogens. Increasingly genes are ordered in sets including functionally related genes or multiple sequence variants on a single gene. Virtually all of the therapeutic proteins in development, such as monoclonal antibodies, are optimized by testing many gene variants for improved function or expression.