Tuning genetic control through promoter engineering
Hal Alper, Curt Fischer, Elke Nevoigt and Gregory Stephanopoulos; Tuning genetic control through promoter engineering; PNAS; Sept. 2005; vol. 102; pages 12678-12683
Introduction
In synthetic and molecular biology and biotechnology scientists try to express foreign proteins in simple organisms, as for example Escherichia coli to use them for their own favor or to define their function in organisms. The field of molecular biology that deals with defining the function of genes and proteins is called functional genomics. Its main goal is to understand the relationship between an organism's genome and phenotype [1]. But gene function studies are usually based on wild type, strong overexpression and complete deletion of a gene – gene knockout. The information from these studies is incomplete, because there are only three discrete points of gene expression studied. What is happening in between, remains unknown. Gene expression is also regulated by numerous of factors in the cell, including promoter strength, cis- and transacting factors, cell growth stage, the expression level of various RNA polymerase-associated factors and other gene-level regulation. Additional problem besides the level of expression occurs in association with slicing and post-translation modifications [2, 3].
This is the reason why many groups are looking for better solutions. As an example I took two articles with common solution: H. Alper et al have developed a library of engineered promoters of varying strengths for bacteria and E. Nevoigt et al for S. cerevisiae. Both groups have also shown the efficiency of their libraries.
Developing a library of engineered promoters for bacteria
H. Alper et al used different strains of E. coli (K12, K12 PT5-dxs, PT5-idi, PT5-ispFD) for promoter engineering examples. As template for nucleotide analogue mutagenesis they used plasmid pZE-gfp(ASV) and appropriate primers. The target DNA was representing a derivative of the constitutive bacteriophage PL-λ promoter. They performed its mutagenesis with PCR in the presence of 8-oxo-2´-deoxyguanosine (8-oxo-dGTP) and 6-(2-deoxy-β-D-ribofuranosyl)-3,4-dihydro-8H-pyrimido-[4,5-c][1,2] oxazin-7-one (dPTP) [2]. Both compounds are considered as mutagenic dNTPs. They can be incorporated into DNA by PCR using standard Taq polymerase. When both mutagenic analogs are used, the rate of mutagenesis can be controlled by regulating the number of PCR cycles. If 10 cycles are performed, the rate of mutagenesis is 6 %. If there are 20 cycles used it ends up with mutagenesis rate 11 % and 19 % for 30 cycles. 8-Oxo-dGTP can mispair with A, leading to A-to-C and G-to-T transversion mutations. When beside 8-Oxo-dGTP also dPTP is used in PCR, both transition mutations (A-to-G and G-to-A) and transversion mutations (A-to-C and G-to-T) can be produced [4].
PCR or polymerase chain reaction is a technology which allows us to amplify a piece of DNA across several orders of magnitude, to get thousands to millions of copies of a particular DNA sequence. In medical and biological research labs PCR is a common and often indispensable technique. It is used for a variety of applications, including DNA cloning for sequencing, DNA-based phylogeny, functional analysis of genes, the diagnosis of hereditary diseases, the identification of genetic fingerprints (used in forensic sciences and paternity testing), and the detection and diagnosis of infectious diseases. The technique relies on thermal cycling to melt and enzymatic replicate the target DNA. Primers contain sequences that are complementary to the target region. With changing the sequence of primers, we define the selectivity of replicating. DNA polymerase is used as an enzyme to replicate DNA and it is usually heat-stable. As PCR progresses, the DNA generated is itself used as a template for further replication. It is like chain reaction in which the DNA template is exponentially amplified. PCR methods use thermal cycling through a defined series of temperature steps. In the first step, the two strands of the DNA double helix are physically separated at a high temperature in a process called DNA melting. In the second step, the temperature is lowered and the primers anneal to two DNA strands that become templates for DNA polymerase to selectively amplify the target DNA in the third step [1].
In the study they purified, digested and ligated 151 bp long PCR products into a reporter plasmid upstream of a low-stability GFP gene. They transformed the obtained plasmid into E. coli strain DH5α. Then they put the transformed cells on minimal media agar plates [2]. Minimum media is a culture medium that contains the minimal necessities for growth of the wild-type. It is containing only inorganic salts, a carbon source and water. The agar was added for media to become solid [5]. Approximately 30,000 colonies had grown on these plates. They picked out 200 colonies which were spanning a wide range in fluorescent intensity. The fluorescence could be observed because there was the gene for green fluorescent protein (GFP) in the used plasmid included. So if the expression of the inserted DNA took place, the fluorescence could be spotted and its intensity was dependent on the level of expression [2].
Characterization
After appropriate treatment of this clones they performed flow cytometry [2]. This is a laser-based, biophysical technology. It is used in cell counting, cell sorting, biomarker detection and protein engineering. The cells should be suspended in a stream of fluid and passed by an electronic detection apparatus [1].
From the results they calculated the geometric mean of the fluorescence distribution of each clonal population. After eliminating clones with non-monovariate distributions of fluorescence and sequencing there remained only 22 promoter sequences that were chosen to form a functional promoter library. Fig. 1 on page 12679 illustrates the procedure up to this step. Because of the uncertainty about the concept of promoter strength and because the systems with only one reporter gene are not sufficiently reliable, they performed a multifaceted characterization of each library member (see Fig. 2 on page 12680). They were measuring culture turbidity and fluorescence as a function of time. With a previously published model and their own results they came to equitation for calculating the strength of promoters that included all relevant factors and measurements that were made. They also measured relative mRNA levels of GFP transcripts with quantitative RT-PCR [2]. This is a technique, based on PCR and consists of two different approaches. RT-PCR is used to qualitatively detect gene expression through creation of complementary DNA transcripts from RNA. Quantitative PCR is used to amplify and simultaneously detect or quantify a targeted DNA molecule. There are two commonly used methods for the detection of products: to add non-specific fluorescent dyes which intercalate with any double-stranded DNA or to add sequence-specific DNA probes whose oligonucleotides are labelled with a fluorescent reporter which permits detection only after hybridization of the probe with its complementary sequence [1]. Furthermore they checked the constitutive nature of created promoters. They inserted them into new constructs with the reporter gene cat [2]. This gene encodes chloramphenicol acetyltransferase that detoxifies the antibiotic chloramphenicol and allow chloramphenicol resistance. Mentioned enzyme covalently binds an acetyl group from acetyl-CoA to chloramphenicol to prevent its binding to ribosomes [1]. They cultivated the cultures on rich solid medium. This medium enables bacteria to grow faster and to increase levels of protein expression. They determined the lowest concentration of chloramphenicol which can inhibit the growth of each clone [2]. The library exhibited a high dynamic range and similar behavior regardless of the regulated gene [2]. The fact that they used two different and contrasting medium and consequently different growth environments further highlighted the constitutive nature of the library promoters [2].
Difference between the numbers of initially and finally selected promoters shows that detailed analysis of each promoter was needed. Numerous mutations can make a change in promoter strength but there is no assurance that they lead to a reproducible, homogenous and linear relationship between promoter strength and reporter [2].
Application
The group applied their functional promoter library to get precise control on transcription level in the investigation of specific genetic effects on a cellular phenotype. They replaced the native promoter of target genes with promoters from their library. The target genes were ppc and dxs and they were investigating their effect on two divergent phenotypes, growth yield and lycopene production [2].
The gene ppc encodes for phosphoenolpyruvate carboxylase which catalyzes the formation of oxaloacetate from phosphoenolpyruvate and a hydrocarbonate ion and is a key anaplerotic role in bacteria with supplying oxaloacetate to the TCA cycle. It influences on the growth yield of bacteria [6, 7]. They were cultivating prepared mutants and periodically monitoring biomass and glucose concentrations. They found out that increasing level of phosphoenolpyruvate carboxylase promoted biomass yield only to a certain point. Higher levels had negative effect on the biomass yield. These results showed optimal expression level of ppc. It was higher than endogenous level of expression [2].
Second gene, dxs encodes for D-1-deoxyxylulose 5-phosphate synthase which catalyzes the first biosynthetic step of the 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway that leads to production of isoprenoids. Lycopene, whose accumulation they investigated is important intermediate in this biosynthetic pathway [8]. The results show that increasing dxs expression increases lycopene accumulation only until a certain point. Above this level, dxs expression is detrimental for lycopene production. In a strain that was overexpressing downstream genes in the isoprenoid pathway there is a linear relationship between dxs expression and lycopene production. Additionally they noticed that cell density in both strains with constructs containing low-strength promoters was significantly reduced. This phenomenon was expected, because dxs is an essential gene [2].
Conclusions of the study
That is how they made a fully characterized, homogeneous, broad-range, functional promoter library and demonstrate its applicability to the analysis of such a genetic control. By characterizing the strength of these promoters in a quantitative manner with various metrics and subsequently integrating these constructs into the genome, it is possible to deduce the precise impact of the gene dosage on the desired phenotype. They showed the basic approach to precisely control gene expression in vivo. It allows expression of a specific gene at any desired expression level, optimization of gene expression to achieve maximal or minimal pathway function and analysis of the distribution of genetic control on pathway behavior [2].
Extension of the Promoter Library to S. cerevisiae
They have the promoter engineering concept to S. cerevisiae strain BY4741 extended as well [2]. This is better described in second mentioned article. There was used the same approach and the library for S. cerevisiae was expanded. Both articles came from same laboratory. The second article is actually continuation of first article. They used the yeast strain BY4741 and generated promoter library with error-prone PCR of the TEF1 promoter [3]. TEF1 is gene that encode translation and elongation factor 1. Its promoter has constitutive nature [9, 10]. They got 14000 clones. They inserted them into CEN/ARS plasmid upstream of GFP [3]. CEN/ARS plasmid is the yeast shuttle vector which means it can be used in two different host species. It has components that allow replication and selection in both E. coli cells and yeast cells. The E. coli component includes an origin of replication and a selectable marker. The yeast component includes an autonomously replicating sequence (ARS), a yeast centromere (CEN), and a yeast selectable marker [1].
Characterization
Using fluorescence-activated cell sorting only 11 yeast clones with gradually increasing fluorescence were sorted out of this library (see Fig. 1 on page 5269). To confirm that the differences in specific fluorescence were result of the caused mutations in the plasmid-based TEF1 promoter, chosen plasmids were isolated and retransformed into yeast. They tested the clones on two different synthetic media and confirmed that the relative strengths of the TEF1 promoter mutants were independent of the growth media. They also measured quantity of GFP mRNA by quantitative RT-PCR. Then they sequenced all 11 selected mutants (see Fig. 2 on page 5270). The results from sequencing showed that the number of mutations in comparison to the native promoter ranged from 4 to 71. They were randomly distributed throughout the whole promoter sequence. But a few positions were mutated or completely untouched in a few mutants. The mutant with higher activity than the wild type of TEF1 promoter had only a deletion of one nucleotide [3].
Generation of promoter replacement cassettes
Additionally they have constructed plasmids that are usable as templates for generating promoter replacement cassettes. They contain selectable and removable markers upstream of the different TEF1 promoter versions and are used to integrate into the yeast genome and regulate the expression of a desired gene [3].
Application
To show the efficiency of the promoter library they studied control of glycerol production by GPD1 expression in S. cerevisiae. Previous studies suggested that GPDH is the step that limits the speed of glycerol biosynthesis. It has been shown that the deletion of GPD1, one of the two iso-genes encoding GPDH, slows down the glycerol production in comparison to the GPD1 wild type. In contrast, the strong overexpression of GPD1 can increase rate and yield of glycerol production. But the multicopy overexpression of GPD1 has a negative impact on mentioned features. These are the reasons why optimization of GDP1 expression is necessary to achieve high biomass and glycerol yields [3]. They used five members of their TEF1 promoter library and inserted them upstream of the GPD1 gene (see Fig. 3 on page 5271) into yeast genome. Then they measured the specific GPDH activities and normalized them to the activity of the unmutated TEF1 promoter. Promoter strength was determined with measurements of fluorescence and transcript level, as with E. coli library. They confirmed a linear correlation between relative promoter strength and relative GPDH activity. Additionally they measured glycerol and biomass yields and plotted them as a function of specific GPDH activity. The results did not completely confirm previous known data. This shows the importance of these promoter libraries in comparison with limited data from 3 points analysis (knock-down, wild type and overexpression). The results confirmed that GPDH expression is the rate-limiting step in the synthesis of glycerol. But their linear dependence does not continue indefinitely and ends much before reaching the level obtained in multicopy overexpression of GPD1. Additionally biomass yield was significantly reduced when the GPDH was overexpressed [3].
Developing a library of engineered promoters for bacteria no. 2
Additionally I looked for other approaches to engineering promoters in order to develop a library of promoters with varying strengths and I found an article of De Mey M. et al. They did not mutate a native promoter but they designed and synthesized a degenerated oligonucleotide sequence that encoded consensus sequences of promoters of E. coli, separated by spacers of random sequences and flanked with non-degenerated multi cloning sites. The consensus sequences for prokaryotic promoters were already known. They extracted the consensus sequence for an E. coli promoter. There is a great part of this sequence well conserved especially the nucleotides around -10 and -35 location. They included also the parts that are a bit less conserved. The 57 bp long sequence contained therefore 24 conserved, 13 semi-conserved and 20 random nucleotides. They added non-degenerated flanks with multiple recognition sites for restriction endonucleases on both ends to ensure the ability of cloning. The final sequence was then 119 nt long. Then they converted this mixture of degenerated oligonucleotides to double-stranded DNA-fragments using the Klenow fragment of DNA polymerase I and a short oligonucleotide primer complementary to the 3' of the non-degenerated flank. Furthermore they cloned this mixture of degenerated DNA fragments into a promoter probing vector and transformed them into competent E. coli cells resulting in several clones [10].
Characterization
First they developed the green fluorescent protein assay. They investigated and found the best buffer and incubation temperature and used them in defining the strength of produced promoters. The results showed that the clones cover a wide range of promoter activities. They covered 3 to 4 logs of promoter activity in small steps of activity change [10]. In addition they made a comprehensive analysis of promoter sequences and they concluded with result that the strengths appear to be randomly distributed in the phylogenetic tree. There was no clear relationship between the strength of the promoter and the degree of alignment. Also the strength was not correlated with promoter sequence but some positions could be identified as having a high influence. Finally they built a PLS model that correlates the promoter strength to its sequence [10].
Conclusion
The PLS model that was built and validated can be an extremely useful tool to rationally design a suitable promoter in order to enabling fine tuning of gene expression in the framework of model-based metabolic engineering [10].
Alternative methods for controlling gene expression
Alternative methods for controlling gene expression are: Using native promoters that have various strengths to control the expression of the desired gene. This method gives results that are hard to predict, because endogenous promoters may have numerous different regulators, although they are designated as “constitutive.” Additionally it is hard to find native promoters of desired strength or it is possible that they are not even available [3]. Using vectors of different number of copies to adjust the expression levels of the desired gene. The biggest limitation of this technique is that plasmids with any copy number are not necessarily available. Furthermore there exists high metabolic burden in terms of maintaining the high-copy number plasmids. Another possible problem is the cell-cell heterogeneity in expression because not all the cells in culture have the same number of plasmids [3]. Titrating of an inducible promoter system with various concentrations of its inducer. In these systems the possible problem represents the inducer toxicity. There are also inducer-mediated pleiotropic effects to be expected. Most inducers are relatively expensive, so they are not appropriate for use at industrial scales. It is hard to ensure that the level of induction is the same for all cells in the population and that results in cell-cell heterogeneity [3].
Every method has its advantages and disadvantages. But over the years of researching every one of them had developed and will further develop to become better and better. There is always a potential to improve known and to develop new methods.
References
1. Wikipedia – different pages [6.1.2015].
4. Trilink Biotehnologies; http://www.trilinkbiotech.com/cart/Scripts/prodView.asp?idproduct=2746 [23.12.2014]
5. Biology glossary, 26.7.2004, [7.1.2015] http://groups.molbiosci.northwestern.edu/holmgren/Glossary/Definitions/Def-M/minimal_medium.html
6. PortEco, 27.5.2013, [7.1.2015] http://ecoliwiki.net/colipedia/index.php/ppc:Quickview
7. InterPro, [7.1.2015] http://www.ebi.ac.uk/interpro/entry/IPR022805