Combinatorial synthesis of genetic networks
Article that I selected to explain was written by Guet C.C., Elowitz M.B., Hsing W. and Leibler S. and published in Science in 2002.
Introduction
Living cells respond to information from their environment on the basis of the interactions of a large yet limited number of molecular species that are arranged in complex cellular networks [1-3]. However, despite growing knowledge about the molecular components of the cell, the dynamics of even simple cellular networks are not well understood. A central problem in biology is determining how genes interact as parts of functional networks. Creation and analysis of synthetic networks, composed of well-characterized genetic elements, provide a framework for theoretical modeling. Simple and modular experimental systems are needed to study how the genetic structure and connectivity of cellular networks are related to their function. To this end, an in vivo synthetic system that enables the generation of combinatorial libraries of genetic networks was devised in E. coli. These networks were composed of genes encoding the three well-characterized prokaryotic transcriptional regulators LacI, TetR, and lambda CI, as well as the corresponding promoters [3].
Transcription is the first step of gene expression, in which a particular segment of DNA is copied into RNA by the enzyme RNA polymerase. Regulation of transcription is highly controlled process and there are plenty of different molecules involved. Transcriptional regulators control the rate of gene transcription for example by helping or hindering RNA polymerase binding to DNA. A single gene can be regulated in a range of ways, from altering the number of copies of RNA that are transcribed, to the temporal control of when the gene is transcribed. This control allows the cell or organism to respond to a variety of intra- and extracellular signals and thus respond adequately. For example, mRNA is produced to encode enzymes to adapt to a change in a food source, producing the gene products involved in cell cycle specific activities, and producing the gene products responsible for cellular differentiation in higher eukaryotes [4].
Transcriptional regulators
Transcriptional regulators are transcription factors (TF) and other proteins working in concert to finely tune the amount of RNA being produced through a variety of mechanisms [4]. The fact is that any given gene is likely controlled by a specific combination of factors, which is called combinatorial control. In a hypothetical example, the factors A and B might regulate a distinct set of genes from the combination of factors A and C. This combinatorial nature extends to complexes of far more than two proteins, and allows a very small subset (less than 10%) of the genome to control the transcriptional program of the entire cell [4]. All three of TFs used in this experiment are shortly described below.
LacI is lac repressor which inhibits the expression of genes coding for proteins involved in the metabolism of lactose in bacteria. These genes are repressed when lactose is not available to the cell, ensuring that the bacterium only invests energy in the production of machinery necessary for uptake and utilization of lactose when lactose is present. When lactose becomes available, it is converted into allolactose, which inhibits the lac repressor's DNA binding ability. Loss of DNA binding by the lac repressor is required for transcriptional activation of the operon [5]. Operon is a functioning unit of genomic DNA containing a cluster of genes under the control of a single promoter [6].
The tetracycline repressor (TetR) regulates the most abundant resistance mechanism against the antibiotic tetracycline in gram-negative bacteria. The TetR protein and its mutants are commonly used as control elements to regulate gene expression in higher eukaryotes [7]. In the absence of tetracycline, basal expression of TetR is very low, but expression rises sharply in the presence of tetracycline through a positive feedback mechanism. TetR is used in artificially engineered gene regulatory networks because of its capacity for fine regulation [8].
cI is a transcription inhibitor of bacteriophage Lambda, also known as Lambda Repressor. cI is responsible for maintaining the lysogenic life cycle of phage Lambda. This is achieved when two repressor dimers bind cooperatively to adjacent operator sites on the DNA. The cooperative binding induces repression of the cro gene and simultaneous activation of the cI gene, which code for proteins Cro and cI [9, 10].
Generating combinatorial libraries
So, a combinatorial library was generated composed of those three transcriptional regulatory genes and their corresponding promoters with varying connectivity. The binding state of LacI and TetR can be changed with the small molecule inducers, isopropyl b-D-thiogalactopyranoside (IPTG) and anhydrotetracycline (aTc) [3]. Inducer is a molecule that starts gene expression. It can bind to repressors or activators.
They also chose five promoters regulated by these proteins, which cover a broad range of regulatory characteristics such as repression, activation, leakiness, and strength [3]. Promoter is a region of DNA that initiates transcription of a particular gene. Promoters are located near the transcription start sites of genes, on the same strand and upstream on the DNA. They can be about 100–1000 base pairs long [11]. Two of the promoters are repressed by LacI, one is repressed by TetR, and the remaining two are regulated by λ cI, one positively and one negatively.
The genetic assembly scheme they used ensures that each network in the library has the following structure: Pi-lacI—Pj-λcI—Pk-tetR, wherein each Pi, Pj, Pk represents any of the five promoters (Fig. 1). The regulatory genes on each plasmid activate or repress one another in various ways in vivo, generating networks with diverse connectivities. Altogether, 53 = 125 different networks are possible [3].
Methods
In Figure 1, they present modular genetic cloning strategy used to generate combinatorial libraries of logical circuits. At first (A) all 15 possible promoter-gene units were built. Individual promoters and genes were than amplified by PCR. The genes [denoted -lite in (B)] have an ssrA tag that reduces the half-life of the proteins encoded by the modified gene [12]. The five promoters used were PL1 and PL2 (repressed by LacI), PT (repressed by TetR), and Pλ- and Pλ+ (repressed and activated by λ cI). The transcriptional terminator T1 was present at the end of each gene [3].
They amplified promotor region and separately the region with the gene. Identical RBS sites were used as internal primers for the subsequent fusion PCR step to form promoter-gene units [13]. In order to control the number of promoter-gene units and the position of a given gene in the network, Bgl I sites were incorporated in PCR primers, as shown. The special recognition and restriction properties of Bgl I [14] allow various sticky ends to be produced by Bgl I cleavage. Here, they designed the Bgl I sites such that specific cohesive ends x and y were associated with each regulatory gene (for lacI, xlac = GCC, ylac = TTC; for λ cI, xcI = AAG, ycI = GTG; and for tetR, xtet = CAC, ytet = TCG). You can notice that ylaclac is compatible with xcI, ycI is compatible with xtet, and so on [3].
In step B when all 15 possible fusion PCR products were mixed together and ligated, the resulting products contained exactly three promoter-gene units in one particular order (lacI, λ cI, tetR). These products were cloned into a low copy number plasmid (3-4 copies/cell) [15], carrying the reporter gene gfpmut3 under the control of Pλ- , which is a fourth transcriptional unit coding for green fluorescent protein (GFP) controlled by the λ cI repressible promoter. The fluorescent signal acts as the network output, whereas the levels of the two chemical inducers were used as inputs [3].
The plasmid library was transformed into two different host strains of E. coli, CMW101 (lacI-, tetR-) and DH10β (lacI+, tetR-), which differed most significantly by the presence of a wild-type copy of lacI at a chromosomal locus. Each clone was grown under four conditions, with and without IPTG and with or without aTc. GFP fluorescence was monitored simultaneously during cell growth. In this way, they searched the library for circuits in which the output is a binary logical function of both inducers. Examples of such logical circuit are NAND, NOR, or NOT IF [3].
Logic circuits
Logical circuits are based on Boolean function, which performs a logical operation on one or more logical inputs, and produces a single logical output. In our case circuits are binary, so there are exactly two logical inputs.
For example, NAND gate which stands for NOT-AND gate and is equal to an AND gate followed by NOT gate. The outputs of all NAND gates are high if any of the inputs are low. The symbol is an AND gate with a small circle on the output that represents inversion [16].
NOR gate stands for NOT-OR gate which is equal to an OR gate followed by a NOT gate. The outputs of all NOR gates are low if any of the outputs are high. The symbol is an OR gate with a small circle on the output. The small circle represents inversion [16].
Results
Figure 2 presents detailed analysis of two binary logical circuits (D038 and D052). (A) Here we have two host strains of E. coli (CMW101 and DH10β) transformed with each of two networks. Logical circuit behavior can be observed directly on agar plates, where fluorescence of colonies defines “on” and “off” states. Cells containing the indicated network were patched onto minimal agar media containing all four combinations of the two inducers. To increase fluorescent signal and to show that reporter expression is not cis-dependent, cells contained plasmids deleted for gfp and were co-transformed with compatible plasmids (~15 copies/cell) containing an equivalent Pλ- -yfp transcriptional unit. So the reporter unit was on the other plasmid [3].
To explain what means not being cis-dependent, I will take the example of cis-regulatory module. This is a stretch of DNA, where the number of TF can bind and regulate expression of nearby genes and regulate their transcription rates. They are labelled as cis because they are typically located on the same DNA as the genes they control [17]. So here, they wanted to make sure that expression of repressor is not dependent on the proximity of genes coding for TFs and the reporter can still work in spite of not being located on the same DNA strand.
Cells were also grown in liquid culture and populations were analyzed with FACS for distributions of GFP expression. FACS or fluorescence-activated cell sorting is a specialized type of flow cytometry. It provides a method for sorting a heterogeneous mixture of biological cells into two or more containers, one cell at a time, based upon the specific light scattering and fluorescent characteristics of each cell [18]. In each set of histograms, the blue curve shows the fluorescence distribution without inducers, the green curve shows when IPTG alone was present, the red curve indicates aTc alone, and the cyan curve shows the distribution when both inducers were present simultaneously. Single-peaked distributions are observed, but, in some cases, peaks contain long tails that overlap, corresponding to “leaky” or “fuzzy” logical circuits. (B) Sequencing was used to determine the connectivity of each of the two networks. Their logical behavior is different but both networks have the same connectivity, as can be inferred from the corresponding diagrams of the interactions between the repressors and promoters. The schematic connectivity or topology diagrams shown at the bottom are identical for the two networks [3].
In Figure 3, we can see distribution of logical phenotypes in the two strains. (A) Definition of the logic operations performed by the circuits. In the top row, + and - indicate the presence or absence of each inducer input. The output (fluorescence) is indicated in the lower rows by “On” or “Off”. We do not distinguish here between the two inputs and, thus, between two different types of NOT IF logic functions. Colored bars act as legends for (B) and (C). Histograms show the fraction of networks qualifying as logical circuits of each type for varying values of a threshold parameter. A single universal threshold value was applied simultaneously to all networks. Besides being greater than this threshold, the minimal “on” value in each particular network was also required to be at least fourfold greater than the maximal “off” value [3].
Figure 4. Genetic structure and behavior of selected networks. A subset of 30 plasmids was characterized in the lac- host strain (CMW101), and their genetic composition was determined by sequencing. Levels of GFP expression in each of the four conditions are indicated by the color or intensity of the corresponding box on a linear intensity scale (see color bar). In many cases, logical behavior is strain dependent (i.e., is different in lac+ strain DH10β). The promoters incorporated in the network, determine its connectivity diagram. The 13 connectivity diagrams corresponding to different networks are drawn. A and B refer to either of the inducible repressors (LacI and TetR), C always denotes λ cI, and G denotes GFP. Activation is denoted by sharp arrows (↓), while repression is denoted by blunt arrows (┴) [3].
Figure 5. Dependence of phenotypic behavior on network connectivity. (A) A single change of the promoter can completely modify the behavior of the logical circuit. The network on plasmid D133, which is always in the on state, differs from the networks encoded in plasmids D038 and D016 by exchange of single promoter placed in front of lac gene (shown in red). Both circuits show diverse logical behaviors, which differ in the two bacterial host strains. (B) Networks can differ by their connectivity but have qualitatively the same logical function. Both, D016 and D052, behave as NOR logical circuits (in both lac+ and lac- strains), but they have different connectivity, as shown schematically on the right side of the table. Intensity scale indicating the levels of GFP expression is the same as in Fig. 4 [3].
Explanation of results
In many cases, the output fluorescence levels of an individual culture in their library were sufficiently distinct for different inputs that an unambiguous binary output value for each input state could be assigned without any problem. In other cases, the designation was somewhat arbitrary [3]. There could be a single universal output, in this case fluorescence threshold, on the entire library and can still be obtained a large number of logical circuits for which “on” and “off” states differ significantly (Fig. 3). Naturally, spectrum of logical behaviors differed in the two hosts (Fig. 3) because the chromosomal lacI gene present in one strain (DH10β) acts as a network component and thus may drastically change the resulting phenotype (Fig. 2A) [3]. A phenotype is the composite of an organism's observable characteristics or traits, such as its morphology, development, biochemical or physiological properties, behavior, etc. [19].
Phenotypic variation in organisms often arises through mutations in the protein coding regions of the DNA. Another important contribution to phenotypic variability comes from changes in the cis-regulatory connections of existing genetic elements [3, 20, 21]. To determine the origin of the phenotypic variability observed in the library, 30 clones with a variety of different behaviors were retransformed into both hosts, rescreened, and sequenced (Fig. 4). There was a low level of point mutations, which, in some cases, modify the logical behavior of the networks. However, a large variety of behaviors remains among the many networks that do not have mutations in their regulatory regions. By sequencing they identified the three promoters incorporated in each plasmid; connectivity between different genetic elements varies from network to network so that 13 different “topologies” can be distinguished among the sequenced networks (Fig. 4). This variety of network connectivity is evidently the major source of phenotypic diversity in the library. In fact, the sequence data show that single step changes to the network connections, in which one promoter replaces another, frequently converted network operation from one logical function to another [3]. When they replaced a single promoter in a network (D133 in Fig. 5A), which is always in the “on” state, one obtains a network (D038), which acts as a NOT IF in one strain and as a NAND in the other. Alternatively, by performing a different promoter replacement, one obtains network D016, which acts as a NOR circuit in both strains [3].
Just one-step change in connectivity and a set of genes and promotor has the potential to switch among a variety of different computational functions. Once a simple set of genes and cis-regulatory elements is in place, it should be possible to jump from one functional phenotype to another just by modifying the regulatory connections. Such changes can also be achieved in evolution by natural combinatorial mechanisms like transposition, recombination, or gene duplication [3] but the process in much slower if not caused by effects driven by successive point mutations.
The system is even more complex because connectivity of a network does not uniquely determine its behavior. Some networks that share the same connectivity but perform different logical operations show very different phenotypic behavior (Fig. 2). There are also examples of networks with different connectivity that exhibited qualitatively similar behaviors. For instance, two networks shown in Fig. 5B (D016 and D052) both perform NOR operations despite their different connectivity. So the behavior of even simple networks built out of a few, well-characterized components cannot always be inferred from connectivity diagrams alone [3].
Discussion
The question is if the behavior of the logical circuits obtained here can be predicted? Boolean-type models of gene regulation are often used to intuitively understand the operation of genetic networks where only discrete values of the biochemical variables and parameters are considered. That leads to commonly used reasoning, such us: gene product A is produced, it inhibits the expression of gene product B, which is thus absent, etc. [22, 23]. This description is adequate for some of the present networks but seems not to apply to others. Boolean-type models neglect many potentially important intracellular phenomena, including stochastic fluctuations in the levels of components and the detailed biochemistry of protein-DNA interactions.
They determined all Boolean network structures theoretically possible in the library for lac- strain CMW101. Only three network structures were expected to depend on both inputs, and they all behaved as NOT IF. However, experimentally they also found NORs. As shown in Fig. 2, , the Boolean description is consistent with the NOT IF behavior of network D038 but not the NOR behavior of network D052. It is possible that even for these well-studied transcriptional regulators subtle additional regulation may be at work among the plasmid-encoded elements [3]. Genetic networks are nonlinear, stochastic systems in which the unknown details of interactions between components might be of crucial importance. Combinatorial libraries of simple networks should be useful in the future to uncover the existence of such additional regulation mechanisms and to explore the limits of quantitative modeling of cellular systems [3].
Combinatorial techniques inspired by recombination, such as DNA shuffling, have often proven successful in enhancing or changing the enzymatic activities of proteins and pathways [24, 25] without requiring an understanding of the mechanisms by which they work. DNA shuffling is a way to rapidly propagate beneficial mutations in a directed evolution experiment [26]. It is used to rapidly increase DNA library size [27] because it is a recombination between different DNA species with different mutations [26]. However, combinatorial methods in simple and well-controlled systems can and should also be used to gain better understanding of system and level properties of cellular networks for further practical applications [3].
Conclusion
The present results show that really little of interacting genetic elements can generate a surprisingly large diversity of complex behaviors. The current system uses a small number of building blocks restricted only to transcriptional regulation. Both the number of elements and the range of biochemical interactions can be extended by including other modular genetic elements [3].
There are also some ideas for the future. The approach can be taken beyond the intracellular level by linking input and output through cell-cell signaling molecules, such as those involved in quorum sensing [3]. The latter is a system of stimulus and response correlated to population density. Many species of bacteria use quorum sensing to coordinate gene expression according to the density of their local population [28]. Lastly, this combinatorial strategy can be used to search for other dynamic behaviors such as switches, sensors, oscillators, and amplifiers, as well as for high-level structural properties, such as robustness or noise-resistance [29].
References
1. Bray, D. Protein molecules as computational elements in living cells. Nature, 1995, vol. 376(6538), p. 307-312.
2. Szathmary E., Jordan F., Pal C. Molecular biology and evolution. Can genes explain biological complexity? Science, 2001, vol. 292(5520) , p. 1315-1317.
3. Guet, C.C., et al. Combinatorial synthesis of genetic networks. Science, 2002, vol. 296 (5572), p. 1466-1470.
4. Transcriptional regulation - Wikipedia, the free encyclopedia [online]. 13.12.2014. [cited 1.1.2015]. http://en.wikipedia.org/wiki/Transcriptional_regulation
5. Lac repressor - Wikipedia, the free encyclopedia [online]. 17.12.2014. [cited 1.1.2015] http://en.wikipedia.org/wiki/Lac_repressor
6. Operon - Wikipedia, the free encyclopedia [online]. 20.12.2014. [cited 1.1.2015] http://en.wikipedia.org/wiki/Operon
7. Orth, P., et al. Structural basis of gene regulation by the tetracycline inducible Tet repressor-operator system. Nature Structural Biology, 2000, vol. 7(3), p. 215-219.
8. TetR - Wikipedia, the free encyclopedia [online]. 30.11.2014. [cited 1.1.2015]. http://en.wikipedia.org/wiki/TetR
9. Stayrook, S., et al. Crystal structure of the lambda repressor and a model for pairwise cooperative operator binding. Nature, 2008, vol. 452(7190), p. 1022-1025.
10. Lambda repressor - Proteopedia, life in 3D [online]. 1.1.2015. [cited 1.1.2015]. http://proteopedia.org/wiki/index.php/Lambda_repressor
11. "Analysis of Biological Networks: Transcriptional Networks - Promoter Sequence Analysis". Tel Aviv University. Retrieved 30 December 2012.
12. Elowitz, M.B., Leibler, S. A synthetic oscillatory network of transcriptional regulators. Nature, 2000, vol. 403(6767), p. 335-338.
13. Mullinax, R.L., et al. Expression of a heterodimeric Fab antibody protein in one cloning step. Biotechniques, 1992, vol. 12(6), p. 864-869.
14. Berger, S.L. Expanding the potential of restriction endonucleases: use of hapaxoterministic enzymes. Analytical Biochemistry, 1994, vol. 222(1), p. 1-8.
15. Lutz, R., Bujard, H. Independent and tight regulation of transcriptional units in Escherichia coli via the LacR/O, the TetR/O and AraC/I1-I2 regulatory elements. Nucleic Acids Research, 1997, vol. 25(6), p. 1203-1210.
16. Basic Logic Gates [online]. 8.12.2005. [cited 1.1.2015]. http://www.ee.surrey.ac.uk/Projects/CAL/digital-logic/gatesfunc/index.html#truth
17. Cis-regulatory module - Wikipedia, the free encyclopedia [online]. 17.12.2014. [cited 1.1.2015]. http://en.wikipedia.org/wiki/Cis-regulatory_module
18. Flow cytometry - Wikipedia, the free encyclopedia [online]. 17.12.2014. [cited 1.1.2015]. http://en.wikipedia.org/wiki/Flow_cytometry#Fluorescence-activated_cell_sorting_.28FACS.29
19. Phenotype - Wikipedia, the free encyclopedia [online]. 20.12.2014. [cited 1.1.2015]. http://en.wikipedia.org/wiki/Phenotype
20. Davidson, E.H. Genomic regulatory systems : development and evolution. 2001, San Diego; London: Academic Press.
21. Carroll, S.B. Endless forms: the evolution of gene regulation and morphological diversity. Cell, 2000, vol. 101(6), p. 577-580.
22. Glass, L., Kauffman S.A. The logical analysis of continuous, non-linear biochemical control networks. Journal of Theoretical Biology, 1973, vol. 39(1), p. 103-129.
23. Thomas, R., D'Ari, R., Biological feedback. 1990, Boca Raton, Fla.: CRC Press.
24. Stemmer, W.P. DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution. PNAS USA, 1994, vol. 91(22), p. 10747-10751.
25. Zhang, Y.X., et al. Genome shuffling leads to rapid phenotypic improvement in bacteria. Nature, 2002, vol. 415(6872), p. 644-646.
26. DNA shuffling - Wikipedia, the free encyclopedia [online]. 1.12.2014. [cited 1.1.2015]. http://en.wikipedia.org/wiki/DNA_shuffling
27. Cohen, J. How DNA shuffling works. Science, 2001, vol. 293(5528), p. 237.
28. Quorum sensing - Wikipedia, the free encyclopedia [oniline]. 18.12.2014. [cited 1.1.2015]. http://en.wikipedia.org/wiki/Quorum_sensing
29. Hartwell, L.H., et al, From molecular to modular cell biology. Nature, 1999, vol. 402(6761 Suppl), p. C47-52.