Multiplex genome engineering using CRISPR/Cas systems
Uroš Stupar
Introduction
Here I will try to describe you how type II procaryotic CRISPR (clustered regularly interspaced short palindromic repeats)/Cas system from Streptococcus pyrogenes can be efficiently used to precisely edit genome sequences. In this article the scientists described the engineering of two different type II CRISPR/Cas adaptive immune systems to induce cleavage by enzyme Cas9 guided by short RNA segments, in a determined genomic loci in both human and mouse cells. They also converted Cas9 into a nicking enzyme that helps DNA repair. And last but not least, they encoded a variety of guide sequences in a CRISPR array, so CRISPR/Cas9 system could edit many sites within a genome at the same time.
Besides serving as an invaluable research tool, targeted genome engineering in cells and organisms could potentially provide the path to revolutionary applications in medicinal human therapies, molecular biology, bio¬technology and microbial engineering. Methods of modifying the genome exploit endogenous DNA repair pathways that are initiated by the introduction of site-specific double-stranded DNA cleavages. Induction of site-specific genomic mutations in cell cultures was made possible with new methods that include zinc finger nucleases (ZFNs) or transcription-activator-like effector nucleases (TALEs). These enzymes target and induce cleavages in specific genomic DNA sequences. Error-prone nonhomologous end joining (NHEJ) DNA repair then generates mutations. The engineering of efficient ZFNs requires extensive technical expertise and empirical testing to find efficient enzymes, which requires a lot of time and hard work. The TALEN technology provides an attractive alternative to ZFNs, but like ZFNs, it requires the assembly of two relatively large DNA-binding proteins for each target. It is still time consuming. We needed new technologies that are affordable and easy to engineer. Most recently, a new class of genome editing tool based on the type II prokaryotic CRISPR (clustered regularly interspaced short palindromic repeats) adaptive immune system has been developed
CRISPR/Cas system
The type II CRISPR/Cas9 system is used by bacteria as an RNA-guided defense system against invading viruses and plasmids. The Streptococcus pyogenes SF370 type II CRISPR locus consists of four genes. One of these genes is Cas9 nuclease and the others are two noncoding CRISPRR RNAs (crRNA), a trans-activating crRNA (tracrRNA) and precursor crRNA (pre-crRNA) which contains spacers (nuclease guide sequences). Between the spacers there are identical direct repeats. In Streptococcus pyogenes the spacers make base pairs by noncovalent hydrogen bonding (hybridization) with complementary target DNA sequence. The Cas9 endonuclease is then guided to the target site with the help of tracrRNA, which recruits crRNA into the Cas9 complex. Cas9 then induces double stranded breaks in the targeted DNA. This system was developed throughout evolution to cleave foreign DNA (viral DNA for example).
This system can be modified and adapted for inducing site-specific genome mutations and thus edit target DNA sequences. The mature crRNA and tracrRNA can be fused in a single synthetic guide RNA (sgRNA). The Cas9 is then recruited by crRNA and tracrRNA, which bind to the target site, where double-strand breaks are induced with its catalytic activity. In previous studies it has been shown that tracrRNA, pre-crRNA, RNase III and Cas9 nuclease are sufficient to induce double-stranded DNA breaks in vitro and in prokaryotics cells. But the cells try to repair the damaged RNA through a variety of mechanisms. NHEJ (Non-homologous end joining) is one of the mechanisms most commonly used by cells to repair double stranded breaks in DNA backbone.
NHEJ (Non-homologous end joining)
Non-homologous end joining (NHEJ) is a pathway that repairs double-strand breaks in DNA It is referred to as non-homologous because the break ends are directly ligated without the need for a homologous template, in contrast to homologous recombination, which requires a homologous sequence to guide repair. This term was first used by Moore and Haber in 1996. NHEJ typically utilizes short homologous DNA sequences called microhomologies to guide repair. These microhomologies are often present in single-stranded overhangs on the ends of double-strand breaks. When the overhangs are perfectly compatible, NHEJ usually repairs the break accurately. Imprecise repair leading to loss of nucleotides can also occur, but is much more common when the overhangs are not compatible. Inappropriate NHEJ can lead to translocations and telomere fusion, which are present in tumor cells. NHEJ is evolutionarily conserved throughout all kingdoms of life and is the predominant double-strand break repair pathway in mammalian cells. The choice between NHEJ and homologous recombination for repair of a double-strand break is regulated at the initial step in recombination, 5' end resection. In this step, the 5' strand of the break is degraded by nucleases to create long 3' single-stranded tails. DSBs that have not been resected can be rejoined by NHEJ, but resection of even a few nucleotides strongly inhibits NHEJ and effectively commits the break to repair by recombination. NHEJ is active throughout the cell cycle, but is most important during G1 when no homologous template for recombination is available. This regulation is accomplished by the cyclin-dependent kinase Cdk1 (Cdc28 in yeast), which is turned off in G1 and expressed in S and G2. Cdk1 phosphorylates the nuclease Sae2, allowing NHEJ to initiate. So when the double-strand breaks are induced in the target DNA sequence by Cas9 complex, cell mechanisms try to repair those using NHEJ pathways. Because of the inaccuracy of the repair by NHEJ, the repaired DNA sequence can result in mutations at target sites such as small insertions or deletions.
Effectivenes of the CRISPR/Cas system for targeted cleavage of mammalian chromosomes
In this study the nuclear localization signals were attached to optimized Cas9 and RNase III from Streptococcus pyogenes, because previous research showed this helps transfer these two enzymes in the nucleus where genomic DNA, which is going to be edited, is located. In another study GFP and mCherry coding sequences were attached to follow the localization of the Cas9 nuclease and RNase III in vivo (see Figure 1A). Expression of human codon–optimized Cas9 (hSpCas9) and RNase III (hSpRNase III) genes were driven by the elongation factor 1α (EF1α). Gene for tracrRNA was transferred under RNase III U6 promoter, as well as the pre-crRNA, which contained a single nucleus guiding sequence, flanked by two identical direct repeats. The nucleus guiding sequence was carefully designed so it binds a 30 base-pair long target sequence called protospacer, which is in our case located in human EMX1 gene (see Figure 1B). This gene encodes a member of the EMX family of transcriptional factors. The EMX1 gene, along with its family members, is expressed in the developing cerebrum (also known as the telencephalon). Emx1 plays a role in specification of positional identity, the proliferation of neural stem cells, differentiation of layer-specific neuronal phenotypes and commitment to aneuronal or glial cell fate. After the protospacer there are of three nucleotides – NGG. This is called protospacer-adjacent motif (PAM) and is necessary for the successful binding of the spacer.
The components of this modified Streptococcus pyogenes CRISPR/Cas system were transfected in 293FT cells in different combinations, to test if the cleavage of DNA strands can be successfully achieved. In figure 4C we can see how base pairing between crRNA and target sequence should occur. Cleavage site is shown by the red arrow (see Figure 1C). For detection, the SURVEYOR assay was used. SURVEYOR Mutation Detection Kits are a simple and robust method to detect mutations and polymorphisms in DNA. The key component of the kits is Surveyor Nuclease, a member of the CEL family of mismatch-specific nucleases. Surveyor Nuclease recognizes and cleaves mismatches due to the presence of single nucleotide polymorphisms (SNPs) or small insertions or deletions. To be sure that the results are right, the samples were sequenced using Sanger sequencing. The results showed that with combination of all four components (Cas9, RNase III, tracrRNA and pre-crRNA) most efficient cleavage in protospacer was obtained, although RNase II was not necessary for the induction of double-stranded breaks (see Figure 1D). It seems that tracrRNA and pre-crRNA can be processed in its absence. They are supposing that in mammalian cells there are already some RNases that help with the maturation of these two RNAs. If they removed any of the other three components, the cleavage was not successful, so they came to a conclusion that a system made of a minimum of three components – Cas9, tracrRNA and pre-crRNA – can be successfully used to induce double stranded breaks in a DNA molecule. This was the first step of this research.
Generalizability of RNA-guided genome editing in eukaryotic cells
In the next part of the research they encoded Cas9, tracrRNA and pre-crRNA on the same specifically designed vector. They also made chimeric DNA hybrids, which they encoded in vector instead of tracrRNA and pre-crRNA individually. This hybrid type was a little modified chimeric crRNA-tracrRNA hybrid, where processed and mature crRNA, which contains the guide sequence, was fused to only a part of tracrRNA across a synthetic stem loop to form a mature crRNA-tracrRNA duplex (see Figure 2B). The purpose of this test was to compare the effectiveness of pre-crRNA and tracrRNA with the effectiveness of the chimeric crRNA-tracrRNA duplex, but also to find out if this CRISPR/Cas system could target the other protospacers in the EMX1 loci too. They chose five protospacers on EMX1 locus (see Figure 2A). The results showed that not all designed RNAs could cleave the targeted sequences. So they decided to target additional sequences in other genes such as the human PVALB gene and mouse Th gene. As for the EMX1 gene the first thing to do was to design the correct pre-crRNAs and chimeric DNA duplexes to target those sequences. The results were similar. Using the correct pre-crRNA which then forms crRNA-tracrRNA duplex proved to be the most efficient method as the double strand breaks were identified in all three analyzed Th mouse genes and in one out of two human PVALB targets. As we can clearly see from the electrophoresis gel in figure 2C, efficiency of cleavage in the same DNA targets using chimeric RNA was either lower or undetectable (see Figure 2C). This probably occurred because the chimeric RNA is less stable or its expression is lower. It could be that the endogenous RNA interference machinery degraded chimeric RNA. One of options is also that the chimeric DNA was just not efficient enough in recruiting Cas9 or recognizing specific targets because of its secondary structures.
Specificity of the CRISP/Cas system for targeted cleavage
Next they analyzed the connection between the mismatches in the guide sequence region and the protospacer, because cleavage must not be only efficient, but also specific (see Figure 3A). They found out that the presence of a single-base mismatch up to 11 base pairs on the 5’ direction of the PAM sequence, which is located immediately at the end of the guide sequence 3’ end, results in non-cleavage of the protospacer by Cas9. However mutations further upstream than 11 base pairs do not abolish the cleaving activity of the Cas9 nuclease (see Figure 3B). Similar results were obtained in previous bacterial and in vitro studies and now those results were confirmed.
Targeted modification of genomes
Double stranded breaks in DNA sequences induced by type II CRISPR/Cas9 system can be repaired through different mechanisms. These mechanisms are crucial for preventing lethal DNA damage that can occur in cells and developed through the evolution. There are two main mechanisms. First one is homology directed repair (HDR), which can be used when there is a homologue piece of DNA present in the nucleus. It occurs mostly in G2 and S phase of the cell cycle. HDR is important for suppressing the formation of cancer. HDR maintains the genomic stability by repairing the broken DNA strand, assumed error free because of the use of a template. However when the homologue DNA piece is not available, another process called non-homologous end joining (NHEJ) can take place instead. When a double strand DNA lesion is repaired by NHEJ there is no validating DNA template present which may result in a non-original DNA strand formation with loss of information. A different nucleotide sequence in the DNA strand results in a different protein expressed in the cell. This protein may malfunction and processes in the cell may fail or take a different path. But in our case, we want to induce mutations, so this error-prone NHEJ mechanism is just what we need.
Wild-type Cas9 induces site specific double strand breaks in DNA. In the study a mutation was induced in the RuvC I domain of Cas9 to convert an aspartate into an alanine (see Figure 4A). With this modification the enzyme becomes a DNA nickase (Cas9n) as only one DNA strand is cut by the enzyme. Nicked DNA is usually repaired through high-fidelity homology-directed repair, which means mutations rarely occur. 327 amplicons were then nicked by this modified Cas9n nickase and analyzed via SURVEYOR assay and Sanger sequencing. No deletions or insertions were detected (see Figure 4B). However in some rare cases nicked DNA can be also repaired through intermediates with double stranded breaks and NHEJ mechanism as shown in a previous research. They wanted to test also the Cas9 mediated homology directed repair at the EMX1 locus. To do this a homology repair template was added so the breaks in DNA would repair in the way the restriction sites for HindIII and NdeI restriction enzymes would be induced near the protospacer (see Figure 4C). Induction of the restriction sites was successful in both cases, using Cas9 and Cas9n. This was confirmed by digestion with HindIII restriction enzyme and electrophoresis separation of restriction fragments. In both cases they got a 2281 base pair long fragment, which is the non-restricted fragment and two fragments long 1189 and 1092 base pairs. The two smaller fragments show that the induction was successful in both Cas9 and Cas9n mediated HDR in approximately same percent (see Figure 4D). Just to be sure, the results were once more verified using Sanger sequencing. The advantage of the nickase is that it might reduce off-target mutations. Multiplex genome engineering using a single CRISPR array As described before the sequence of pre-crRNA in the CRISP locus contains spacers, which bind to a target DNA sequence between identical direct repeats. There can be one spacer, but there can be also multiple spacers for targeting multiple genes. In the end the scientists engineered a CRISPR array that contains two spacers for targeting EMX1 locus and PVLB locus. Efficient cleavage was detected in both (see Figure 4F). In figure 4F the design of crRNA is shown. Effectiveness of cleavage was tested using gel electrophoresis. As we can see from the gel, both protospacers were successfully cleaved.
Conclusion
Type IICRISPR/Cas system was therefore shown as a perspective relatively new potential method for inducing RNA guided mutations by cleaving both strands of DNA molecule in target sites and relying on the error-prone NHEJ mechanism to induce mutations when repairing the DNA. A lot of research tool companies have already launched products for CRISPR-Cas systems65 These generally involve web-based bioinformatic tools to design gRNAs, plus RNA or plasmid vectors that encode Cas9, along with fluorescent proteins or other expression markers. Although this method is effective, specific and can target multiple targets, it can still be improved to even further to increase its effectivity. Like mentioned before it can only target sequences that are found in genome right next to three-nucleotide PAM sequences, which consist of NGG nucleotides. These repeats are only found once per 8 base pairs, which means not every part of sequence can serve as a protospacer. Furthermore complex secondary structures of crRNA and DNA metilation states can reduce the possibility of binding of guide sequences to the accessible protospacer. We should therefore explore further the family of Cas9 nucleases to find or engineer systems with better crRNA secondary structures for easier access to target sequences and different PAM requirements, so we can extend the choice of protospacers we can use for inducing mutagenesis. Specificity could also be improved. It could be improved with systems requiring multiple crRNA-Cas9 complexes for activity, reducing activity while increasing cooperativity and most importantly by a wise choice of guide sequences in crRNA. Also Cas9 proteins from species with larger genome than Streptococcus pyrogenes could be more specific. Despite all the improvements that need to be made in future, CRISPR/Cas system is cheaper and less time consuming than other methods for genome engineering that include zinc finger nucleases and TALE nucleases and seems to have all the potential to become a powerful tool in the future researches in the field of molecular biology, biotechnology and medicine.