4.3 Finding genes: DNA Libraries

We now have all the tools at hand that we need to actually clone a gene. Each new cloning problem is different from from the previous one and our experimental approach must be planned carefully. The first major decision that must be made is whether to clone from the genomic DNA or to start with the messenger RNA that has the final code for a protein of interest.

Why do the two approaches differ? You may have to review this part of the molecular biology section before proceeding. Recall that the DNA of both bacteria and higher organisms has information at the 5' end of a gene that is related to control of that gene--turning it on or off. You may or may not want to include this information in your cloned gene.

Most important, the typical DNA gene from higher organisms usually contains introns and exons and the sequences that are a part of introns must be cleaved out of the mRNA copy before it may be used to code for a protein. Thus, if you want to read a gene from a higher organism (such a human) in a bacterium (such as E. coli), to produce a protein you will want to start with the sequences found in mature mRNA. We then need to make a DNA copy (called cDNA) of the mRNA so that we may use the cloning techniques we have just learned. If the host cell is to be an eukaryotic one, you most likely will want to isolate the complete genomic DNA.

cDNA libraries

In our example, we will devise an approach to isolate a set of mRNA molecules from cells that are actively expressing the gene of interest, make DNA copies of the RNA and clone these DNA molecules into E. coli.

Figure34. Constructing a cDNA library

Isolate mRNA from cells that are actively expressing the gene of interest. Most cells express hundreds of genes at the same time, but often we can take advantage of some specialized tissue that is producing much more of the protein we are interested in than do other cells or tissues. It then follows that these cells will be making increased amounts of the specific mRNA that we want. Examples include liver cells producing albumin, lymphocytes making antibodies, and our favorite: beta-cells of the pancreas producing insulin (the only cells of the body to do so). If we isolate mRNA from these special tissues, we increase our chances of being able to capture the correct mRNA in our cDNA library.

Use reverse transcriptase to make DNA copies of the RNA. Reverse transcriptase is an enzyme that is found in some viruses, where it makes DNA copies of RNA viral genes. We also can make use of another trick by remembering that most eukaryotic mRNAs have a string of adenine nucleotides at their 3' end so that we can isolate the whole mRNA on a column containing poly dT bound to beads. The poly A tail will aneal to the poly dT and be captured on the column. The column may be washed to remove other nucleic acids and protein and then the mRNA may be eluted. Remember also that polymerases need a primer; now we can use poly dT for this purpose. We might also save only the mRNAs that are of the approximate size that we suspect would be needed to code for our protein.

Use terminal transferase + dCTP to add a poly dC tail which will serve as a primer for second-strand synthesis. We need a primer to start the second strand synthesis, just as we needed one to start synthesis of the first strand. A convenient way of providing this is use an enzyme known as terminal transferase to add cytidine residues to the 3' end of the newly synthesized DNA using dCTP as the substrate.

If we separate the strands in alkali, the original RNA will be hydrolyzed and only our new cDNA copy will be left. We now may add an oligo dG primer and synthesize the second strand using DNA polymerase. We now have created a full length double-stranded DNA copy of the original mRNA. We may also add restriction endonuclease sites to the ends of the molecules (see below)

Insert the DNA copies into a vector (a plasmid or bacteriophage l) after creating "sticky ends" for annealing and to provide sites for restriction endonucleases. One of several methods may be used at this point. For example, we could add dC tails to each end of the cDNA using terminal transferase. Then we might cut the plasmid with PstI and add dG tails to the exposed sticky ends, again using terminal transferase. The DNA now inserts readily into the vector.

Figure 36. Joining and extending

The gaps may be filled in with DNA polymerase after annealing. Note that the PstI site is preserved in the recombinant DNA and will provide an easy way to clip out the new DNA after amplification and isolation. Another method would be to use blunt end ligation (which occurs with low efficiency) to ligate "linker" oligonucleotides to each end of the cDNA. the linkers would contain one or more restriction sites.

Isolate many different clones that contain the vector with inserted DNA sequences to create a cDNA library. Presumably we now have a complex mixture of cDNAs inserted into vectors. The cDNAs represent sequences of all the major mRNA that were present in the cells. We might wish to save all those colonies that contain plasmids with functional inserts (detected by antibiotic resistance tests, etc. as described before). We even could go back to this library at a latter date and screen for other genes of interest that might be represented among the cDNAs.

Screen for the desired clone using any of the techniques discussed in the previous sections. Antibodies specific for the new protein or radioactive nucleic acid probes are particularly useful. We now need to screen our library more specifically for the gene of interest. We need quite a specific test in order to select the correctly transformed cell colonies. If we are looking for the insulin gene, we might test with antibodies for insulin (radioactive antibodies will bind to those colonies producing insulin). Since we know the amino acid sequence for insulin, we could synthesize a radioactive nucleic acid probe that will anneal to the cDNA and perform this test (as described previously).

Figure 37. A library from mRNA

Genomic Libraries

The goal of creating genomic libraries is to fragment the entire genome of an organism into a series of overlapping fragments and incorporate them all into suitable vectors. The library (or clone bank) may be stored as transformed bacteria (in the case of plasmid vectors) or infected bacteria or bacteriophages (in the case of lambda vectors).

Estimate the number of clones needed to produce a library that will include all of genes of an organism. This estimated number will depend heavily on the size of the genome and will be much lower for the gene of a bacterium than for a human. N = ln(1-P)/ln(l-a/b) helps to estimate the number of clones needed where P is the probability that the desired sequence will be represented in the library (usually 0.95-0.99), a is the average size of the DNA fragments, and b is the size of the genome . For the human genome, we might have to screen about one half million clones of average size of 20 kbases pairs in order to find a gene present as a single copy, because the human genome has a total of 3 x 109 base pairs!

Cut genomic DNA randomly into fragments suitable for cloning. DNA may be sheared (simply be passing DNA solutions through a fine hypodermic needle) but no restriction sites will be generated. We noted that restriction endonucleses that recognize six base pairs give fragments of about 4096 base pairs which is on the small side if we hope to capture complete genes. Also, we know nothing about the actual distribution of restriction sites (such as for EcoRI) within a given gene. For these reasons, the usual procedure is to use a restriction enzyme that cuts frequently, on average, (such as Sau3A; once every 256 bases) but only allow a partial digestion of the DNA. Under these conditions, cutting is nearly random and restriction endonuclease sticky ends are produced. Either the time of the reaction or the enzyme concentration may be adjusted to obtain the desired size range of fragments.

Figure 38. Selecting sizes of nucleic acid to clone. Genomic DNA was cleaved with increasing amounts of Sau3A giving progressively smaller fragments. The desired size may be isolated from the gel and used for cloning.

DNA of suitable size for cloning may be purified by gel electrophoresis. Bacteriophage lambda or cosmids usually are the vectors of choice for DNA libraries because 20 Kb fragments and above can be inserted and cloned.

The vector is opened with a suitable restriction endonuclease, mixed with the DNA fragments, annealed and ligated. If Sau3A was used in the fragmentation of DNA, the l vector may be opened with BamHI to produce the correct cohesive ends. If lambda is used, the recombinant molecules may be incorporated into particles in vitro and the library stored in this way.

Bacteriophage lambda libraries may be amplified by one passage through E. coli, stored and used to screen for many different genes. Indeed, libraries based on lambda are available commercially and represent an easy starting point for screening for human genes.

Specific screening techniques (such as antibodies or nucleic acid probes) are used to select transformed clones of interest.