2.2 Proteins and amino acids

Proteins are made up of long sequences of amino acids. There are 20 common amino acids used for synthesis of the many thousands of proteins found in living organisms. The properties of the proteins depend upon the exact sequence of their amino acids which in turn is determined by the genetic code discussed below. A major goal of genetic engineering is to be able to make useful quantities of peptides and proteins, some of which are as small as a dozen amino acids on up to proteins containing hundreds of amino acids. The proteins we want to make (perhaps using bacteria to produce them) might be human hormones, antibodies and other rare proteins that are difficult or impossible to obtain by other means.

Figure 4. Structure of the common amino acids

Many proteins found in cells are enzymes that catalyze metabolic reactions. These proteins must have a three dimensional structure that encourages binding of substrates and one or more active sites where the reactions take place. Peptide hormones must be of precise structure in order to bind to their receptors on cells.

The exact amino acid sequence and the precise folding of the protein structure is essential for fulfilling their biological roles . Proteins fold into characteristic structures determined by (guess what) hydrogen bonding, usually into helical form (a-helix) or sheet-like structures (b-sheets). A single protein often has several of each of these folding patterns. Also, during protein folding, the amino acids with aliphatic and aromatic side chains tend to locate in the interior of the protein away from water, with charged amino acids at the surface.

The naming of amino acids is trivial (asparagine was isolated from asparagus)

The usual structure is a carboxyl and an amino group covalently bound to the same carbon atom (the a-carbon):

Various side chains are attached to the alpha-carbon (the R-group; see table). There are several amino acids that have only carbon and hydrogen on the R group (alanine, valine, leucine, isoleucine), several that have a positive charge at neutral pH (histidine, lysine, arginine), several that are acidic at neutral pH (aspartic acid, glutamic acid), and several that have aromatic rings (phenyl alanine, tryrosine, tryptophan). The amino acid cysteine (HS-CH2 = R) is of particular interest because disulfide bonds between two cysteine residues are often found that cross-link between two peptide chains or between two parts of a single peptide (to give R-Ch2-S-S-CH2-R'). The amino acids often are represented by three letter abbreviations (ala, val, leu, iso, etc.) and even one letter abbreviations (A=alanine, R=arginine, D=aspartic acid, etc) to save space when writing out sequences.


Figure 5. Proteins are made up of combinations of helical and sheet structures.

During synthesis of proteins, amino acids are added one at a time according to the genetic code using complex structures called ribosomes. Each amino acid is added to the preceding one by a splitting out of water between the carboxyl group of one amino acid and the amino group of the next amino acid. The resulting bond between the two is known as the peptide bond:

It follows that the first amino acid of a protein usually will have a free amino group (the amino end) and the last amino acid will usually have a carboxyl group (the carboxy end). I have said "usually" this is true, because sometimes the ends of proteins are modified after synthesis.