Borrowed from Tisdall's Beginning Perl for Bioinformatics, pp.30-1:
For expediency, the names of the nucleic acids and the amino acids are often represented as one- or three-letter codes, as shown in Table 4-1 and Table 4-2. (This book mostly uses the one-letter codes for amino acids.)
Code |
Nucleic Acid(s) |
---|---|
A | |
C | |
G | |
T | |
Uracil | |
M |
A or C (amino) |
R |
A or G (purine) |
W |
A or T (weak) |
S |
C or G (strong) |
Y |
C or T (pyrimidine) |
K |
G or T (keto) |
V |
A or C or G |
H |
A or C or T |
D |
A or G or T |
B |
C or G or T |
N |
A or G or C or T (any) |
One-letter code |
Amino acid |
Three-letter code |
---|---|---|
A |
Alanine |
Ala |
B |
Aspartic acid or Asparagine |
Asx |
C |
Cysteine |
Cys |
D |
Aspartic acid |
Asp |
E |
Glutamic acid |
Glu |
F |
Phenylalanine |
Phe |
G |
Glycine |
Gly |
H |
Histidine |
His |
I |
Isoleucine |
Ile |
K |
Lysine |
Lys |
L |
Leucine |
Leu |
M |
Methionine |
Met |
N |
Asparagine |
Asn |
P |
Proline |
Pro |
Q |
Glutamine |
Gln |
R |
Arginine |
Arg |
S |
Serine |
Ser |
T |
Threonine |
Thr |
V |
Valine |
Val |
W |
Tryptophan |
Trp |
X |
Unknown |
Xxx |
Y |
Tyrosine |
Tyr |
Z |
Glutamic acid or Glutamine |
Glx |
The nucleic acid codes in Table 4-1 include letters for the four basic nucleic acids; they also define single letters for all possible groups of two, three, or four nucleic acids. In most cases in this book, I use only A, C, G, T, U, and N. The letters A, C, G, and T represent the nucleic acids for DNA. U replaces T when DNA is transcribed into ribonucleic acid (RNA). N is the common representation for "unknown," as when a sequencer can't determine a base with certainty. Note that the lowercase versions of these single-letter codes is also used on occasion, frequently for DNA, rarely for protein.