IUB_IUPAC Acid Codes

Standard IUB/IUPAC Amino and Nucleic Acid Codes

Borrowed from Tisdall's Beginning Perl for Bioinformatics, pp.30-1:

For expediency, the names of the nucleic acids and the amino acids are often represented as one- or three-letter codes, as shown in Table 4-1 and Table 4-2. (This book mostly uses the one-letter codes for amino acids.)

Table 4-1. Standard IUB/IUPAC nucleic acid codes

Code

Nucleic Acid(s)

A

Adenine

C

Cytosine

G

Guanine

T

Thymine

U

Uracil

M

A or C (amino)

R

A or G (purine)

W

A or T (weak)

S

C or G (strong)

Y

C or T (pyrimidine)

K

G or T (keto)

V

A or C or G

H

A or C or T

D

A or G or T

B

C or G or T

N

A or G or C or T (any)

Table 4-1. Standard IUB/IUPAC nucleic acid codes
Code	Nucleic Acid(s)
A	Adenine
C	Cytosine
G	Guanine
T	Thymine
U	Uracil
M	A or C (amino)
R	A or G (purine)
W	A or T (weak)
S	C or G (strong)
Y	C or T (pyrimidine)
K	G or T (keto)
V	A or C or G
H	A or C or T
D	A or G or T
B	C or G or T
N	A or G or C or T (any)

Table 4-2. Standard IUB/IUPAC amino acid codes

One-letter code

Amino acid

Three-letter code

A

Alanine

Ala

B

Aspartic acid or Asparagine

Asx

C

Cysteine

Cys

D

Aspartic acid

Asp

E

Glutamic acid

Glu

F

Phenylalanine

Phe

G

Glycine

Gly

H

Histidine

His

I

Isoleucine

Ile

K

Lysine

Lys

L

Leucine

Leu

M

Methionine

Met

N

Asparagine

Asn

P

Proline

Pro

Q

Glutamine

Gln

R

Arginine

Arg

S

Serine

Ser

T

Threonine

Thr

V

Valine

Val

W

Tryptophan

Trp

X

Unknown

Xxx

Y

Tyrosine

Tyr

Z

Glutamic acid or Glutamine

Glx

Table 4-2. Standard IUB/IUPAC amino acid codes
One-letter code	Amino acid	Three-letter code
A	Alanine	Ala
B	Aspartic acid or Asparagine	Asx
C	Cysteine	Cys
D	Aspartic acid	Asp
E	Glutamic acid	Glu
F	Phenylalanine	Phe
G	Glycine	Gly
H	Histidine	His
I	Isoleucine	Ile
K	Lysine	Lys
L	Leucine	Leu
M	Methionine	Met
N	Asparagine	Asn
P	Proline	Pro
Q	Glutamine	Gln
R	Arginine	Arg
S	Serine	Ser
T	Threonine	Thr
V	Valine	Val
W	Tryptophan	Trp
X	Unknown	Xxx
Y	Tyrosine	Tyr
Z	Glutamic acid or Glutamine	Glx

The nucleic acid codes in Table 4-1 include letters for the four basic nucleic acids; they also define single letters for all possible groups of two, three, or four nucleic acids. In most cases in this book, I use only A, C, G, T, U, and N. The letters A, C, G, and T represent the nucleic acids for DNA. U replaces T when DNA is transcribed into ribonucleic acid (RNA). N is the common representation for "unknown," as when a sequencer can't determine a base with certainty. Note that the lowercase versions of these single-letter codes is also used on occasion, frequently for DNA, rarely for protein.