Parsing the FEATURES Table
The full definition of the syntax of the FEATURES table can be
found at http://www.ncbi.nlm.nih.gov/collab/FT/index.html.
As an overview, you might first look over the NCBI gbrel.txt
document mentioned earlier.
Features
The GenBank entries in the book are very simple, including only
three features. The FEATURES table can support a great many more types, however,
and these are listed here (and borrowed from Tisdall's book):
- allele
- Obsolete; see
variation
feature key
- attenuator
- Sequence related to transcription termination
- C_region
- Span of the C immunological feature
- CAAT_signal
- CAAT box in eukaryotic promoters
- CDS
- Sequence coding for amino acids in protein (includes
stop codon)
- conflict
- Independent sequence determinations differ
- D-loop
- Displacement loop
- D_segment
- Span of the D immunological feature
- enhancer
- Cis-acting enhancer of promoter function
- exon
- Region that codes for part of spliced mRNA
- gene
- Region that defines a functional gene, possibly including
upstream (promoter, enhancer, etc.) and downstream control elements, and for
which a name has been assigned
- GC_signal
- GC box in eukaryotic promoters
- iDNA
- Intervening DNA eliminated by recombination
- intron
- Transcribed region excised by mRNA splicing
- J_region
- Span of the J immunological feature
- LTR
- Long terminal repeat
- mat_peptide
- Mature peptide coding region (doesn't include stop
codon)
- misc_binding
- Miscellaneous binding site
- misc_difference
- Miscellaneous difference feature
- misc_feature
- Region of biological significance that can't be described
by any other feature
- misc_recomb
- Miscellaneous recombination feature
- misc_RNA
- Miscellaneous transcript feature not defined by other
RNA keys
- misc_signal
- Miscellaneous signal
- misc_structure
- Miscellaneous DNA or RNA structure
- modified_base
- The indicated base is a modified nucleotide
- mRNA
- Messenger RNA
- mutation
- Obsolete: see
variation
feature key
- N_region
- Span of the N immunological feature
- old_sequence
- Presented sequence revises a previous version
- polyA_signal
- Signal for cleavage and polyadenylation
- polyA_site
- Site at which polyadenine is added to mRNA
- precursor_RNA
- Any RNA species that isn't yet the mature RNA product
- prim_transcript
- Primary (unprocessed) transcript
- primer
- Primer binding region used with PCR
- primer_bind
- Noncovalent primer binding site
- promoter
- A region involved in transcription initiation
- protein_bind
- Noncovalent protein binding site on DNA or RNA
- RBS
- Ribosome binding site
- rep_origin
- Replication origin for duplex DNA
- repeat_region
- Sequence containing repeated subsequences
- repeat_unit
- One repeated unit of a repeat_region
- rRNA
- Ribosomal RNA
- S_region
- Span of the S immunological feature
- satellite
- Satellite repeated sequence
- scRNA
- Small cytoplasmic RNA
- sig_peptide
- Signal peptide coding region
- snRNA
- Small nuclear RNA
- source
- Biological source of the sequence data represented
by a GenBank record; mandatory feature, one or more per record; for organisms
that have been incorporated within the NCBI taxonomy database, an associated
/
db_xref="taxon:NNNN"
qualifier will
be present (where NNNNN
is the numeric identifier
assigned to the organism within the NCBI taxonomy database)
- stem_loop
- Hairpin loop structure in DNA or RNA
- STS
- Sequence Tagged Site: operationally unique sequence
that identifies the combination of primer spans used in a PCR assay
- TATA_signal
- TATA box in eukaryotic promoters
- terminator
- Sequence causing transcription termination
- transit_peptide
- Transit peptide coding region
- transposon
- Transposable element (TN)
- tRNA
- Transfer RNA
- unsure
- Authors are unsure about the sequence in this region
- V_region
- Span of the V immunological feature
- variation
- A related population contains stable mutation
- -
- Placeholder (hyphen)
- -10_signal
- Pribnow box in prokaryotic promoters
- -35_signal
- -35 box in prokaryotic promoters
- 3'clip
- 3'-most region of a precursor transcript removed in
processing
- 3'UTR
- 3' untranslated region (trailer)
- 5'clip
- 5'-most region of a precursor transcript removed in
processing
- 5'UTR
- 5' untranslated region (leader)
Each of these features can have its own "sub-feature" keys, as
well.