GenBank

From MetaBase

Jump to: navigation, search
Text from the NAR 2007 Database Issue (Database Summaries) by permission of Oxford University Press.

GenBank® is NAR Database No. 3.

Contents

Database Description

GenBank® is a comprehensive sequence database that contains publicly available DNA sequences for more than 170,000 different organisms, obtained primarily through the submission of sequence data from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the BankIt (Web) or Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical literature via PubMed. Sequence similarity searching is provided by the BLAST family of programs. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. NCBI also offers a wide range of World Wide Web retrieval and analysis services based on GenBank data. The GenBank database and related resources are freely accessible via the NCBI home page at http://www.ncbi.nlm.nih.gov.

Authors

  • Missing

External Links

Contact Email




Divisions

Information mostly taken from PMID 18073190

The GenBank database is divided into 18 different divisions. Traditionally divisions roughly corresponded to taxonomic groups, for example, bacteria (BCT), viruses (VRL), primates (PRI) and rodents (ROD). In recent years, however, divisions have been added to support specific sequencing strategies. These include;

WGS - whole genome shotgun sequences 
Contains contigs from ongoing Whole Genome Shotgun (WGS) sequencing projects. These records can contain annotations, and an entire project is updated as sequencing progresses. See http://www.ncbi.nlm.nih.gov/Genbank/wgs.html
EST - expressed sequence tags 
What, why, how? GenBank EST data are processed into the companion database, dbEST and UniGene. See http://www.ncbi.nlm.nih.gov/dbEST/
STS - sequence tagged sites
What, why, how? See http://www.ncbi.nlm.nih.gov/dbSTS/
GSS - genome survey sequences 
Sequences in this division are the products of as many as 80 different experimental techniques, including a large number of BAC end sequences. See http://www.ncbi.nlm.nih.gov/dbGSS/
ENV - environmental sample sequences 
This division accommodates non-WGS sequences obtained via environmental sampling methods.
HTG - high throughput genomic sequences 
This division was created to accommodate a growing need to make unfinished genomic sequence data rapidly available to the scientific community. It contains unfinished DNA sequences generated by the high-throughput sequencing centres. These records are designated as Phase 0–3 depending on the quality of the data. Upon reaching Phase 3, the finished state, HTG records are moved into the appropriate organism division of GenBank. See http://www.ncbi.nlm.nih.gov/HTGS/
HTC - high-throughput cDNA sequences 
This division accommodates high-throughput cDNA sequences. HTCs are of draft quality but may contain 5'UTRs and 3'UTRs, partial coding regions and introns. HTC sequences which are finished and of high quality are moved to the appropriate organism GenBank division
TPA - Third Party Annotation 
records support the reporting of published sequence annotation by a scientist other than the original submitter of the primary sequence.

See also


See also


See also; GenBank in Wikipedia.


Search for "GenBank" in:

Web Search Wiki Sites Scientific Meta-Databases
Personal tools