GenBank
From MetaBase
- Galperin MY. The Molecular Biology Database Collection: 2007 update. Nucl. Acids Res. 2007. 35: D3-D4; doi:10.1093/nar/gkl1008. http://nar.oxfordjournals.org/cgi/content/abstract/35/suppl_1/D3
GenBank® is NAR Database No. 3.
Contents |
Database Description
GenBank® is a comprehensive sequence database that contains publicly available DNA sequences for more than 170,000 different organisms, obtained primarily through the submission of sequence data from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the BankIt (Web) or Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical literature via PubMed. Sequence similarity searching is provided by the BLAST family of programs. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. NCBI also offers a wide range of World Wide Web retrieval and analysis services based on GenBank data. The GenBank database and related resources are freely accessible via the NCBI home page at http://www.ncbi.nlm.nih.gov.
Authors
- Missing
External Links
- GenBank® homepage
- GenBank® abstract in the NAR 2006 Database Issue
Contact Email
Divisions
Information mostly taken from PMID 18073190
The GenBank database is divided into 18 different divisions. Traditionally divisions roughly corresponded to taxonomic groups, for example, bacteria (BCT), viruses (VRL), primates (PRI) and rodents (ROD). In recent years, however, divisions have been added to support specific sequencing strategies. These include;
- WGS - whole genome shotgun sequences
- Contains contigs from ongoing Whole Genome Shotgun (WGS) sequencing projects. These records can contain annotations, and an entire project is updated as sequencing progresses. See http://www.ncbi.nlm.nih.gov/Genbank/wgs.html
- EST - expressed sequence tags
- What, why, how? GenBank EST data are processed into the companion database, dbEST and UniGene. See http://www.ncbi.nlm.nih.gov/dbEST/
- STS - sequence tagged sites
- What, why, how? See http://www.ncbi.nlm.nih.gov/dbSTS/
- GSS - genome survey sequences
- Sequences in this division are the products of as many as 80 different experimental techniques, including a large number of BAC end sequences. See http://www.ncbi.nlm.nih.gov/dbGSS/
- ENV - environmental sample sequences
- This division accommodates non-WGS sequences obtained via environmental sampling methods.
- HTG - high throughput genomic sequences
- This division was created to accommodate a growing need to make unfinished genomic sequence data rapidly available to the scientific community. It contains unfinished DNA sequences generated by the high-throughput sequencing centres. These records are designated as Phase 0–3 depending on the quality of the data. Upon reaching Phase 3, the finished state, HTG records are moved into the appropriate organism division of GenBank. See http://www.ncbi.nlm.nih.gov/HTGS/
- HTC - high-throughput cDNA sequences
- This division accommodates high-throughput cDNA sequences. HTCs are of draft quality but may contain 5'UTRs and 3'UTRs, partial coding regions and introns. HTC sequences which are finished and of high quality are moved to the appropriate organism GenBank division
- TPA - Third Party Annotation
- records support the reporting of published sequence annotation by a scientist other than the original submitter of the primary sequence.
See also
See also
See also; GenBank in Wikipedia.
Search for "GenBank" in:
| Web Search | Wiki Sites | Scientific | Meta-Databases |
|---|---|---|---|
|
