1. Online Updated Molecular Biology Database Collection 2016
1.1. 1. Nucleic Acid Sequence and Structure
1.1.1. CEGA
1.1.1.1. Highly conserved groups of vertebrates. (potential promoters, enhancers and other regulatory elements
1.1.2. JuncDB
1.1.2.1. exon-exon junction sequences
1.1.3. dbSUPER, SEA
1.1.3.1. sequences of super-enhancers
1.1.4. Dfam
1.1.4.1. human DNA repeat families
1.1.5. ARESite
1.1.5.1. AU-rich elements in vertebrates UTRs
1.1.6. NPIDB
1.1.6.1. nuclear-protein interaction database-proposes clasification of DNA-protein complexes
1.1.7. JASPAR, HOCOMOCO, ORegAnno, RegulonDB
1.1.7.1. Transcriptional regulation
1.1.8. BIGNAsim
1.1.8.1. DNA dynamics- molecular dynamics simulations
1.2. 2. Protein Sequences and Structure
1.2.1. ELM, NBDB, UET
1.2.1.1. predicted protein functional sites
1.2.2. sORE
1.2.2.1. proteomics database- ribosome profiling
1.2.3. PRIDE, dbPTM
1.2.3.1. updates on databases on proteomic peptide identification and post-translational modifications
1.2.4. PDBe
1.2.4.1. improve the value added to and accessibility of protein structure significantly
1.3. 3. Metabolic and Signaling Pathways
1.3.1. KEGG, MetaCyc, Reactome, WikiPath, ECMDB, BiGG Models & MNXref/MetaNetX.
1.3.1.1. Metabolomics data- metabolte standards, protocols, tutorials & analysis tools.
1.4. 4. Viruses, Bacteria,Protozoa and Fungi
1.4.1. MG-RAST, EBI Metagenomics & probeBASE, Human Pan-Microbe Communities
1.4.1.1. metagenomics resources
1.4.2. BacWGST
1.4.2.1. identifying bacterial strains in samples isolated from infection.
1.4.3. Ensembl Genomes, Bacterial Diversity (BacDive)
1.4.3.1. organizational genome diversity
1.5. 5. Genomes of Human and Model Organisms
1.5.1. DMDD
1.5.1.1. Collects phenotypic data of mouse mutant embryos.
1.5.2. dbMAE
1.5.2.1. allele-specific expression of autosomal genes, transcriptional activity of two alleles epigenetically controlled
1.6. 6. Human Diseases and Drugs
1.6.1. ClinVar, GWASdb, HaploReg
1.6.1.1. human genetic variation
1.6.1.2. resources on patented drugs, potential drug targets
1.6.1.3. side effects, withdrawn drugs
1.7. 7. Plants
1.7.1. IC4R
1.7.1.1. all aspects of rice research
1.8. 8. Others (Mitochondrial and Chemical Compounds)
1.8.1. MitoCarta, MitoMiner
1.8.1.1. mitochondrial proteins
1.8.2. MitoAge
1.8.2.1. mitochondrial DNA properties from various organisms
2. Numbers available
2.1. total : 1685 databases
2.1.1. 88 new resources
2.1.2. 23 removed obsolete websites
3. Why we need to group these databases?
3.1. large number of databases makes bioinformatics researcher difficult to track their database within their own specific area
3.2. ease of search
3.3. aim to provide structural annotation
3.4. rapid publication of papers by following categories
3.5. allowed users to narrow down the possible links they needed
4. References
4.1. 1. http://www.avatar.se/strbio2001/databases/why.html 2. http://www.charite.de/sysbio/people/koenig/presentation/100505_koenig_biological_databases.pdf 3. Galperin, Michael Y. “The Molecular Biology Database Collection: 2007 Update.” Nucleic Acids Research 35.Database issue (2007): D3–D4. PMC. Web. 4 Mar. 2016. 4. Rigden, Daniel J., Xosé M. Fernández-Suárez, and Michael Y. Galperin. "The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection." Nucleic acids research 44.D1 (2016): D1-D6. 5. http://www.oxfordjournals.org/our_journals/nar/database/c/ 6. http://m.nar.oxfordjournals.org/content/37/suppl_1/D1.full
5. Online Molecular Biology Databases Collection
5.1. 1. Nucleotide Sequence Databases
5.2. 2. RNA Sequence Databases
5.3. 3. Protein Sequence Databases
5.4. 4. Structure Databases
5.5. 5. Genomics Databases(Non-Vertebrate)
5.6. 6.Metabolic and Signaling Pathways
5.7. 7. Human and other Vertebrate Genomes
5.8. 8. Human Genes and Diseases
5.9. 9. Microarray Data and other other Gene Expression Databases
5.10. 10.Proteomics Resourses
5.11. 11. Others Molecular Biology Databases
5.12. 12. Organelle Databases
5.13. 13. Plant Databases
5.14. 14. Immunological Databases
5.15. 15. Cell Biology
6. Why databases are created and shared?
6.1. Make biological data available to scientists
6.1.1. not all data is actually published explicitly in a single article eg . genome sequences
6.1.2. search for all data very time-consuming
6.1.3. publish data maybe difficult to find or access in one single place eg. book, site, database
6.2. To make biological data available in computer-readable form
6.3. need for storing and communicating large datasets has grown tremendously
6.4. convenient method for storing, searching and retrieving necessary data
7. why some databases are no longer in the the databases and dropped from it ?
7.1. superseded by newer and more advanced databases
7.1.1. Crow21
7.1.2. PIR-NREF
7.2. the database is no longer own by its former organization
7.2.1. INFOBIOGEN
7.2.1.1. HugeMap
7.2.1.2. GenetPig
7.2.1.3. DbCat
7.3. database is migrated to new website
7.3.1. INFOBIOGEN
7.4. tightening of budget
7.4.1. ExDom databases of exon-intron structure of the genes in seven eukaryotic genome has to be removed because it does not provide free version anymore