Get Started. It's Free
or sign up with your email address
Orthology by Mind Map: Orthology

1. tests / benchmarks

1.1. benchmarks

1.1.1. done by Altenhoff Dutilh Pryszcz

1.1.2. hampered by availability of results heterogeneous datasets taxonomic biases difference in underlying methodology sparse documentation

1.1.3. did what compared orthology methods as large as phylogenetic phylogenomic synteny-based dataset latent class analysis (meta-analysis)

1.2. test data

1.2.1. functional data e.g. GO terms gene expression data enzyme numbers gene neighborhood conservation phylogenetic congruence KEGG orthology numbers HAMAP family accession numbers problematic b/o limited availability of annotations pure annotations assumptions

1.2.2. phylogenetic data problematic b/o Only gene families assumptions existing databases TreeFam COG Methaphors HoGenom PhylomeDB Ensembl Compara

1.2.3. simulated data Arvestedt's software (aladen) simulates gene trees given a species tree, then MSA

1.2.4. reference benchmarks TreeFam Human-mouse orthologs multi-domain proteins

1.2.5. domain fusion/fisson

1.3. findings

1.3.1. similarity scores raw score vs. bit score vs. e-value vs. identity different configurations Blast vs. Smith Waterman hard vs soft masking

1.3.2. use of external information e.g. gene neighborhood

1.3.3. latent class analysis overlap of existing methods

1.4. studies

1.4.1. human vs. 7 others as done by Dolinski

1.4.2. yeast orthologs as done by

1.5. Methods

1.5.1. Species overlap described in

2. Goals

2.1. Identification of orthologs + functional genomics

2.2. Gold standard

2.3. automated biological function interpretation from gene phylogeny

2.4. Accurate function prediction

3. challenges

3.1. Tree reconciliation

3.1.1. use counterexamples of assumptions

3.1.2. selection of genes to build tree What kinds of topologies make tree difficult to partition

3.1.3. Accuracy of tree reconciliation methods

3.1.4. Identification of functionally divergent nodes History of gene much more complex than duplications Map functional genomics data onto tree

3.2. BBH linkage

3.2.1. testing assumptions How many BBH pairs are not functionally identical? Does number of BBH vary btw. closely/distantly-related species? if so

3.2.2. Which genes always ambiguous in OG construction?

3.2.3. improvements mainly concerns reduce false positives include persistent genes

3.3. General

3.3.1. Benchmarks Clade-specific genes do they have unique features? allow they implications about interactions with environmental factors? Simulation studies assess influence of

3.3.2. biological Insufficient masking of low-complexity regions protein mosaics/protein subfamilies recent duplications Definition of terms orthologs/paralogs co-orthologs/in-paralogs/out-paralogs/super-orthologs/ultra-paralogs protein function alternative splicing function prediction sub-families similar in sequence, but different in domain architecture Horizontal gene transfer

3.3.3. technical Standards for Computation scalability Database stay up-to-date expand representation of orthologs to

4. Orthology prediction methods

4.1. ab initio - building groups of similar genes

4.1.1. formation of groups Similarity search Approaches Similarity scores Tools Biases induced by lead to

4.1.2. expanding groups adding in-paralog idea approaches improve orthology detection External knowledge clustering

4.2. post-processing - building ortholog goups

4.2.1. are based on

4.2.2. and use phylogenetic trees required pre-steps are as done by approaches goal issues

4.3. hybrids

4.3.1. combination of existing dbs as done by YOGY MetaPhOrs

4.3.2. combination of existing methods advantage scalable as use phylogenetic information as as done by Ensembl Compara HomoloGene OrthoParaMap PhIGs PHOG PhyOP TreeFam eggNOG P-POD

5. Background

5.1. Biological

5.1.1. Reasons for bias Gene loss single gene loss reciprocal gene loss Gene gain Horizontal gene transfer leads to occurs within Incomplete lineage sorting Mosaics of proteins Outcome Processes Alternative Splicing

5.2. Definitions

5.2.1. issues orthology constraint General problem about definitions different definitions -> no quality assessment ortholog group defined with respect to Synteny wrongly used b/o

5.2.2. definitions ortholog group defined as gene function defined as problem with proven by homology orthology parology xenologs subtree-neightbors Basic units of orthology domain gene sequence / proteins original definition Horizontal gene transfer defined as found in Conserved gene neighborhood defined as non-transitivity of phylogenetic relationships defined as examples

5.3. Assumptions

5.3.1. Best bidirectional hit true for two genes with same function implied assumptions function by single gene present in both species Transitivity of orthologs

5.3.2. smallest reciprocal distance implied assumptions most similar = most likely orthologous true for two genes with same function

5.3.3. gene evolution = species evolution implied assumption duplication leads to same evolutionary pattern

5.3.4. orthologs = similar/same function implied assumption Gene neighborhood implies orthology similar/same failes for Genes that lost/changed function

5.3.5. General graph-based vs. tree-based approaching a problem

5.3.6. Addition of inparalogs allowed if genes are closer to ortholog of same species than to any gene of others

5.3.7. transitivity of orthologous relationship violated by

5.3.8. Xenologs implied assumption They often appear as true orthologs in genome comparisons and might exhibit variable functions