Create your own awesome maps

Even on the go

with our free apps for iPhone, iPad and Android

Get Started

Already have an account?
Log In

Orthology by Mind Map: Orthology
0.0 stars - reviews range from 0 to 5

Orthology

tests / benchmarks

benchmarks

done by, Altenhoff, Dutilh, Pryszcz

hampered by, availability of results, heterogeneous datasets, taxonomic biases, difference in underlying methodology, sparse documentation

did what, compared orthology methods as large as, phylogenetic, phylogenomic, synteny-based dataset, latent class analysis (meta-analysis)

test data

functional data, e.g., GO terms, score btw. 0 and 1 according to hierachical structure of GO, score, bias, Circular dependency of GO, gene expression data, enzyme numbers, gene neighborhood conservation, as done by, LOFT, phylogenetic congruence, KEGG orthology numbers, HAMAP family accession numbers, problematic b/o, limited availability of annotations, pure annotations, assumptions, only orthologs same functions, orthologs have same functions

phylogenetic data, problematic b/o, Only gene families, assumptions, rate of evolution of species, equals rate of genes, existing databases, TreeFam, TreeFam-A, TreeFam-B, COG, computed trees from COG groups (LOFT), Methaphors, HoGenom, PhylomeDB, Ensembl Compara

simulated data, Arvestedt's software (aladen), simulates gene trees given a species tree, then MSA

reference benchmarks, TreeFam, Human-mouse orthologs, multi-domain proteins

domain fusion/fisson

findings

similarity scores, raw score vs. bit score vs. e-value vs. identity, different configurations, Blast vs. Smith Waterman, hard vs soft masking

use of external information, e.g., gene neighborhood, app. half of BBH hits were out-paralogs, because of, reciprocal gene loss

latent class analysis, overlap of existing methods

studies

human vs. 7 others, as done by, Dolinski

yeast orthologs, as done by

Methods

Species overlap, described in

Goals

Identification of orthologs + functional genomics

Gold standard

automated biological function interpretation from gene phylogeny

Accurate function prediction

challenges

Tree reconciliation

use counterexamples of assumptions

selection of genes to build tree, What kinds of topologies make tree difficult to partition

Accuracy of tree reconciliation methods

Identification of functionally divergent nodes, History of gene much more complex than duplications, Map functional genomics data onto tree

BBH linkage

testing assumptions, How many BBH pairs are not functionally identical?, Does number of BBH vary btw. closely/distantly-related species?, if so, Correlation btw. phylogenetic distance and # false positives of BBH

Which genes always ambiguous in OG construction?

improvements, mainly concerns, reduce false positives, include, persistent genes, that are, widely distributed genes, leading to, dense BBH networks, use for that, phylogenetic/evolutionary profiles, imply, strong selective pressure, high functional consistence, indispensability in extant species

General

Benchmarks, Clade-specific genes, do they have unique features?, allow they implications about interactions with environmental factors?, Simulation studies, assess influence of, gene loss, conserved gene neighborhood

biological, Insufficient masking of low-complexity regions, protein mosaics/protein subfamilies, recent duplications, example, C.elegans - 7 transmembrane proteins, Definition of terms, orthologs/paralogs, in relation to, one speciation/duplication event, co-orthologs/in-paralogs/out-paralogs/super-orthologs/ultra-paralogs, in relation to, particular sequence of speciation and/or duplication events, protein function, alternative splicing, function prediction, sub-families similar in sequence, but different in domain architecture, Horizontal gene transfer

technical, Standards, for, output, OrthoXML, protocols, Computation, scalability, Database, stay up-to-date, expand representation of orthologs, to, graph-networks, to account for, protein mosaics

Orthology prediction methods

ab initio - building groups of similar genes

formation of groups, Similarity search, Approaches, Best Hit, single linkage, aka, Best bidirectional hit, symmetrical best hit, reciprocal best hit, done by, Inparanoid, OMA, complex linkage, examples are, best triangular hit, done by, COG, eggNOG, OrthoDB, P-Pod, works as, example, evaluation, works fine, example, bacteria, fails, example, eukaryotes, because of, much higher duplication rates --> more likely subfunctionalization, discards whole groups, in presence of, gene loss, domain architecture, as done by, DODO, Similarity scores, E-value, Percent Identity, Bit score, Raw score, Reciprocally smallest distance, done by, RSD, Roundup, advantage, acts more global than BLAST-based scores, Reciprocal nearest neighbor, done by, PHOG, problematic, greater evolutionary distance, Tools, sequence-based, local, Smith-Waterman, ParAlign, Heuristics, FASTA, BLAST, profile-based, rpsBLAST, HMMER, Biases, induced by, Alternative splicing, approaches to deal with use, longest transcript, supertranscript, best suitable, as done by, PhyOP, Horizontal gene transfer, so far, detect recently acquired genes, protein mosaics, approaches to deal with, use, overlap sequence length cutoff of, 50%, as in, Inparanoid, example, unequal rates of evolution, example, missing close relative /, missing closely-related database sequence, example, lead to, False positives, caused by, BBH with different function, Violating transitive closure, Protein mosaics, (Single/multiple) Gene loss, solved by, tree-based methods, leading to, Wrong group assignment, Assignment to just one instead of several groups, False negatives, caused by, Presence of recent inparalogs, Too distantly related (above threshold), solved by, increase number of close taxa, Subfunctionization, leading to, Missing at least one ortholog

expanding groups, adding in-paralog, idea, dist A1-A2 < A1-B1, approaches, Inparanoid, OrthoDB, improve orthology detection, External knowledge, from, Considering domain architecture, done by, eggNOG, DODO, available from, Pfam, CDD database, Interpro, Conserved gene neighborhood, done by, fails if, literature, done by, P-Pod, third species about gene duplications, Outgroup species to evaluate BBH, done by, Inparanoid, Ortholuge, OMA, benefits, might provide extra resolution and specificity, disadvantage, decrease sensitivity by removing authentic orthologs, problems, requires phylogeny, requires good outgroup, app. constructing quartet trees, similarity, Blast-score, done by, QuartetS, idea, uses BLAST-Score instead to avoid quartet trees, clustering, approaches, MCL / more densely connected linkages, done by, OrthoMCL, Jaccard Clustering, done by, TIGR, P-Pod, drawback, hits might not be RBH

post-processing - building ortholog goups

are based on

and use, phylogenetic trees, required pre-steps are, multiple sequence alignment, has issues in, wrong alignment, the faster genes evolve, the more distantly-related species are, tree reconstruction, has issues in, reliability, can be assessed by, bootstrapping, mcmc sampling, correctly placing root, by, midpoint rooting, manual outgroup selection, minimize dissimilarity btw. gene and species tree, as done by, unsupervised, automated methods as, PhylomeDB, PANTHER, TreeFam, Ensembl Compara, OrthologID, works as, example, Overview of OrthologID. Maximum parsimony trees are generated and diagnostic characters are determined through an automated process: (1) sequences are retrieved from OrthologID Database and clustered using the Gene Family Creator and aligned, using the Alignment Constructor (which interfaces with MAFFT); (2) phylogenetic trees are generated using the Tree Builder (which interfaces with PAUP) and (3) diagnostic characters are ascertained using the Diagnostic Generator (which interfaces with CAOS). Each OrthologIDmodule, shown as trapezoids, are designed to function independently and allow the use of any processing tool (e.g. one could use ClustalW instead of MAFFT for sequence alignment)., GetHogs, works as, Inferring hierarchy, for, orthology graph, pure reconciliation methods as, SYNERGY, RAP, pros, The algorithm can handle unresolved trees and take both bootstrap values and branch lengths into account for the reliability of trees. The RAP program is freely available, cons, RAP cannot be used as a command line tool., TreeBeST, Species duplication inference (SDI), cons, SDI cannot root the input trees and requires fully resolved trees, RIO, works as, example, A simple example of the RIO procedure Four bootstrap resampled gene trees are shown. Letters represent sequence names/"functions". "A" (nematode and wheat) are true orthologs of the human query sequence, whereas "B" (rat) is a true paralog of the query (i.e. the first tree happens to be the real one). In 3 out of 4 trees nematode "A" appears orthologous to the query, in 3 out of 4 trees wheat "A" appears orthologous to the query. Rat "B" never appears to be orthologous. For an example of actual RIO output see Figure 7., manual curation, as in, TreeFam, PANTHER, in absence of species tree, as done by, LOFT, works as, example, Numbering starts at the top with base number 1. Intermediate duplications result in sub-orthologous groups, e.g. 1.1 and 1.2, indicating fully paralogous genes descending from a gene in base group 1. An intermediate duplication of a gene in group 1.2 will result in sub-orthologous groups 1.2.1 and 1.2.2, which are full paralogs descending from a gene in orthologous group 1.2. In the event that another gene from orthologous group 1.2 is also duplicated, LOFT will assign numbers 1.2.3 and 1.2.4 to these paralogous groups, because numbers 1.2.1 and 1.2.2 have already been assigned., evaluation, pros, LOFT infers orthologs and paralogs from pre-computed homologs in a hierarchical framework without a species tree., cons, LOFT cannot be executed without the GUI as a command line tool. The ‘species-overlap’ is not adjustable, COCO-CL, pros, requires no tree, cons, does not implement tree-reconciliation, Orthostrapper, HOPS, pros, HOPS provides domain-based orthologs. The Orthostrapper program is freely available, cons, HOPS dataset is not available for download and the web server does not work, use, hierachical clustering, species overlap rule, works as, This simple heuristics implies that a speciation event is only assigned to an internal node if its branches contain mutually exclusive sets of species., Tree reconciliation on unresolved tree, used by, PhIGs, approaches, reconcile with, species tree, function-oriented partition, functional divergence cut, closer to root, poss. making separate groups, less likely for large number of species to retain two paralogs with same function, closer to leaves, hard to determine: dupl. event --> sub-/neofunctionalization, manual curation of groups, based on, goal, partitioning tree similar to BBH linkage, reconciliation, mapping gene duplication events on tree, functional divergence cut, determine gene duplications resulting in functional divergence, issues, biased by, wrong tree, species tree, in the case of, different evolutionary rates, computed gene tree, unequivocal duplication events, happens with, multicellular organisms with many conserved duplications, violation of gene tree != species tree, limited by, computationally complexity, phylogenetic signal contained in MSA, lack of automation of outgroup selection

hybrids

combination of existing dbs, as done by, YOGY, MetaPhOrs

combination of existing methods, advantage, scalable as, graph-based methods, use phylogenetic information as, tree-based methods, as done by, Ensembl Compara, HomoloGene, OrthoParaMap, PhIGs, PHOG, PhyOP, TreeFam, eggNOG, is based on, COG, P-POD, works as, example

Background

Biological

Reasons for bias, Gene loss, single gene loss, reciprocal gene loss, Gene gain, Horizontal gene transfer, leads to, xenologs, occurs within, prokaryotes, eukaryotes, Incomplete lineage sorting, Mosaics of proteins, Outcome, Chimeras of proteins, Processes, gene fission, domain gain, domain loss, gene shuffling, fusion, Alternative Splicing

Definitions

issues, orthology, constraint, in given context, that is, taxonomic sampling, General problem about definitions, different definitions -> no quality assessment, ortholog group, defined with respect to, last common ancestor of genes, as done by, COCO-CL, LOFT, Synteny, wrongly used b/o, originally denoted gene loci on the same chromosome regardless of whether or not they are genetically linked.

definitions, ortholog group, defined as, a collection of homologous sequences from at least two species., gene function, defined as, Gene function - Relationship to other biological objects in the cell, including its interactions with other genes, proteins, chromosome intergenic regions, etc.., problem with, proven by, biochemical and/or structural studies, homology, orthology, defined as, homologous sequences derived by a speciation event from a single ancestral sequence in the last common ancestor of the species being compared., special case is, super-orthology, defined as, a subset of orthologs selected on a rooted gene tree such that only speciation events are assigned to each internal node on their connecting path, parology, inparalogy, defined as, paralogs that result from a lineage-specific duplication(s) subsequent to a given speciation event (sometimes termed ‘recent’ paralogs)., aka, co-orthologs, outparalogy, ultra-parology, defined as, a subset of paralogs selected on a rooted tree such that its internal nodes connecting them represent only duplication events (in-paralogs), xenologs, defined as, homologous sequences, the history of which involves transfer of genetic information between species (HGT)., subtree-neightbors, defined as, Given a completely binary and rooted gene tree, the k-subtree-neighbors of a sequence q are defined as all sequences derived from the k-level parent node of q, except q itself (the level of q itself is 0, q's parent is 1, and so forth), Basic units of orthology, domain, as used by, RIO, HOPS, PHOG, gene sequence / proteins, as used by, most methods, original definition, as originally defined, in context of, dist. between homologous and analogous proteins, Horizontal gene transfer, defined as, an evolutionary process that involves transfer of genetic material between species but does not follow the vertical descent from a parental lineage to its offspring., found in, HGT is an important phenomenon in the evolution of prokaryotes and eukaryotes, Conserved gene neighborhood, defined as, Conserved gene neighborhood (CGN): refers to conserved genomic segments containing orthologous genes in a similar collinear order between species., non-transitivity of phylogenetic relationships, defined as, orthology, paralogy and xenology are strictly pairwise and non-transitive relationships between (groups of) genes, examples, This can best be understood using the following example: if two genes, a and b, are equally (co-) orthologous to gene c, it does not imply that a and b must also be orthologous to each other

Assumptions

Best bidirectional hit, true for, two genes with same function, implied assumptions, function by single gene, destroyed by, subfunctionalization, biological example, bias example, _, present in both species, destroyed by, missing selective pressure on that function, presence of compensatory pathway, Transitivity of orthologs, destroyed by, example, implied assumption, orthologs mutually most similar in all species

smallest reciprocal distance, implied assumptions, most similar = most likely orthologous, true for, two genes with same function

gene evolution = species evolution, implied assumption, duplication leads to, sub-/neofunctionalization, true for, most cases, exeptions, exist, same evolutionary pattern, violated by

orthologs = similar/same function, implied assumption, Gene neighborhood implies orthology, violated by, low sequence similarity, reciprocal gene losses, works fine for, closely-related species, similar/same, EC number, GO categories, violated by, sophisticated regulatory differences btw. species, expression profiles, failes for, Genes that lost/changed function

General, graph-based vs. tree-based, approaching a problem, globally (tree), tree-based, locally (pairwise), graph-based

Addition of inparalogs, allowed if, genes are closer to ortholog of same species than to any gene of others

transitivity of orthologous relationship, violated by

Xenologs, implied assumption, They often appear as true orthologs in genome comparisons and might exhibit variable functions