Why
must we sort orthologs from paralogs?
Orthologs are genes
related by common descent, i.e., "true" homologs. The
copies are generated by speciation, not by gene duplication. An
example would be the beta-hemoglobin genes of human and chimpanzee.
Paralogs are genes
related by gene duplication. Examples would be the beta-hemoglobin
of human and the delta hemoglobin of chimpanzee, or the beta and
delta hemoglobin of the same organism.
Why does this matter?
In the absence of biochemical assays, the best possible inference
for gene function is that it is shared by orthologs, and that gene
duplications allow one copy to diverge to take on a new function
or to be otherwise specialized (e.g., in timing or location of expression).
The figure below
illustrates how the commonly used method of reciprocal best-BLAST
matching leads to incorrect assignment of gene identities (and their
correlate, gene function). In this example (found for many real
world examples), the evolutionary split between the two organisms
has occurred after a gene duplication that generated paralogs named
"Gene-A" and "Gene-B". Genes do not all evolve
at the same rate and, in this example, we're imagining that it is
Gene-B in organism 1 and Gene-A in organism 2 that happen to have
the slower rates. That being the case, the reciprocal best matches
are between Gene-B of organism 1 and Gene-A of organism 2, so these
paralogs are erroneously inferred to be orthologous and assigned
the same function. The other two genes are assigned no function
at all, since the best match to Gene-A of organism 1 is Gene-A of
organism 2, but this is not reciprocal, and similarly for Gene-B
of organism 2.
Only a complete phylogenetic
reconstruction using accurate methods - such as is done in the PHRINGE
pipeline - can reconstruct this and make guide the proper inference
of orthology and functional assignment.
Back to PHRINGE
Summary Page
To details about
how the PHRINGE Pipeline works
|