Genome Project Solutions    
Partnering for Discovery
    Genome Comparison and Analysis Toolbox: For Whole Genome Evolutionary Analysis        

Why must we sort orthologs from paralogs?

Orthologs are genes related by common descent, i.e., "true" homologs. The copies are generated by speciation, not by gene duplication. An example would be the beta-hemoglobin genes of human and chimpanzee.

Paralogs are genes related by gene duplication.  Examples would be the beta-hemoglobin of human and the delta hemoglobin of chimpanzee, or the beta and delta hemoglobin of the same organism.

Why does this matter? In the absence of biochemical assays, the best possible inference for gene function is that it is shared by orthologs, and that gene duplications allow one copy to diverge to take on a new function or to be otherwise specialized (e.g., in timing or location of expression).

The figure below illustrates how the commonly used method of reciprocal best-BLAST matching leads to incorrect assignment of gene identities (and their correlate, gene function). In this example (found for many real world examples), the evolutionary split between the two organisms has occurred after a gene duplication that generated paralogs named "Gene-A" and "Gene-B". Genes do not all evolve at the same rate and, in this example, we're imagining that it is Gene-B in organism 1 and Gene-A in organism 2 that happen to have the slower rates. That being the case, the reciprocal best matches are between Gene-B of organism 1 and Gene-A of organism 2, so these paralogs are erroneously inferred to be orthologous and assigned the same function. The other two genes are assigned no function at all, since the best match to Gene-A of organism 1 is Gene-A of organism 2, but this is not reciprocal, and similarly for Gene-B of organism 2.

Only a complete phylogenetic reconstruction using accurate methods - such as is done in the PHRINGE pipeline - can reconstruct this and make guide the proper inference of orthology and functional assignment.

Back to PHRINGE Summary Page

To details about how the PHRINGE Pipeline works