TreeFam
TreeFam
Tree families database The Sanger Institute Beijing Genomics Institute
[Home] [Search] [Browse] [TaxaView] [Download] [FAQ]

Introductions

TreeFam (Tree families database) is a database of phylogenetic trees of animal genes. It aims at developing a curated resource that gives reliable information about ortholog and paralog assignments, and evolutionary history of various gene families.

TreeFam defines a gene family as a group of genes that evolved after the speciation of single-metazoan animals. It also tries to include outgroup genes like yeast (S. cerevisiae and S. pombe) and plant (A. thaliana) to reveal these distant members.

TreeFam is also an ortholog database. Unlike other pairwise alignment based ones, TreeFam infers orthologs by means of gene trees. It fits a gene tree into the universal species tree and finds historical duplications, speciations and losses events. TreeFam uses this information to evaluate tree building, guide manual curation, and infer complex ortholog and paralog relations.

The basic elements of TreeFam are gene families that can be divided into two parts: TreeFam-A and TreeFam-B families. TreeFam-B families are automatically created. They might contain errors given complex phylogenies. TreeFam-A families are manually curated from TreeFam-B ones. Family names and node names are assigned at the same time. The ultimate goal of TreeFam is to present a curated resource for all the families.


Publications and Methods

TreeFam was published in Li et al, Nucleic Acids Res. 2006 Jan 1;34(Database issue):D572-80 . We have also published a TreeFam Update paper in Ruan et al, Nucleic Acids Res. 2008 (Database issue). Please cite these publications if you use TreeFam.

The main software used to build TreeFam, treeBeST (previously called NJTREE), is available from SourceForge at treesoft.sourceforge.net. Many of the methods used in treeBeST are described in a PhD thesis. We intend to provide further web pages describing the methods used to build TreeFam in the near future.

How to get the TreeFam data

There are several ways to access the TreeFam data:

  • Firstly, you can browse the data on the TreeFam website, by searching for your favourite genes on the TreeFam search page.
  • Secondly, we provide flat files of the data in the TreeFam mysql database such as sequences, alignments, trees, and orthologs and paralogs, on the TreeFam download page.
  • Thirdly, if you have knowledge of perl, you can use the TreeFam perl api to extract data such as sequences, alignments, trees and orthologs from the TreeFam mysql database. We provide many example perl scripts that use the TreeFam api on the TreeFam perl api website.
  • Fourthly, if you have knowledge of mysql, you can connect to our publicly accessible TreeFam mysql database by using the command:
    mysql -uanonymous -P3308 -hdb.treefam.org
  • Fifthly, if you have no knowledge of perl or mysql, but would like to extract a large data set from TreeFam for your experiments, please contact the TreeFam mailing list, using the mailing list: treefam@sanger.ac.uk.
Need more detailed instructions?

Contact

If you need help, or have a question about TreeFam, please feel welcome to contact us at the email address: treefam@sanger.ac.uk.

News

TreeFam 7.0 Released - 2009.02.10

TreeFam 7.0 is released today. Gene sets were updated as usual. TreeFam-7 is mainly based on Ensembl v50. 17 species were added in this new release. They are: Cavia porcellus (guinea pig), Oryctolagus cuniculus (rabbit), Equus caballus (horse), Dasypus novemcinctus (armadillo), Sorex araneus (Eurasian shrew), Ochotona princeps (American pika), Otolemur garnettii (galago), Erinaceus europaeus (hedgehog), Microcebus murinus (mouse lemur), Loxodonta africana (elephant), Echinops telfairi (tenrec), Felis catus (cat), Myotis lucifugus (bat), Pongo pygmaeus (orangutan), Spermophilus tridecemlineatus (squirrel), Bombyx mori (silkworm), and Tupaia belangeri (common tree shrew). Gene sets of all species were updated in Sep. 2008. There are now 777,321 genes in total in 16,141 different TreeFam families.

TreeFam 6.0 Released - 2008.06.06

Main geneset sources of release-6.0 came from EnsEMBL v47, the total number of genes in families come to 697,829.

We introduced a new method to integrate Family A/B with Family C, by comparing gene overlaps between families from two sets, we increased threshold step by step to merge families or delete weak connections. As a result, Family C dispeared since this release.

TreeFam 5.0 Released - 2007.12.04

TreeFam 5.0 was released. Gene sets were updated. TreeFam-5 wa mainly based on Ensembl v42. We included many low coverage genomes from ensembl v42, there were Dasypus novemcinctus , Echinops telfairi, Erinaceus europaeus, Felis catus, Myotis lucifugus, Loxodonta africana, Cavia porcellus, Oryctolagus cuniculus, Otolemur garnettii, Spermophilus tridecemlineatus and Tupaia belangeri, and added Dictyostelium discoideum from dictyBase, added ten flys from flyBase.

We promoted about 1000 family C into familyB, because they were new gene families for family A/B. To avoid confusing with family C and A/B, we separated family C from family A/B in the list of search results.

We begin to provide HMMER files for family A/B, they can be found in "Plain File" list at the left side of family page, TreeFam Perl API supports it too.

TreeFam 4.0 Released - 2007.03.07

TreeFam 4.0 is released today. In this new release, we introduce clustering based families (TF5 families) to give a more complete coverage of all annotated genes. Previously, building automated TreeFam families always started from the orginal PhIGs clusters. However, as the number of fully sequences species is growing rapidly and gene annotations become more and more accurate with years, sticking to old clusters made TreeFam miss many genes. To increase the coverage of all annotated genes, we decide to do clustering for each new release. The resultant clusters become TF5 families. Consequently, each TreeFam gene is classified in two ways: the conventional competitive method used in TreeFam-2 and the new clustering method. Searching for one gene usually leads to two results, representing the two classifying methods.

Gene sets were updated as usual. TreeFam-4 is mainly based on Ensembl v41. Four species were added in this new release. They are: Ciona savignyi, Gasterosteus aculeatus, Oryzias latipes and Aedes aegypti. Apis mellifera genes have been dropped since Ensembl did not provide the annotations any more. Gene sets of all the other species were also updated in October, 2006.

TreeSoft Project Launched - 2006.11.01

TreeSoft was registered at SourceForge.net. TreeSoft is a collection of softwares that build, display or manipulate phylogenetic trees. It is also the code base for softwares that are developed for the TreeFam (Tree Families database). At the same time, TreeSoft provides brief introductions and links to other softwares, databases or web services for phylogenetic trees. TreeSoft is an open source project hosted by SourceForge.net. The SourceForge.net project page is at http://sourceforge.net/projects/treesoft/. TreeSoft provides downloads and documentations for most of source codes developed for TreeFam.

HGNC Links to TreeFam - 2006.10.24

HUGO Gene Nomenclature Committee (HGNC) started to provide cross-reference links to TreeFam. These links are available in both gene pages and HOCP (HGNC Comparison of Orthology Predictions) pages. Examples are provided here and also here.

Search for External Accessions - 2006.10.22

The search page has been updated to support search of external accessions from GenBank, UniProt, PDB and even Pfam, GO and so on. The cross-reference table was imported from Ensembl. Although early version also supports this function, the new one is more flexible when Xref table become a part of TreeFam MySQL.

Link to TreeFam pages by cross-references have been updated accordingly. Now people can link to TreeFam family pages in a new way, for example:

For a complete list of dbid, please refer to this page. Usually detailed information dbid and spec should be applied whenever possible. One xref, especially an integer accession, may exist in several databases. In this case, only one result can be seen.

TreeFam 3.0 Released - 2006.06.26

It has been over half a year since the last release. Although TreeFam 3.0 looks pretty like TreeFam 2.0, we do bring a number of new features that may interest you. During this period, we stablized the automatic pipeline, which will make it possible to update TreeFam more swiftly. We also bring back the ortholog table that was missed in 2.0. In comparison to the old ortholog table of TreeFam 1.0, the new version is more complete and much more accrate by utilizing sophisticated algorithms. Other notable new features or improvements are:

Link to TreeFam Pages - 2006.06.02

Now various TreeFam pages can be accessed by providing TreeFam gene identifiers or external gene accessions that are stored by other databases such as HGNC, MGI, GenBank, etc. The following are some examples. Details are provided here.

TreeFam 2.0 Released - 2005.12.30

TreeFam 2.0 comes as a new year's present. Several essential improvements were developed in this new release: pipelines rewritten, bugs fixed, more species added, new features introduced, and web pages updated accordingly. Notable improvements are:

  • Data Sets:
  • Pipelines:
    • Competitive method. In TreeFam 2.0, one sequence is arbitrarily assigned to one family that gives the sequence the highest HMMer score. Overlapping families, which is the main problem with TreeFam 1.0, will not make troubles any more.
    • Clean tree. A clean tree was built by merging several trees together, including Phyml-AA-WAG tree, Phyml-NT-HKY, NJ-dS and NJ-dN tree. Our preliminary tests suggest this is the most accurate automatic method for building trees that we have tried.
  • Web Pages:
    • Alignment View was added to the family page. Pfam domains and splicing sites are visualized in a mapped picture.
    • Sidebar was introduced. Look-and-feel were improved.
At present, TreeFam 2.0 has not been completely finalized. As we hope users can experience the new features after they read our paper published today, we bring v2.0 out in a hurry. Sorry for the inconvenience and we will update remaining parts in the next few days. In the mean time, older release v1.x is still temporarily available at http://platform.humgen.au.dk:8080/, hosted by the Insitute of Human Genetics of Aarhus University.
Last Modified Thu Dec 06 09:48:01 2007 treefam@sanger.ac.uk