|
TreeFam (Tree families database) is a database of phylogenetic trees
of animal genes. It aims at developing a curated resource that gives reliable
information about ortholog and paralog assignments, and
evolutionary history of various gene families.
TreeFam defines a gene family as a group of genes that evolved after
the speciation of single-metazoan animals. It also tries to include
outgroup genes like yeast (S. cerevisiae and S. pombe) and plant
(A. thaliana) to reveal these distant members.
TreeFam is also an ortholog database. Unlike other pairwise alignment based ones,
TreeFam infers orthologs by means of gene trees. It fits a gene tree into
the universal species tree and finds historical duplications, speciations
and losses events. TreeFam uses this information to evaluate tree building,
guide manual curation, and infer complex ortholog and paralog relations.
The basic elements of TreeFam are gene families that can be divided into
two parts: TreeFam-A and TreeFam-B families. TreeFam-B families
are automatically created. They might contain errors given complex
phylogenies. TreeFam-A families are manually curated from TreeFam-B ones.
Family names and node names are assigned at the same time.
The ultimate goal of TreeFam is to present a curated resource for
all the families.
TreeFam was published in
Li
et al, Nucleic Acids Res. 2006 Jan 1;34(Database issue):D572-80
. We have also published a TreeFam Update paper in
Ruan et al, Nucleic Acids Res.
2008 (Database issue). Please cite these publications if you use TreeFam.
The main software used to build TreeFam, treeBeST (previously called NJTREE), is available from SourceForge
at treesoft.sourceforge.net.
Many of the methods used in treeBeST are described
in a PhD
thesis. We intend to provide further web pages describing the
methods used to build TreeFam in the near future.
There are several ways to access the TreeFam data:
Need more detailed instructions?
If you need help, or have a question about TreeFam, please feel welcome to contact us
at the email address: treefam@sanger.ac.uk.
TreeFam 7.0 is released today.
Gene sets were updated as usual. TreeFam-7 is mainly based
on Ensembl v50. 17 species were added in this new
release. They are: Cavia porcellus (guinea pig), Oryctolagus
cuniculus (rabbit), Equus caballus (horse), Dasypus
novemcinctus (armadillo), Sorex araneus (Eurasian shrew),
Ochotona princeps (American pika), Otolemur garnettii
(galago), Erinaceus europaeus (hedgehog), Microcebus murinus
(mouse lemur), Loxodonta africana (elephant), Echinops telfairi
(tenrec), Felis catus (cat), Myotis lucifugus (bat), Pongo
pygmaeus (orangutan), Spermophilus tridecemlineatus (squirrel),
Bombyx mori (silkworm), and Tupaia belangeri (common tree
shrew). Gene sets of all species were updated in Sep. 2008.
There are now 777,321 genes in total in 16,141 different
TreeFam families.
Main geneset sources of release-6.0 came from EnsEMBL v47, the total number of genes in families come to 697,829.
We introduced a new method to integrate Family A/B with Family C, by comparing gene overlaps between families from two sets,
we increased threshold step by step to merge families or delete weak connections. As a result, Family C dispeared since this release.
TreeFam 5.0 was released. Gene sets were updated.
TreeFam-5 wa mainly based on Ensembl
v42. We included many low coverage genomes from ensembl v42, there were Dasypus novemcinctus
, Echinops telfairi,
Erinaceus europaeus,
Felis catus,
Myotis lucifugus,
Loxodonta africana,
Cavia porcellus,
Oryctolagus cuniculus,
Otolemur garnettii,
Spermophilus tridecemlineatus and
Tupaia belangeri, and added
Dictyostelium discoideum from dictyBase, added ten flys from flyBase.
We promoted about 1000 family C into familyB, because they were new gene families for family A/B.
To avoid confusing with family C and A/B, we separated family C from family A/B in the list of search results.
We begin to provide HMMER files for family A/B, they can be found in "Plain File" list at the left side of family page, TreeFam Perl API supports it too.
TreeFam 4.0 is released today. In this new release, we introduce clustering based families (TF5 families)
to give a more complete coverage of all annotated genes.
Previously, building automated TreeFam families always started from the orginal PhIGs clusters.
However, as the number of fully sequences species is growing rapidly and gene annotations become
more and more accurate with years, sticking to old clusters made TreeFam miss many genes.
To increase the coverage of all annotated genes, we decide to do clustering for each new release.
The resultant clusters become TF5 families. Consequently, each TreeFam gene is classified in two ways:
the conventional competitive method used in TreeFam-2 and the
new clustering method.
Searching for one gene usually leads to two results, representing the two classifying methods.
Gene sets were updated as usual. TreeFam-4 is mainly based on Ensembl
v41. Four species were added in this new release. They are:
Ciona savignyi,
Gasterosteus aculeatus,
Oryzias latipes and
Aedes aegypti.
Apis mellifera genes have been dropped since Ensembl did not provide the annotations any more.
Gene sets of all the other species were also updated in October, 2006.
TreeSoft was registered at SourceForge.net.
TreeSoft is a collection of softwares that build, display or manipulate phylogenetic trees. It is also
the code base for softwares that are developed for the TreeFam
(Tree Families database). At the same time, TreeSoft provides brief introductions
and links to other softwares, databases or web services for phylogenetic trees. TreeSoft is an open source project hosted by
SourceForge.net. The SourceForge.net project page is
at http://sourceforge.net/projects/treesoft/.
TreeSoft provides downloads and documentations for most of source codes developed for TreeFam.
HUGO Gene Nomenclature Committee (HGNC)
started to provide cross-reference links to TreeFam. These links are available in
both gene pages and HOCP
(HGNC Comparison of Orthology Predictions) pages. Examples are provided
here
and also here.
The search page has been updated to support search of external accessions from GenBank,
UniProt, PDB and even Pfam, GO and so on. The cross-reference table was imported from
Ensembl. Although early version also supports this function, the new one is more flexible
when Xref table become a part of TreeFam MySQL.
Link to TreeFam pages by cross-references have been updated accordingly. Now people
can link to TreeFam family pages in a new way, for example:
For a complete list of dbid, please refer to this page.
Usually detailed information dbid and spec should be applied whenever possible. One xref,
especially an integer accession, may exist in several databases. In this case, only one result can be seen.
It has been over half a year since the last release. Although TreeFam 3.0 looks
pretty like TreeFam 2.0, we do bring a number of new features that may interest you.
During this period, we stablized the automatic pipeline, which will make it possible
to update TreeFam more swiftly. We also bring back the ortholog table that was missed
in 2.0. In comparison to the old ortholog table of TreeFam 1.0, the new
version is more complete and much more accrate by utilizing sophisticated algorithms.
Other notable new features or improvements are:
Now various TreeFam pages can be accessed by providing TreeFam gene identifiers or external
gene accessions that are stored by other databases such as HGNC,
MGI, GenBank, etc.
The following are some examples. Details are provided here.
TreeFam 2.0 comes as a new year's present. Several essential improvements were
developed in this new release: pipelines rewritten, bugs fixed, more species added,
new features introduced, and web pages updated accordingly.
Notable improvements are:
- Data Sets:
- Pipelines:
- Competitive method. In TreeFam 2.0, one sequence is arbitrarily assigned to
one family that gives the sequence the highest HMMer score. Overlapping
families, which is the main problem with TreeFam 1.0, will not make troubles
any more.
- Clean tree. A clean tree was built by merging several trees together, including
Phyml-AA-WAG tree,
Phyml-NT-HKY, NJ-dS and NJ-dN tree.
Our preliminary tests suggest this is the most accurate automatic method for building trees that we have tried.
- Web Pages:
- Alignment View was added to the family page.
Pfam domains and
splicing sites are visualized in a mapped picture.
- Sidebar was introduced. Look-and-feel were improved.
At present, TreeFam 2.0 has not been completely finalized. As we hope users can experience
the new features after they read
our paper published today,
we bring v2.0 out in a hurry. Sorry for the
inconvenience and we will update remaining parts in the next few days. In the mean time,
older release v1.x is still temporarily available at
http://platform.humgen.au.dk:8080/,
hosted by the Insitute of Human Genetics of Aarhus University.
|