The TreeCmp application was designed to compute distances between arbitrary (not necessary binary) phylogenetic trees. Offers an implementation of metrics allowing to compare trees with a large number of leaves.
In the first step, select one of the four available comparison modes (Overlaping Pair, Window, Matrix - default, Reference trees to all input trees) as shown in Fig 1. In the case of Window mode you have to enter it's width in the additional field as shown in Fig 2.
Fig. 1
Fig. 2
In each of those available modes, different trees are compared:
Fig. 3
The TreeCmp software was designed to support Nexus tree
specifications date files (BEAST and MrBayes), where phylogenetic trees are stored in the Newick format.
Note that plain text files containing only trees in this format are supported as well.
In next step compared trees should be entered, it can be done it in few ways as shown in Fig 4, You can:
Fig. 4
In the next step, select at least one of available phylogenetic metrics. It can be for rooted or unrooted trees, and if you are interested in more than one, you can select a few metrics as shown in Fig 5. In a special case, you can even select all.
Fig. 5
Additional options are available (Fig 6):
In the last step, click Compute button to generate report (Fig 6).
Fig. 6
After completing calculations, results appear in the new tab as shown in Fig. 7. Each row (excluding header) contains information about calculated metrics for one pair of compared trees. In the first three rows we find the ordinal number and numbers of compared trees.(Fig. 7A). In subsequent columns, we can find the metrics selected in the previous step (Fig. 7B).
Each pair of compared trees can be drawn by clicking corresponding row. The visualized trees will be displayed in a new pop-up window using the Phylo.io application. Detailed information related to manipulation of displayed trees are placed in the manual available in the newly displayed window.
After using Normalized distances option the following two columns (Fig. 7C) per each chosen metric appear additionally in the output file. These columns contain the value of the distance in a particular metric divided by its empirical average value. If the number of common leaves in compared trees is out of supported range (which is form 4 to 1000), then “N/A” value is inserted. For details regarding generating phylogenetic trees under the Yule and uniform models see (McKenzie and Steel 2000; Semple and Steel 2003).
By default, data is sorted in ascending order by first column containing the ordinal numbers. We can easily change it by clicking the appropriate column header (this also applies to the order: decreasing, increasing). In order to search/filter only rows that containing a certain phrase, we can use the search control (Fig. 7D).
The displayed data (excluding filtered ones) can be easily copied to the system clipboard, saved in one of the following formats: CSV, EXCEL, PDF or printed using the appropriate button (Fig. 7E).
Fig. 7
The following table contains a mapping between available metrics and column names in the Report that are related to them.
Metric name in the output file | Full metric name |
---|---|
MatchingSplit | Matching Split distance |
RF | Robinson-Foulds distance |
PathDiffernce | Path difference distance |
Quartet | Quartet distance |
UMAST | Unrooted maximum agreement subtree distance |
RFWeighted | Weighted Robinsin Foulds distance |
GeoUnrooted | Geodesic Unrooted distance |
MatchingCluster | Matching Cluster metric |
RF_Cluster | Robinson-Foulds metric based on clusters |
NodalSplitted | Nodal Splitted metric with L2 norm |
Triples | Triples metric |
MatchingPair | Matching Pair metric |
MAST | Rooted maximum agreement subtree metric |
CopheneticL2Metric | Cophenetic Metric with L2 norm |
RFClusterWeighted | Weighted Robinson-Foulds metric based on clusters |
NodalSplittedWeighted | Weighted Nodal Splitted metric with L2 norm |
GeoRooted | Geodesic Rooted metric |
CopheneticL2WeightedMetric | Weighted Cophenetic Metric with L2 norm |
Using Summary option adds an additional section to the report with the following format (Fig. 7F):
Name | Avg | Std | Min | Max | Count |
---|---|---|---|---|---|
Metric name 1 | Average value | Standard deviation value | Minimal value | Maximal value | Number of analyzed values |
Metric name 2 | … | … | … | … | … |
… | … | … | … | … | … |
Metric name n | … | … | … | … | … |
After using switch Prune trees option the following three columns appear additionally in report (not used in our example).
Tree1_taxa | Tree2_taxa (or RefTree_taxa) | Common_taxa |
---|---|---|
Number of taxa in the first tree | Number of taxa in the second (or reference) tree | Number of taxa in common |
By clicking the Raw report button (Fig. 7G), you can save the report in text format file . This file is tab separated text files (TSV), which means that they can be easily read by various data analysis software (e.g. MS Excel, R, OpenOffice.org). An report file consists of two sections (Fig 8). The first section contains formatted in rows values of distances in selected metrics. The second (optional) section contains summary data computed based on all rows that appears in the first section.
Fig. 8
Select Window comparison and set window width to 2 as shown in Fig 9a.
Fig. 9a
Enter the following NEXUS file format content (File. 1) to the main newick trees window (Fig. 9b). This is the first way to enter compared trees described in Compared trees section.
File. 1 (testBSP.newick)
Fig. 9b
Choose Matching Split distance from available unrooted metrics (Fig. 9c) and Include summary option from Other options (Fig. 9d).
Fig. 9c
Fig. 9d
As a result (Fig. 9e), we obtain 5 comparisons and its summary.
Fig. 9e
These calculations can be saved to report.txt file (Fig. 9f) by clicking the Save report button.
Fig. 9f
Reporting distances divided by pre-computed empirical average values for random trees (generated according to Yule and uniform models, Normalized distances option) can help in an interpretation of the similarity level of analyzed trees in chosen metric. In the following example, the distance in the MS metric of each tree from a given set (File. 2) to the reference tree (File. 3) is computed. Analyzed trees have 15 leaves.
File. 2 (test_set.trees)
File. 3 (testBSP.newick)
Select Ref-to-all comparison and enter trees from a given set (File. 2) and reference tree (File. 3) to the main newick trees window as shown in Fig 10a.
Fig. 10a
Choose Matching Split distance from available unrooted metrics and Normalized distances option from Other options as shown in Fig 10b.
Fig. 10b
As a result (Fig. 10c), we obtain 12 comparisons without summary.
Fig. 10c
Basic interpretation of result.txt file (Fig. 10d):
Fig. 10d
The most convenient comparison mode for such purpose is a Matrix (default) mode. In the following example (File. 4), the Matching Split distance is used.
File. 4 (plain2.trees)
Trees number 2, i.e.: (a,b,(c,(d,e))) and 3, i.e.:(((a,b),c),d,e) in the input are the most similar. In fact, they have the same topology (trees are assumed to be unrooted as metric for unrooted trees is used) because their distance is 0 as shown in Fig. 11 (report.txt file).
Fig. 11
The report can be easily copied to clipboard, saved in one of three formats: MsExcel, CSV or PDF or printed, see Fig. 12.
Fig. 12
In order to pass data to R (http://www.r-project.org/) it is convenient to have the TreeCmp result file in a simple tabular form (therefore, it is recommended to avoid Include summary option, because it results in generation the summary section, which disturb the tabular order).
Such files can be easily read by R environment by using for example the read.table function as follows:
treeCmpData<-read.table("C:\\Program Files\\TreeCmp\\examples\\plain\\result.txt", header = TRUE, sep = "\t")
In the example, the file to read “result.txt” is placed in “C:\Program Files\TreeCmp\examples\plain” folder.