Account for weighted trees in data ind
Bug #1404162 reported by
Jon Hill
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Supertree Toolkit |
Confirmed
|
Medium
|
Unassigned |
Bug Description
When downweighting data to deal with identical sources, take that into account when recalculating data ind
To post a comment you must log in.
This is actually part of a wider issue. The output from data ind is a bit confusing. For example, say tree_1 == tree_2 == tree_3 (i.e. all same taxa and characters, but using a different algorithm to construct them). These are non-independent and identical, but you would get:
tree_2 == tree_1
tree_3 == tree_2
In the present output. It's therefore not clear that tree_1= =tree_2= =tree_3. It also makes it harder to automatically weight the trees (i.e. you need to figure out to downweight by 1/3, not the more obvious 1.2).
Propose altering the output of data_independence to include:
list of list of identical trees:
[[tree_1, tree_2, tree_3]]
a list of list of subsets where item 0 is the larger tree
[[tree_1, tree_2, tree_3]]
would mean tree_2 and tree_3 are subsets of tree_1
The generate new phyml can then also down-weight the identical trees easily
In future we also want to be able to select and remove trees individually (see https:/ /bugs.launchpad .net/supertree- toolkit/ +bug/1404157) but I think this might be easier with the new output format.