Miscelleneous R Phylogenetics Code
By Eliot Miller
I'm slowly learning some tricks in R. I have written a few functions that might be of use to others. Since I'm no expert, and I was just trying to get the job done, the functions tend to require more packages then they probably need to. Here they are if you want to play around. Links to the ".R" files are provided after descriptions of what the function does.
Phylogenetic Independent Contrasts Across a Table with Missing Data
Requires: Picante
A standard Picante trait table has species as the row names (distinct from as the first column), with each column as a trait. The normal pic3 function can't deal with missing data, which means you need to subset for each comparison to the species with filled data, prune the tree accordingly, then run the pics. This function will do that automatically.
The output is a list of three matrices. The first is the r of the pearson correlation; the second is the degrees of freedom of the pic; the third is the p of the pearson correlation. NOTE THAT THIS RUNS THE PIC scaling by branch length. If you don't want that, you need to modify the function accordingly. If the output is too large to look at in R, save it as a table, e.g. write.table(yourResult, file="yourResult.csv", sep=","). You have to bump the entire series of column names one column over to read properly in Excel
R Script
Positivize x-axis for plotting PICs (sensu Garland et al 1992)
Requires: Nothing
Let's get serious, this is kind of a lame function, but some folks like to positivize the x-axis for plotting of phylogenetic independent contrasts. This function takes two vectors (e.g. the x and the y variables) and positivizes the x following Garland et al. (1992).
R Script
Mean Phylogenetic Distance to the Root (MRD sensu Algar et al 2009)
Requires: Picante, Plyr
The metric MRD goes back further than Algar, but this is a useful paper to see if you are interested in learning what it does. It's an average number of nodes to the root of the tree among all the members of a given "community."
The function here take a Phylocom community data matrix, where the first column is the name of the community, the second is the abundance, and the third is the species name, and converts it into a Picante style matrix. Amy Zanne wrote the guts of this script, which I generalized and put into a function here.
R Script
Explore where in a phylogeny a node occurs
Requires: Ape, Plyr, Phylobase
Sometimes we work with large phylogenies with nodes specified only by a number. Certain metrics may return nodes, but drilling down into the phylogeny to understand what that node implies can be difficult. Which taxa descend from it? Which taxa are sister to it? This is meant to be an exploratory function to help with those problems.
This function, given a node name, will return the identities of the nodes descended from it. The ouput is: 1) the names of the daughter nodes to the node in question; 2) lists of the taxa descended from each of those nodes; 3) the identity of the ancestral node to the node in question; 4) a list of the taxa descended from the ancestral node but excluding those descended from the node in question ("the outgroup").
Currently, the function can take only a single node. In order to check multiple nodes, the function can be put into a for loop, as shown in an example at the bottom of the R script. The default is to collapse the returned taxa to genus. Set this argument to false if you don't want that. It is assumed that there is an under-score between genus and species name; if not, the function won't be able to collapse to genus or family. The default is to collapse to genus, but not to family. In order for this to work with family collapse, you need to load in a csv file, WITHOUT column headers (make sure to write in there header=FALSE when you read in the csv), where the first column is the genus and the second is the family that genus corresponds to. No effort has been made to deal with potential mismatches between the phylogeny and the family lookup table, so the onus is on the user to check beforehand. The option collapse.to.genus=FALSE
and collapse.to.family=TRUE is not compatible; always collapse to genus if you
want family returned in the end.
R Script