POY on the cluster


POY version 3.0.11 is on the cluster.

From the POY download site:

POY is a program for phylogenetic analysis of sequence and other data that implements a number of heuristic procedures to search for the tree or trees that have a minimum edit cost for the given data.

POY implements classic heuristic tree search strategies like branch swapping, treedrifting, treefusing, and ratcheting and heuristic procedures to optimize sequences on trees such as Direct Optimization and Fixed States. These heuristics for calculating minimal edit costs for sequences are tightly integrated with the tree search heuristics. As a byproduct of sequence optimization, POY can also output tree dependent multiple alignments of input sequences.

You will need to read the POY manual to understand how to prepare a properly formatted data set, to prepare a parameter file, and to understand how to set the various options.

To run an analysis in parallel on the cluster:

  1. Download this script - runjobs.poy (right click and download).

  2. Open the script in a text editor (TextWrangler, BBEdit,...) and find the line with "poy -parallel." This is the line that you need to modify to run your specific analysis. Add the names of your data file(s), parameter file and options. Complete this line as you would if you were working directly from the command-line (see the POY manual).

  3. Save the modified runjobs.poy file (make sure it has UNIX linebreaks).

  4. Log in to the cluster and create a folder. Upload your data file(s), parameter file (if you have one), and the modified runjobs.poy script into this folder. These files must have UNIX line breaks.

  5. Make the runjobs.poy script executable by typing, "chmod 755 runjobs.poy".

  6. To begin your run, change into the directory with the newly uploaded files and type: "runjobs.poy". You will be prompted to give your run a name so that you can recognize it on the cluster. You will also be prompted to choose how many processors you want to use.

  7. When the run finishes, an email will be sent to your UMSL email account (i.e., yourgatewayid@umsl.edu). Depending on the size of your data set it may take some time (days) to run.

  8. The output will be sent to a file that you specified and/or it will be sent to the files ending in .err and .out. These files will be in the same directory as your input data sets. To view the progress of your run, you can view the .err file - at the prompt type: "cat *.err".