the challenge: run as many phylogenetic analyses at a time as you want, without any more hardware than you have.
the strategy: use ec2 to launch and terminate virtual machines.
Amazon’s ec2 (elastic cloud computing) web service is well suited for phylogeneticicts with fluctuating computational requirements. Basically, ec2 allows you to zip up a virtual hard drive, complete with an operating system, applications and data, and to run one or many copies of that virtual machine somewhere in the sprawling universe of Amazon’s computers. In ec2 parlance, you launch one or more instances of an AMI (amazon machine image). When you no longer have a need for a particular instance, you terminate it. Instances are classified into types, depending on the amount of memory and processor speed. It will cost you between 10 and 80 cents an hour, for each running instance, depending on type.
Amazon provides a set of command line tools written in java to work with ec2. Amazon also provides good documentation for how to use these tools: a getting started guide, and something more comprehensive.
If working with the command line is not your thing, there is a FireFox plugin, Elasticfox. With Elasticfox you can keep your hands off the keyboard - at least for a little longer.
What I thought I would do here is focus on the specific task of using Elasticfox to set up an AMI instance to run a MrBayes analysis.
ec2 with ElasticFox
First follow the instructions here to sign up for Amazon’s s3 (simple storage service) and ec2 (elastic cloud computing). You’ll receive an access key id which points to a secret access key. You’ll also get a X.509 certificate and private key which you must place in a hidden directory (starts with a ‘.’) within your home directory. All of this is to make it hard for some villain to rip you off.
Once that is out of the way, get Elasticfox here.
To open Elasticfox, go to Tools->Elasticfox
Configure
Click on the credentials button in the upper left. Fill in your access key id and secret access key.
Then click on the KeyPairs tab and press the ‘Create a new key pair button’ - the green one in the upper left. You’ll need to give your key a name. Press OK and it will squirt a .pem file onto your desktop.
If you click on the Tools button in the upper right, a dialog will appear with a ‘SSH key template’ field. The path in this field is where Elasicfox will look for the certificate you just generated. By default it will look in a directory called ec2-keys under your home directory. You can change this if you want.
You need to do a couple of things to your certificate in addition to placing it in the directory specified in the Tools dialog. One, You need to prepend ‘id_’ to the name and loose the .pem extension (e.g. sledghammer.pem becomes id_sledghammer). Two, you need to restrict the permissions on the file. If you are on a mac or linux, open Terminal, cd into the ec2-keys directory and use the command chmod:
chmod 400 <id_yourkey>
Launch AMI
Click the AMIs and Instances tab. Hit the Refresh button (blue) in the upper left. You should see a long list of public AMIs. You can launch one or many of any. These ready-made machine images are diverse, and there is probably something close to what you want - so long as you don’t want anything other than an open-source operating system. You can run a remote desktop, and you might want to do so just for the thrill, but desktop images are much larger than base installs and will take much longer to load. All we want to do is run a MrBayes analysis, so we’ll go fast and light with a base install of Ubuntu.
Type ‘ubuntu’ in the search field on the right side of the tool bar. This will filter the list of public AMIs. Find ami-179e7a7e, a base install of Ubuntu Hardy Heron. Right click on it (or Ctr-click on a mac).
A menu will appear with a ‘Launch instance(s) of this AMI’ option. Take it.
A complicated dialog window will follow. You can ignore most of it, but you need to specify a KeyPair. You may also want to choose another instance type - if you want to buy more processor speed or memory (on a medium sized instance I can run a MrBayes analysis about twice as fast as I can on my MacBook Pro).
Press OK and a row will appear in the ‘your instances’ area at the bottom of the window. The initial value in the State field will be ‘pending’. After a few moments hit the refresh button. Repeat until the state is ‘running.’
Running. Your virtual machine has been created. It’s in the cloud… somewhere. Now you have to access it with ssh (secure shell). Right click on the instance and select ‘SSH onto public DNS name.’ If everything is configured properly, you should see an RSA key fingerprint and be asked if you are sure you want to connect. Say yes and your in! But in what?
Start typing
You are logged into a Linux shell, a command line user interface to the Ubuntu operating system. You are logged in as the super user ‘root,’ which lets you do whatever you want. If you feel uncomfortable in a shell, you may want to learn a few basic Linux commands.
Time to do some system administration. Let’s install MrBayes. Installing programs in Ubuntu is easy. Linux operating systems maintain extensive collections of programs in software repositories that are accessed through the network by package managers. In Ubuntu the package manager is apt (Advanced Packaging Tool). Type:
apt-get install mrbayes
That’s it! To run MrBayes just type:
mb
OK, but what about data? Well, if you have access to an ftp server you can use ftp (if you don’t, check out DynDNS and set up an ftp server on your local machine). Since all we have to do is shlep around a few text files, we can cheat. On your local machine open your Nexus file in a text editor and copy everything to the clip board. Then in your AMI instance type:
pico
That will launch pico, a simple text editor. Relevant commands will be shown at the bottom of the window. Paste in the contents of your clipboard. Hit Ctr-O, and give the file a name. Hit Ctr-X to close pico. Bang.
Now you’ve got your data and your phylogeny program ready to go. Set up and run your analysis. In the meantime, you can do whatever you want on your local machine. Play video games. Turn it off. Whatever. Actually, if you are going to log out of your shell session you will need to do a couple things to keep the program running in the background. First, make sure that you have all of your MrBayes commands in the Nexus file. Something like this:
begin mrbayes; set autoclose=yes nowarn=yes; lset nst=6 rates=gamma; mcmc ngen=1000000 samplefreq=1000 printfreq=1000 nchains=4; end;
Then, launch your analysis like this:
nohup mb <your_data.nxs> &
The command nohup will keep MrBayes chugging along after you log out. The trailing ampersand puts the process in the background. Type Ctr-D to exit the shell.
When the analysis is complete, ssh back onto your instance and cd to the directory where you have your data. Open your Nexus file in pico and comment out the mcmc command. Now you can run MrBayes (without erasing your data), execute your file and do all of the summarizing you need to - plot, sump, sumt.
Whenever you want to stop shooting a dime or two at Amazon every 60 minutes, open Elasticfox, right-click on your instance and select ‘Terminate Instances.’
And it’s gone
One thing to be aware of is that when you terminate your instance you will loose all of your work. The next time you launch an instance you will have to start from scratch. Unless you install the ec2-ami-tools on your instance, which will allow you to bundle and register your own AMI, which you can store in s3 (cheap, but not free). This will allow you to create an image pre-loaded with all of your favorite phylogeny programs and data. You can find ou how here.
Cheers,
Nate


