Today I learned a few painful lessons: First, EC2 instances die and a simple reboot will not recover them. Second, unlike many web hosts – amazon doesn’t offer any level of monitoring. Third, backups are only useful if they’re current. 🙁
When I originally built my EC2 instance to host this site (and a few other applications) I was learning about Amazon’s EC2 and so spent a good amount of time trying to be as thorough in my documentation of my server setup. The result was an install script, “install_server” that effectively did all the steps I did when I turned on the server. The script goes something like this:
#!/bin/sh yum -y upgrade yum -y install sendmail yum -y install httpd yum -y install php yum -y install php-mysql yum -y install php-pecl-memcache yum -y install memcached yum -y install subversion pear install HTTP_Request cp configs/freetds.conf /etc/freetds.conf cp configs/httpd.conf /etc/httpd/conf/httpd.conf cp configs/memcached /etc/sysconfig/memcached cp configs/php.ini /etc/php.ini cp configs/fstab /etc/fstab tar -xzf webroot.01.tar.gz /etc/init.d/memcached restart /etc/init.d/httpd restart /etc/init.d/sendmail restart echo configs/crontab.txt echo NOW SETUP CRONTAB!!!
This effectively copies into place all of the settings I need for a server. This is nice, because I am able to bring a new server online within 10 minutes or so of it going down. I should probably just automate the creation of a server specific AMI once or twice a day, but I’m just not there yet. Also – I know there are some weak points in the startup script… I’ll be working those out soon now that I see it’s really a useful tool.
Monitoring of EC2 instances needs to be done by an external service. Fortunately, not too many people care what’s going on with this site over a Saturday night to Sunday morning. I’m looking at the following options in two realms, first – a basic alert that there’s a problem, and secondly a more proactive approach that can do some instance killing and restarting on it’s own.
Right now I’m looking at these services for quick and dirty SMS alerts about the status of my instances:
And I’m looking at these for a more holistic approach to monitoring but am gun shy on relying on these to manage the instances until I learn more about them:
Any experiences anyone has had with any of these products is always appreciated.
Last but not least are backups. I have another script, aptly named “backup_server” that makes a snapshot of the settings and configurations every 24 hours during off peak times storing the data on an elastic block storage device that I have mounted to the application server. That goes something like this:
crontab -l > configs/crontab.txt cp /etc/php.ini configs/php.ini cp /etc/httpd/conf/httpd.conf configs/httpd.conf cp /etc/sysconfig/memcached configs/memcached cp /etc/fstab configs/fstab tar -czvf backups/configs.01.tar.gz configs tar -czvf backups/webroot.01.tar.gz /mnt
Where I was burned here is that my cron job only backed up my data every 24 hours at 4am. However, the application server crashed and burned at 2am EDT. Clearly I need to consider something like rsync to prevent this type of data loss. Rsync can grab the incremental changes hourly, thus reducing work losses during between the full backups every 24 hours. As a stop gap, I’ve increased the frequency of the backups until I can get back to the system and setup rsync.