EC2 Instances Die and Other Lessons From The Cloud

Today I learned a few painful lessons: First, EC2 instances die and a simple reboot will not recover them. Second, unlike many web hosts – amazon doesn’t offer any level of monitoring. Third, backups are only useful if they’re current. 🙁

Lesson 1:

When I originally built my EC2 instance to host this site (and a few other applications) I was learning about Amazon’s EC2 and so spent a good amount of time trying to be as thorough in my documentation of my server setup. The result was an install script, “install_server” that effectively did all the steps I did when I turned on the server. The script goes something like this:

yum -y upgrade
yum -y install sendmail
yum -y install httpd
yum -y install php
yum -y install php-mysql
yum -y install php-pecl-memcache
yum -y install memcached
yum -y install subversion
pear install HTTP_Request
cp configs/freetds.conf /etc/freetds.conf
cp configs/httpd.conf /etc/httpd/conf/httpd.conf
cp configs/memcached /etc/sysconfig/memcached
cp configs/php.ini /etc/php.ini
cp configs/fstab /etc/fstab
tar -xzf webroot.01.tar.gz 
/etc/init.d/memcached restart
/etc/init.d/httpd restart
/etc/init.d/sendmail restart
echo configs/crontab.txt

This effectively copies into place all of the settings I need for a server. This is nice, because I am able to bring a new server online within 10 minutes or so of it going down. I should probably just automate the creation of a server specific AMI once or twice a day, but I’m just not there yet. Also – I know there are some weak points in the startup script… I’ll be working those out soon now that I see it’s really a useful tool.

Lesson 2:

Monitoring of EC2 instances needs to be done by an external service. Fortunately, not too many people care what’s going on with this site over a Saturday night to Sunday morning. I’m looking at the following options in two realms, first – a basic alert that there’s a problem, and secondly a more proactive approach that can do some instance killing and restarting on it’s own.

Right now I’m looking at these services for quick and dirty SMS alerts about the status of my instances:

And I’m looking at these for a more holistic approach to monitoring but am gun shy on relying on these to manage the instances until I learn more about them:

Any experiences anyone has had with any of these products is always appreciated.

Lesson 3:

Last but not least are backups. I have another script, aptly named “backup_server” that makes a snapshot of the settings and configurations every 24 hours during off peak times storing the data on an elastic block storage device that I have mounted to the application server. That goes something like this:

crontab -l > configs/crontab.txt
cp /etc/php.ini configs/php.ini
cp /etc/httpd/conf/httpd.conf configs/httpd.conf
cp /etc/sysconfig/memcached configs/memcached
cp /etc/fstab configs/fstab
tar -czvf backups/configs.01.tar.gz configs
tar -czvf backups/webroot.01.tar.gz /mnt

Where I was burned here is that my cron job only backed up my data every 24 hours at 4am. However, the application server crashed and burned at 2am EDT. Clearly I need to consider something like rsync to prevent this type of data loss. Rsync can grab the incremental changes hourly, thus reducing work losses during between the full backups every 24 hours. As a stop gap, I’ve increased the frequency of the backups until I can get back to the system and setup rsync.

This entry was posted in Amazon Web Services, Linux, Software, Technology and tagged , , , , , . Bookmark the permalink.

6 Responses to EC2 Instances Die and Other Lessons From The Cloud

  1. Bret Clement says:

    Hi Erik, do you find memcached lack of HA/failover to be a problem?

  2. Erik says:

    @bret – No but I’ve designed my apps to run either with or without memcached so it’s simply offloading from MySQL meaning that HA Memcached doesn’t really affect me.

  3. Neill Turner says:

    I would recommend installing applications and logs on EBS and using links to link to the OS. Then you don’t need to use EBS as backup. But you still need to create snapshots of the EBS to back that up or backup a database using a database utility to S3

    Neill Turner

  4. Mikayel says:

    I will suggest using, for monitoring your instance cause it provides not only external monitoring, but although resource utilization of your instances, give it a try and you will like it

  5. Fuji says:

    Hi Erik,

    I just got started on cloud computing stuff. Was looking at AWS and to be honest was a bit afraid about the complexity. Scalr seems to be a good solution and doesnt seem to cost an arm and a leg like rightscale.

    Have you used Scalr yet? Do you recommend it?


  6. Erik says:


    I haven’t used Scalr or RightScale myself. Now that Amazon has added auto scaling and elastic load balancing, I’ve opted to roll my own solution with those technologies.