Archive for the ‘software’ Category

Share Simple File Based Caching in PHP

Friday, July 30th, 2010


Depending on the access pattern for the your data, file based caches are often more economical, despite being slower, than an in memory cache like Memcached or APC. Disk space is cheap and is therefore a good option for caching data that’s not frequently accessed, but is expensive to generate. That’s exactly where this class comes in. On a small EC2 instance, I was able to cache API calls, time was reduced from 0.4 seconds to 0.001 seconds per call. The reduced the overall latency of the application and freed up system resources to work on other tasks.

Because API result sets can be very large, consider a friend list from Facebook which may be 10-50K per person, caching them in memory can often push out more valuable data reducing the effectiveness of the cache. The file based cache is of course persistent and can also be distributed across multiple servers using rsync (or if you’re feeling adventurous, you alter this code to store it in an S3 bucket).

One drawback of this approach is the cache is not limited in size, leaving the files on disk until they’re deleted or replaced with new data. For that reason, I recommend you store data on a different drive where possible. File base caches, also have a potential security risk if the data is stored in a web accessible directory. This risk is not unique to this implementation. The potential exists for a clever hacker to discover your cache and gain access to sensitive data or worse compromise your server by storing executable code and executing it. You can reduce this risk by encrypting your data before storing it in the cache and by moving the data to a directory outside of the web root.

To use this cache, save this source to disk, I recommend a filename like FileCache.php and include it in your existing code files where you will be using it. Then, start caching your data! See the usage example in the comments for how to get started. Because the files are not encrypted, you can easily look in them to see what your data looks like.

<?php
 
	/**
	*
	* @package FileCache - A simple file based cache
	* @author Erik Giberti
	* @copyright 2010 Erik Giberti, all rights reserved
	* @license http://opensource.org/licenses/gpl-license.php GNU Public License
	*
	* Class to implement a file based cache. This is useful for caching large objects such as
	* API/Curl responses or HTML results that aren't well suited to storing in small memory caches 
	* or are infrequently accessed but are still expensive to generate.
	*
	* For security reasons, it's *strongly* recommended you set your cache directory to be outside
	* of your web root and on a drive independent of your operating system.
	*
	* Uses JSON to serialize the data object.
	*
	* Sample usage:
	*
	* $cache = new FileCache('/var/www/cache/');
	* $data = $cache->get('sampledata');
	* if(!$data){
	*      $data = array('a'=>1,'b'=>2,'c'=>3);
	*      $cache->set('sampledata', $data, 3600);
	* }
	* print $data['a'];
	*
	*/
 
 
// Requires the native JSON library
if (!function_exists('json_decode') || !function_exists('json_encode')) {
  throw new Exception('Cache needs the JSON PHP extensions.');
}
 
class FileCache {
 
	/**
	* Value is pre-pended to the cache, should be the full path to the directory
	*/
	protected $root = null;
 
	/**
	* For holding any error messages that may have been raised
	*/
	protected $error = null;
 
	/**
	* @param string $root The root of the file cache. 
	*/
	function __construct($root = '/tmp/'){
		$this->root = $root;
	}
 
	/**
	* Saves data to the cache. Anything that evaluates to false, null, '', boolean false, 0 will 
	* not be saved.
	* @param string $key An identifier for the data
	* @param mixed $data The data to save
	* @param int $ttl Seconds to store the data
	* @returns boolean True if the save was successful, false if it failed
	*/
	public function set($key, $data = false, $ttl = 3600){
		if(!$key) {
			$this->error = "Invalid key";
			return false; 
		}
		if(!$data){
			$this->error = "Invalid data";
			return false;
		}
		$key = $this->_make_file_key($key);
		$store = array(
			'data' => $data,
			'ttl'  => time() + $ttl,
		);
		$status = false;
		try {
			$fh = fopen($key, "w+");
			if(flock($fh, LOCK_EX)){
				ftruncate($fh, 0);
				fwrite($fh, json_encode($store));
				flock($fh, LOCK_UN);
				$status = true;
			}
			fclose($fh); 
		} catch (Exception $e) { 
			$this->error = "Exception caught: " . $e->getMessage();
			return false; 
		}
		return $status;
	}
 
	/**
	* Reads the data from the cache
	* @param string $key An identifier for the data
	* @returns mixed Data that was stored
	*/
	public function get($key){
		if(!$key) {
			$this->error = "Invalid key";
			return false; 
		}
 
		$key = $this->_make_file_key($key);
		$file_content = null;
 
		// Get the data from the file
		try {
			$fh = fopen($key, "r");
			if(flock($fh, LOCK_SH)){
				$file_content = fread($fh, filesize($key));
			}
			fclose($fh);
		} catch (Exception $e) { 
			$this->error = "Exception caught: " . $e->getMessage();
			return false; 
		}
 
		// Assuming we got something back...
		if($file_content){
			$store = json_decode($file_content, true);
			if($store['ttl'] < time()){ 
				unlink($key);	// remove the file
				$this->error = "Data expired";
				return false; 
			}
		}
		return $store['data'];
	}
 
	/**
	* Remove a key, regardless of it's expire time
	* @param string $key An identifier for the data
	*/
	public function delete($key){
		if(!$key) {
			$this->error = "Invalid key";
			return false; 
		}
 
		$key = $this->_make_file_key($key);
 
		try {
			unlink($key);	// remove the file
		} catch (Exception $e) { 
			$this->error = "Exception caught: " . $e->getMessage();
			return false; 
		}
 
		return true;
	}
 
	/**
	* Reads and clears the internal error
	* @returns string Text of the error raised by the last process
	*/
	public function get_error(){
		$message = $this->error;
		$this->error = null;
		return $message;
	}
 
	/**
	* Can be used to inspect internal error
	* @returns boolean True if we have an error, false if we don't 
	*/
	public function have_error(){
		return ($this->error !== null) ? true : false;
	}
 
	/**
	* Create a key for the cache
	* @todo Beef up the cleansing of the file.
	* @param string $key The key to create
	* @returns string The full path and filename to access
	*/	
	private function _make_file_key($key){
		$safe_key = str_replace(array('.','/',':','\''), array('_','-','-','-'), trim($key));
		return $this->root . $safe_key;
	}
}

Share Is Desktop Software Dead?

Tuesday, December 1st, 2009

I’ve always been a huge fan of desktop software. It allows developers to create a unique experience specifically tailored to a specific task. It promotes consistency within the OS, always knowing that the close window button is in the same location is a huge boon to usability. It’s generally faster and can work where your internet connection doesn’t. Last but not least, you have that copy on your hard drive that you can backup, put on a thumb drive or even print out as a hex dump if your so inclined.

Lately it seems though that, more and more of the software I use on a daily basis is heavily reliant on it’s client client connecting to the real back end over the internet. Email, instant messaging, Skype and web based documents are quickly being the primary conduit for my communication with clients. Add in the dizzying array of Twitter, Facebook, FriendFeed, YouTube, LinkedIn, Flickr and suddenly there are a lot of different ways to get a hold of me, none of which are dependent on my desktop software. The dark side of this cloud based interaction is that I don’t have a nice easy way to interact – a single touchpoint, like email, that I can be part of the conversation.

Enter Raindrop, it is an amazing new project from Mozilla Labs that promises to make email relevant again. In addition to email, it will bring in all of my conversation channels into one place, becoming a communication hub for me. This software is hugely exciting, taming the communication channels in a way that Tweetdeck and Seesmic Desktop can only dream of, while keeping all the information I gather in once central location. If this project ever launches and does only 1/2 of what it says it will do, it’s going to be awesome!

Today, TechCrunch posted an article about a new project called Inbox2. This project is web based and even has a Facebook application. It also promises to tame your communication environment! While the Raindrop application will definitely rock, it can’t touch the flexibility that the web version can reach. Raindrop is doomed because all of the data it’s taming is online, it only makes sense to build the application online as well.

This brings me to my original question, is desktop software dead? Google’s new Chrome OS is betting it is. In my experience, many software applications and iPhone apps are little more than a thin wrapper around a series of web services and API calls. Even when they’re not, like Pages or Word, I’m usually going to share it with a client via email, so why not write it in Google Docs to begin with? Obviously, the line is already blurring. As a developer, it seems easier to me to prototype an application in just about any web language and just simply run it in a browser window than it is to build a client for each device, iPhone, Mac, PC, Andriod, Linux you get the idea…

What do you think, is the end here for the Desktop Software market?

Share Amazon AWS Command Line Tool Help

Monday, June 29th, 2009

Amazon Web Services Logo Amazon’s Web Services are very handy, although sometimes the command line tool syntax is a little awkward to remember and the documentation, while extensive, is not quite as simple to navigate as I’d like. I’m providing these help files as a reference for anyone who might need them. As you are no doubt aware, you can also get this content directly by issuing <aws_command_name> --help in the shell. For me it’s much easier to have these up in a browser window so I can quickly toggle between it and the command line without losing my place. I’ve added references for Auto Scaling, CloudWatch, Elastic Compute Cloud and Elastic Load Balancing.

Share Message Queue Solutions

Tuesday, April 28th, 2009

While I’m a fan of Gearman so far, I thought it prudent to look at alternative solutions. This is a survey of alternative solutions I’ve located so far. Most of my clients are LAMP(hp) and so I’ll probably be ignoring the language specific packages that don’t support PHP. After a cursory overview from the list below I know I’ll be checking in on Amazon SQS and Beanstalkd before I make my final selection.

Linden Labs (publishers of Second Life) posted their evaluation of Messaging Queue Software. Of course they’re an edge case in terms of scale so some of these may work just fine for your uses despite being eliminated by them.

Share Getting up and Running with Gearman

Monday, April 27th, 2009

Gearman Gearman is a job scheduling service and I’m very excited about it. I’m using it in a development capacity so your mileage may vary in production but I wanted to share my experience thus far. As I said, I’m very bullish on this project and I see it as hugely helpful in eliminating latency in applications that often get bogged down during unnecessary synchronous communications.

Compiling gearman required installing a package that wasn’t part of my default Fedora Core install and for me wasn’t intuitive to locate. The UUID header file was located in the package e2fsprogs-devel which I found using yum provides "*/uuid.h". After that it was rather smooth to get it up and running. gearmand -d -u nobody got it up and running as a damon and I was able to connect to it using telnet over port 4730. Next I compiled the source for the PHP client and got that hooked into PHP by adding an extension file include in /etc/php.d to load the module and restarted Apache so it would be loaded there too.

Process to install and get running:

// First the server
tar -xzvf gearmand-0.5.tar.gz.tar
cd gearmand-0.5
yum install e2fsprogs-devel
./configure; make && make install
gearmand -d -u nobody
 
// Next the PHP client
tar -xzvf gearman-php-ext-0.2.tar.gz.tar
cd gearman-php-ext-0.2
phpize
./configure; make && make install
echo "extension=gearman.so" > /etc/php.d/gearman.ini
service httpd restart

So now to do some work, even if it’s useless, that takes a long time. It just so happens that creating a file with 1,000,000 sequential numbers takes a few seconds on a small EC2 instance, perfect for my test. I realize this is a highly insecure process, NEVER pass filenames as parameters in production code. Here’s the worker that creates a file (passed as the parameter) on the current system’s /tmp directory.

$worker = new gearman_worker();
$worker->add_server('127.0.0.1', 4730);
$worker->add_function('fill_file', 'fill_file_fn');
 
while(1) $worker->work();
 
function fill_file_fn($job){
	$data = $job->workload();
	$fh = fopen("/tmp/" . $data, "w");
	for($i=1;$i<1000000;$i++){
		fwrite($fh, $i . "\n");
	}
	fclose($fh);
	return;
}

The calling client just invokes this 20 times in the background.

$client = new gearman_client();
$client->add_server('127.0.0.1', 4730);
for($i=0; $i<20; $i++){
	$client->do_background('fill_file', 'file' . $i . '.txt');
}

Workers are started from the command line with something like this, “php worker.php &” and if you want more, just run more of them. You can also kill off some if they’re no longer needed.

The client completes it’s run in about 5 seconds while 5 worker threads toil away in the background until they get their work done about 3 minutes later. The use cases from the gearman team show the utility of this as a spider and for image manipulation. I see uses for sending mass emails to distribution lists using a template and substitute parameters to create a unique email for each person on the worker instead of the client – thus reducing the processing time to get the mail ready and speeding the delivery using multiple worker threads for sending (that can even be on remote machines). This product is definitely worth checking out.

Hopefully this helps you get up and running with Gearman!

Share Subversion Hosting Part 2 of 2

Thursday, April 2nd, 2009

This is the second part of of an article looking at how to effectively host a small subversion based project that is no longer going through rapid development. The first part looked at using EC2 to run Subversion and S3 for persistent storage. While an intruiging solution, it raised some concerns.

The alternative solution is to look at outsourcing the hosting of Subversion and ticket management to another provider. The size of our repository is less than 1GB and so I’m using that as the price point. Additionally, there are 2-3 developers who’ll require access to the repository. There are many great “free” services including Google, but this is not an open sourced project so it’s out. In the hosted subversion realm, there are a number of providers with basic accounts to handle this size repository. The following table is a price comparison at the 1GB storage level. Many providers offer a free service for smaller projects with different limitations for bandwidth, tickets and so on so YMMV.

ProjectLocker $2.50
Wush $6.67
SVNRepository.com $6.95
CVSDude (2GB) $6.99
Hosted-Projects $7.00
Assembla1 $8.00
Code Spaces $9.99
Beanstalk (3GB)2 $15.00
Versionshelf (3GB)2

$19.00
Unfuddle (2GB)2 $24.00
DevGuard (2GB)2 $29.95
  • 1 Pricing dependent on storage and developers
  • 2 Offers a cheaper or free plan with less than 1GB of storage.

The real benefit of a hosted solution is the addition of services such as Trac, user management, automated backups and more. If you are looking at building a project with multiple developers who are not in the same physical location, hosting your project with a service is definitely the way to go. It’s cheaper and the overhead of configuring and maintaining your own EC2 instance (or even a dedicated server) increases the costs significantly.

Share Subversion Hosting Part 1 of 2

Thursday, April 2nd, 2009

Over the last few weeks I’ve been considering some options for cutting development costs for myself and a few clients. One of the continuing questions is how to manage the code base. Keeping a development server on hand is great during periods of active development and work, but when the site reaches maturity and only bug fixed are required, development servers sit idle for weeks on end without use. This got me thinking about how to best manage the source in a persistent way. This first post looks at how this might be accomplished using EC2.

I’ve been thinking about moving a development environment to Amazon EC2 from a dedicated server. The problem is, at least for this project, development only occurs a few hours per day and may go entire weeks without anyone working on it. Obviously a small instance at $0.10/hour is sufficient for load. That would cost roughly $72/month. But, even if I’m working on the application 40 hours per week, I should be able to reduce that charge to $16/month to cover the time the server is actually on. Additionally, after being burned with an instance failure last weekend, I want to be sure the data is securely backed up as well. I thought about using EBS but, as readers have pointed out, even they can fail. Furthermore, I don’t want to create a drive sized in GB if I only need a few MB of storage. Lastly, but not least, if I need to scale the drive – I don’t want to re-create the AMI each time to reflect new drive ids.

My initial thought was to start with a public Fedora Core instance and install PersistentFS, automating all of the startup and shutdown process to ensure data integrity. Next, configure subversion to use that mount point for file storage. Last but not least, I’ll create a script I can run from my local machine (or a remote server) that starts and stops an instance and binds a known elastic IP to that instance at boot time. I think my overall costs will be greatly reduced.

Estimate of costs on EC2:

EC2 Small Instance Run Time (40hr week) $16.00
S3 Storage Cost (~10GB AMI) $1.50
S3 Storage Cost – Filesystem $0.15
S3 Bandwidth Cost (guess) $2.00
EC2 Bandwidth Cost (guess) $2.00
Total Cost (Monthly) $21.90

The bare minimum – if no development work was done at all would be the storage costs of $1.65 – certainly cheap enough! However, the time to build the initial environment, create the scripts and the time lost during the startup and shutdown of the server each time made me think there may be a better alternative. Read more on subversion hosting in the second part.

Share EC2 Instances Die and Other Lessons From The Cloud

Sunday, March 29th, 2009

Today I learned a few painful lessons: First, EC2 instances die and a simple reboot will not recover them. Second, unlike many web hosts – amazon doesn’t offer any level of monitoring. Third, backups are only useful if they’re current. :(

Lesson 1:

When I originally built my EC2 instance to host this site (and a few other applications) I was learning about Amazon’s EC2 and so spent a good amount of time trying to be as thorough in my documentation of my server setup. The result was an install script, “install_server” that effectively did all the steps I did when I turned on the server. The script goes something like this:

#!/bin/sh
yum -y upgrade
yum -y install sendmail
yum -y install httpd
yum -y install php
yum -y install php-mysql
yum -y install php-pecl-memcache
yum -y install memcached
yum -y install subversion
pear install HTTP_Request
cp configs/freetds.conf /etc/freetds.conf
cp configs/httpd.conf /etc/httpd/conf/httpd.conf
cp configs/memcached /etc/sysconfig/memcached
cp configs/php.ini /etc/php.ini
cp configs/fstab /etc/fstab
tar -xzf webroot.01.tar.gz 
/etc/init.d/memcached restart
/etc/init.d/httpd restart
/etc/init.d/sendmail restart
echo configs/crontab.txt
echo NOW SETUP CRONTAB!!!

This effectively copies into place all of the settings I need for a server. This is nice, because I am able to bring a new server online within 10 minutes or so of it going down. I should probably just automate the creation of a server specific AMI once or twice a day, but I’m just not there yet. Also – I know there are some weak points in the startup script… I’ll be working those out soon now that I see it’s really a useful tool.

Lesson 2:

Monitoring of EC2 instances needs to be done by an external service. Fortunately, not too many people care what’s going on with this site over a Saturday night to Sunday morning. I’m looking at the following options in two realms, first – a basic alert that there’s a problem, and secondly a more proactive approach that can do some instance killing and restarting on it’s own.

Right now I’m looking at these services for quick and dirty SMS alerts about the status of my instances:

And I’m looking at these for a more holistic approach to monitoring but am gun shy on relying on these to manage the instances until I learn more about them:

Any experiences anyone has had with any of these products is always appreciated.

Lesson 3:

Last but not least are backups. I have another script, aptly named “backup_server” that makes a snapshot of the settings and configurations every 24 hours during off peak times storing the data on an elastic block storage device that I have mounted to the application server. That goes something like this:

crontab -l > configs/crontab.txt
cp /etc/php.ini configs/php.ini
cp /etc/httpd/conf/httpd.conf configs/httpd.conf
cp /etc/sysconfig/memcached configs/memcached
cp /etc/fstab configs/fstab
tar -czvf backups/configs.01.tar.gz configs
tar -czvf backups/webroot.01.tar.gz /mnt

Where I was burned here is that my cron job only backed up my data every 24 hours at 4am. However, the application server crashed and burned at 2am EDT. Clearly I need to consider something like rsync to prevent this type of data loss. Rsync can grab the incremental changes hourly, thus reducing work losses during between the full backups every 24 hours. As a stop gap, I’ve increased the frequency of the backups until I can get back to the system and setup rsync.

Share 2 Handy Utilities for OS X Developers

Wednesday, March 18th, 2009

Over the last few weeks I fought with getting my Mac OS X machine to mount a NFS share on Linux remotely. I’ve tried different port numbers, different security models, even SSH tunneling to no avail. I’ve read mailing lists that say it can’t be done, and others that say certain kernel versions work and others don’t. I’ve been able to actually edit files for 2-3 minutes before the connection would drop for no apparent reason. In other tests, I was able to view files, but due to some permissions errors, I was unable to make changes. After parsing through the miriad of freely given advice online, I began seeking alternatives.

The key requirements for me were simple:

  1. Be able to browse the directory structure using Mac OS X finder.
  2. Be able to save/copy/move/delete files without being prompted for my password.
  3. Be able to edit files using any editor of my choosing.

ExpanDrive

ExanDrive I have found the tool I’ve been looking for! ExpanDrive allows me to use sftp, basically ssh to manage my files remotely but leverage my super handy Mac tools to do the editing. This is a huge win for productivity. The publisher provides a 30 day free trial, but I knew instantly this was what I’ve been looking for. The package runs $39.95. I expect it will pay for itself in less than a week of development. There’s lots of great support through getsatisfaction.com for common questions like, “how do I access ec2?” which didn’t directly answer my question, but got me sufficient information to troubleshoot my own SSH connection issues.

I’m still a little disappointed I was unable to get NFS working as I had hoped, but this solution definitely makes my life easier.

Meerkat

Meerkat Another great tool that I discovered today is Meerkat. As you’ll recall I said I was trying to use SSH tunnels to access the NFS service. After trying 8 different tools (that were all junk) I found Meerkat. The software is dead simple and provides rich access to pretty much everything I could want. Using Meerkat you can leverage an SSH login into one machine to provide access to a port bound service somewhere else. I see myself using this to grab remote access to a MySQL server parked behind a firewall via an SSH tunnel to a server that can see it. Other tools may have more “options”, but Meerkat actually works. I haven’t bitten the bullet to buy this one yet, I want to see how much I actually use it first, but at $19.95 – it’s a bargain as well and may quickly pay for itself.

I also wanted to give credit to Alan Watson who’s blog post about using ExpanDrive even put them on the radar for me. He also blogged on Meerkat which is how I found him in the first place. Thank you Alan, you made my day!

Share Increasing Apache ServerLimit on Ubuntu

Tuesday, February 17th, 2009

I’ve been doing a fair amount of server tuning over the last few weeks and one of the issues I hit this morning was a ServerLimit directive that wasn’t being applied to the server upon a restart. Scouring Google I stumbled upon a post by Erik Ljungstrom that helped me fix my problem. You can see in my apache2.conf snippet, I’ve added a line to increase the ServerLimit to 384 and the MaxClients to 384.

<IfModule mpm_prefork_module>
    StartServers         50
    MinSpareServers      20
    MaxSpareServers      70
    MaxClients          384
    ServerLimit         384
    MaxRequestsPerChild   0
</IfModule>

When I would restart the server I would get the following message.

root@app:~# apache2ctl restart
WARNING: MaxClients of 384 exceeds ServerLimit value of 256 servers,
 lowering MaxClients to 256.  To increase, please see the ServerLimit
 directive.

Apache was running – but still capping out at 256 servers. Erik L. explains that to fully stop the server you need to run “apache2ctl graceful-stop”, wait for requests to complete and then run “apache2ctl start”. If you’re running mod_status.so (status module) you’ll be able to see the correct number of threads available to the server.

root@app:~# apache2ctl status
Apache Server Status for localhost
 
Server Version: Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.5 with Suhosin-Patch
Server Built: Jun 25 2008 13:54:43
 
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 
Current Time: Tuesday, 17-Feb-2009 01:23:16 PST
Restart Time: Tuesday, 17-Feb-2009 01:21:45 PST
Parent Server Generation: 0
Server uptime: 1 minute 30 seconds
280 requests currently being processed, 64 idle workers
 
W__K_K_C_KWW_KWKWW__W_K_KKW_WK__KK_WKKKWWKKWK_KK_K_K_WKWWW_KKKKK
W_WWKK__KK_WK_K_W_W_WWWKW_WKKK___KWKKK_KWW_W_WK_KW__KK_K_K_K__W_
K__KWKKW_WKCKW_KWKWWKCW____WKK__KKKWKKWW__KKK_KK__KWWWKWWW_WWWWW
WKW__KWWWWW__KWWW_WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
KKWWWWWWWWWWWWKWWWWWWWWW........................................
 
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process

You’ll still get the error message, but as Grig Gheorghiu says, “know when to ignore warnings“. If anyone knows how to get rid of this error, please let me know.

© 1998-2008 AF-Design, All rights reserved.