Building Something Scalable: Caching
By admin • Oct 5th, 2008 • Category: Building Something Scalable, techIf you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!
I’ve been seriously getting my kicks with scalability fora number of months, so why not start on ongoing series where I talk about what I’ve learned / found?
So welcome to to the first: Building Something Scalable - An ongoing experiment. This post covers caching. I’ll cover delivery in the next post.
Keep in mind language wise I’m using php, but the general advice should be sound, regardless of language. If you disagree with something or have a better way, feel free to comment.
So now that I’ve ranted off 4 topics, maybe I should expand on them a bit.
Use Caching
Caching is a good thing, but use caching is a pretty vague statement, so let’s expand.
Caching isn’t just a one ring to rule them all type of solution. Its actually a fixture of a number of different solutions, that work together to boost your site / applications overall performance.
Database caching
I consider database caching a 2 part solution. You have the mysql query cache, but I also like to have a server side query cache as well. Why? I tend to use oop and having a server side query cache allows me to cut some overhead both appilcation wise, and by preventing me from having to connect / query mysql.
The big issue with server side query caches is stale queries. The mysql query cache prevents stale caches automaticly, but with a server side query cache we’ll need to set a TTL (time to live). I tend to go with something really low like 5-10 seconds.
5-10 seconds may seem pointless, but it allows higher traffic applications to handle a number of requests with fewer queries to mysql. This takes some of the load from mysql, so your database is performing under less load than it would have without the server side query cache.
There is plenty of information on the mysql query cache online, so fire up google and start researching. For your server side cache here a few things to keep in mind
- keep your cache in a secure location. If your using a file cache this means outside of your web directory
- hashing is a quick and painless way to uniquely id your queries. md5(’select * from table1′) will allows return md5(’select * from table1′) if done correctly.
- prevent cache filename collisons.
- do a light weight encoding on cache files. base64_encode / base64_decode are quick and easy to use. They’re not secure, but its a good idea to add some basic obfuscation
- keep your TTL low. Your query cache should try to stay as fresh as possible.
Opcode Cache
Php is compiled / ran at runtime (when you request a page / script). Opcode caches store the compiled code so that your code doesn’t have to be compiled for every request. Opcode caches can increase your codes performence by up to 90%, but then again, any increase helps the overall perforence of your site / application.
There are a number of opcode caches avaible for php. I prefer xcache, but there are a number of other opcode caches available for php.
Static content cache
Static content unlike dynamic content is, well static. Your probably wondering: Why cache something thats already static? Simple, performence. Static content though is cached / served differently than dynamic content. I know this touching more on delivery, but its still worth mentioning.
Static content is often served through a CDN (content delivery network) or a web cache. A CDN and web cache act similarly, except that a cdn has a number of servers setup in various locations.
Content Delivery Network
A CDN acts just as it’s name says: It delivers your content via its network of servers. The CDN selects a server closest to the location of the user, and serves your content from that location. Whats the benifit? Faster delivery of your content. Is it worth it? That’s a question only you can answer. Do some research, compare the solutions, check you budget - and you’ll have your answer.
Web Cache
A web cache or reverse proxy simply put delivers your content faster. I’m not to well versed in the science of it all, but here’s a basic break down of what I do know:
Web caching software like vanish (in the past squid was the standard) handle servering static content better than apache, and with a smaller footprint. The web cache creates a cache of your content when requested and then delivers your content from its memory / disk cache.
The most obvious benefit from all this? Reduced server load. Apache is a resource hog (there I said it), but that will be covered in a future post in this series. By moving static content delivery to software created just for this task your freeing resources and of course getting content to users quicker.
Output Cache
So far we’ve looked at a number of ways to increase the speed of dynamic and static content, but there’s still one major item left out: Output Caching.
As your scripts / application generates pages, you can cache them to be served for future requests. Output caches in general can be as basic or complex as you need them. A few things to keep in mind.
- stale content, your cache should have aTTL(time to live) that prevents it from serving stale content
- filename collisons - your naming scheme should prevent filename collisons
- store your cache outside of your web folder
- logged in users / vs non logged in users - come up with a solution that deals with this.
Variable / Object Caches
Your code has objects and variables, often some of these objects are database intensive. An object / Variable cache is a way to store your objects and variables. The thing to keep in mind with these types of caches is speed.
It makes no sense to cache something like $var=1+1;. You can run that command quicker than you would access it from the cache. A good example of somethign to cache would a class object that runs a number of queries on the database, but accesses content that doesn’t change as often. By caching this object you can prevent a few database queries (or cache calls). Or a class object that generates a number of child class objects.
I could go on and on about this subject but lets get to the point. If your application is running on only one server use a file cache. If your application uses more than once server look into memcached / memcachedb.
Thats it.
Hopefully that was short and sweet, the next post will cover delivery.
Greg - Out
admin is
Email this author | All posts by admin






