Storing Websites in Memory Using PHP

The proliferation of content management systems has allowed many more people to get a site online, which is a great thing. These content management systems tend to be quite abstract and “one size fits all”, so they often suffer from code-bloat (and from the security aspect, popular software is always a bigger prize target for hackers…). The bare-bones of the CMS’s themself are so abstract with tiny functions and hooks that turns serving a web request into a complicated matter. That’s not to say that these content management systems are slow, though they certainly are more resource intensive than serving static files.

For files in general (be it an image file, PHP script or static HTML file), Operating Systems are good at caching regularly accessed files on disk. The popular content management systems also tend to have a cache in memory of the most regularly accessed files, to save reading them from a much slower disk. MemCached is an oft mentioned service that is used. Some applications like MySQL’s InnoDB engine take care of their own file and memory caching, while MyISAM defers file caching decisions to the operating system. A file based content management system will typically be very quick when all the regularly used files it accesses are in the disk cache.

For all other requests out of the disk cache, disk seeking is required, which is many, many times slower than using a cache or memory. See this short conversation about disk seeks and why they are the bottleneck in today’s computer world. Apparently, ‘disks are the new tape’… though SSD’s are a very nice intermediate solution.

With that in hand, how about making a site that is fast and simple, and is as fast (or very close) to your hardware limits? Loading a page of static content should be very quick, regardless of what content management system is used to generate the content. The following code is an example of storing a small website in memory, rather than on disk. If we wanted more speed, then we’d likely want to code this in something like C, removing the need for PHP and Apache. PHP has a range of semaphore and shared memory functions that allows you to store data permanently in memory that is persistent between web requests. Shared memory also allows you to share memory across applications, so your Java, Perl, C or whatever other language is able to access the same shared memory segment. So how can shared memory be used?

Storing HTML Templates in Memory

This example takes a 12 Megabyte Bootstrap template and compresses it into 3.3 Megabytes of shared memory. It assumes a reasonable knowledge of PHP in order to tweak it to your liking. 1. Download the template and extract the contents of the file into a web accessible folder, for the purposes of this example the folder is called test and resides in /var/www/test/, which is accessible in the browser via http://localhost/test 2. Create an .htaccess file in /var/www with the following contents. If you are not using Apache, then use the URL rewriting engine available on your preferred web server.

3. Save this PHP file as server.php in the test folder

4. Run the script once, preferably from the command line and putting an exit(); after @shm::destroy();shm::create();. All the files in the folder test matching the content types we’re interested in will now be held in the shared memory segment, in a compressed format. Assuming that went well for you,  comment out the @shm::destroy();shm::create(); line altogether, simply leaving the call to shm::get();. If you have problems, ensure that everything is located in the right place and that you are allowed to have < 4 Megabytes of shared memory. 8 Megabytes is a fairly common default so you should be OK there.

A Quick Rundown of How it Works

1. Apache receives a request to your test folder and sees that it should be internally rewritten to /test/server.php. This populates the variable $_SERVER['REDIRECT_URL'] with the originally requested URL.

2. A call is made to shm::get() from our script

3. $_SERVER['REDIRECT_URL'] has the trailing directory stripped. It is checked to ensure there is no directory traversal which could lead to requests like http://localhost/test/../../../secrets.txt which may contain sensitive accessible to the web server.

4. fileinode() is called to get the inode number of the file requested. This touches the hard disk with 1 disk seek when it is not already cached, but it’s quite likely it will be cached. The shared memory segment is then opened and checked to ensure the contents of the file exist in our shared memory, otherwise it’ll return a 404 response to the client. Inodes were used as it is a simple way to convert a pathname to an integer, which shm_get_var() and shm_put_var() require as unique identificaiton of a variable. You’re perfectly able to use a quick hashing scheme like murmurhash() in order to get the integer you need, though you’d have to consider possible collisions (though they are unlikely). To ensure no collisions, run through the files an extra time and check that each hash is generated only once for all filenames.

5. The extension of the URL is examined to determine which content type to return.

6. The clients request headers are evaluated to see whether they’ll accept HTTP compression, and if so, they will get served compressed content. Otherwise, the contents of shared memory are uncompressed and served. Most clients are able to deal with compression and it saves memory by storing it in compressed format (3.3 Megabytes versus 12 Megabytes)

7. The content is served to the client with the appropriate Content-Type and Content-Encoding.

Some Possible Improvements

  • You may want the ability to dynamically add new files into the shared memory segment. Bear in mind security considerations, i.e. you do not want to allow anyone to simply add their own content, you will want some kind of authentication or separate the creation/editing operations of the segment with the selecting of values from it.
  • Consider an alternative to using fileinode() , and you can avoid touching the disk entirely. If you use a hashing method, you could use some of the shared memory as a linked list to deal with collisions.
  • The content types are listed in 3 separate places in this example, you may want to at least dynamically create the .htaccess file to reduce that to 2.

Leave a Reply

Your email address will not be published. Required fields are marked *