a

Murmurhash2 in PHP without the extension

Murmurhash is a nice and speedy hashing algorithm that is handy for creating hash values based on strings. I use it often as benchmarks suggest it is one of the speedier implementations out there. Murmurhash can create 32-bit or 128-bit outputs.

In PHP, if you are able to install extensions, then you can simply install the murmurhash extension * (see bottom of page for instructions) and be done with it. If you’re on shared hosting, here is an extensionless alternative to produce 32-bit outputs based on the 2nd version of the murmurhash algorithm.

Do note, it is many times slower than the extension implementation, simply because it’s a user-created function. The code itself is relatively efficient and mostly bitshifting anyway. I had to knock this together because I needed murmurhash in a shared hosting environment where installing extensions is not an option.

* More recently that particular link for instructions on how to install the murmurhash extension is no available. Here is the general gist of how to install the extension:

Creating CSS Sprites with PHP

When you have a number of small images that appear on a page, or a number of pages that you know one particular client is going to visit that include these small images, it makes sense to use CSS sprites to speed-up the rendering of your web page.

This has two advantages, namely:

  • It reduces the number of HTTP requests a client has to make to download the images on your page
  • Speeds up the rendering of the images and the web-page due to limits on the number of HTTP connections between a server and a client.

The slight downside is that you may make a CSS sprite that contains images that don’t appear on a particular page, and therefore the client doesn’t need. Some consideration has to be put into these situations to weigh the pro’s and con’s of using a sprite.

A fairly comprehensive analysis of using sprites can be found at css-sprites.com.

Using PHP to Create Sprites

It’s very easy to create a sprite using PHP, which can generate a single image from a collection of images, and also generate the CSS for you. This is a huge timesaver if you intend to make a number of sprites.

The following simple class will perform the following upon initiating the class:

A summary of what’s going on…

  • Accept 4 arguments to perform the sprite and CSS creation:
    1. $folder , the folder to read images from
    2. $output, the filename given to the output, $output.css and $output.png
    3. $x,$y, the dimensions of the images you want to consider, all other images are ignored. If you wish to have images of variable size you will also want to do some mathematical optimization, to fit the images into the smallest sprite dimension possible.
  • The $folder you provide will then be scanned for matching files
  • The sprite image is then created, with a size according to the number of images that will be put into it. A CSS file is also created.
  • For each image in the folder, the image is appended to the sprite image in its relevant position, and the position is logged in the CSS file. A simple counter is used to differentiate the classes declared in the CSS file.

Some Simple PHP Password Generation Functions

Passwords may well become a thing of the past in the not-so-distant future, as processing power doubles up year on year and makes the cracking of passwords in a brute force manner almost trivial.

This calculator of password complexity against brute force attacks illustrates how easy it is to compromise a password in an offline environment.

Fortunately for us web developers, things take an awful lot longer in an online scenario- due to the fact that the network latency and distance limits the number of password hacking attempts can occur in a reasonable timeframe. Various other layers of protection can be added too, blacklistings IP’s, locking accounts after a number of failed login attempts and generally paying attention nefarious requests.

There’s been numerous instances where a public facing vulnerability has resulted in password crackers gaining access to an entire database of usernames and passwords. The worst cases involve passwords that are not one-way encrypted at all, no computation is necessary to exploit them and unfortunately, can expose users who use the same passwords across different sites.

A not-so-bad cases involve one-way encryption of passwords using what’s considered to be weaker algorithms like MD5 that do not use salted passwords.

Summarily, when storing passwords, you need to use salting, preferably on a per-password basis and use what’s considered to be a secure algorithm. Have a look at this article (and comments) for a more involved look at attack scenarios and how to prevent them. From the user standpoint, using any kind of dictionary word or concatenation of words is asking for trouble, particularly if the database that stores them is not salting passwords.

However…

If you are generating passwords for your users, you will want to have a level of complexity that won’t be subject to brute force attacks. The following code is a simple PHP password generator that will include alphanumeric characters and optionally other printable characters (you can add more if you like). This is good for instances where you need to generate a random password for a user.

This fairly rudimentary function generates passwords of reasonable complexity, depending on length.

For my own personal passwords, it is a pain to be signed up to so many websites and having to memorise or maintain a list. Software like KeePass have sprung up in order to save the hassle of you remembering the long and complex passwords that are desirfed to maintain a good degree of confidence your account(s) are secure.

The following function is similar in notion, passwords do not have to be remembered here, simply the domain name that you use them on:

There is a caveat. Due to some temporal lapse in concentration of some web developers, some password fields insist on restricting what characters are in a password, or insist on certain groups of characters being present. This is fine for preventing people enter overly simple passwords, but can cause issues in my above implementation. For those occasions you may want to just use a part of the hex representation of the hash and calculate that some of the characters should become uppercase.

Storing Websites in Memory Using PHP

The proliferation of content management systems has allowed many more people to get a site online, which is a great thing. These content management systems tend to be quite abstract and “one size fits all”, so they often suffer from code-bloat (and from the security aspect, popular software is always a bigger prize target for hackers…). The bare-bones of the CMS’s themself are so abstract with tiny functions and hooks that turns serving a web request into a complicated matter. That’s not to say that these content management systems are slow, though they certainly are more resource intensive than serving static files.

For files in general (be it an image file, PHP script or static HTML file), Operating Systems are good at caching regularly accessed files on disk. The popular content management systems also tend to have a cache in memory of the most regularly accessed files, to save reading them from a much slower disk. MemCached is an oft mentioned service that is used. Some applications like MySQL’s InnoDB engine take care of their own file and memory caching, while MyISAM defers file caching decisions to the operating system. A file based content management system will typically be very quick when all the regularly used files it accesses are in the disk cache.

For all other requests out of the disk cache, disk seeking is required, which is many, many times slower than using a cache or memory. See this short conversation about disk seeks and why they are the bottleneck in today’s computer world. Apparently, ‘disks are the new tape’… though SSD’s are a very nice intermediate solution.

With that in hand, how about making a site that is fast and simple, and is as fast (or very close) to your hardware limits? Loading a page of static content should be very quick, regardless of what content management system is used to generate the content. The following code is an example of storing a small website in memory, rather than on disk. If we wanted more speed, then we’d likely want to code this in something like C, removing the need for PHP and Apache. PHP has a range of semaphore and shared memory functions that allows you to store data permanently in memory that is persistent between web requests. Shared memory also allows you to share memory across applications, so your Java, Perl, C or whatever other language is able to access the same shared memory segment. So how can shared memory be used?

Storing HTML Templates in Memory

This example takes a 12 Megabyte Bootstrap template and compresses it into 3.3 Megabytes of shared memory. It assumes a reasonable knowledge of PHP in order to tweak it to your liking. 1. Download the template and extract the contents of the file into a web accessible folder, for the purposes of this example the folder is called test and resides in /var/www/test/, which is accessible in the browser via http://localhost/test 2. Create an .htaccess file in /var/www with the following contents. If you are not using Apache, then use the URL rewriting engine available on your preferred web server.

3. Save this PHP file as server.php in the test folder

4. Run the script once, preferably from the command line and putting an exit(); after @shm::destroy();shm::create();. All the files in the folder test matching the content types we’re interested in will now be held in the shared memory segment, in a compressed format. Assuming that went well for you,  comment out the @shm::destroy();shm::create(); line altogether, simply leaving the call to shm::get();. If you have problems, ensure that everything is located in the right place and that you are allowed to have < 4 Megabytes of shared memory. 8 Megabytes is a fairly common default so you should be OK there.

A Quick Rundown of How it Works

1. Apache receives a request to your test folder and sees that it should be internally rewritten to /test/server.php. This populates the variable $_SERVER['REDIRECT_URL'] with the originally requested URL.

2. A call is made to shm::get() from our script

3. $_SERVER['REDIRECT_URL'] has the trailing directory stripped. It is checked to ensure there is no directory traversal which could lead to requests like http://localhost/test/../../../secrets.txt which may contain sensitive accessible to the web server.

4. fileinode() is called to get the inode number of the file requested. This touches the hard disk with 1 disk seek when it is not already cached, but it’s quite likely it will be cached. The shared memory segment is then opened and checked to ensure the contents of the file exist in our shared memory, otherwise it’ll return a 404 response to the client. Inodes were used as it is a simple way to convert a pathname to an integer, which shm_get_var() and shm_put_var() require as unique identificaiton of a variable. You’re perfectly able to use a quick hashing scheme like murmurhash() in order to get the integer you need, though you’d have to consider possible collisions (though they are unlikely). To ensure no collisions, run through the files an extra time and check that each hash is generated only once for all filenames.

5. The extension of the URL is examined to determine which content type to return.

6. The clients request headers are evaluated to see whether they’ll accept HTTP compression, and if so, they will get served compressed content. Otherwise, the contents of shared memory are uncompressed and served. Most clients are able to deal with compression and it saves memory by storing it in compressed format (3.3 Megabytes versus 12 Megabytes)

7. The content is served to the client with the appropriate Content-Type and Content-Encoding.

Some Possible Improvements

  • You may want the ability to dynamically add new files into the shared memory segment. Bear in mind security considerations, i.e. you do not want to allow anyone to simply add their own content, you will want some kind of authentication or separate the creation/editing operations of the segment with the selecting of values from it.
  • Consider an alternative to using fileinode() , and you can avoid touching the disk entirely. If you use a hashing method, you could use some of the shared memory as a linked list to deal with collisions.
  • The content types are listed in 3 separate places in this example, you may want to at least dynamically create the .htaccess file to reduce that to 2.

Simple PHP & MySQL Pagination

When looking at MySQL output, it is sometimes more convenient to split up the number of records returned into separate pages and include hyperlinks to further pages in the result set, a layout often referred to as pagination.

The following is an example of such pagination. Change the MySQL query in the example at the foot of the code to see it working for yourself, remembering to connect to your MySQL database beforehand. This code is designed for simplicity rather than considering the finer details of pagination (mentioned below).

First off, create a test table if you wish to test the code:

Add some test data

This is the simple PHP class to illustrate basic pagination:

Produces something like…

In most cases and in particular for small tables, this method of pagination is fine as it’s a relatively inexpensive computation and allows you to jump to any page you like.

For larger tables you will find that an alternative method is preferred, this post goes into detail why. The post is useful in understanding the general concepts regarding performance and pagination from the MySQL point of view.