Forking with PHP from the command line

Forking new processes is an extremely handy function in programming that allows you to run tasks in parallel to one another, from a single invocation of a program.

You may be interested in forking if:

  • You have a multi-processor/threaded CPU and want to utilise it more effectively
  • You want something to run in the background while your main thread of execution continues
  • You have a set of tasks that take an appreciable time to complete, but do not rely on the results of one another to complete.

As ever, an introdution to the concept is available in the PHP manual.

It is worth noting early on that forking is slightly different to threading, which is described in more detail in this StackOverflow question. Historically threading has not been available in PHP though there has been developments in remedying that.

One popular example usage is HTTP fetching. Fetching is a relatively slow process because of all the latency involved in talking to servers across the world. If you have a queue of 1000 URLs to fetch and each URL takes 3 seconds to fetch, it will take 3000 seconds to fetch all the URLs. Slow or unresponsive servers mean that your average is higher, and that URLs later in the queue have to wait for all the slower URLs in front of it to be fetched.

With forking (or threading), you can split the workload between instances of the script. In the URL fetching example for instance, you could create 10 forks of the fetching script that will fetch 100 URLs each. This should dramatically speed up the time it takes to fetch all the URLs, because if one particular URL is slow, your 9 other forked scripts will still be fetching the URLs in their queue.

I have provided skeleton code below to give you an idea of how it can work for you.

One important thing to consider when forking scripts is to avoid the nastiness of a fork bomb or the unpredictability of a race condition. Bear these concepts in mind as you delve into the usefulness of multi-tasking with forks or threads.

Workarounds for this problem are quite easy. In a text file for instance, you would want each script instance to grab every 10th line, so the 1st fork would grab the 1st line, the 11th line, the 21st line etc. Alternatively, you can have one fork that “serves” lines to the other forks (like in the example above), so that each line is only issued once. If you’re using a database as input and it has an auto-increment field, simply using a modulus of the auto-increment as a quick’n’dirty way to delegate an equal number of rows to each fork. Essentially, you’re looking to keep each fork busy and avoid allocating the same job twice.

Leave a Reply

Your email address will not be published. Required fields are marked *