Search This Blog

Saturday, February 26, 2011

How to Minify HTML using PHP and Minify (including final HTML output of PHP files)

Full disclaimer: I did not actually implement this method in the project where I had it set up. I explain why below.

Before Doing Anything Else, Gzip Your Pages!

The first line of defense in reducing the payload of an HTML page is to gzip, or compress, your files. Make sure you have taken that step first if you haven't before doing anything else mentioned here, as that will provide the biggest pay-off.

Benchmark Your Speed, Before and After

I said I didn't implement this method on the particular project I was working on: often after your pages are gzipped, the benefits you get from further compression are either negligible or actually decrease load speed, the very thing you're hoping to improve.

Therefore, testing the speed of your pages before and after minification is advised: there are several free, online solutions available, and I happened to be using Webwait, which I liked because it is simple, does not require registration, and does not place restrictions.

In my case, results revealed that I was actually suffering a slight penalty in load time when stripping whitespace. That said, I have a writing site that has massive HTML pages, and when I have time, I will likely try it again there: I suspect that for very large files, minification by PHP before page load would provide a benefit.

Enough preamble.

Whitespace Removal: Basic Steps

There are several steps you need to take to set it up and get it working.

PHP Check:

First, is PHP already being used to render your page? If so, you're good to go as is—skip this paragraph. If not, does the page have an "htm," "html", or "shtml" file extension and does that extension matter? If the answer to the second question is no, just change your file extension to "php" and if your web service provides PHP you should be good. If the file extension does matter and you're hosted on an Apache server, take a look at Step 4 here on how to give PHP control of rendering your pages.

Note: If your extension is "shtml" and you currently make full use of server-side includes (SSI), you will need to reconfigure your pages to use PHP for the same task or else find a different solution. The beauty is, PHP can do everything Apache's support of SSI can and more.

Basic PHP Function with Associated "Gotchas" Explained:

Now we need to write a PHP function, so let's create a new file called compress_html.php that we include on all our pages we want to minify. Open the IDE or text editor of your choice and create this new file.

Now let's type a simple function.
// This function is probably too simple to use: It can bite.
function replace_tabs_newlines($content) {
    return preg_replace('(\r|\n|\t)', '', $content);
A very simple function like this that strips newline characters and tabs might be all you'd need. The danger with this approach though happens when your site includes any inline JavaScript or CDATA that uses single-line comments (double slashes) or any pre-filled textareas that need to have newline characters for human readability. For example...
<script type="text/javascript">
// click event
myFormElement.onclick = function() {
...our simple PHP function would cause all the JavaScript below the "click event" comment to fail, because it's just stripping newlines and tabs from the entire HTML page indiscriminately. Once the document gets parsed by the browser, as far as the interpreter is concerned, without the newlines the script tag contains nothing but a big, long comment: // click eventmyFormElement.onclick = function() {this.submit();};. That means, of course, that your click event just died a horrid death.

And if you have any textareas on your page with text already included in them, you'll also see just one big, long string. Probably not what you want.

Wrapper Function Using Minify:

My approach was to use Minify's little-documented and still-experimental HTML minifier instead. Simply download Minify, and check out this page on getting it set up and running.

Once Minify's in place, here's the new and improved, as before assumed to be saved to a file named compress_html.php; there's still nothing exotic about our now wrapper function, and if you know PHP, you could probably write a better one (and if you do, please post it here and share with the rest of us), but this one will at least get your foot through the door.
function replace_tabs_newlines($content) {
    require_once 'min/lib/Minify/HTML.php';
    require_once 'min/lib/Minify/CSS.php';
    require_once 'min/lib/JSMin.php';
    $content = Minify_HTML::minify($content, array(
        'cssMinifier' => array('Minify_CSS', 'minify'),
        'jsMinifier' => array('JSMin', 'minify')
    return $content;
With some sort of callback method in place (such as the one above using Minify), you've now ensured that JavaScript and CSS at least should get handled correctly.

Serving Minified HTML on a Platter:

Now, simply include in the very top line of the PHP or HTML page you want to minify the following code, which includes the file that contains our replace_tabs_newlines function and calls ob_start:
// includes are relative to the including file: this example
// assumes compress_html.php lives at the same directory level
// as the file in which you'll include these lines
ob_start tells PHP to output any file contents to its internal memory buffer rather than the screen (think "o.utput b.uffering start"—you are turning on output buffering). The "replace_tabs_newlines" tells ob_start to send the raw output from the buffer through the function we just wrote and included in a separate file. Notice the quotation marks—single or double doesn't matter: ob_start expects the function reference to be a string.

Then, at the very bottom of the file after all other code and markup, include the following:
This does just what it says: it ends, or turns back off, output buffering, then flushes the contents previously held in the buffer either to the screen (if ob_start is passed no callback method) or through "replace_tabs_newlines" (or any other callback method, if one is specified).

Final Check:

View your source code, and you should see that white space and newlines * were stripped before the page was rendered.

That's it! Make sure to test your page speed before and after: Webwait or however. If you are seeing any performance increase, go for it! :)

Update on Getting a Totally Blank Output Page

I host through 1&1, and ran into a caveat when testing this on my writing site (which incidentally did improve speed performance for the large pages): I had to edit the file min/lib/Minify/CSS.php, taking out the three references to Minify in the require_once statements:
// edits made to "min/lib/Minify/CSS.php"

public static function minify($css, $options = array()) 
        require_once 'CSS/Compressor.php'; // removed "Minify/" from the front
        if (isset($options['preserveComments']) 
            && !$options['preserveComments']) {
            $css = Minify_CSS_Compressor::process($css, $options);
        } else {
            require_once 'CommentPreserver.php'; // removed "Minify/" from the front
            $css = Minify_CommentPreserver::process(
                ,array('Minify_CSS_Compressor', 'process')
        if (! isset($options['currentDir']) && ! isset($options['prependRelativePath'])) {
            return $css;
        require_once 'CSS/UriRewriter.php'; // removed "Minify/" from the front
        if (isset($options['currentDir'])) {
            return Minify_CSS_UriRewriter::rewrite(
                ,isset($options['docRoot']) ? $options['docRoot'] : $_SERVER['DOCUMENT_ROOT']
                ,isset($options['symlinks']) ? $options['symlinks'] : array()
        } else {
            return Minify_CSS_UriRewriter::prepend(
Note that (a) I am using the Minify library a directory level deeper (/some_dir/min), and (b) I only ran into problems when including it in files that lived in directories deeper than root, not at the root level itself.

However, I suspect that even if (a) were not an issue, (b) would probably still apply, and since CSS.php physically lives in the Minify directory, there should be no reason that directory should have to be specified at the front of the require_once path anyway, as includes are relative to the including file.

I can also verify that modifying this file in this way works equally well in root as it does any number of directory levels deeper.

* Minify does not necessarily strip all newlines depending on its heuristics and your particular page. If you can be assured that stripping all newlines and tabs will have no effect on your particular setup, you can replace "return $content;" with the return found in the otherwise too-simple first example. Minify will have safely stripped all comments from the CSS and JS and you'll be stripping any whitespace it "missed," but you've still got any pre-populated textareas to consider, inputs Minify knows to leave well enough alone.