Compressing only html/css/js/etc files for use with nginx’s “gzip static”

This was something I’ve wanted to do for a long time… since I was messing around with everything else on the server, I figured now would be a good time.

The rationale behind it is that if you have pre-compressed versions of a file available and use the “gzip_static” directive in nginx, it’ll save nginx from having to compress it on-the-fly. So less server load, and visitors hopefully get served the page just a tiny bit faster. This also makes the max compression in gzip (-9) a little more palatable.

The problems I needed to solve were as follows:

  1. I needed to be able to specify which files to compress based on the extension. Only compress CSS, HTML, JS, etc.
  2. I wanted it set up in a way such that it could be automated. In other words, so that I could either manually run a script that would look-at-and-take-care-of-everything, or handle it in a cron job that would look to see if any of the files changed, and update the .gz version if so.
  3. This was being done on a Ubuntu 14.04 server, so of the zillion ways you might do something in a bash script, whatever I used had to work on Ubuntu 14.04.

Solving #1 was a fairly simple 1-liner, though note that you should BACK UP first and make sure you have the latest version of gzip before trying this, since the “-k” option is pretty new and it would be unfortunate if you suffered the old behavior:

find /var/www/testing.com -type f -regextype posix-extended -regex '.*\.(htm|css|html|js)' -exec gzip -k -9 {} \;

That’s a quick and dirty (not so great) way of manually gzipping it all and getting prompts asking if you want to overwrite any previous .gz versions. Kinda terrible, but one line.

A better way that addressed #2 is via a shell script. This one’s a little messy, but hopefully easy enough to follow:

#!/bin/bash

LOCATION="/var/www/testing.com"
FILES="htm|css|html|js"

process() {

DEBUG=1
SLEEP_DELAY=0.1

        FILE="$1"

        if [ -f "$FILE".gz ]
        then
                FILE_ORIG=$(stat -c %Y "$FILE")
                FILE_GZIP=$(stat -c %Y "$FILE".gz)
                if [ $FILE_ORIG -gt $FILE_GZIP ]
                then
                        rm "$FILE".gz
                        gzip -k -9 "$FILE"
                        if [ "$DEBUG" == 1 ]
                        then
                                echo "Deleted old .gz and created new one at: $FILE.gz"
                                sleep $SLEEP_DELAY
                        fi
                else
                        if [ "$DEBUG" == 1 ]
                        then
                                echo "Skipping - Already up to date: $FILE.gz" 
                        fi
                fi
        else
                gzip -k -9 "$FILE"
                if [ "$DEBUG" == 1 ]
                then
                        echo "Created new: $FILE.gz"
                        sleep $SLEEP_DELAY
                fi
        fi
}
export -f process
find $LOCATION -type f -regextype posix-extended -regex '.*\.('$FILES')' -exec /bin/bash -c 'process "{}"' \;

The stuff meant to be easily tweakable is listed at the top in BOLD.

What it essentially does:

  1. Does a find for everything with an html/htm/css/js extension.
  2. Checks to see if a gzipped version already exists.
  3. If a gzipped version DOES already exist, it compares the timestamp against the original. If the timestamp is the same, it does nothing. If the original is newer, it deletes the old gzip and creates a new one.
  4. If a gzipped version DOESN’T already exist, it creates a gzip.

A few things to note:

  • Again, this works on this particular linux distro with the latest version of gzip. Do a test run on some non-important data before trying to use this.
  • DEBUG=1 spits out data about what happened for each file – whether it was gzipped, skipped, or an old gzip was replaced. After you’ve run it successfully once and are sure that nothing-crazy-happened, you can probably set DEBUG=0.
  • SLEEP_DELAY creates a small delay after each compression (it won’t add a delay on skipped files), with the intent being that it doesn’t cause the server load to skyrocket if you have a lot of files and the server’s already heavily loaded. Since compressing 1 million files would take about 28 hours with this setting, you may want to tweak it.
  • You could tweak this in a number of ways to make it better suited to your uses – Couple examples:
    • adding another check so that you don’t compress files under a certain size.
    • saving everything in the process() function to it’s own script and then calling it with something like find /var/www/testing.com -type f -regextype posix-extended -regex ‘.*\.(htm|php|html)’ -exec ./mynewscript.sh {} \; to make it a little more portable if you want to use the same script to affect different file extensions or directories.

 Warnings:

  • Don’t use it on php files unless you absolutely 100% know what you’re doing. A gzipped version wouldn’t work anyway, and they often contain sensitive data that will magically become available to the world via the new .gz file.
  • Back up before testing, and test it on a duplicate. I can’t stress this enough. I could have made an error above, or there might be something funky that happens on your system.
  • If it’s possible for someone else (visitors) to create files on your system somehow, you may want to carefully comb through the code and make sure there’s no room for injection. I honestly have no idea if there’s potential for damage via nefariously crafted odd file/directory names (stuff with spaces and/or special chars in it). Use at your own risk!

6 Comments | Leave a Comment

  1. Leonardo on March 22, 2015 - click here to reply
    Hello, very nice script. But, if for some reason I wish to remove all .gz files, how can I do ?
    • Leonardo: You could make a copy of the script, but replace the process() block with this:
      process() {

      DEBUG=1
      SLEEP_DELAY=0.1

      FILE="$1"

      if [ -f "$FILE".gz ]
      then
      rm "$FILE".gz
      fi
      }
      ...obviously back up first in case something goes awry (or in case I wrote something incorrect - it's fairly late here). That should remove all gzip files that were based on the stuff listed under FILES in the top.
  2. Ev-pa on November 4, 2016 - click here to reply
    What if original file was deleted? How automatically delete it's .gz?
    • Matt Gadient on November 4, 2016 - click here to reply
      Hey Ev-pa,

      A lot of ways to do potentially do this. I'd be inclined to use a separate script (or a separate loop within the first script) that's triggered before the main script/loop which does a "find" for all .gz files in the desired location and then pulls out the basename. For example, if you completely gutted/modified the script so that "FILEZ" was pointing to files like "my.opinion.on.cats.vs.dogs.html.gz" (pointing to .gz extensions instead of the html/js/etc extensions), then:

      basename $FILEZ .gz

      (note the space between $FILEZ and .gz)
      ...would output "my.opinion.on.cats.vs.dogs.html".

      So a line like:

      CHECKFILE=$(basename "$FILEZ" .gz)

      (note the space between "$FILEZ" and .gz)
      ...would give you the file name without the gz extension and save it to a new CHECKFILE variable. Then you can do a check to see if $CHECKFILE exists (with the "if [ -f ... ]" bit). If it doesn't exist, delete the .gz version.

      Once any extraneous .gz files are removed, let the normal script run.

      ---

      A few notes:
      1) I use basename because it handles file names that have multiple periods in them pretty easily. There are other ways to do it, and someone can certainly chime in if they've got a better method.
      2) This will obviously cause a problem if you intentionally uploaded a .tar.gz file and it happens to be in the path you're checking (since you probably don't have an original .tar in there, it'll delete your original). Thus, I tend to just manually delete .gz variants manually if need be.
      3) I've tested the behavior of BASENAME, but didn't actually test my syntax above. So it may need tweaking.
      4) Obviously TEST CAREFULLY on a copy of your main site just in case something goes wrong. Make sure you have backups!

      Hopefully something in there helps. Good luck!
  3. Eugene Varnavsky on May 30, 2017 - click here to reply
    Simply replace gzip with zopfli and the compression ratio will increase.
  4. Alex on August 15, 2020 - click here to reply
    Hi, thank you for the script. I would recommend to use google zopfli instead of gzip. Zopfli is compatible to gzip but delivers better compression. I have modified the script to include minification, because I noticed that the build-in minification from Magento 2 is not very good. So I first minify the files and then compress with zopfli.

Leave a Comment

You can use an alias and fake email. However, if you choose to use a real email, "gravatars" are supported. You can check the privacy policy for more details.