HandBrake 0.10.5 nightly and ARM (ARMv7) – short benchmark and how-to

While messing with Scaleway’s dedicated 4-core ARM server for one of my previous posts, I thought it might be interesting to see how HandBrake might do.

After compiling HandBrake for armhf (instructions at the end if you came here just for that), I did a couple quick tests at various settings, and compared against a couple other systems:

HandBrake on ARM comparison

Notes:

  • The blue and green results are from x264 (fast) and x265 (fast) on a 1080p encode (a full-screen Bluray).
  • The yellow result is from an x264 encode (veryslow) at 480p (DVD resolution).
  • The framerates above (fps) aren’t exact. I had to encode a short clip to keep encode time within the realm of sanity on ARM, but the processing at the end of the encode inflated the time it took which really impacted ARM’s result fairly badly. Thus, these are visual average fps numbers from part way though each encode.
  • These were all 4-core machines.

I won’t go into the detailed encode settings: I really just wanted to know “how much slower is ARM than intel/x86/x64 when encoding in HandBrake?“, and for that, all the little detailed minutiae didn’t really matter. A general ballpark was ok.

The answer is roughly 10-20x slower against the other 2 machines I tested. If you want to get an idea compared to *your* machine, you’ll probably have to compare the ratio of your-cpu : i5-2500k (Anandtech’s CPU bench will get you a ballpark), and then extrapolate/apply it to what you see in the chart.

Obviously in the tests above, ARM wasn’t showing very usable results: a 2 hour movie might take 1-2 days to encode….

 

Next question: Can ARMv7 give usable speeds under any circumstance?

For that, I limited my other tests to x264 at 480p and toned down the speed settings some. So basically DVD-resolution stuff at quicker settings. Few examples part-way through an encode:

Ultrafast: 41fps
Faster: 14fps
Medium: 12fps

So sure, HandBrake on ARM can give reasonable speeds. But you’re looking probably looking at using x264, 480p, and fairly quick settings, particularly if your video is complex. If you want a little more performance, you can avoid “tune grain”, use a lower-quality RF value, and perhaps try some mild denoising via HQDN3D. Those often give a slight speed benefit on x32/64 machines, so it’s very possible those benefits will translate to ARM as well.

 

So why is ARM so slow? Some thoughts…

  1. Intel/AMD have a bunch of extensions used by x264. There’s also a bunch of hand-optimized assembly. Over the years, x264 has been tuned like crazy to perform as well as it does. Much of that hand tweaking won’t extend to ARM.
  2. ARMv7 isn’t exactly the newest stuff out there. We’re talking somewhere between iPhone 3GS to iPhone 4S levels of tech here, and ARM has made pretty big strides since then (unlike Intel which has had fairly minor strides since Sandy Bridge in 2009). ARMv8 stuff seems to be trickling out across the server market, so it’s possible it will show a little better.

Enough about performance…

Compiling HandBrake on ARMv7 (armhf)

There are 2 tricks to getting HandBrake to compile on ARM via Ubuntu linux (well… 2 more tricks than usual anyway…). HandBrake itself will compile fine on Ubuntu 16.04. However:

  1. x264 and x265 won’t work with assembly enabled (x264 *might* depending on architecture).
  2. x264 and x265 have some other wonky things that have to be configured too.

Fortunately, it’s pretty easy to edit the x264 and x265 module.def files before building. You essentially need to add:

  • –disable-asm –disable-opencl for the X264.CONFIGURE.extra fields in contrib/x264/module.defs
  • -DENABLE_ASSEMBLY=OFF -DENABLE_PIC=ON -DENABLE_AGGRESSIVE_CHECKS=ON -DENABLE_TESTS=ON -DCMAKE_SKIP_RPATH=ON for the X265.CONFIGURE.extra fields in contrib/x265/module.defs

If you’re already accustomed to dabbling in that stuff, you can probably figure it out pretty easily. If you’re using the ARMv7 cores on Scaleway, for x264 you can optionally omit –disable-asm and instead use –extra-cflags=”-mfpu=vfpv3 -mcpu=marvell-pj4″ – the marvell-pj4 bit should hopefully be correct for the Marvel 370 that the machines seem to use. I’m not sure what the optimal fpu entry is to use, but vfpv3 didn’t cause issues for me. The down-side is that even after building with these targetted settings and doing a side-by-side comparison, there didn’t seem to be any difference in encode time (if there was, it was < 1%).

If not, here is a chunk of code you can try copy/pasting into the shell on a FRESH UBUNTU 16.04 INSTALL when you’re logged in as root… (disclaimer: may be broken or may break at some point. make sure it looks good and back up anything important):

apt-get update &&
apt-get install -y git cmake yasm build-essential autoconf libtool \
zlib1g-dev libbz2-dev libogg-dev libtheora-dev libvorbis-dev \
libsamplerate-dev libxml2-dev libfribidi-dev libfreetype6-dev \
libfontconfig1-dev libass-dev libmp3lame-dev libx264-dev libjansson-dev \
intltool libglib2.0-dev libdbus-glib-1-dev libgtk-3-dev libgudev-1.0-dev \
libwebkitgtk-3.0-dev libnotify-dev libgstreamer1.0-dev \
libgstreamer-plugins-base1.0-dev libappindicator-dev libtool-bin &&
apt-get install -y libmp3lame-dev libass-dev libsamplerate-dev &&
ldconfig &&
export PKG_CONFIG=/usr/bin/pkg-config &&
export PKG_CONFIG_PATH=/usr/lib/pkgconfig &&
git clone https://github.com/HandBrake/HandBrake.git hb-master &&
cp -n ./hb-master/contrib/m4/module.defs ./hb-master/contrib/m4/module.defs.bak &&
echo '$(eval $(call import.MODULE.defs,M4,m4))' > ./hb-master/contrib/m4/module.defs &&
echo '$(eval $(call import.CONTRIB.defs,M4))' >> ./hb-master/contrib/m4/module.defs &&
echo 'M4.FETCH.url = http://ftp.gnu.org/gnu/m4/m4-1.4.17.tar.bz2' >> ./hb-master/contrib/m4/module.defs &&
echo 'M4.FETCH.md5 = 8a1787edcba75ae5cd1dc40d7d8ed03a' >> ./hb-master/contrib/m4/module.defs &&
cp -n ./hb-master/contrib/x264/module.defs ./hb-master/contrib/x264/module.defs.bak &&
sed '/format\=420/a X264.CONFIGURE.extra \+\= \-\-disable\-asm \-\-disable-opencl' \
./hb-master/contrib/x264/module.defs.bak > ./hb-master/contrib/x264/module.defs &&
cp -n ./hb-master/contrib/x265/module.defs ./hb-master/contrib/x265/module.defs.bak &&
sed '/LIBNUMA\=OFF/a X265.CONFIGURE.extra \+\= \-DENABLE\_ASSEMBLY\=OFF \-DENABLE\_PIC\=ON \-DENABLE\_AGGRESSIVE\_CHECKS\=ON \-DENABLE\_TESTS\=ON \-DCMAKE\_SKIP\_RPATH\=ON' \
./hb-master/contrib/x265/module.defs.bak > ./hb-master/contrib/x265/module.defs &&
cd hb-master &&
./configure --disable-gtk &&
cd build &&
make

This literally grabs all the dependencies, makes a git clone of HandBrake (in whatever directory you’re in), makes some necessary changes, and compiles. You’ll find the ./HandBrakeCLI file in the directory once you’re done.

If you saw my previous post re: working around hiccups when compiling Handbrake on Ubuntu, you’ll notice that it’s almost identical. This one simply adds the stuff to edit the x264 and x265 module.def files so it’ll compile on ARMv7/armhf.

2 Comments | Leave a Comment

  1. Daniel on May 5, 2018 - click here to reply
    It would be interesting to have this post updated using the armv8 machines from scaleway. They perform a lot better.
  2. James Carroll on February 10, 2020 - click here to reply
    I recently installed Handbrake on a Raspberry Pi 4 running Raspbian, a 32 bit ARM custom distribution for the Pi. I converted an 81.8MB 1080p MP4 file downloaded from YouTube. I used Handbrake to convert it to 480p on several computers. The Pi 4 took about 8 minutes and my i7 E6530 laptop from 2012 took only 1.5 minutes with the same settings. My 2012 Macbook Pro with i7 took a few seconds longer than the Dell laptop since it's CPU was slightly older. My dual core Atom 1.6ghz equiped Acer netbook took almost 16 minutes. The best part is that I simply installed Handbrake from the Raspbian repositories. I wanted to compare the Pi 4 running 64 bit Manjaro but unfortunately they do not have a package for it so I now find myself looking to compile it which is what brought me here. I'm running at the stock clock on the Pi 4 at the moment but will soon move to a 2.1ghz OC. Handbrake fully utilizes all 4 cores on the Pi and will cause it to overheat and throttling will occur unless a fan is installed. Thanks for the useful info you provide here.

Leave a Comment

You can use an alias and fake email. However, if you choose to use a real email, "gravatars" are supported. You can check the privacy policy for more details.