While messing with Scaleway’s dedicated 4-core ARM server for one of my previous posts, I thought it might be interesting to see how HandBrake might do.
After compiling HandBrake for armhf (instructions at the end if you came here just for that), I did a couple quick tests at various settings, and compared against a couple other systems:
- The blue and green results are from x264 (fast) and x265 (fast) on a 1080p encode (a full-screen Bluray).
- The yellow result is from an x264 encode (veryslow) at 480p (DVD resolution).
- The framerates above (fps) aren’t exact. I had to encode a short clip to keep encode time within the realm of sanity on ARM, but the processing at the end of the encode inflated the time it took which really impacted ARM’s result fairly badly. Thus, these are visual average fps numbers from part way though each encode.
- These were all 4-core machines.
I won’t go into the detailed encode settings: I really just wanted to know “how much slower is ARM than intel/x86/x64 when encoding in HandBrake?“, and for that, all the little detailed minutiae didn’t really matter. A general ballpark was ok.
The answer is roughly 10-20x slower against the other 2 machines I tested. If you want to get an idea compared to *your* machine, you’ll probably have to compare the ratio of your-cpu : i5-2500k (Anandtech’s CPU bench will get you a ballpark), and then extrapolate/apply it to what you see in the chart.
Obviously in the tests above, ARM wasn’t showing very usable results: a 2 hour movie might take 1-2 days to encode….
Next question: Can ARMv7 give usable speeds under any circumstance?
For that, I limited my other tests to x264 at 480p and toned down the speed settings some. So basically DVD-resolution stuff at quicker settings. Few examples part-way through an encode:
So sure, HandBrake on ARM can give reasonable speeds. But you’re looking probably looking at using x264, 480p, and fairly quick settings, particularly if your video is complex. If you want a little more performance, you can avoid “tune grain”, use a lower-quality RF value, and perhaps try some mild denoising via HQDN3D. Those often give a slight speed benefit on x32/64 machines, so it’s very possible those benefits will translate to ARM as well.
So why is ARM so slow? Some thoughts…
- Intel/AMD have a bunch of extensions used by x264. There’s also a bunch of hand-optimized assembly. Over the years, x264 has been tuned like crazy to perform as well as it does. Much of that hand tweaking won’t extend to ARM.
- ARMv7 isn’t exactly the newest stuff out there. We’re talking somewhere between iPhone 3GS to iPhone 4S levels of tech here, and ARM has made pretty big strides since then (unlike Intel which has had fairly minor strides since Sandy Bridge in 2009). ARMv8 stuff seems to be trickling out across the server market, so it’s possible it will show a little better.
Enough about performance…
Compiling HandBrake on ARMv7 (armhf)
There are 2 tricks to getting HandBrake to compile on ARM via Ubuntu linux (well… 2 more tricks than usual anyway…). HandBrake itself will compile fine on Ubuntu 16.04. However:
- x264 and x265 won’t work with assembly enabled (x264 *might* depending on architecture).
- x264 and x265 have some other wonky things that have to be configured too.
Fortunately, it’s pretty easy to edit the x264 and x265 module.def files before building. You essentially need to add:
- –disable-asm –disable-opencl for the X264.CONFIGURE.extra fields in contrib/x264/module.defs
- -DENABLE_ASSEMBLY=OFF -DENABLE_PIC=ON -DENABLE_AGGRESSIVE_CHECKS=ON -DENABLE_TESTS=ON -DCMAKE_SKIP_RPATH=ON for the X265.CONFIGURE.extra fields in contrib/x265/module.defs
If you’re already accustomed to dabbling in that stuff, you can probably figure it out pretty easily. If you’re using the ARMv7 cores on Scaleway, for x264 you can optionally omit –disable-asm and instead use –extra-cflags=”-mfpu=vfpv3 -mcpu=marvell-pj4″ – the marvell-pj4 bit should hopefully be correct for the Marvel 370 that the machines seem to use. I’m not sure what the optimal fpu entry is to use, but vfpv3 didn’t cause issues for me. The down-side is that even after building with these targetted settings and doing a side-by-side comparison, there didn’t seem to be any difference in encode time (if there was, it was < 1%).
If not, here is a chunk of code you can try copy/pasting into the shell on a FRESH UBUNTU 16.04 INSTALL when you’re logged in as root… (disclaimer: may be broken or may break at some point. make sure it looks good and back up anything important):
apt-get update && apt-get install -y git cmake yasm build-essential autoconf libtool \ zlib1g-dev libbz2-dev libogg-dev libtheora-dev libvorbis-dev \ libsamplerate-dev libxml2-dev libfribidi-dev libfreetype6-dev \ libfontconfig1-dev libass-dev libmp3lame-dev libx264-dev libjansson-dev \ intltool libglib2.0-dev libdbus-glib-1-dev libgtk-3-dev libgudev-1.0-dev \ libwebkitgtk-3.0-dev libnotify-dev libgstreamer1.0-dev \ libgstreamer-plugins-base1.0-dev libappindicator-dev libtool-bin && apt-get install -y libmp3lame-dev libass-dev libsamplerate-dev && ldconfig && export PKG_CONFIG=/usr/bin/pkg-config && export PKG_CONFIG_PATH=/usr/lib/pkgconfig && git clone https://github.com/HandBrake/HandBrake.git hb-master && cp -n ./hb-master/contrib/m4/module.defs ./hb-master/contrib/m4/module.defs.bak && echo '$(eval $(call import.MODULE.defs,M4,m4))' > ./hb-master/contrib/m4/module.defs && echo '$(eval $(call import.CONTRIB.defs,M4))' >> ./hb-master/contrib/m4/module.defs && echo 'M4.FETCH.url = http://ftp.gnu.org/gnu/m4/m4-1.4.17.tar.bz2' >> ./hb-master/contrib/m4/module.defs && echo 'M4.FETCH.md5 = 8a1787edcba75ae5cd1dc40d7d8ed03a' >> ./hb-master/contrib/m4/module.defs && cp -n ./hb-master/contrib/x264/module.defs ./hb-master/contrib/x264/module.defs.bak && sed '/format\=420/a X264.CONFIGURE.extra \+\= \-\-disable\-asm \-\-disable-opencl' \ ./hb-master/contrib/x264/module.defs.bak > ./hb-master/contrib/x264/module.defs && cp -n ./hb-master/contrib/x265/module.defs ./hb-master/contrib/x265/module.defs.bak && sed '/LIBNUMA\=OFF/a X265.CONFIGURE.extra \+\= \-DENABLE\_ASSEMBLY\=OFF \-DENABLE\_PIC\=ON \-DENABLE\_AGGRESSIVE\_CHECKS\=ON \-DENABLE\_TESTS\=ON \-DCMAKE\_SKIP\_RPATH\=ON' \ ./hb-master/contrib/x265/module.defs.bak > ./hb-master/contrib/x265/module.defs && cd hb-master && ./configure --disable-gtk && cd build && make
This literally grabs all the dependencies, makes a git clone of HandBrake (in whatever directory you’re in), makes some necessary changes, and compiles. You’ll find the ./HandBrakeCLI file in the directory once you’re done.
If you saw my previous post re: working around hiccups when compiling Handbrake on Ubuntu, you’ll notice that it’s almost identical. This one simply adds the stuff to edit the x264 and x265 module.def files so it’ll compile on ARMv7/armhf.