mattgadient.com

ARMv7 for a website? A quick and dirty (and horribly unfair) test!

While looking through some various web hosts, I came across some pretty cheap dedicated (“bare metal”) ARM offerings.

  • SYS (OVH’s SoYouStart offering) had some Cortex A9 ARMv7 servers for 12€/month (2 core, 1Ghz/2GB-ram/2TB-hd/250Mbps)
  • Scaleway (Online.net’s offering) had some “Dedicated ARM Cores” servers for 2.99€/month (4 core, ?Ghz/2GB-ram/50GB-ssd//200Mbps)

A dedicated server in that price range seemed pretty awesome. Ubuntu has nginx packages that’ll run on those ARM devices, and a web server should certainly be doable. But would an ARM dedicated server beat out a VPS I was using?

 

The Test Candidates

I fired up a Scaleway server (instead of OVH) for a few reasons that I’ll mention at the end. In case you’re curious, the clock speed turned out to be 1333Mhz as seen below:

lscpu of the ARM

It uses the Marvell Armada 370/XP by the way (in case you were curious).

I figured a “fair” comparison was going to be a similar-cost KVM VPS: a Xeon E3 (3400Mhz) I had access to that happened to have 512MB RAM, an SSD, and 2 virtual cores:

lscpu of the E3 3.4Ghz

Okay, so the moment you glance at those specs, you’ll realize it’s not super fair. Here’s how I justified it though:

  1. Price-wise, they’re close. And if we’re looking at a ~$5 USD dedicated server, I think it’s safe to say price is a big factor.
  2. The VPS has the extra overhead of virtualization.
  3. VPS is obviously shared.
  4. The VPS is a good baseline simply because it’s readily available. If you’re wondering “should I switch from my decent VPS to a dedicated ARM server?“, it’s good to see how the 2 might compare. Note: This obviously won’t help as much if you’re considering a move from a crappy VPS to a dedicated ARM.

Main Concern – encryption

Nginx on it’s own wasn’t a huge worry because nginx is generally considered to be pretty quick. However, I was a little concerned about the HTTPS/SSL aspect for websites (openssl). The processing power needed for SSL isn’t normally a huge concern for sites, but I wasn’t sure how the little ARM server would handle it. Intel/AMD server processors (and most of their consumer processors now) generally have AES-NI support to speed up encryption. No idea whether ARM had anything similar and if so, how much of a bearing it might have.

I started with the common openssl speed rsa benchmark on the ARM machine:

openssl speed rss - ARM

…and compared it to the VPS:

openssl speed rss - E3 3.4Ghz

This looked… awful. Compare the last line (rsa 4096). Ouch.

The VPS was 19-36x faster. If this translated into page load times, that might not be so good… time to see whether those huge difference translated to page load times!

Performance – Testing page loads with Apache Bench

I decided to run a small series of tests using AB to see what the implications were. This was pretty tough (and won’t be incredibly accurate) for the following reasons:

  1. If run on the server itself, ApacheBench uses more processing power than nginx and becomes the bottleneck (saturates the core) on the ARM server in the 6-9 concurrent request range even without HTTPS. I ran some AB’s against Google to verify that this has a huge impact on the ARM machine except at really low concurrency values (near 1).
  2. On machines in the same region, latency has a big impact at lower concurrency values (at higher values, server oomph came into play).

In any case, after dozens of tests, here’s a summary of the “gist”:

  • ARM server: minimum request time tends to be ~1.2ms. With SSL that increases to about 80-90ms!
  • VPS : minimum request time tends to be ~0.1ms. With SSL that increases to about 3ms!
  • Using SSL (https), the ARM server can reasonably handle ~25 concurrent connections before response times start nearing 400ms. At 50 connections it doubles (about 800ms). At 100 it doubles again. Note that the ARM server *is* using all 4 cores with nginx.
  • Using standard HTTP, the ARM server can handle ~100 concurrent connections before response times actually start to increase (and we’re talking a measly 2-3ms at this point).
  • When AB was run on the server itself (non-SSL results), at low concurrency levels of 1-10, the response time of the ARM server was generally 10x longer than the VPS. However, these were small durations to begin with (0.13ms-0.4ms vs 1.3ms-5ms). The gap actually narrowed a little for some reason once concurrency levels of 100 were reached (about 5x longer and we’re talking 5-6ms vs 26-37ms now). Remember that this result has the impact of ApacheBench running at the same time, so don’t compare it with the above.

SSL obviously takes a toll on the ARM server, with a minimum response time of 80-90ms for 1 request. By contrast, the VPS didn’t hit 80-90ms until it’s got around 40-50 concurrent requests coming in.

Conclusion – decent for HTTP, not fantastic with SSL (in my opinion)

The ARM server’s request time via standard http was about 1.2ms. More than the VPS, but still close enough to “nothing” to be nothing. Even if it hits 100 connections, we’re looking at under 5ms. Still decent. Of course, that may all change if php/mysql is thrown into the mix, but it’s not my biggest concern, and I’ll have to look at that some other time.

SSL on the other hand…. oh boy. I did a few browser tests to be sure the results were in-line, and they were.

That said, depending on your circumstance, 80ms might not be a big hit for you. If your website already takes a couple seconds before the TTFB, then hey, what’s another 80ms? But for those pushing the < 200ms TTFB, that 80ms will kill ya.

 

Looking Forward

While I’m not a fan of *these* particular ARM chips as SSL web servers at this point in time, the notion of a cheap dedi running on a low-cost ARM chip with decent RAM and HD space certainly appeals to me. They’d also be great as a standard http server, backup server, or for running pretty much *anything* that’s not CPU intensive. Hopefully in another year or 2, we’ll have better/faster models to play with and I can try this all again!

SoYouStart vs Scaleway – Finishing Up

I mentioned at the beginning that I’d give reasons for choosing Scaleway instead of SoYouStart for this, and didn’t get a chance to sneak it in elsewhere, so here goes….

OVH/SoYouStart has the benefits of OVH’s renowned network, locations in Europe and Canada, IPv6, and the potential for additional IPv4 addresses. All good stuff. A 2TB drive is nothing to sneeze at (though since I’ve never had a website > 20GB, I certainly don’t mind the smaller SSD). Sadly, the price is just way out-of-whack and it’s only a 2-core chip.

Scaleway was 1/4 of the price, and was a 4-core (which is a plus since nginx will use them all if you’ve got 4+ concurrent connections!). It’s also one of those “hourly” billing setups that simply caps out at the monthly rate (similar to DigitalOcean). While they don’t have any North America locations and IPv6 seems to be non-existent for the ARM boxes, they make for perfectly good little test-boxes.


I’ll leave things there. If you’ve tried out one of the ARM boxes and have some input or thoughts to share (or want to correct me on something!), feel free to leave a comment below!

3 Comments

 | Leave a Comment
  1. very useful

  2. armv8 changes the game.. there are the equivalent aes instructions

  3. Unfortunately this review of the capabilities is slightly incorrect incorrect. The AES capability depends not just on the ArmV7 platform but the implementation that the manufactures did.

    Compare this:

    https://systemausfall.org/wikis/howto/AES-Performance

    CuBox-i ARMv7 Processor rev 10 (v7l) 1200 4 23083.07k
    ODroid C1 Amlogic S805 1500 4 25216,9k
    ODROID XU4 Exynos 5422 (ARMv7 Processor rev 3 (v71)) 1400/2000 8 335500.73k

    Both are ArmV7 but have a massive 14x difference.

    ——————————–

    > openssl speed rsa ( on a single core )

    * Raspberry Pi 3 ( 4 * 1.2GHz )

    sign verify sign/s verify/s
    rsa 512 bits 0.000770s 0.000067s 1299.4 15026.7
    rsa 1024 bits 0.003929s 0.000198s 254.5 5042.0
    rsa 2048 bits 0.024988s 0.000705s 40.0 1419.2
    rsa 4096 bits 0.172034s 0.002683s 5.8 372.7

    * Intel(R) Xeon(R) CPU E5-2620 v4 ( 16 * 2.10GHz. 8/8 )

    rsa 512 bits 0.000069s 0.000005s 14551.4 196274.7
    rsa 1024 bits 0.000194s 0.000014s 5147.6 73258.6
    rsa 2048 bits 0.001497s 0.000045s 668.2 22294.2
    rsa 4096 bits 0.011324s 0.000166s 88.3 6022.6

    Results that are more similar to yours.

    ——————————–

    But lets start digging deeper.

    > openssl speed -multi 4 aes-128-cbc sha256

    * Raspberry Pi 3 ( 4 * 1.2GHz )

    aes-128 cbc 165296.89k 186582.34k 193733.03k 195543.38k 195802.45k
    sha256 42471.01k 108561.15k 195634.60k 246154.24k 265977.86k

    * Intel(R) Xeon(R) CPU E5-2620 v4 ( 16 * 2.10GHz. 8/8 )

    aes-128 cbc 364023.11k 403488.41k 412570.88k 415567.87k 417092.95k
    sha256 144559.57k 325549.82k 566663.00k 694440.62k 734221.65k

    Yet these numbers are closer… double the results with the same core count on specific encoding.

    ——————————–

    * Raspberry Pi 3 ( single core 1.2GHz )

    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    sha256 10697.70k 27193.90k 48921.69k 61672.11k 66633.73k

    * Intel(R) Xeon(R) CPU E5-2620 v4 ( single core 2.10GHz. )

    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    sha256 36154.70k 81391.89k 141687.38k 173399.38k 184119.99k

    Yet, here the numbers are only 3 times difference. If we match cpu speed, its down to only 2 times differences.

    And this on the ArmV7 A7 in the Raspberry Pi3… Given the results that we see on the site i provided ( and what matches to mine ), the issue is not pure the Arm device. Some can be a LOT slower but also the implementations.

    Conclusions:

    In Short, the device that Scaleway is running looks to be more a older soc implementation or a very standard one, as we can see the massive 10 to 15 times differences in AES/SHA performance between different Arm platforms.

    Even the Raspberry Pi 3 gives 50% better results, then the one used by Scaleway. Still horrible for RSA but that brings us to the next point.

    The issue is not all the implementations. It seem that RSA has gotten better treatment in the Intel CPUs (possibly the difference in cache sizes? Specific instructions? ).

    And at times it simply the compiler + new instructions. A not so old article for Go showed the differences:

    https://blog.minio.io/accelerating-sha256-by-100x-in-golang-on-arm-1517225f5ff4

    Yes, 100 times free performance boost the moment somebody finally implemented the instruction in the compiler.

    So people using Arm based mini-servers, can see massive differences based upon the SOC, Compiler and Arm Version. Because of the almost monopoly position of Intel for years, people do not even questions any more, that different CPUs can have big differences depending on what “extras” some manufactures add or not add. And the software support around it.

    The most interesting that i noticed, is the massive growth in the Arm sector, where more and more features get added to the CPU setups and they can slowly start to creep up to Intel. As we see with the SHA256, the difference between both a cheap ArmV7, on the same clock speed will be down to only 2 times.

    Its optimizations and picking the right CPU for the task that people these days simply forgotten…

Leave a Comment

You can use an alias and fake email. However, if you choose to use a real email, "gravatars" are supported. You can check the privacy policy for more details.

To reduce spam, I manually approve all comments, so don't panic if your comment doesn't show up immediately.