This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Combine XOR and AES insructions on ARM/ARM64
ClosedPublic

Authored by mbrase on May 22 2018, 6:12 PM.

Details

Summary

The ARM/ARM64 AESE and AESD instructions have a builtin XOR as the first step in the instruction. Therefore, if the AES key is zero and the AES data was previously XORed, it can be combined into a single instruction.

Diff Detail

Repository
rL LLVM

Event Timeline

mbrase created this revision.May 22 2018, 6:12 PM

I am not an expert in encryption but how likely it is for 'key' to be zero ?
Other than that, LGTM

sdesmalen added inline comments.May 23 2018, 1:09 PM
test/Transforms/InstCombine/AArch64/aes-intrinsics.ll
21 ↗(On Diff #148138)

Would it be useful to add some negative tests in case the second operand is not all zero?

Hi, I can try to give a little more background on the motivation behind this patch, and why the key might be zero.

I’m trying to cross compile some code which uses x86 AES intrinsics on Aarch64 with LLVM. However, the x86 AES instructions have slightly different semantics than the ARM instructions. One important difference is that x86 performs an XOR at the end of their AESENC/AESENCLAST instructions, whereas ARM does the XOR at the beginning of their AESE instruction.

To emulate the x86 AES intrinsics on Aarch64, I defined some inline functions as a drop in replacement, to avoid rewriting the algorithm by hand. Here is what they look like:

#include <stdint.h>
#include <arm_neon.h>

typedef uint8x16_t __m128i;  // __m128i is an x86 type, map it to ARM neon type

// https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_aesenc_si128&expand=221
static inline __m128i _mm_aesenc_si128 (__m128i a, __m128i RoundKey)
{
    return vaesmcq_u8(vaeseq_u8(a, (__m128i){})) ^ RoundKey;
}

// https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_aesenclast_si128&expand=222
static inline __m128i _mm_aesenclast_si128 (__m128i a, __m128i RoundKey)
{
    return vaeseq_u8(a, (__m128i){}) ^ RoundKey;
}

To skip the XOR from the ARM AESE instruction, I pass in a key value of zero, and then manually XOR the real key at the end of the function. The only downside to this approach is that when the inline functions are substituted in the algorithm, they have an unnecessary EOR instruction when multiple rounds of AES encryption are done back to back. Consider the following test code:

// Performs 3 rounds of AES encryption
// Note: In proper 128-bit AES encryption, 10 rounds are used
__m128i aes_block(__m128i data, __m128i k0, __m128i k1, __m128i k2, __m128i k3)
{
    data = data ^ k0;
    data = _mm_aesenc_si128(data, k1);
    data = _mm_aesenc_si128(data, k2);
    data = _mm_aesenclast_si128(data, k3);
    return data;
}

Compiles down to this:

0000000000000024 <aes_block>:
  24:   6e201c20        eor     v0.16b, v1.16b, v0.16b  ; EOR can be combined with AESE
  28:   6f00e401        movi    v1.2d, #0x0
  2c:   4e284820        aese    v0.16b, v1.16b
  30:   4e286800        aesmc   v0.16b, v0.16b
  34:   6e221c00        eor     v0.16b, v0.16b, v2.16b  ; EOR can be combined with AESE
  38:   4e284820        aese    v0.16b, v1.16b
  3c:   4e286800        aesmc   v0.16b, v0.16b
  40:   6e231c00        eor     v0.16b, v0.16b, v3.16b  ; EOR can be combined with AESE
  44:   4e284820        aese    v0.16b, v1.16b
  48:   6e241c00        eor     v0.16b, v0.16b, v4.16b
  4c:   d65f03c0        ret

My goal with this intrinsic is to teach LLVM how to eliminate some of these extra EOR instructions. With my patch, LLVM produces this:

0000000000000024 <aes_block>:
  24:   4e284820        aese    v0.16b, v1.16b
  28:   4e286800        aesmc   v0.16b, v0.16b
  2c:   4e284840        aese    v0.16b, v2.16b
  30:   4e286800        aesmc   v0.16b, v0.16b
  34:   4e284860        aese    v0.16b, v3.16b
  38:   6e241c00        eor     v0.16b, v0.16b, v4.16b
  3c:   d65f03c0        ret
mbrase updated this revision to Diff 148304.May 23 2018, 3:29 PM

I updated the testcases to include negative tests (for when the AES key is non-zero)

sdesmalen accepted this revision.May 23 2018, 11:13 PM

LGTM! Thanks for adding the negative tests and rationale.

This revision is now accepted and ready to land.May 23 2018, 11:13 PM
This revision was automatically updated to reflect the committed changes.