Rewrite ARM NEON intrinsic emission completely.


Rewrite ARM NEON intrinsic emission completely.

There comes a time in the life of any amateur code generator when dumb string
concatenation just won't cut it any more. For NeonEmitter.cpp, that time has

There were a bunch of magic type codes which meant different things depending on
the context. There were a bunch of special cases that really had no reason to be
there but the whole thing was so creaky that removing them would cause something
weird to fall over. There was a 1000 line switch statement for code generation
involving string concatenation, which actually did lexical scoping to an extent
(!!) with a bunch of semi-repeated cases.

I tried to refactor this three times in three different ways without
success. The only way forward was to rewrite the entire thing. Luckily the
testing coverage on this stuff is absolutely massive, both with regression tests
and the "emperor" random test case generator.

The main change is that previously, in arm_neon.td a bunch of "Operation"s were
defined with special names. NeonEmitter.cpp knew about these Operations and
would emit code based on a huge switch. Actually this doesn't make much sense -
the type information was held as strings, so type checking was impossible. Also
TableGen's DAG type actually suits this sort of code generation very well
(surprising that...)

So now every operation is defined in terms of TableGen DAGs. There are a bunch
of operators to use, including "op" (a generic unary or binary operator), "call"
(to call other intrinsics) and "shuffle" (take a guess...). One of the main
advantages of this apart from making it more obvious what is going on, is that
we have proper type inference. This has two obvious advantages:

  1. TableGen can error on bad intrinsic definitions easier, instead of just generating wrong code.
  2. Calls to other intrinsics are typechecked too. So we no longer need to work out whether the thing we call needs to be the Q-lane version or the D-lane version - TableGen knows that itself!

Here's an example: before:

case OpAbdl: {
  std::string abd = MangleName("vabd", typestr, ClassS) + "(__a, __b)";
  if (typestr[0] != 'U') {
    // vabd results are always unsigned and must be zero-extended.
    std::string utype = "U" + typestr.str();
    s += "(" + TypeString(proto[0], typestr) + ")";
    abd = "(" + TypeString('d', utype) + ")" + abd;
    s += Extend(utype, abd) + ";";
  } else {
    s += Extend(typestr, abd) + ";";


def OP_ABDL     : Op<(cast "R", (call "vmovl", (cast $p0, "U",
                                                     (call "vabd", $p0, $p1))))>;

As an example of what happens if you do something wrong now, here's what happens
if you make $p0 unsigned before the call to "vabd" - that is, $p0 -> (cast "U",

arm_neon.td:574:1: error: No compatible intrinsic found - looking up intrinsic 'vabd(uint8x8_t, int8x8_t)'
Available overloads:

  • float64x2_t vabdq_v(float64x2_t, float64x2_t)
  • float64x1_t vabd_v(float64x1_t, float64x1_t)
  • float64_t vabdd_f64(float64_t, float64_t)
  • float32_t vabds_f32(float32_t, float32_t)

... snip ...

This makes it seriously easy to work out what you've done wrong in fairly nasty

As part of this I've massively beefed up the documentation in arm_neon.td too.

Things still to do / on the radar:

  • Testcase generation. This was implemented in the previous version and not in the new one, because
    • Autogenerated tests are not being run. The testcase in test/ differs from the autogenerated version.
    • There were a whole slew of special cases in the testcase generation that just felt (and looked) like hacks. If someone really feels strongly about this, I can try and reimplement it too.
  • Big endian. That's coming soon and should be a very small diff on top of this one.


jamesmJun 17 2014, 6:11 AM
rL211100: Move SetTheory from utils/TableGen into lib/TableGen so Clang can use it.

Event Timeline