Summary
How can we make InstSimplify work with intrinsics?
The idea here is to pretend that the intrinsics were the actual instruction (eg an fadd) and run the existing InstSimplify code. We stop InstSimplify before it breaks code as soon as we are not sure anymore that the rewrite is compatible with the semantics of the intrinsics.
The tests in this patch show this for contrained fp and vp intrinsics.
This is a work-in-progress re-implementation of the generalized pattern match mechanism of D57504 . That patch also implements InstCombine (and not just simplify).
Background
InstSimplify only works for regular LLVM instructions. Yet, there are more and more intrinsics that mimic regular instructions with a twist.
For example:
- @llvm.experimental.constrained.fadd allows custom rounding and fp exceptions - however, for default fp settings, it is just an fadd.
- @llvm.vp.add is a vector-add with a mask and explicit vector length - however, the operation applied to each active lane is just an add.
InstSimplify and InstCombine specify a ton of peephole rewrites to optimize patterns with regular IR instructions.
We'd like to make those pattern-based rewrites work on intrinsics as well.
How?
InstSimplify always works the same: if a pattern matches, it replaces the match root with a pattern leaf (or a constant).
We do two things to make this work:
1.) We add layer of helper classes that let intrinsics pretend to be instructions.
2.) We add a MatcherContext that verifies that a specific pattern match is legal with intrinsics.
However, we want this not just for one kind of intrinsic but different classes (as shown above). To do that, we introduce the notion of a Trait - a Trait is a representation of that extra-property that makes the difference between an instruction and an intrinsic that just pretends to be one.
We define three different traits in this patch:
- The CFPTrait works on constrained fp intrinsics. The MatcherContext<CFPTrait> verifies all pattern matches that use constrained fp intrinsics with default fp semantics (tonearest, no exceptions).
- The VPTrait works on VP intrinsics and regular instructions. Eg, a first-class %x = add <8 x i32> %y, %z passes as an add as well as a llvm.vp.add.v8i32(%x, %y, %mask, %evl). Since the masked-out lanes in VP intrinsics deliver an undefined results all matching patterns are automatically legal.
- The EmptyTrait does not pretend anything. Only first-class FAdd is a FAdd. There are no helper classes but an Instruction is really just an Instruction.
Remarks
- We get constant-folding for VP intrinsics for free.
- The constrained fp trait could be extended to non-strict fp exceptions (simplify only).
- We will build on this framework also for InstCombine - this was also implemented in D57504 .
- I am not a floating-point expert - i'd be thrilled to learn under which circumstances pattern rewrites that assume default fp semantics apply to other rounding modes.
Implementation Details
The MatcherContext<Trait> starts in an uninitialized state. When a PatternMatch.h pattern is in the process of being matched against a specific instruction, it calls the check(V)/accept(V) methods of the context on all operators in the pattern. As soon as the context returns false, the entire match fails.
The ExtInstruction<Trait>, ExtBinaryOperator<Trait> classes make up the intermediate layer of pretend-classes. The default implementation of those classes assumes that there is an underlying intrinsic class (Trait::Intrinsic).
I'm confused: How does this work? Shouldn't there be an isa<typename Trait::Intrinsic>(V) check? Actually, how does this even compile? It seems like a (V) is missing on the cast.