Page MenuHomePhabricator

[GlobalISel][X86][ARM] Relaxing type constraints on G_SHL and friends
AcceptedPublic

Authored by rtereshin on Mar 20 2018, 2:40 PM.

Details

Summary

This is WIP.

G_SHL, G_ASHR, and G_LSHR opcodes have a single type index at the moment, forcing all the operands to have the same LLT.
This is consistent with the corresponding LLVM IR opcodes, but not so much with the apparent target requirements and pre-existing *.td-defined selection rules / instruction patterns.

For instance, most (if not all) AArch64 patterns fall into one of the following categories:

  1. scalar shifts, having the second source operand ("number of bits to shift") of type i64 (regardless of the type of the main operands, being i32 or i64 most of the time)
  2. vector shifts, having the second source operand of type i32 (immediate or variable both), regardless of the type of the main operands (number of vector lanes and size of the vector elements both)

In other words, the type of the second source operand is not fixed (it could be i64 or i32), it could be larger or smaller than the type of the main operands, and it could be scalar even if the rest of the instruction operates on vectors.

The existing instruction selection artifacts include (but not limited to):
a) patterns, like the following:

multiclass SIMDVectorLShiftLongBySizeBHSPats<SDPatternOperator ext> {
  def : Pat<(AArch64vshl (v8i16 (ext (v8i8 V64:$Rn))), (i32 8)),
            (SHLLv8i8 V64:$Rn)>;

b) immediate predicates, like GIPFP_I64_Predicate_imm0_31

X86 patterns, on the other hand, fall into one of the following categories:

  1. scalar shifts with the second source typed as i8 (as before, regardless of the types of the rest of the operands)
  2. vector shifts following LLVM IR scheme with all the operands having the same type

A typical pattern from (1) looks like this:

// x << (32 - y) >> (32 - y)
def : Pat<(srl (shl GR32:$src, (i8 (trunc (sub 32, GR32:$lz)))),
               (i8 (trunc (sub 32, GR32:$lz)))),
          (BZHI32rr GR32:$src, GR32:$lz)>;
def : Pat<(srl (shl (loadi32 addr:$src), (i8 (trunc (sub 32, GR32:$lz)))),
               (i8 (trunc (sub 32, GR32:$lz)))),
          (BZHI32rm addr:$src, GR32:$lz)>;

Mips appears to have the second source operand typed as i32 or i64, independent of the rest of the types, as usual.

This discrepancy creates the following issues:

  1. Non-optimized and mildly-optimized (pre- https://reviews.llvm.org/D44700 patch) Tablegen'erated InstructionSelect's MatchTable contains rules that could not possibly match, forcing targets to implement shifts' selection by hand in C++
  2. Aggressively optimized MatchTable (post- https://reviews.llvm.org/D44700 patch) contains rules that can and will actually match, but then execute renderers that expect the values having different types (from original SelectionDAG ISel patterns), resulting in miscopmiles.
  3. Testgen (https://reviews.llvm.org/D43962) generates test-cases that don't represent the actual contents of the MatchTable

(2) and (3) due to the fact that aggressive optimizations and test-generation exploits type constrains as defined by Tablegen'erated MCInstrDesc to reduce the number of type checks performed during selection and properly handle partially optimized match tables respectively.

Issue (1) for x86 is mentioned in the following commit message: https://github.com/llvm-mirror/llvm/commit/5b113a2c3b054e1d894ab9e44a6a08e1d0cd7ff3 (git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327499 91177308-0d34-0410-b5e6-96231b3b80d8, https://reviews.llvm.org/D44395), adding as much hand-written C++ selecting shifts as could be seen here: https://github.com/llvm-mirror/llvm/blob/5b113a2c3b054e1d894ab9e44a6a08e1d0cd7ff3/lib/Target/X86/X86InstructionSelector.cpp#L1405-L1482

AArch64 ended up having manually written C++ for selecting all the shifts and G_GEP for GPR RegBank, and the majority of binary ops for FPR RegBank.

I see 2 ways of solving the problem:

  1. Change GlobalISel Emitter the Tablegen backend so it would intelligently adapt the patterns being imported and re-write them so they would work with existing G_* shifts
  2. Relax the type constraints for G_* shifts and allow the second source operand to have an independent type

Given the diversity between targets (1) will have to be target-specific, and in any way it will most likely end up being quite complicated and fragile. Also, vector shifts with a scalar shift amount (AArch64) won't be possible, therefore selecting an efficient opcode will only be possible for vector shifts if the vector operand for shift amounts have the same vreg for every vector element, which is again, fragile and will probably require an additional combine to happen more often. It makes more sense, IMO, to allow such mixed shifts on MIR level explicitly.

Also, shifts aren't regular arithmetic/logical binary ops anyway, they aren't commutative nor associative, their second source operand is always an unsigned integer type regardless of the rest of the operands being signed or unsigned, and the corresponding LLVM IR opcodes have special rules regarding poison values WRT that operand. Therefore they require special handling across the selector anyway.

So this patch is to track progress on implementing the solution (2) at the moment, and get it reviewed as soon as it's done.

Diff Detail

Repository
rL LLVM

Event Timeline

rtereshin created this revision.Mar 20 2018, 2:40 PM

The GenericOpcodes.td change makes sense to me but I'm surprised that only AMGGPU's LegalizerInfo is changing and only for SHL. I'd have expected most (probably all) GlobalISel targets to need to change all three shifts.

Also, could you add a test case?

nhaehnle removed a subscriber: nhaehnle.Apr 2 2018, 12:18 AM

The GenericOpcodes.td change makes sense to me but I'm surprised that only AMGGPU's LegalizerInfo is changing and only for SHL. I'd have expected most (probably all) GlobalISel targets to need to change all three shifts.

Also, could you add a test case?

AMDGPU only legalizes G_SHL, non of the others. And w/o the setAction({G_SHL, 1, S32}, Legal); already existing tests fail (CodeGen/AMDGPU/GlobalISel/legalize-shl.mir and CodeGen/AMDGPU/GlobalISel/regbankselect-shl.mir), so no need to add another one. As for the selector, this is entire AMDGPU's selector:

case TargetOpcode::G_ADD:
  return selectG_ADD(I);
case TargetOpcode::G_CONSTANT:
  return selectG_CONSTANT(I);
case TargetOpcode::G_GEP:
  return selectG_GEP(I);
case TargetOpcode::G_LOAD:
  return selectG_LOAD(I);
case TargetOpcode::G_STORE:
  return selectG_STORE(I);

Every other target legalizes shifts by using the new-style legalization action builders, and those have a couple of rather dangerous properties:

  1. If actions are built for multiple opcodes simultaneously, they are actually directly defined for a representative only, making the rest aliases, which is fine, but nothing checks that all the aliases agree in the number of type indices.
  2. while actions are built for the representative, nothing checks that they are properly defined for every type index, implicitly making any type legal for any type index not mentioned with no action required.

This is why none of the other already existing tests fail, though there are plenty that test legalization of shifts.

Here's a fine example of the issue mentioned earlier:

This is how we legalize G_INTTOPTR for AArch64 right now:

getActionDefinitionsBuilder(G_INTTOPTR)
    .unsupportedIf([&](const LegalityQuery &Query) {
      return Query.Types[0].getSizeInBits() != Query.Types[1].getSizeInBits();
    })
    .legalFor({s64, p0});

(https://github.com/llvm-mirror/llvm/blob/27c28519031032a11a917fa4dbf05417fbe78740/lib/Target/AArch64/AArch64LegalizerInfo.cpp#L215-L219)

Who can spot a problem?

In fact, 2 problems at once. I'm working on a simple validation for LegalizerInfo that would catch this.

SPOILERS below:

a tip, the following MIR:

--- |
  target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
  target triple = "aarch64--"
  
  define i64 @broken(<4 x i16> %v) {
    %res = bitcast <4 x i16> %v to i64
    ret i64 %res
  }

...
---
name:            broken
alignment:       2
tracksRegLiveness: true
registers:       
  - { id: 0, class: _ }
  - { id: 1, class: _ }
body:             |
  bb.1 (%ir-block.0):
    liveins: $d0
  
    %0:_(<4 x s16>) = COPY $d0
    %1:_(s64) = G_INTTOPTR %0(<4 x s16>)
    $x0 = COPY %1(s64)
    RET_ReallyLR implicit $x0

...

will be successfully legalized by

./bin/llc -run-pass=legalizer -verify-machineinstrs -simplify-mir <INPUT>.mir -o -

with the following output:

--- |
  ; ModuleID = 'out.mir'
  source_filename = "out.mir"
  target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
  target triple = "aarch64--"
  
  define i64 @broken(<4 x i16> %v) {
    %res = bitcast <4 x i16> %v to i64
    ret i64 %res
  }

...
---
name:            broken
alignment:       2
legalized:       true
tracksRegLiveness: true
registers:       
  - { id: 0, class: _ }
  - { id: 1, class: _ }
frameInfo:       
  maxCallFrameSize: 0
body:             |
  bb.0 (%ir-block.0):
    liveins: $d0
  
    %0:_(<4 x s16>) = COPY $d0
    %1:_(s64) = G_INTTOPTR %0(<4 x s16>)
    $x0 = COPY %1(s64)
    RET_ReallyLR implicit $x0

...

It's worth mentioning that:

%1:_(s64) = G_INTTOPTR %0(<4 x s16>)

isn't a valid G_INTTOPTR. The result should be a pointer type and the input should be a scalar. However, your point stands even with that fixed.

The first problem is that it should be using the pair version of legalFor() like so:

.legalFor({{s64, p0}});

I see your point that it's too easy to mix these two overloads up at the moment:

LegalizeRuleSet &legalFor(std::initializer_list<LLT> Types);
LegalizeRuleSet &legalFor(std::initializer_list<std::pair<LLT, LLT>> Types);

A partial fix would be to rename the latter legalForPair() or similar which would make it easier to spot but even then we'd still want something to make sure we don't mix up legalFor() and legalForPair()

The next problem is that the types are backwards. It should be:

.legalFor({{p0, s64}});

Nailed it.

isn't a valid G_INTTOPTR. The result should be a pointer type and the input should be a scalar.

Well, it should be because we like it to be, MachineVerifier doesn't check that.

This is what I eventually came up with to reduce the number of such mistakes in the future:
first: https://reviews.llvm.org/D46338
second: https://reviews.llvm.org/D46339

PTAL

rtereshin edited the summary of this revision. (Show Details)May 7 2018, 11:13 AM
rtereshin added reviewers: aemerson, ab, t.p.northover.
rtereshin edited the summary of this revision. (Show Details)May 7 2018, 11:16 AM
rtereshin updated this revision to Diff 146275.May 10 2018, 8:19 PM

This is pretty much finished, PTAL

aemerson accepted this revision.Aug 28 2018, 4:56 PM

I'd like to get this in. LGTM but needs one issue addressed.

lib/Target/AArch64/AArch64InstructionSelector.cpp
1092

Even though this may be true, I'd rather we not use unreachable here. If for some reason Tablegen fails to select the user we want to have SDAG try it.

This revision is now accepted and ready to land.Aug 28 2018, 4:56 PM
aemerson added inline comments.Aug 28 2018, 4:58 PM
lib/Target/AArch64/AArch64InstructionSelector.cpp
1092

s/select the user/select, for the user

This is already done in another patch. Abandon?

Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2019, 3:24 PM