Page MenuHomePhabricator

[DAGCombine][X86][AMDGPU][AArch64] (srl (shl x, c1), c2) with c1 != c2 handling
Needs ReviewPublic

Authored by lebedev.ri on May 18 2019, 6:24 AM.

Details

Summary

https://rise4fun.com/Alive/6bVL

AArch64 is clearly a regression, will need a separate fix
AMDGPU change looks bad either way - bfe is not used
X86 changes look neutral-positive, except vector cases, those are clearly regressions

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.May 18 2019, 6:24 AM
lebedev.ri marked 2 inline comments as done.May 18 2019, 3:18 PM
lebedev.ri added subscribers: arsenm, compnerd.

Looked at changes:

  • I'll leave x86 vector stuff for later. since i actually wanted to look into reverse trasnform, and looked into this only for consistency.
  • I don't know what to do with AArch64 regression. I can hide it with shouldFoldConstantShiftPairToMask(), but it is there regardless (tests added). Thoughts?
  • That leaves AMDGPU?
test/CodeGen/AArch64/arm64-bitfield-extract.ll
396

After actually looking at -debug output, this regression happens because of SimplifyDemandedBits(),
which ignores AArch64TargetLowering::isDesirableToCommuteWithShift() override.
So when we get to AArch64TargetLowering::isBitfieldExtractOp(), we have

t18: i32 = srl t15, Constant:i64<2>
  t15: i32 = or t26, t24
    t26: i32 = and t7, Constant:i32<1073741816>
      t7: i32,ch = load<(load 4 from %ir.y, align 8)> t0, t2, undef:i64
        t2: i64,ch = CopyFromReg t0, Register:i64 %0
          t1: i64 = Register %0
        t6: i64 = undef
      t25: i32 = Constant<1073741816>
    t24: i32 = and t12, Constant:i32<4>
      t12: i32 = srl t4, Constant:i64<16>
        t4: i32,ch = CopyFromReg t0, Register:i32 %1
          t3: i32 = Register %1
        t11: i64 = Constant<16>
      t23: i32 = Constant<4>
  t17: i64 = Constant<2>

I'm not sure how to turn this pattern on it's head to produce ubfx again.

test/CodeGen/AMDGPU/llvm.amdgcn.ubfe.ll
686–687

@arsenm will AMDGPU prefer 2 shifts or shift+mask here?

arsenm added inline comments.May 18 2019, 3:28 PM
test/CodeGen/AMDGPU/llvm.amdgcn.ubfe.ll
686–687

In this particular case, they're the same. In general 2 shifts is probably better. The mask value is less likely to be an inline immediate. It seems like we have a BFE matching problem though

686–687

For 64-bit, shift and mask would be better

Looks like AMDGPU changes are neutral too.
And now that i think about it, the AArch64 regression should be solvable (hidable) by an inverse transform.
Should i look into that before or after this patch?