This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask
ClosedPublic

Authored by gandhi21299 on Jun 10 2021, 11:01 AM.

Details

Summary
  • Implemented the following fold-negate transformation in the very beginning of AMDGPUCodegenPrepare.cpp:

xor (llvm.amdgcn.class x, mask), -1 --> llvm.amdgcn.class(x, ~mask)

  • Added regression tests

Diff Detail

Event Timeline

gandhi21299 created this revision.Jun 10 2021, 11:01 AM
gandhi21299 requested review of this revision.Jun 10 2021, 11:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2021, 11:01 AM
arsenm added inline comments.Jun 10 2021, 2:27 PM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1393

This should not be a separate pass over the function. This should be a visitXor function

1444

You don't need to create a constant to check the value

1454

Should check hasOneUse

1458–1469

I don't know why you are looking at all of these extensions. The xor should directly consume the call

1473–1475

You can just dyn_cast<IntrinsicInst>

1480

Since you know it's a constant, you can also 0 the irrelevant high bits

1485

You don't need to collect dead instructions, the xor should always be dead

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
19

Need a negative test with a variable mask. Also a negative test for multiple uses. Plus also could use tests for all of the FP types

27

You shouldn't be trying to look through this zext, this IR should have been optimized to xor on i1

37–39

This didn't do anything

arsenm requested changes to this revision.Jun 10 2021, 2:27 PM
This revision now requires changes to proceed.Jun 10 2021, 2:27 PM
gandhi21299 marked 6 inline comments as done.Jun 11 2021, 9:09 AM
gandhi21299 added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1458–1469

I am considering the case where the result from the class intrinsic is extended/truncated for some reason. Is it possible to get the IntrinsicInst directly from the xor operand?

gandhi21299 marked 3 inline comments as done.Jun 11 2021, 12:46 PM
gandhi21299 added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
1480

Do I use IRBuilder to do this or is there a simpler way?

gandhi21299 marked 2 inline comments as done.Jun 11 2021, 1:21 PM
  • check if one of the operands of the xor instruction is a ConstantInt, fixes two of the tests
arsenm added inline comments.Jun 15 2021, 10:32 AM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
21

Definitely should not include this here

30

Shouldn't need this

819–827

I believe PatternMatch has a nicer way to check for a not

835

I'd prefer to just use the dyn_cast below and return on that rather than checking isa first

842

You don't need this intermediate APInt (plus this is required to be an i32)

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
55

It would be a better test to have a non-identical use (i.e. just add a store of the value)

arsenm added inline comments.Jun 15 2021, 10:46 AM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
819–827

Plus constants are canonicalized to the RHS, so you don't need to check both

828–829

Invert condition and exit early to reduce indentation

arsenm added inline comments.Jun 15 2021, 11:08 AM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
845

Arg->getType()

gandhi21299 marked 7 inline comments as done.
  • changes as requested by Matt
gandhi21299 marked an inline comment as done.Jun 15 2021, 1:09 PM
gandhi21299 added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
819–827

After removing the else-if condition, this seems clean enough to me. Perhaps, PatternMatch won't be required any more.

  • refreshing diff, only 2 tests should be failing instead of 4
arsenm added inline comments.Jun 15 2021, 5:41 PM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
817

No else after return

821–822

Using m_Not would still be clearer

821–823

Using dyn_cast and isa is redundant, just use dyn_cast and check the result

841

There are fewer bits in the test mask than this

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
2

You didn't actually generate these checks. The operand tests are mostly missing

foad added a comment.Jun 16 2021, 2:52 AM

Please run all of check-llvm-codegen-amdgpu. I tried your patch and it looks like a couple more tests need updating.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
17–39

I don't think you need any of these changes to the #includes.

815

This check is wrong. It's OK for there to be multiple uses of the Xor, since you replace all of them. It is not OK for there to be other uses of the intrinsic call, since you modify it in-place.

gandhi21299 marked 7 inline comments as done.
  • requested changes
  • @foad two tests fail after running ninja check-llvm-codegen-amdgpu
gandhi21299 added inline comments.Jun 16 2021, 2:50 PM
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
2

Can you elaborate on the kind of tests we need?

foad added inline comments.Jun 17 2021, 1:38 AM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
23

Don't need this because you don't use the IRBuilder for anything

25

Don't need this unless you actually do some pattern matching

28

Don't need this, it's already included by default

826

Don't need this, you don't use it for anything

830–832

This is wrong. If the value was 4 you will xor it with 7 giving 3, but you need to flip all the bits that amdgcn_class cares about, i.e. 10 low order bits. You should either xor with a fixed value of 0x3ff, or perhaps move this enum from AMDGPUInstCombineIntrinsic.cpp to a common header (maybe SIDefines.h?) and add an "ALL" value to it:

260:  case Intrinsic::amdgcn_class: {
261-    enum {
262-      S_NAN = 1 << 0,       // Signaling NaN
263-      Q_NAN = 1 << 1,       // Quiet NaN
264-      N_INFINITY = 1 << 2,  // Negative infinity
265-      N_NORMAL = 1 << 3,    // Negative normal
266-      N_SUBNORMAL = 1 << 4, // Negative subnormal
267-      N_ZERO = 1 << 5,      // Negative zero
268-      P_ZERO = 1 << 6,      // Positive zero
269-      P_SUBNORMAL = 1 << 7, // Positive subnormal
270-      P_NORMAL = 1 << 8,    // Positive normal
271-      P_INFINITY = 1 << 9   // Positive infinity
272-    };
gandhi21299 marked 4 inline comments as done.Jun 17 2021, 9:10 AM
gandhi21299 added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
23

Turns out, it is used in other places. This pass does not compile without including this header.

830–832

Flipping only the low 10 bits fails several other tests, thoughts? @arsenm @foad

foad added inline comments.Jun 17 2021, 9:13 AM
llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
830–832

You need to look at why they fail. Probably your patch changes (improves) the generated code, so the tests need to be updated to expect the improved code sequence.

gandhi21299 marked 2 inline comments as done.
  • get lower 10 bits of the test mask by xor-ing with 0x3ff
  • updated tests accordingly
  • refactoring
  • 2 tests still fail from the AMDGPU codegen test suite

I am not too sure what is causing the test CodeGen/AMDGPU/amdgpu-codegenprepare-i16-to-i32.ll to fail. There is no amdgcn class intrinsic being used anywhere in this test case so there should not be any transformation happening. @arsenm

foad added a comment.Jun 18 2021, 3:03 AM

I am not too sure what is causing the test CodeGen/AMDGPU/amdgpu-codegenprepare-i16-to-i32.ll to fail. There is no amdgcn class intrinsic being used anywhere in this test case so there should not be any transformation happening.

Your visitXor function overrides the handling of 16-bit xors which were previously handled by visitBinaryOperator.

For the cases you can't handle, instead of return false you need something like return visitBinaryOperator(I).

  • now passes all tests, thanks @foad for the fix
foad accepted this revision.Jun 18 2021, 9:15 AM

Looks OK to me, but please wait a day in case other reviewers still have comments.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
816

All ConstantInts have IntegerType so you don't need to check that.

gandhi21299 marked an inline comment as done.
  • eliminated RHS IntegerTy check since it is already declared as a ConstantInt
arsenm added inline comments.Jun 18 2021, 11:02 AM
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-foldnegate.ll
47

Should check the operands

61

Should check the operands

gandhi21299 marked 4 inline comments as done.
  • added checks on operands for the class intrinsic
arsenm accepted this revision.Jun 18 2021, 11:49 AM
This revision is now accepted and ready to land.Jun 18 2021, 11:49 AM

Thanks for the review process, I will merge this patch in

This revision was landed with ongoing or failed builds.Jun 18 2021, 12:04 PM
This revision was automatically updated to reflect the committed changes.