- Implemented the following fold-negate transformation in the very beginning of AMDGPUCodegenPrepare.cpp:
xor (llvm.amdgcn.class x, mask), -1 --> llvm.amdgcn.class(x, ~mask)
- Added regression tests
Differential D104049
[AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask gandhi21299 on Jun 10 2021, 11:01 AM. Authored by
Details
xor (llvm.amdgcn.class x, mask), -1 --> llvm.amdgcn.class(x, ~mask)
Diff Detail
Event Timeline
Comment Actions
Comment Actions Please run all of check-llvm-codegen-amdgpu. I tried your patch and it looks like a couple more tests need updating.
Comment Actions
Comment Actions
Comment Actions I am not too sure what is causing the test CodeGen/AMDGPU/amdgpu-codegenprepare-i16-to-i32.ll to fail. There is no amdgcn class intrinsic being used anywhere in this test case so there should not be any transformation happening. @arsenm Comment Actions
Your visitXor function overrides the handling of 16-bit xors which were previously handled by visitBinaryOperator. For the cases you can't handle, instead of return false you need something like return visitBinaryOperator(I). Comment Actions Looks OK to me, but please wait a day in case other reviewers still have comments.
|
Definitely should not include this here