Page MenuHomePhabricator

[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path.
ClosedPublic

Authored by tsymalla on Dec 22 2020, 7:39 AM.

Details

Summary

This patch should find a pattern in GlobalISel / AMDGPUPostLegalizerCombiner which can appear when trying to clamp 64-bit values to short values without truncation, for example by using the 64-bit / 16-bit integer extensions in shaders. This pattern can be reduced to two VALU instructions:

v_cvt_pk_i16_i32_e64 v0, v0, v1 ; where v[0:1] is a 64-bit number
v_med3_i32 v0, LowBoundary, v0, HighBoundary ; where LowBoundary >= SHORT_MII, HighBoundary <= SHORT_MAX

I am open for suggestions and reviews as this is my first change on the LLVM compiler.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
arsenm added inline comments.Jan 4 2021, 10:39 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/clampi64toi16.ll
18–21 ↗(On Diff #313345)

The numbered %0, %1. They should have fixed string names instead. opt -S -instnamer will do this for you or you can manually %name them

arsenm added inline comments.Jan 4 2021, 10:40 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
2

This has too many -s then

285

This should still be trying to match against the min/max opcodes, not looking for compare and selects

pvellien removed a subscriber: pvellien.Jan 4 2021, 8:31 PM
tsymalla updated this revision to Diff 314547.Jan 5 2021, 2:25 AM
tsymalla marked 2 inline comments as done.

Renamed the identifiers in the lit test.

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
2

I reverted this change.

285

So, if I understand this correctly:

  • The select/compare pattern I'm trying to match here will be matched to G_MIN/G_MAX
  • This seems to be not implemented yet or not working for my case (as far as I can see that)

I'm not sure what should be achieved here. Should there either be a "future-proof" implementation for matching min and max which is not working yet, an additional case for matching min and max which falls back to my current implementation or a combination of a) extension of the pattern matching that combines select and icmp to max / min and b) an implementation which only matches max / min and combines it accordingly?
Would you mind explaining this to me a little bit more?

llvm/test/CodeGen/AMDGPU/GlobalISel/clampi64toi16.ll
18–21 ↗(On Diff #313345)

Thanks. I manually named them.

tsymalla marked an inline comment as not done.Jan 5 2021, 2:26 AM
foad added inline comments.Jan 5 2021, 4:34 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
15–16

You can use the git-clang-format script to avoid reformatting the whole file. If you put it on your path then you can invoke it as e.g. git clang-format @^ to only reformat the parts of the file that were changed by your most recent commit.

311–319

Use // even for block comments please.

arsenm added inline comments.Jan 5 2021, 7:15 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
285

The compare+select pattern you are matching here should be matched separately into the min/max opcodes as a better, more canonical form. This is not implemented in GlobalISel yet.

Matching min/max should be implemented in GlobalISel, however I don't think this is a priority. Theoretically the IR optimizers should be matching these in the vast majority of cases with the new min/max intrinsics. A GlobalISel combine would theoretically be useful if the compare/select pattern were to be emitted by independent legalizations, but I'm currently not aware of any implemented case where this could happen

tsymalla added inline comments.Jan 5 2021, 9:07 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
285

So, what would be the right approach to handle this in your opinion? As far as I understand it doesn't make sense to have an implementation for this specific case where a generic implementation would be more suitable. Would you say it makes more sense to wait for the max / min matching in GlobalISel and implement my combine around it?

arsenm added inline comments.Jan 5 2021, 10:10 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
285

You don't need to wait. You can just do it with the min/max opcodes now. You can either write MIR tests or change the IR tests to use the intrinsics already

tsymalla marked an inline comment as done.Jan 11 2021, 4:33 AM
tsymalla added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
311–319

Thanks, Jay. I will change this.

tsymalla updated this revision to Diff 315783.Jan 11 2021, 6:13 AM
tsymalla marked an inline comment as done.

This moves the implementation to the PreLegalizer and is based on LLVM max/min intrinsics.

I would expect this in the post legalizer combiner, and going directly from i64 is a bit weird. What does the post-legalized MIR look like for this?

llvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h
455 ↗(On Diff #315783)

Remove leftover whitespace change

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
191–192

Unrelated formatting changes

248

Extra whitespace change

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
62 ↗(On Diff #315783)

Capitalize

74 ↗(On Diff #315783)

Shouldn't construct single use MachineIRBuilder, there should be one in the combiner state

104 ↗(On Diff #315783)

Capitalize

119 ↗(On Diff #315783)

This is just noise, remove it

121 ↗(On Diff #315783)

Don't construct fresh builder

129 ↗(On Diff #315783)

You can just do Unmerge.getReg(0)

134 ↗(On Diff #315783)

It would be better to use a generic operation pre-regbank select rather than forcing register classes here

148–149 ↗(On Diff #315783)

Bad shadowing name

151–153 ↗(On Diff #315783)

You can combine this as auto MinBoundaryDst = B.buildConstant(S32, min)

tsymalla marked 9 inline comments as done.Jan 12 2021, 12:39 AM

Previously this was implemented in the PostLegalizer. The SMAX / SMIN MIR gets lowered to a pattern of G_SELECT / G_ICMP, so the state after PostLegalizing is equivalent to what I've implemented in the tests before matching this to the min/max instructions. Disabling the optimization in the implementation here yields a result similar to the following:

%62:_(s1) = G_ICMP intpred(slt), %53:_(s32), %25:_
%63:_(s64) = G_CONSTANT i64 0
%28:_(s64) = G_SELECT %62:_(s1), %63:_, %61:_
%29:_(s64) = G_CONSTANT i64 32767
%37:_(s1) = G_ICMP intpred(slt), %28:_(s64), %29:_
%30:_(s64) = G_SELECT %37:_(s1), %28:_, %29:_
%31:_(s64) = G_CONSTANT i64 -32768
%36:_(s1) = G_ICMP intpred(sgt), %30:_(s64), %31:_
%32:_(s64) = G_SELECT %36:_(s1), %30:_, %31:_
%34:_(s32) = G_TRUNC %32:_(s64)

So this is why I moved the implementation to the PreLegalize step where the G_SMIN / G_SMAX IR still exists:

%30:_(s64) = G_SMIN %28:_, %29:_
%32:_(s64) = G_SMAX %30:_, %31:_
%33:_(s16) = G_TRUNC %32:_(s64)

I don't exactly understand why the min / max instructions get lowered to this pattern so for the moment, the only way to get this producing the code I wish seems to match the pattern in the PreLegalize step. By the way, the goal here is to check explicitly if an long should get clamped to short boundaries (or inbetween). Instead of producing some comparisons, this should yield these two instructions as I've figured out with Nicolai.

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
134 ↗(On Diff #315783)

How would I go for this?

tsymalla updated this revision to Diff 316057.Jan 12 2021, 4:05 AM

Various changes according to the code review. I don't know why the formatting changes in the PostLegalizer keep showing up as they are already reverted to HEAD...

Do we do the equivalent combine in the DAG path? I don't recognize this one

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
71 ↗(On Diff #316057)

Probably shouldn't print here especially when failure is likely. Ideally we would get a message from the generated combiner based on the defined combine name

132 ↗(On Diff #316057)

I'd rather just repeat the VGPR_32RegClass (also shuoldn't use macro-like naming convention)

134 ↗(On Diff #315783)

You can follow the example of the other G_AMDGPU_* pseudos (e.g. G_AMDGPU_CVT_UBYTE0). I'm a bit ambivalent on whether we should do this in this case, as there are still some unanswered questions about how regbankselect should work. However, so far we've been using pseudos like this

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-short-clamp.ll
113 ↗(On Diff #316057)

Fix this whitespace error

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
137–141 ↗(On Diff #316057)

You can replace this block with B.buildInstr(CvtOpcode, {CvtDst}, {Hi32, Lo32}, MI.getFlags());

201 ↗(On Diff #316057)

Why do you need LegalizerInfo? CombinerHelper::isLegalOrBeforeLegalizer relies on LI being nullptr before legalizer.

tsymalla marked 6 inline comments as done.Jan 13 2021, 6:41 AM

No, I doubt there is support for this in the other path.

llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
71 ↗(On Diff #316057)

I removed this message. There aren't any other debug outputs in the other Legalize combines, too.

137–141 ↗(On Diff #316057)

Thanks. I removed this for the other buildInstr, too.

201 ↗(On Diff #316057)

Thanks. This is not needed here, so I removed it.

134 ↗(On Diff #315783)

Thanks. With some help I was able to get this working and learned something new. This should now use the new pseudo and select it into the corresponding machine instruction.

tsymalla updated this revision to Diff 316380.Jan 13 2021, 6:45 AM
tsymalla marked 4 inline comments as done.

Implemented a GMIR instruction for the V_CVT opcode and did some changes to the implementation according to the code reviews.

arsenm added inline comments.Jan 13 2021, 9:34 AM
llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
149 ↗(On Diff #316380)

A pseudo here would also apply for the same reasons

tsymalla added inline comments.Jan 18 2021, 4:03 AM
llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
149 ↗(On Diff #316380)

I will implement this as soon as I'll get access back to my machine. Is there a reason why in some cases pseudos exist, in some cases not?

tsymalla updated this revision to Diff 317348.Jan 18 2021, 7:06 AM

This adds an additional G_MED3_S32 GIR opcode and uses it in the PreLegalizerCombiner.

tsymalla marked an inline comment as done.Jan 18 2021, 7:07 AM

Added the new MED3 pseudo.

arsenm added inline comments.Jan 20 2021, 1:39 PM
llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
146 ↗(On Diff #317348)

Don't need the register classes. You can also hide the createRegister calls with the builder as well

149 ↗(On Diff #316380)

If we need to change how the operation is selected based on the register banks, it definitely should use a regbankselectable pseudo with generic register operands. If there isn't a meaningful handling change depending on the register banks, there's less reason to use a pseudo.

tsymalla marked 2 inline comments as done.Jan 25 2021, 6:20 AM
tsymalla added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
146 ↗(On Diff #317348)

Thanks. I reworked this aspect of the implementation. Should be a bit more clean now.

149 ↗(On Diff #316380)

Thanks. :)

tsymalla updated this revision to Diff 318987.Jan 25 2021, 6:22 AM
tsymalla marked 2 inline comments as done.

Removes the generic createVirtualRegister invocations and the manual setting of the VGPR32 register class.

arsenm added inline comments.Jan 25 2021, 10:04 AM
llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp
142–143 ↗(On Diff #318987)

You can truncate directly into the destination register, you don't need this copy

llvm/lib/Target/AMDGPU/SIInstructions.td
2596 ↗(On Diff #318987)

No _S32, this is also available for s16 on some targets

tsymalla updated this revision to Diff 319250.Jan 26 2021, 2:29 AM
tsymalla marked 2 inline comments as done.

Renamed the G_MED3 opcode and removed a superfluous copy.

arsenm accepted this revision.Jan 26 2021, 3:21 PM
This revision is now accepted and ready to land.Jan 26 2021, 3:21 PM

Hi, it seems, I am not permitted to commit this patch. Could somebody please do the commit for me? Thanks!

Hi, it seems, I am not permitted to commit this patch. Could somebody please do the commit for me? Thanks!

You have to request commit access:
https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

Requested access. Thanks.

I've tried applying this but getting some build errors. You also have some remaining trailing whitespace and extra whitespace changes

I've tried applying this but getting some build errors. You also have some remaining trailing whitespace and extra whitespace changes

I will revert those whitespace changes, thanks.
Where did you apply this? Locally? For me, this worked with a recent LLVM version.

I investigated the problem and it seems, there were some includes missing in PostLegalizerCombiner as there were changes in terms of how the instructions are made available to the TU.
I will fix this and upload a new patch.

tsymalla updated this revision to Diff 320405.Mon, Feb 1, 12:44 AM

Reverted whitespace changes, added missing includes.

arsenm accepted this revision.Mon, Feb 1, 4:11 PM
This revision was automatically updated to reflect the committed changes.
eugenis added a subscriber: eugenis.Tue, Feb 2, 2:59 PM

Hi,

This change, or one of the related change has broken the msan bot.
http://lab.llvm.org:8011/#/builders/5/builds/4134

==91167==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x35d38b6 in AMDGPUPreLegalizerCombinerHelper::matchClampI64ToI16(llvm::MachineInstr&, llvm::MachineRegisterInfo&, llvm::MachineFunction&, AMDGPUPreLegalizerCombinerHelper::ClampI64ToI16MatchInfo&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:92:17
    #1 0x35e7a91 in operator() /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/lib/Target/AMDGPU/AMDGPUGenPreLegalizeGICombiner.inc:1227:38
    #2 0x35e7a91 in (anonymous namespace)::AMDGPUGenPreLegalizerCombinerHelper::tryCombineAll(llvm::GISelChangeObserver&, llvm::MachineInstr&, llvm::MachineIRBuilder&, llvm::CombinerHelper&) const /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/lib/Target/AMDGPU/AMDGPUGenPreLegalizeGICombiner.inc:1226:14
    #3 0x35d6aed in (anonymous namespace)::AMDGPUPreLegalizerCombinerInfo::combine(llvm::GISelChangeObserver&, llvm::MachineInstr&, llvm::MachineIRBuilder&) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:197:17
    #4 0xae8a3c4 in llvm::Combiner::combineMachineInstrs(llvm::MachineFunction&, llvm::GISelCSEInfo*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/GlobalISel/Combiner.cpp:150:24
    #5 0x35d629f in (anonymous namespace)::AMDGPUPreLegalizerCombiner::runOnMachineFunction(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:268:12
    #6 0x71117ca in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:72:13
    #7 0x7f07777 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1435:27
    #8 0x63360b8 in RunPassOnSCC /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:178:25
    #9 0x63360b8 in RunAllPassesOnSCC /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:476:9
    #10 0x63360b8 in (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:541:18
    #11 0x7f08ee4 in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1550:27
    #12 0x7f08ee4 in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:541:44
    #13 0x27ee02a in compileModule(char**, llvm::LLVMContext&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:684:8
    #14 0x27e7c41 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:385:22
    #15 0x7f1bd530d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #16 0x2767739 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc+0x2767739)

Also, there is something very wrong with the diff here. It just shows a bunch of merge conflict markers. Please make sure that your commits include the relevant headers like "Differential Revision", etc - it's very hard to track things otherwise.

arsenm added a comment.Tue, Feb 2, 4:02 PM

Hi,

This change, or one of the related change has broken the msan bot.
http://lab.llvm.org:8011/#/builders/5/builds/4134

==91167==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x35d38b6 in AMDGPUPreLegalizerCombinerHelper::matchClampI64ToI16(llvm::MachineInstr&, llvm::MachineRegisterInfo&, llvm::MachineFunction&, AMDGPUPreLegalizerCombinerHelper::ClampI64ToI16MatchInfo&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:92:17
    #1 0x35e7a91 in operator() /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/lib/Target/AMDGPU/AMDGPUGenPreLegalizeGICombiner.inc:1227:38
    #2 0x35e7a91 in (anonymous namespace)::AMDGPUGenPreLegalizerCombinerHelper::tryCombineAll(llvm::GISelChangeObserver&, llvm::MachineInstr&, llvm::MachineIRBuilder&, llvm::CombinerHelper&) const /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/lib/Target/AMDGPU/AMDGPUGenPreLegalizeGICombiner.inc:1226:14
    #3 0x35d6aed in (anonymous namespace)::AMDGPUPreLegalizerCombinerInfo::combine(llvm::GISelChangeObserver&, llvm::MachineInstr&, llvm::MachineIRBuilder&) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:197:17
    #4 0xae8a3c4 in llvm::Combiner::combineMachineInstrs(llvm::MachineFunction&, llvm::GISelCSEInfo*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/GlobalISel/Combiner.cpp:150:24
    #5 0x35d629f in (anonymous namespace)::AMDGPUPreLegalizerCombiner::runOnMachineFunction(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPreLegalizerCombiner.cpp:268:12
    #6 0x71117ca in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:72:13
    #7 0x7f07777 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1435:27
    #8 0x63360b8 in RunPassOnSCC /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:178:25
    #9 0x63360b8 in RunAllPassesOnSCC /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:476:9
    #10 0x63360b8 in (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:541:18
    #11 0x7f08ee4 in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1550:27
    #12 0x7f08ee4 in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:541:44
    #13 0x27ee02a in compileModule(char**, llvm::LLVMContext&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:684:8
    #14 0x27e7c41 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:385:22
    #15 0x7f1bd530d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #16 0x2767739 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc+0x2767739)

Also, there is something very wrong with the diff here. It just shows a bunch of merge conflict markers. Please make sure that your commits include the relevant headers like "Differential Revision", etc - it's very hard to track things otherwise.

I think that's what D95878 is for

It may take some time until @tsymalla can look at the fix in D95878 again, so I reverted this for the time being.