RegBankSelect creates zext and trunc when it selects banks for uniform i1.
Add zext_trunc_fold from generic combiner to post RegBankSelect combiner.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This should also apply to conditional branches, but the user doesn't actually matter. This is really a generic combine for (trunc (bool_ext_type bool_producer))
llvm/lib/Target/AMDGPU/AMDGPUCombine.td | ||
---|---|---|
44 | The select part here isn't essential, but the matcher here wants a specific opcode. I guess you could bypass the generated combine matcher and just call this combine in a switch, or at least add the relevant G_BRCOND user | |
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
90 ↗ | (On Diff #319258) | Changing the instruction without notifying the observer |
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-uniform-icmp-select.mir | ||
37–62 ↗ | (On Diff #319258) | Most of these instructions aren't relevant to the combine. You can also directly emit copies from 64-bit SGPRs even though we emit them normally as separate 32-bit copies |
116–142 ↗ | (On Diff #319258) | Ditto |
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
---|---|---|
86 ↗ | (On Diff #319258) | Can you call the variable ICmp instead of ICMP? |
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
---|---|---|
76–77 ↗ | (On Diff #320478) | I don't think this really needs a one use check. Consider the case where another use already exists without the intermediate casts: %x:s32 = G_ICMP |
I don't understand why this needs to be AMDGPU-specific, and why it only works for uniform values, and why it only works inside G_SELECT and G_BRCOND.
Can't you have a generic combine that simplifies (zext (trunc x)) -> x if the types match and the high bits of x are known to be zero?
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
---|---|---|
76–77 ↗ | (On Diff #320478) | I wanted to keep this as simple as possible and cover most common case (single use). %x:s32 = G_ICMP %y:s1 = G_TRUNC %x ... %z:s32 = G_ZEXT %y %select1 = G_SELECT %z ... %w:s32 = G_ZEXT %y %select0 = G_SELECT %w Maybe we could cover this case for select, are there more? ... %z:s32 = G_ZEXT %y //dead %x:s32 = G_ICMP %select1 = G_SELECT %x %y:s1 = G_TRUNC %x ... %w:s32 = G_ZEXT %y %select0 = G_SELECT %w |
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-move-uniform-icmp-with-one-use.mir | ||
34–35 ↗ | (On Diff #320478) | Combine targets specific case when there are instructions between uniform G_ICMP and G_SELECT/G_BRCOND. Zext and trunc are there but combining them has no effect on having to temporary save scc. We have to move icmp. Also since Zext and trunc have no effect on selected instruction we don't move them and just leave them to be removed as dead instructions. |
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
---|---|---|
76–77 ↗ | (On Diff #320478) | If the builder is CSEing, you could end up with the same trunc used multiple times |
Handle some cases with many uses. Adding icmp fold without move for the case when we can't move icmp because code looks nicer in the case with more than one use.
llvm/test/CodeGen/AMDGPU/GlobalISel/move-uniform-icmp.ll | ||
---|---|---|
24–26 ↗ | (On Diff #320792) | This looks pretty much same as before without uniform_icmp combine. |
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
---|---|---|
123–126 ↗ | (On Diff #320792) | I don't think this should be trying to find defs and move them. If we're CSEing, just creating the instruction you need would get the desired result |
Use zext_trunc_fold from generic combiner to separately fold all cases of zext(trunc x) -> x made by regbankselect.
icmp move before select/brcond has to be aware of current state of MF since we run combines top-down and instructions (trunc) can be left without uses (zext was deleted by zext_trunc_fold)
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
---|---|---|
72 ↗ | (On Diff #322084) | Opcode check first? |
92 ↗ | (On Diff #322084) | You shouldn't need to check dominance (and I don't see how this would ever not be the case) |
103 ↗ | (On Diff #322084) | This isn't changing the operands anymore? (I also think just creating the new instruction with the new operand is cleaner than modifying in place, doing it that way should fix the multiple use case too) |
103–107 ↗ | (On Diff #322084) | This really shouldn't be trying to move instructions. Do you even really need to erase the trunc? If it's dead it will be removed already |
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
---|---|---|
92 ↗ | (On Diff #322084) | Trunc could have a use before the MI so we cant move icmp and trunc past that use. |
103 ↗ | (On Diff #322084) | zext_trunc_fold changes operands. This now only moves icmp (and trunc for multiple use). |
103–107 ↗ | (On Diff #322084) |
ICmp move breaks ssa in mir, trunc uses icmp before it is defined. I meant to delete trunc because this is the place we broke ssa, and we are aware of it. Leaving it to be deleted by something that eliminates dead instructions should work fine (I don't expect anything else to check where uses of this trunc are defined).
What do you suggest, making new icmp (and trunc) or something else? |
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-move-uniform-icmp.mir | ||
203 ↗ | (On Diff #322084) | Trunc use before select. |
213 ↗ | (On Diff #322084) | After zext_trunc fold this select uses %17 G_ICMP instead of %22 G_ZEXT but we can't move icmp because %11:sgpr(s1) = G_TRUNC %17(s32) uses icmp and %18:sgpr(s32) = G_ANYEXT %11(s1) above uses trunc. |
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
---|---|---|
103–107 ↗ | (On Diff #322084) | Yes, if you recreate the desired instruction it should automatically CSE as you need by the builder |
Actually, why is this patch necessary? The ZEXT+TRUNC handling takes care of this already?
Oh right, the problem here is actually the intermediate SCC copies produced as a selection artifact.
I'm not sure treating this as a combine is the correct way to go about this. The DAG handles this with a scheduler to minimize physical register liveranges. We might be better treating this off as a scheduling issue for after selection, when we directly see the SCC defs.
For the case I looked at (test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.end.cf.i64.ll) just running zext_trunc_fold as a post-regbankselect combine was enough to get rid of the SCC copies. So perhaps we should commit that first, and then worry about how to handle the remaining cases?
Dropping icmp move for from this patch. Leaving zext_trunc_fold.
Zext is selected into AND with 1. zext_trunc_fold results in getting rid of the SCC copies when zext was the only instruction between icmp and select/branch.
Looks good. I'm not sure we actually need to introduce the helper state class in this patch, do we? But I'll guess we'll need it later.
You don't need AMDGPURegBankCombinerHelper::UniformICmpSelectMatchInfo at all. You can just use Register. See D95645 for an example.