Instead of transforming G_BUILD_VECTOR_TRUNC into G_OR + G_AND + G_SHL, transform it into G_BITCAST or just replace the registers,
if the operand of build_vector_trunc is undef or a direct product of the other operand.
Details
- Reviewers
foad arsenm mbrkusanin Pierre-vh
Diff Detail
Event Timeline
I think it would be better to do this by using a selection of existing combines if possible. For example binop_left_undef_to_zero + undef_to_int_zero + right_identity_zero should do most of this.
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
---|---|---|
334–335 ↗ | (On Diff #403615) | This combine feels way too targeted. We shouldn't have to have combines that look for undefs in the operands of other things. Those operations on undef should have folded out on their own |
345–346 ↗ | (On Diff #403615) | Shouldn't hardcode these specific values, should compute based on the type bitwidth |
351 ↗ | (On Diff #403615) | No else after return |
355 ↗ | (On Diff #403615) | No else after return |
362 ↗ | (On Diff #403615) | Don't need this check |
Instead of using a combiner in the legalizer, change the implementation of apply mapping for G_BUILD_VECTOR_TRUNC in regbankselect.
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
2612 | I think we should have combined out any implicit def inputs into something else. Trying to optimize as part of a lowering expansion is generally a last resort strategy |
Instead of a combiner in regbankselect pass, add a combiner in amdgpu-postlegalizer-combiner pass.
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp | ||
---|---|---|
310–312 ↗ | (On Diff #410556) | What's the point of this change? I don't think this is really correct since a bitcast from <2 x s16> to s32 won't have a consistent canonicalized interpretation |
356 ↗ | (On Diff #410556) | No else after return |
405 ↗ | (On Diff #410556) | Missing return value |
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp | ||
---|---|---|
386–396 ↗ | (On Diff #411087) | I guess you did track that from before, so it doesn't really matter |
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp | ||
---|---|---|
322 ↗ | (On Diff #411087) | In case we have a copy instruction as a second operand of G_LSHR. mi_match would return false, this way we don't have to worry about that. |
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp | ||
---|---|---|
322 ↗ | (On Diff #411087) | Extra copies should be separately folded out by copy combines |
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp | ||
---|---|---|
322 ↗ | (On Diff #411087) | No, you were right, no need for getDefIgnoringCopies. |
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp | ||
---|---|---|
334 ↗ | (On Diff #411141) | It's a bit strange to define a lambda that is only used once. |
344–347 ↗ | (On Diff #411141) | What is this part for? It only seems to affect the specially constructed test cases in combine-or-and-shl.mir. Does it ever help with real code? |
Remove a lambda that is used only once and a part of the code that covers only a specially constructed test.
I think we should have combined out any implicit def inputs into something else. Trying to optimize as part of a lowering expansion is generally a last resort strategy