Artifact combiner is not able to access individual elements after using
LCMTy style merge/unmerge, extract and insert to change vector number of
elements (pad with undef or split to sub-vector instructions).
Use unmerge to individual elements instead and then merge elements into
requested types.
Change argument lowering for vectors and moreElementsVector to use
buildPadVectorWithUndefElements and buildDeleteTrailingVectorElements.
FewerElementsVector had a few helpers that had different behaviour,
introduce new helper for most of the opcodes.
FewerElementsVector helper is more flexible since it can create leftover
instruction smaller then requested type (useful in case target wants to
avoid pad with undef and use fewer registers). If target does not want
leftover of different type it should call more elements first.
Some helpers were performing more elements first to have split without
leftover. Opcodes that used this helper use clampMaxNumElementsStrict
(does more elements first) in LegalizerInfo to avoid test changes.
Fixes failures caused by failing to combine artifacts created during
more/fewer elements vector.
Details
Diff Detail
Event Timeline
Thanks, this mostly looks less ugly than before
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
994 | Why this change? I thought this previously handled arbitrary cases, and this doesn't? | |
3633–3639 | I think the prevailing style is to use static functions instead of anonymous namespaces | |
3651 | Not sure what the point of this check is | |
3826 | Might as well hardcode G_PHI | |
6593 | Use the default small size? | |
6870 | I think it's more appropriate to lower the vector operation in terms of G_EXTRACT_VECTOR_ELT and not direct to the register extracts, G_SHUFFLE_VECTOR is not a legalization artifact | |
llvm/lib/CodeGen/GlobalISel/MachineIRBuilder.cpp | ||
231 | 4 seems too small, use the default or 8? | |
253 | Ditto | |
llvm/lib/CodeGen/GlobalISel/Utils.cpp | ||
938 | I'm not sure what to do about scalable vectors, but no legalization has been added for those as far as I know | |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
2248–2251 | Can you split this into a separate patch? I know I have one that does the same thing somewhere | |
2282 | Can you also split this, as it's not dependent on the larger legalizer changes or new builder functions | |
2284 | 2 seems too small, most vectors are probably 4 or 8 | |
4696–4712 | This is a lot longer, is there a simpler way? |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
994 | reduceOperationWidth did number of elements splits and narrow scalar for freeze, here I extracted narrow scalar only. | |
3651 | This is for load/store with vector pointer address. the whole function is annoying check that we don't try to split operands of non-intended Opcode. | |
6870 | Sure, we do unmerge via custom legalize of G_EXTRACT_VECTOR_ELT | |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
4696–4712 | I don't see it. Whole function is complicated, handles many cases, and packs all output operands into ResultRegs. And we no longer pad with undef in all cases. |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
3658 | This line does not seem right? (flagged by coverity by the way) |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
3658 | Thanks, that line was pasted there by mistake. |
Why this change? I thought this previously handled arbitrary cases, and this doesn't?