Fold two element build_vector_trunc with implicit_def in element 1 into
bitcast of element 0. Result must have same size in bits as the elements.
Details
- Reviewers
foad arsenm mbrkusanin aemerson paquette
Diff Detail
Unit Tests
Event Timeline
Using merge/unmerge for more elements together with build_vector_combines in this series of patches gives optimal code for add <N x s16> with various vector sizes as far as I can tell (look at tests in add.vNi16.ll.mir).
llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h | ||
---|---|---|
1198 ↗ | (On Diff #370605) | Typo Secold |
llvm/lib/CodeGen/GlobalISel/Legalizer.cpp | ||
111 ↗ | (On Diff #370605) | I'm not sure this is really an artifact. You never strictly need it to link 2 operations together, although you can use it to legalize G_BUILD_VECTOR |
llvm/lib/CodeGen/GlobalISel/Legalizer.cpp | ||
---|---|---|
111 ↗ | (On Diff #370605) | There are already a few artifact combines with G_IMPLICIT_DEF, also build_vector_trunc sounds similar to build_vector so I thought it would be best to deal with it as soon as possible. This way when dealing with <vN x s16> we pretty much get what will be inst-selected during legalizer. G_BUILD_VECTOR_TRUNC will be lowered using bit shifts in regbankselect if it does not get combined before that point. Maybe this belongs in a combiner pass (amdgpu-postlegalizer)? |
llvm/lib/CodeGen/GlobalISel/Legalizer.cpp | ||
---|---|---|
111 ↗ | (On Diff #370605) | Yes, this looks just like an optimization to me. I'm not seeing how this improves legality |
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp | ||
---|---|---|
328 | getDefIgnoringCopies cannot fail, so no point in the null check. Also you can fold this into return of the bool expression |
I don't think this is the best place to perform this combine since there are some regressions in regbank-combiner.
Also I expect that it will be covered by D116441 during inst-selection.
As mentioned before there are some regressions here. I think that it would be best to:
not lower vgpr v2s16 build_vector_trunc in regbank select, and deal with all cases in instruction select
second element is undef -> first elemenent (this patch)
default case -> bit shift packing like in regbankselect
Should assert on the opcode