This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Add build_vector_trunc post-legalizer combine
Changes PlannedPublic

Authored by Petar.Avramovic on Sep 3 2021, 9:24 AM.

Details

Summary

Fold two element build_vector_trunc with implicit_def in element 1 into
bitcast of element 0. Result must have same size in bits as the elements.

Diff Detail

Event Timeline

Petar.Avramovic created this revision.Sep 3 2021, 9:24 AM
Petar.Avramovic requested review of this revision.Sep 3 2021, 9:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 3 2021, 9:24 AM

Using merge/unmerge for more elements together with build_vector_combines in this series of patches gives optimal code for add <N x s16> with various vector sizes as far as I can tell (look at tests in add.vNi16.ll.mir).

arsenm added inline comments.Sep 7 2021, 5:17 PM
llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
1198 ↗(On Diff #370605)

Typo Secold

llvm/lib/CodeGen/GlobalISel/Legalizer.cpp
111 ↗(On Diff #370605)

I'm not sure this is really an artifact. You never strictly need it to link 2 operations together, although you can use it to legalize G_BUILD_VECTOR

Petar.Avramovic added inline comments.Sep 8 2021, 4:02 AM
llvm/lib/CodeGen/GlobalISel/Legalizer.cpp
111 ↗(On Diff #370605)

There are already a few artifact combines with G_IMPLICIT_DEF, also build_vector_trunc sounds similar to build_vector so I thought it would be best to deal with it as soon as possible. This way when dealing with <vN x s16> we pretty much get what will be inst-selected during legalizer. G_BUILD_VECTOR_TRUNC will be lowered using bit shifts in regbankselect if it does not get combined before that point.

Maybe this belongs in a combiner pass (amdgpu-postlegalizer)?

arsenm added inline comments.Sep 8 2021, 12:24 PM
llvm/lib/CodeGen/GlobalISel/Legalizer.cpp
111 ↗(On Diff #370605)

Yes, this looks just like an optimization to me. I'm not seeing how this improves legality

Petar.Avramovic retitled this revision from [GlobalISel] Build_vector_trunc artifact combine into bitcast to AMDGPU/GlobalISel: Add build_vector_trunc post-legalizer combine.
Petar.Avramovic edited the summary of this revision. (Show Details)

Move to amdgpu post-legalizer combine.

arsenm added inline comments.Jan 16 2022, 5:48 PM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
328

getDefIgnoringCopies cannot fail, so no point in the null check. Also you can fold this into return of the bool expression

I don't think this is the best place to perform this combine since there are some regressions in regbank-combiner.
Also I expect that it will be covered by D116441 during inst-selection.

ormris removed a subscriber: ormris.Jan 18 2022, 9:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2022, 8:07 AM
Herald added a subscriber: kosarev. · View Herald Transcript

We're currently leaning towards not using legal G_BUILD_VECTOR_TRUNC anymore but this could still be a generic combine

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
317

Should assert on the opcode

328

Still don't need this null check

Petar.Avramovic planned changes to this revision.Sep 30 2022, 2:49 AM

As mentioned before there are some regressions here. I think that it would be best to:
not lower vgpr v2s16 build_vector_trunc in regbank select, and deal with all cases in instruction select
second element is undef -> first elemenent (this patch)
default case -> bit shift packing like in regbankselect