Page MenuHomePhabricator

AMDGPU/GlobalISel: Add build_vector_trunc post-legalizer combine
Changes PlannedPublic

Authored by Petar.Avramovic on Sep 3 2021, 9:24 AM.



Fold two element build_vector_trunc with implicit_def in element 1 into
bitcast of element 0. Result must have same size in bits as the elements.

Diff Detail

Unit TestsFailed

1,960 msx64 debian > LLVM.CodeGen/AMDGPU::llvm.amdgcn.image.sample.g16.a16.dim.ll
Script: -- : 'RUN: at line 2'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.a16.dim.ll | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck -check-prefixes=GFX10 /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.a16.dim.ll
60,130 msx64 debian > MLIR.Examples/standalone::test.toy
Script: -- : 'RUN: at line 1'; /usr/bin/cmake /var/lib/buildkite-agent/builds/llvm-project/mlir/examples/standalone -G "Ninja" -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang -DLLVM_ENABLE_LIBCXX=OFF -DMLIR_DIR=/var/lib/buildkite-agent/builds/llvm-project/build/lib/cmake/mlir -DLLVM_USE_LINKER=lld -DPython3_EXECUTABLE="/usr/bin/python3.9"

Event Timeline

Petar.Avramovic created this revision.Sep 3 2021, 9:24 AM
Petar.Avramovic requested review of this revision.Sep 3 2021, 9:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 3 2021, 9:24 AM

Using merge/unmerge for more elements together with build_vector_combines in this series of patches gives optimal code for add <N x s16> with various vector sizes as far as I can tell (look at tests in add.vNi16.ll.mir).

arsenm added inline comments.Sep 7 2021, 5:17 PM
1198 ↗(On Diff #370605)

Typo Secold

111 ↗(On Diff #370605)

I'm not sure this is really an artifact. You never strictly need it to link 2 operations together, although you can use it to legalize G_BUILD_VECTOR

Petar.Avramovic added inline comments.Sep 8 2021, 4:02 AM
111 ↗(On Diff #370605)

There are already a few artifact combines with G_IMPLICIT_DEF, also build_vector_trunc sounds similar to build_vector so I thought it would be best to deal with it as soon as possible. This way when dealing with <vN x s16> we pretty much get what will be inst-selected during legalizer. G_BUILD_VECTOR_TRUNC will be lowered using bit shifts in regbankselect if it does not get combined before that point.

Maybe this belongs in a combiner pass (amdgpu-postlegalizer)?

arsenm added inline comments.Sep 8 2021, 12:24 PM
111 ↗(On Diff #370605)

Yes, this looks just like an optimization to me. I'm not seeing how this improves legality

Petar.Avramovic retitled this revision from [GlobalISel] Build_vector_trunc artifact combine into bitcast to AMDGPU/GlobalISel: Add build_vector_trunc post-legalizer combine.
Petar.Avramovic edited the summary of this revision. (Show Details)

Move to amdgpu post-legalizer combine.

arsenm added inline comments.Jan 16 2022, 5:48 PM

getDefIgnoringCopies cannot fail, so no point in the null check. Also you can fold this into return of the bool expression

I don't think this is the best place to perform this combine since there are some regressions in regbank-combiner.
Also I expect that it will be covered by D116441 during inst-selection.

ormris removed a subscriber: ormris.Jan 18 2022, 9:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2022, 8:07 AM
Herald added a subscriber: kosarev. · View Herald Transcript

We're currently leaning towards not using legal G_BUILD_VECTOR_TRUNC anymore but this could still be a generic combine


Should assert on the opcode


Still don't need this null check

Petar.Avramovic planned changes to this revision.Sep 30 2022, 2:49 AM

As mentioned before there are some regressions here. I think that it would be best to:
not lower vgpr v2s16 build_vector_trunc in regbank select, and deal with all cases in instruction select
second element is undef -> first elemenent (this patch)
default case -> bit shift packing like in regbankselect