This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Combine (build_vector_trunc x, undef) -> (bitcast x)
AcceptedPublic

Authored by foad on Oct 19 2021, 4:34 AM.

Details

Summary

This mostly benefits AMDGPU where arguments to IMAGE instructions are
sometimes packed into the two 16-bit halves of a 32-bit register.

Diff Detail

Event Timeline

foad created this revision.Oct 19 2021, 4:34 AM
foad requested review of this revision.Oct 19 2021, 4:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2021, 4:34 AM
foad added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-build-vector-trunc.v2s16.mir
157 ↗(On Diff #380646)

I could just remove all these "undef" test cases, since there is no longer any code in AMDGPUInstructionSelector that handles IMPLICIT_DEF specially.

Should add some MIR tests specifically for this in case we decide to stop using G_BUILD_VECTOR_TRUNC someday

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
630–637 ↗(On Diff #380646)

I think deleting this should be split into a separate patch

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-build-vector-trunc.v2s16.mir
157 ↗(On Diff #380646)

Yes

Ravi added a comment.Oct 25 2021, 4:58 AM

Good to see the unnecessary 'shifts' and 'movs' removed. LGTM.

foad updated this revision to Diff 388172.Nov 18 2021, 6:00 AM

Rebase. Add MIR tests.

foad updated this revision to Diff 388174.Nov 18 2021, 6:05 AM

Rebase.

foad marked 2 inline comments as done.Nov 19 2021, 3:25 AM
Ravi accepted this revision.Dec 13 2021, 3:30 AM

LGTM

This revision is now accepted and ready to land.Dec 13 2021, 3:30 AM
foad updated this revision to Diff 393866.Dec 13 2021, 6:02 AM

Rebase.

arsenm added inline comments.Dec 13 2021, 6:13 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/clamp-minmax-const-combine.ll
62–67 ↗(On Diff #393866)

What happened here?

foad added inline comments.Dec 13 2021, 6:16 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/clamp-minmax-const-combine.ll
62 ↗(On Diff #393866)

Regression here for the same reasons as in combine-fcanonicalize.mir.

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fcanonicalize.mir
256 ↗(On Diff #393866)

Regression here because combining G_BUILD_VECTOR_TRUNC %zero_s32(s32), %undef(s32) into a G_BITCAST breaks detection of a constant splat with some elements undef.

Any thoughts on how to fix this? Perhaps it was not a good idea to do this as a combine after all, despite the comment in AMDGPUInstructionSelector::selectG_BUILD_VECTOR_TRUNC?

arsenm added inline comments.Dec 13 2021, 6:52 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fcanonicalize.mir
256 ↗(On Diff #393866)

I think the globalisel version of isCanonicalized needs to be implemented for G_BITCAST

arsenm added inline comments.Dec 13 2021, 6:54 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fcanonicalize.mir
256 ↗(On Diff #393866)

Or everything really. It's pretty thin right now

foad added inline comments.Dec 14 2021, 6:48 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fcanonicalize.mir
256 ↗(On Diff #393866)

I'm not sure having isCanonicalized look through a bitcast from s32 to <2 x s16> is a great idea. And it wouldn't help with the regression in clamp-minmax-const-combine.ll which is to do with recognizing vector splat constants where some elements may be undef.

I see the DAG version of isCanonicalized looks through ISD::BITCAST, but I don't see how that can be safe when it loses the information about whether you were looking at (say) an f32 or a vector of f16s.

Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2022, 8:05 AM
Herald added a subscriber: kosarev. · View Herald Transcript