This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Code quality: don't expand G_BUILD_VECTOR_TRUNC if not neccessary
AcceptedPublic

Authored by matejam on Jan 27 2022, 4:55 AM.

Details

Summary

Instead of transforming G_BUILD_VECTOR_TRUNC into G_OR + G_AND + G_SHL, transform it into G_BITCAST or just replace the registers,
if the operand of build_vector_trunc is undef or a direct product of the other operand.

Diff Detail

Event Timeline

matejam created this revision.Jan 27 2022, 4:55 AM
matejam requested review of this revision.Jan 27 2022, 4:55 AM
matejam updated this revision to Diff 403606.Jan 27 2022, 4:59 AM

Minor change in the patch format.

foad added a comment.Jan 27 2022, 5:03 AM

Can you add a combine-or-and-shl.mir to show exactly what transforms this is doing?

foad added a comment.Jan 27 2022, 5:09 AM

I think it would be better to do this by using a selection of existing combines if possible. For example binop_left_undef_to_zero + undef_to_int_zero + right_identity_zero should do most of this.

matejam updated this revision to Diff 403615.Jan 27 2022, 5:17 AM

Added .mir test.

arsenm added inline comments.Jan 27 2022, 7:04 AM
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
334–335 ↗(On Diff #403615)

This combine feels way too targeted. We shouldn't have to have combines that look for undefs in the operands of other things. Those operations on undef should have folded out on their own

345–346 ↗(On Diff #403615)

Shouldn't hardcode these specific values, should compute based on the type bitwidth

351 ↗(On Diff #403615)

No else after return

355 ↗(On Diff #403615)

No else after return

362 ↗(On Diff #403615)

Don't need this check

matejam updated this revision to Diff 405924.Feb 4 2022, 4:38 AM
matejam retitled this revision from [AMDGPU][GlobalISel] Code quality: remove unnecessary or, and, shift instructions to [AMDGPU][GlobalISel] Code quality: don't expand G_BUILD_VECTOR_TRUNC if not neccessary.
matejam edited the summary of this revision. (Show Details)

Instead of using a combiner in the legalizer, change the implementation of apply mapping for G_BUILD_VECTOR_TRUNC in regbankselect.

matejam updated this revision to Diff 405948.Feb 4 2022, 6:43 AM

Narrow capture lists in lambdas.

arsenm added inline comments.Feb 4 2022, 7:22 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
2612

I think we should have combined out any implicit def inputs into something else. Trying to optimize as part of a lowering expansion is generally a last resort strategy

matejam updated this revision to Diff 410553.Feb 22 2022, 8:57 AM
matejam edited the summary of this revision. (Show Details)

Instead of a combiner in regbankselect pass, add a combiner in amdgpu-postlegalizer-combiner pass.

matejam updated this revision to Diff 410556.Feb 22 2022, 9:00 AM

Code formatting.

arsenm added inline comments.Feb 22 2022, 2:09 PM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
310–312 ↗(On Diff #410556)

What's the point of this change? I don't think this is really correct since a bitcast from <2 x s16> to s32 won't have a consistent canonicalized interpretation

356 ↗(On Diff #410556)

No else after return

405 ↗(On Diff #410556)

Missing return value

matejam added inline comments.Feb 24 2022, 3:12 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
310–312 ↗(On Diff #410556)

You're right, I'll remove that.

356 ↗(On Diff #410556)

Thanks!

405 ↗(On Diff #410556)

Most of the apply functions are void and don't have the return statement.

matejam updated this revision to Diff 411087.Feb 24 2022, 5:06 AM

Minor changes.

arsenm added inline comments.Feb 24 2022, 7:47 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
322 ↗(On Diff #411087)

Why use getDefIgnoringCopies instead of directly using mi_match?

386–396 ↗(On Diff #411087)

Special casing the use doesn't feel right. If the type doesn't match, insert a new cast and rely on bitcast folding?

arsenm accepted this revision.Feb 24 2022, 7:47 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
386–396 ↗(On Diff #411087)

I guess you did track that from before, so it doesn't really matter

This revision is now accepted and ready to land.Feb 24 2022, 7:47 AM
matejam added inline comments.Feb 24 2022, 8:21 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
322 ↗(On Diff #411087)

In case we have a copy instruction as a second operand of G_LSHR. mi_match would return false, this way we don't have to worry about that.

arsenm added inline comments.Feb 24 2022, 8:24 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
322 ↗(On Diff #411087)

Extra copies should be separately folded out by copy combines

matejam marked an inline comment as not done.Feb 24 2022, 8:25 AM
matejam added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
322 ↗(On Diff #411087)

No, you were right, no need for getDefIgnoringCopies.

matejam updated this revision to Diff 411141.Feb 24 2022, 8:31 AM

Typos in tests and a few minor changes.

arsenm accepted this revision.Feb 24 2022, 8:35 AM
foad added inline comments.Feb 24 2022, 9:07 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
334 ↗(On Diff #411141)

It's a bit strange to define a lambda that is only used once.

344–347 ↗(On Diff #411141)

What is this part for? It only seems to affect the specially constructed test cases in combine-or-and-shl.mir. Does it ever help with real code?

matejam added inline comments.Feb 25 2022, 9:33 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
334 ↗(On Diff #411141)

Thanks, I'll remove the lambda.

344–347 ↗(On Diff #411141)

You're right, I'll remove that.

matejam updated this revision to Diff 411445.Feb 25 2022, 9:57 AM

Remove a lambda that is used only once and a part of the code that covers only a specially constructed test.

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 1:36 PM
Herald added a subscriber: kosarev. · View Herald Transcript