This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][2/4]Regard (shl val, N) as a potential bit-field-positioning op regardless of the number of uses.
AbandonedPublic

Authored by mingmingl on Oct 12 2022, 7:52 PM.

Details

Summary

Before this patch (and the refactor D135843)

  • isBitfieldPositioningOp requires 'SHL' node to have one use for non bigger pattern (code link)

After this patch

  • A DAG node of (shl val, N) doesn't have the one use requirement.

The rationale is that, 'val' could be used as bit extraction source as long as N (the left shift amount) fits BiggerPattern requirement (that no extra shift node are created around this line). This would at least reduces one use of SHL if BFI instruction is used.

One existing test case is improved without regressing others. And there is no correctness issues, since BiggerPattern doesn't look at the number of uses before this patch)

Diff Detail

Event Timeline

mingmingl created this revision.Oct 12 2022, 7:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2022, 7:52 PM
mingmingl requested review of this revision.Oct 12 2022, 7:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2022, 7:52 PM
mingmingl retitled this revision from [AArch64]Regard (shl val, N) as a potential bit-field-positioning op regardless of the number of uses; the rationale is that, 'val' could be used as bit extraction source as long as N (the left shift amount) fits BiggerPattern requirement to [AArch64]Regard (shl val, N) as a potential bit-field-positioning op regardless of the number of uses..Oct 13 2022, 12:10 AM
mingmingl edited the summary of this revision. (Show Details)
mingmingl added a reviewer: dmgreen.
mingmingl retitled this revision from [AArch64]Regard (shl val, N) as a potential bit-field-positioning op regardless of the number of uses. to [AArch64][2/4]Regard (shl val, N) as a potential bit-field-positioning op regardless of the number of uses..

Do you have a better test? One that doesn't get so heavily optimized by opt.

mingmingl planned changes to this revision.Oct 18 2022, 11:08 AM

Do you have a better test? One that doesn't get so heavily optimized by opt.

Good question. When constructing test case, turns out changing shl (val, N) to UBFIZ could be counter-productive (regardless of number of uses), since shl (val, N) itself might be folded into aarch64 operand2.

Take @test_nouseful_bits as an example, bfxil w9, w0, #0, #8 is not better than orr w9, w0, w8, #8 (with w8 = and w0, 0xff) (higher throughput, shorter latency).

Together with the other motivating test case (https://godbolt.org/z/h96b1sGco for D135102) , planning to make changes when orr with a left shift is better than bfi.

mingmingl abandoned this revision.Nov 8 2022, 10:54 PM

For the affected test case test_nonuseful_bits, one BFM and one ORR is generated now (https://godbolt.org/z/a3c68f7dE) with this commit. Going to abandon this patch (rather than rebase it for other BFI improvements). Thanks for the discussions around BFM/ORR.