This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelDAGToDAG.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
bitfield-insert.ll

Differential D135844

[AArch64][2/4]Regard (shl val, N) as a potential bit-field-positioning op regardless of the number of uses.
AbandonedPublic

Authored by mingmingl on Oct 12 2022, 7:52 PM.

Download Raw Diff

Details

Reviewers

dmgreen
efriedma
t.p.northover

Summary

Before this patch (and the refactor D135843)

isBitfieldPositioningOp requires 'SHL' node to have one use for non bigger pattern (code link)

After this patch

A DAG node of (shl val, N) doesn't have the one use requirement.

The rationale is that, 'val' could be used as bit extraction source as long as N (the left shift amount) fits BiggerPattern requirement (that no extra shift node are created around this line). This would at least reduces one use of SHL if BFI instruction is used.

One existing test case is improved without regressing others. And there is no correctness issues, since BiggerPattern doesn't look at the number of uses before this patch)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mingmingl created this revision.Oct 12 2022, 7:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2022, 7:52 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

mingmingl requested review of this revision.Oct 12 2022, 7:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2022, 7:52 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B191881: Diff 467347.Oct 12 2022, 7:53 PM

mingmingl mentioned this in D135843: [AArch64][NFC][1/4]Refactor 'isBitfieldPositioningOp' so that DAG nodes with different Opcode are handled with separate helper functions.Oct 13 2022, 12:00 AM

mingmingl retitled this revision from [AArch64]Regard (shl val, N) as a potential bit-field-positioning op regardless of the number of uses; the rationale is that, 'val' could be used as bit extraction source as long as N (the left shift amount) fits BiggerPattern requirement to [AArch64]Regard (shl val, N) as a potential bit-field-positioning op regardless of the number of uses..Oct 13 2022, 12:10 AM

mingmingl edited the summary of this revision. (Show Details)

mingmingl added a reviewer: dmgreen.

mingmingl retitled this revision from [AArch64]Regard (shl val, N) as a potential bit-field-positioning op regardless of the number of uses. to [AArch64][2/4]Regard (shl val, N) as a potential bit-field-positioning op regardless of the number of uses..

mingmingl mentioned this in D135850: [AArch64] Enhance 'isBitfieldPositioningOp' to find pattern (shl(and(val,mask), N).Oct 13 2022, 12:19 AM

Do you have a better test? One that doesn't get so heavily optimized by opt.

mingmingl added a parent revision: D135843: [AArch64][NFC][1/4]Refactor 'isBitfieldPositioningOp' so that DAG nodes with different Opcode are handled with separate helper functions.Oct 14 2022, 11:21 AM

mingmingl added a child revision: D135850: [AArch64] Enhance 'isBitfieldPositioningOp' to find pattern (shl(and(val,mask), N).

mingmingl removed a child revision: D135850: [AArch64] Enhance 'isBitfieldPositioningOp' to find pattern (shl(and(val,mask), N).Oct 15 2022, 11:48 AM

mingmingl mentioned this in rG45cadb4bd36b: [AArch64][NFC]Refactor 'isBitfieldPositioningOp' so that DAG nodes with….Oct 17 2022, 8:09 AM

mingmingl mentioned this in rGdb0286a09626: [AArch64]Enhance 'isBitfieldPositioningOp' to find pattern (shl(and(val,mask)….Oct 17 2022, 9:02 AM

In D135844#3857874, @dmgreen wrote:

Do you have a better test? One that doesn't get so heavily optimized by opt.

Good question. When constructing test case, turns out changing shl (val, N) to UBFIZ could be counter-productive (regardless of number of uses), since shl (val, N) itself might be folded into aarch64 operand2.

Take @test_nouseful_bits as an example, bfxil w9, w0, #0, #8 is not better than orr w9, w0, w8, #8 (with w8 = and w0, 0xff) (higher throughput, shorter latency).

Together with the other motivating test case (https://godbolt.org/z/h96b1sGco for D135102) , planning to make changes when orr with a left shift is better than bfi.

For the affected test case test_nonuseful_bits, one BFM and one ORR is generated now (https://godbolt.org/z/a3c68f7dE) with this commit. Going to abandon this patch (rather than rebase it for other BFI improvements). Thanks for the discussions around BFM/ORR.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelDAGToDAG.cpp

3 lines

test/

CodeGen/

AArch64/

bitfield-insert.ll

5 lines

Diff 467347

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

Show First 20 Lines • Show All 2,550 Lines • ▼ Show 20 Lines	static bool isBitfieldPositioningOpFromShl(SelectionDAG *CurDAG, SDValue Op,
int &Width) {		int &Width) {
assert(isShiftedMask_64(NonZeroBits) && "Caller guaranteed");		assert(isShiftedMask_64(NonZeroBits) && "Caller guaranteed");

SDNode *N = Op.getNode();		SDNode *N = Op.getNode();
uint64_t ShlImm;		uint64_t ShlImm;
if (!isOpcWithIntImmediate(N, ISD::SHL, ShlImm))		if (!isOpcWithIntImmediate(N, ISD::SHL, ShlImm))
return false;		return false;

if (!BiggerPattern && !Op.hasOneUse())
return false;

EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
assert((VT == MVT::i32 \|\| VT == MVT::i64) &&		assert((VT == MVT::i32 \|\| VT == MVT::i64) &&
"Caller guarantees that type is i32 or i64");		"Caller guarantees that type is i32 or i64");

SDValue ShlOp0 = N->getOperand(0);		SDValue ShlOp0 = N->getOperand(0);

DstLSB = countTrailingZeros(NonZeroBits);		DstLSB = countTrailingZeros(NonZeroBits);
Width = countTrailingOnes(NonZeroBits >> DstLSB);		Width = countTrailingOnes(NonZeroBits >> DstLSB);
▲ Show 20 Lines • Show All 2,811 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/bitfield-insert.ll

Show First 20 Lines • Show All 260 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret void		ret void
}		}

; Tests when all the bits from one operand are not useful		; Tests when all the bits from one operand are not useful
define i32 @test_nouseful_bits(i8 %a, i32 %b) {		define i32 @test_nouseful_bits(i8 %a, i32 %b) {
; CHECK-LABEL: test_nouseful_bits:		; CHECK-LABEL: test_nouseful_bits:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: and w8, w0, #0xff		; CHECK-NEXT: and w8, w0, #0xff
		; CHECK-NEXT: bfi w0, w8, #8, #8
; CHECK-NEXT: lsl w8, w8, #8		; CHECK-NEXT: lsl w8, w8, #8
; CHECK-NEXT: mov w9, w8		; CHECK-NEXT: bfi w8, w0, #16, #16
; CHECK-NEXT: bfxil w9, w0, #0, #8
; CHECK-NEXT: bfi w8, w9, #16, #16
; CHECK-NEXT: mov w0, w8		; CHECK-NEXT: mov w0, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%conv = zext i8 %a to i32 ; 0 0 0 A		%conv = zext i8 %a to i32 ; 0 0 0 A
%shl = shl i32 %b, 8 ; B2 B1 B0 0		%shl = shl i32 %b, 8 ; B2 B1 B0 0
%or = or i32 %conv, %shl ; B2 B1 B0 A		%or = or i32 %conv, %shl ; B2 B1 B0 A
%shl.1 = shl i32 %or, 8 ; B1 B0 A 0		%shl.1 = shl i32 %or, 8 ; B1 B0 A 0
%or.1 = or i32 %conv, %shl.1 ; B1 B0 A A		%or.1 = or i32 %conv, %shl.1 ; B1 B0 A A
%shl.2 = shl i32 %or.1, 8 ; B0 A A 0		%shl.2 = shl i32 %or.1, 8 ; B0 A A 0
▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines