This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
5
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
arm64-shifted-sext.ll
-
arm64-trunc-store.ll
-
X86/
-
addr-mode-matcher-2.ll

Differential D159198

[DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2)
ClosedPublic

Authored by RKSimon on Aug 30 2023, 6:30 AM.

Download Raw Diff

Details

Reviewers

dmgreen
goldstein.w.n
craig.topper

Commits

rGe4d0e1209934: [DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 <<…
rGb027ce0ab930: [DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 <<…

Summary

Assuming the ADD is nsw then it may be sign-extended to merge with a SHL op in a similar fold to the existing (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) fold.

This is most useful for helping to expose address math for X86, but has also touched several aarch64 test cases as well - I think they are benign but would like some confirmation.

@craig.topper RISCV uses isDesirableToCommuteWithShift to prevent creating shifted adds with larger offsets - should this be extended to peek through ext nodes as well? If so I'll need to add suitable test coverage. IIRC there's implicit extension on RISCV targets that I'm not very familiar with.

Alive2: https://alive2.llvm.org/ce/z/2UpSbJ

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

RKSimon created this revision.Aug 30 2023, 6:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2023, 6:30 AM

Herald added subscribers: luismarques, pengfei, s.egerton and 5 others. · View Herald Transcript

RKSimon requested review of this revision.Aug 30 2023, 6:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2023, 6:30 AM

Herald added a subscriber: wangpc. · View Herald Transcript

alive2 proof?

RKSimon edited the summary of this revision. (Show Details)Aug 30 2023, 2:10 PM

The Arm parts seem OK to me

Missing a test with nuw + zext and negative tests (maybe missing flag or nsw + zext / nuw + sext).

pengfei added inline comments.Aug 30 2023, 7:02 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10012–10013	Do we need to consider the possibility of overflow when `c1 << c2`?

goldstein.w.n added inline comments.Aug 30 2023, 7:27 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10012–10013	Shouldnt need to, the proof stands even with

pengfei added inline comments.Aug 30 2023, 10:04 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10012–10013	I'm not sure about the `zext` one if we always put `c1 << c2` in the displacement. The displacement can only be signed int.

OK, I'll just go with sext/nsw for now - I'll add some explicit negative tests, although there are plenty of existing test cases that implicity do this for us already.

goldstein.w.n added inline comments.Aug 31 2023, 3:09 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10012–10013	Oh that's a good point

RKSimon mentioned this in rGf33c64dd5629: [X86] addr-mode-matcher-2.ll - add more sext/zext nsw/nuw permutations.Aug 31 2023, 4:19 AM

rebase - limit to sext/add_nsw - added negative tests

Harbormaster completed remote builds in B255977: Diff 554976.Aug 31 2023, 5:07 AM

In D159198#4630801, @RKSimon wrote:

rebase - limit to sext/add_nsw - added negative tests

Can you make the tests related?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10013	Add TODO for `zext` if `nuw` + `c1 << c2` stays imm32? (or more conservatively same int width class).

In D159198#4632006, @goldstein.w.n wrote:

In D159198#4630801, @RKSimon wrote:

rebase - limit to sext/add_nsw - added negative tests

Can you make the tests related?

I'm sorry, please can you explain what you mean by this?

In D159198#4632035, @RKSimon wrote:

In D159198#4632006, @goldstein.w.n wrote:

In D159198#4630801, @RKSimon wrote:

rebase - limit to sext/add_nsw - added negative tests

Can you make the tests related?

I'm sorry, please can you explain what you mean by this?

Make them related revisions? Or have you already pushed.

Already pushed at rGf33c64dd5629d71ecdf31e6dcd6afc9dbf92c562

Any further comments - does any of the RISCV gurus have any thoughts on whether isDesirableToCommuteWithShift needs adjusting?

In D159198#4633898, @RKSimon wrote:

Any further comments - does any of the RISCV gurus have any thoughts on whether isDesirableToCommuteWithShift needs adjusting?

X86 aspect LGTM.

This revision is now accepted and ready to land.Sep 1 2023, 11:45 AM

Cheers - I'll push this in a couple of days as long as there's no objections from RISCV @craig.topper?

This revision was landed with ongoing or failed builds.Sep 6 2023, 2:07 AM

Closed by commit rGb027ce0ab930: [DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 <<… (authored by RKSimon). · Explain Why

This revision was automatically updated to reflect the committed changes.

RKSimon added a commit: rGb027ce0ab930: [DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 <<….

Sorry, this change breaks llvm-project/llvm/test/Transforms/InferAddressSpaces/AMDGPU/flat_atomic.ll:

https://lab.llvm.org/buildbot/#/builders/249/builds/9078

I will revert this patch to unbreak the tests at HEAD.

gribozavr added a reverting change: rG97bf104d97d6: Revert "[DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2)….Sep 6 2023, 2:32 AM

RKSimon added a commit: rGe4d0e1209934: [DAG] Fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 <<….Sep 6 2023, 5:20 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

21 lines

test/

CodeGen/

AArch64/

arm64-shifted-sext.ll

5 lines

arm64-trunc-store.ll

8 lines

X86/

addr-mode-matcher-2.ll

4 lines

Diff 555989

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,003 Lines • ▼ Show 20 Lines	if ((N0.getOpcode() == ISD::ADD \|\| N0.getOpcode() == ISD::OR) &&
if (SDValue Shl1 =		if (SDValue Shl1 =
DAG.FoldConstantArithmetic(ISD::SHL, SDLoc(N1), VT, {N01, N1})) {		DAG.FoldConstantArithmetic(ISD::SHL, SDLoc(N1), VT, {N01, N1})) {
SDValue Shl0 = DAG.getNode(ISD::SHL, SDLoc(N0), VT, N0.getOperand(0), N1);		SDValue Shl0 = DAG.getNode(ISD::SHL, SDLoc(N0), VT, N0.getOperand(0), N1);
AddToWorklist(Shl0.getNode());		AddToWorklist(Shl0.getNode());
return DAG.getNode(N0.getOpcode(), SDLoc(N), VT, Shl0, Shl1);		return DAG.getNode(N0.getOpcode(), SDLoc(N), VT, Shl0, Shl1);
}		}
}		}

		// fold (shl (sext (add_nsw x, c1)), c2) -> (add (shl (sext x), c2), c1 << c2)
		// TODO: Add zext/add_nuw variant with suitable test coverage
		pengfeiUnsubmitted Not Done Reply Inline Actions Do we need to consider the possibility of overflow when `c1 << c2`? pengfei: Do we need to consider the possibility of overflow when `c1 << c2`?
		goldstein.w.nUnsubmitted Not Done Reply Inline Actions Shouldnt need to, the proof stands even with goldstein.w.n: Shouldnt need to, the proof stands even with
		pengfeiUnsubmitted Not Done Reply Inline Actions I'm not sure about the `zext` one if we always put `c1 << c2` in the displacement. The displacement can only be signed int. pengfei: I'm not sure about the `zext` one if we always put `c1 << c2` in the displacement. The…
		goldstein.w.nUnsubmitted Not Done Reply Inline Actions Oh that's a good point goldstein.w.n: Oh that's a good point
		goldstein.w.nUnsubmitted Not Done Reply Inline Actions Add TODO for `zext` if `nuw` + `c1 << c2` stays imm32? (or more conservatively same int width class). goldstein.w.n: Add TODO for `zext` if `nuw` + `c1 << c2` stays imm32? (or more conservatively same int width…
		// TODO: Should we limit this with isLegalAddImmediate?
		if (N0.getOpcode() == ISD::SIGN_EXTEND &&
		N0.getOperand(0).getOpcode() == ISD::ADD &&
		N0.getOperand(0)->getFlags().hasNoSignedWrap() && N0->hasOneUse() &&
		N0.getOperand(0)->hasOneUse() &&
		TLI.isDesirableToCommuteWithShift(N, Level)) {
		SDValue Add = N0.getOperand(0);
		SDLoc DL(N0);
		if (SDValue ExtC = DAG.FoldConstantArithmetic(N0.getOpcode(), DL, VT,
		{Add.getOperand(1)})) {
		if (SDValue ShlC =
		DAG.FoldConstantArithmetic(ISD::SHL, DL, VT, {ExtC, N1})) {
		SDValue ExtX = DAG.getNode(N0.getOpcode(), DL, VT, Add.getOperand(0));
		SDValue ShlX = DAG.getNode(ISD::SHL, DL, VT, ExtX, N1);
		return DAG.getNode(ISD::ADD, DL, VT, ShlX, ShlC);
		}
		}
		}

// fold (shl (mul x, c1), c2) -> (mul x, c1 << c2)		// fold (shl (mul x, c1), c2) -> (mul x, c1 << c2)
if (N0.getOpcode() == ISD::MUL && N0->hasOneUse()) {		if (N0.getOpcode() == ISD::MUL && N0->hasOneUse()) {
SDValue N01 = N0.getOperand(1);		SDValue N01 = N0.getOperand(1);
if (SDValue Shl =		if (SDValue Shl =
DAG.FoldConstantArithmetic(ISD::SHL, SDLoc(N1), VT, {N01, N1}))		DAG.FoldConstantArithmetic(ISD::SHL, SDLoc(N1), VT, {N01, N1}))
return DAG.getNode(ISD::MUL, SDLoc(N), VT, N0.getOperand(0), Shl);		return DAG.getNode(ISD::MUL, SDLoc(N), VT, N0.getOperand(0), Shl);
}		}

▲ Show 20 Lines • Show All 17,711 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-shifted-sext.ll

Show First 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	entry:
%conv = sext i16 %inc to i64		%conv = sext i16 %inc to i64
%shr = ashr i64 %conv, 16		%shr = ashr i64 %conv, 16
ret i64 %shr		ret i64 %shr
}		}

define i64 @extendedLeftShiftintToint64By4(i32 %a) nounwind readnone ssp {		define i64 @extendedLeftShiftintToint64By4(i32 %a) nounwind readnone ssp {
; CHECK-LABEL: extendedLeftShiftintToint64By4:		; CHECK-LABEL: extendedLeftShiftintToint64By4:
; CHECK: ; %bb.0: ; %entry		; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: add w8, w0, #1		; CHECK-NEXT: ; kill: def $w0 killed $w0 def $x0
; CHECK-NEXT: sbfiz x0, x8, #4, #32		; CHECK-NEXT: sbfiz x8, x0, #4, #32
		; CHECK-NEXT: add x0, x8, #16
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%inc = add nsw i32 %a, 1		%inc = add nsw i32 %a, 1
%conv = sext i32 %inc to i64		%conv = sext i32 %inc to i64
%shl = shl nsw i64 %conv, 4		%shl = shl nsw i64 %conv, 4
ret i64 %shl		ret i64 %shl
}		}

▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-trunc-store.ll

	Show All 14 Lines
	@zptr8 = common global ptr null, align 8			@zptr8 = common global ptr null, align 8
	@zptr16 = common global ptr null, align 8			@zptr16 = common global ptr null, align 8
	@zptr32 = common global ptr null, align 8			@zptr32 = common global ptr null, align 8

	define void @fct32(i32 %arg, i64 %var) {			define void @fct32(i32 %arg, i64 %var) {
	; CHECK-LABEL: fct32:			; CHECK-LABEL: fct32:
	; CHECK: // %bb.0: // %bb			; CHECK: // %bb.0: // %bb
	; CHECK-NEXT: adrp x8, :got:zptr32			; CHECK-NEXT: adrp x8, :got:zptr32
	; CHECK-NEXT: sub w9, w0, #1
	; CHECK-NEXT: ldr x8, [x8, :got_lo12:zptr32]			; CHECK-NEXT: ldr x8, [x8, :got_lo12:zptr32]
	; CHECK-NEXT: ldr x8, [x8]			; CHECK-NEXT: ldr x8, [x8]
	; CHECK-NEXT: str w1, [x8, w9, sxtw #2]			; CHECK-NEXT: add x8, x8, w0, sxtw #2
				; CHECK-NEXT: stur w1, [x8, #-4]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	bb:			bb:
	%.pre37 = load ptr, ptr @zptr32, align 8			%.pre37 = load ptr, ptr @zptr32, align 8
	%dec = add nsw i32 %arg, -1			%dec = add nsw i32 %arg, -1
	%idxprom8 = sext i32 %dec to i64			%idxprom8 = sext i32 %dec to i64
	%arrayidx9 = getelementptr inbounds i32, ptr %.pre37, i64 %idxprom8			%arrayidx9 = getelementptr inbounds i32, ptr %.pre37, i64 %idxprom8
	%tmp = trunc i64 %var to i32			%tmp = trunc i64 %var to i32
	store i32 %tmp, ptr %arrayidx9, align 4			store i32 %tmp, ptr %arrayidx9, align 4
	ret void			ret void
	}			}

	define void @fct16(i32 %arg, i64 %var) {			define void @fct16(i32 %arg, i64 %var) {
	; CHECK-LABEL: fct16:			; CHECK-LABEL: fct16:
	; CHECK: // %bb.0: // %bb			; CHECK: // %bb.0: // %bb
	; CHECK-NEXT: adrp x8, :got:zptr16			; CHECK-NEXT: adrp x8, :got:zptr16
	; CHECK-NEXT: sub w9, w0, #1
	; CHECK-NEXT: ldr x8, [x8, :got_lo12:zptr16]			; CHECK-NEXT: ldr x8, [x8, :got_lo12:zptr16]
	; CHECK-NEXT: ldr x8, [x8]			; CHECK-NEXT: ldr x8, [x8]
	; CHECK-NEXT: strh w1, [x8, w9, sxtw #1]			; CHECK-NEXT: add x8, x8, w0, sxtw #1
				; CHECK-NEXT: sturh w1, [x8, #-2]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	bb:			bb:
	%.pre37 = load ptr, ptr @zptr16, align 8			%.pre37 = load ptr, ptr @zptr16, align 8
	%dec = add nsw i32 %arg, -1			%dec = add nsw i32 %arg, -1
	%idxprom8 = sext i32 %dec to i64			%idxprom8 = sext i32 %dec to i64
	%arrayidx9 = getelementptr inbounds i16, ptr %.pre37, i64 %idxprom8			%arrayidx9 = getelementptr inbounds i16, ptr %.pre37, i64 %idxprom8
	%tmp = trunc i64 %var to i16			%tmp = trunc i64 %var to i16
	store i16 %tmp, ptr %arrayidx9, align 4			store i16 %tmp, ptr %arrayidx9, align 4
	Show All 21 Lines

llvm/test/CodeGen/X86/addr-mode-matcher-2.ll

	Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; X64-NEXT: # %bb.3:			; X64-NEXT: # %bb.3:
	; X64-NEXT: popq %rax			; X64-NEXT: popq %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	; X64-NEXT: .LBB0_1: # %.preheader			; X64-NEXT: .LBB0_1: # %.preheader
	; X64-NEXT: movl %esi, %eax			; X64-NEXT: movl %esi, %eax
	; X64-NEXT: .p2align 4, 0x90			; X64-NEXT: .p2align 4, 0x90
	; X64-NEXT: .LBB0_2: # =>This Inner Loop Header: Depth=1			; X64-NEXT: .LBB0_2: # =>This Inner Loop Header: Depth=1
	; X64-NEXT: cltq			; X64-NEXT: cltq
	; X64-NEXT: leaq 4(,%rax,4), %rax			; X64-NEXT: shlq $2, %rax
	; X64-NEXT: leaq (%rax,%rax,4), %rdi			; X64-NEXT: leaq 20(%rax,%rax,4), %rdi
	; X64-NEXT: callq bar@PLT			; X64-NEXT: callq bar@PLT
	; X64-NEXT: jmp .LBB0_2			; X64-NEXT: jmp .LBB0_2
	br i1 %0, label %9, label %3			br i1 %0, label %9, label %3

	%4 = phi i32 [ %8, %3 ], [ %1, %2 ]			%4 = phi i32 [ %8, %3 ], [ %1, %2 ]
	%5 = add nsw i32 %4, 1			%5 = add nsw i32 %4, 1
	%6 = sext i32 %5 to i64			%6 = sext i32 %5 to i64
	%7 = getelementptr inbounds %struct.A, ptr null, i64 %6			%7 = getelementptr inbounds %struct.A, ptr null, i64 %6
	▲ Show 20 Lines • Show All 246 Lines • Show Last 20 Lines