Download Raw Diff

Details

Reviewers

spatel
RKSimon
niravd
efriedma
javed.absar
john.brawn
fpichet

Commits

rGdd8cd6d26b17: [DAGCombine] Fix ReduceLoadWidth for shifted offsets
rG597811e7a754: [DAGCombiner] Reduce load widths of shifted masks
rL351310: [DAGCombine] Fix ReduceLoadWidth for shifted offsets
rL340261: [DAGCombiner] Reduce load widths of shifted masks

Summary

During combining, ReduceLoadWdith is used to combine AND nodes that mask loads into narrow loads. This patch allows the mask to be a shifted constant. This results in a narrow load which is then left shifted to compensate for the new offset.

Diff Detail

Event Timeline

samparker created this revision.Aug 8 2018, 3:43 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptAug 8 2018, 3:43 AM

RKSimon added inline comments.Aug 8 2018, 8:10 AM

test/CodeGen/ARM/and-load-combine.ll
44	I'd recommend reverting these changes from diff - you can avoid the problem by moving all the arguments to the same line

Moved arguments to occupy a single line.

samparker added a reviewer: john.brawn.Aug 14 2018, 1:20 AM

dnsampaio added a subscriber: dnsampaio.Aug 14 2018, 2:13 AM

dnsampaio added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9094	ActiveBits can be zero, this might throw an error no? `and (load i32, 0x000FF000)` This is a shifted mask. You could use APInt.countPopulation as to get the number of 1s.
9094	Ignore. u do lsrh.

john.brawn added inline comments.Aug 14 2018, 4:36 AM

test/CodeGen/ARM/and-load-combine.ll
7	It's not clear to me what the purpose of these test changes are (or the previous version). If I revert these changes then the test still pases.
test/CodeGen/X86/fp128-i128.ll
44 ↗	(On Diff #159674)	r338821 made changes to this test which means the patch fails on this file.

Rebased and updated changes to the x86 codegen tests.

test/CodeGen/ARM/and-load-combine.ll
7	It's just because I've used the update_llc script and the format was different. Moving the arguments onto a single line removed the unnecessary diffs produced by the script.

john.brawn requested changes to this revision.Aug 15 2018, 7:42 AM

john.brawn added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9232	On big-endian targets ShAmt has been adjusted by the time we get here, in which case the shifts we do here are wrong, e.g. many of the tests you've added are checking that in big-endian the load is eliminated which is not what should be happening.

This revision now requires changes to proceed.Aug 15 2018, 7:42 AM

samparker added inline comments.Aug 16 2018, 3:09 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9232	Ah! Thanks, I'm sure BE always trips me up here.

Re-adjusted ShAmt for big endian targets.

LGTM.

This revision is now accepted and ready to land.Aug 20 2018, 9:51 AM

Closed by commit rL340261: [DAGCombiner] Reduce load widths of shifted masks (authored by sam_parker). · Explain WhyAug 21 2018, 3:27 AM

This revision was automatically updated to reflect the committed changes.

I recently resynced an out of tree backend and I got a miscompile because of this commit.
My target is Big Endian.
The C code is :

void swap(unsigned *ptr) {
  *ptr = (*ptr & 0x0000ff00 ) << 8;		
}

The IR is

%0 = load i32, i32* %ptr, align 4
%and = and i32 %0, 65280
%shl = shl i32 %and, 8
store i32 %shl, i32* %ptr, align 4

The resulting DAG->dump() after the function ReduceLoadWidth returned will be:

t5: i32,ch = load<(load 4 from %ir.ptr)> t0, t2, undef:i32, main.c:8:4
t6: i32 = Constant<65280>
  t12: i32 = add nuw t2, Constant:i32<2>, main.c:8:4
t13: i32,ch = load<(load 1 from %ir.ptr + 2, align 2), zext from i8> t0, t12, undef:i32, main.c:8:4
    t7: i32 = and t13, Constant:i32<255>, main.c:8:4
  t9: i32 = shl t7, Constant:i32<8>, main.c:8:4
t10: ch = store<(store 4 into %ir.ptr)> t13:1, t9, t2, undef:i32, main.c:8:4

There is a missing shl by 8 missing. (yes there should be 2 shl to be combined later)

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9252 ↗	(On Diff #161681)	I think this line should be: SDValue Shifted = DAG.getNode(ISD::SHL, DL, VT, N->getOperand(0),

This revision is now accepted and ready to land.Jan 13 2019, 7:37 PM

Hi,

Could you please describe more accurately what you are expecting to see? I'm failing to see the issue.

For example:

given test.ll:
define dso_local void @swap(i32* %ptr) #0 {
entry:

%0 = load i32, i32* %ptr, align 4
%and = and i32 %0, 65280
%shl = shl i32 %and, 8
store i32 %shl, i32* %ptr, align 4
ret void

}

llc -mtriple=armv7-linux-gnu < test.ll
will give:

ldrb    r1, [r0, #1]
lsl     r1, r1, #8   // 8
str     r1, [r0]
bx      lr

it should be:

ldrb    r1, [r0, #1]
lsl     r1, r1, #16  // 16
str     r1, [r0]
bx      lr

Thanks. Yes, I think this is to do with the way I try to insert the new shl node, the DAG is just returning the existing node hence we only have one shl node. Should have a patch up tomorrow.

Changed the method of creating the final shl.

I tried your patch and the bug is gone. Thanks.

Closed by commit rL351310: [DAGCombine] Fix ReduceLoadWidth for shifted offsets (authored by sam_parker). · Explain WhyJan 16 2019, 12:44 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D117104: [DAGCombine] Refactor DAGCombiner::ReduceLoadWidth. NFCI.Jan 13 2022, 12:13 PM

Diff 181768

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,718 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitAND(SDNode *N) {
// fold (and (extload x, i16), 255) -> (zextload x, i8)		// fold (and (extload x, i16), 255) -> (zextload x, i8)
// fold (and (any_ext (extload x, i16)), 255) -> (zextload x, i8)		// fold (and (any_ext (extload x, i16)), 255) -> (zextload x, i8)
if (!VT.isVector() && N1C && (N0.getOpcode() == ISD::LOAD \|\|		if (!VT.isVector() && N1C && (N0.getOpcode() == ISD::LOAD \|\|
(N0.getOpcode() == ISD::ANY_EXTEND &&		(N0.getOpcode() == ISD::ANY_EXTEND &&
N0.getOperand(0).getOpcode() == ISD::LOAD))) {		N0.getOperand(0).getOpcode() == ISD::LOAD))) {
if (SDValue Res = ReduceLoadWidth(N)) {		if (SDValue Res = ReduceLoadWidth(N)) {
LoadSDNode *LN0 = N0->getOpcode() == ISD::ANY_EXTEND		LoadSDNode *LN0 = N0->getOpcode() == ISD::ANY_EXTEND
? cast<LoadSDNode>(N0.getOperand(0)) : cast<LoadSDNode>(N0);		? cast<LoadSDNode>(N0.getOperand(0)) : cast<LoadSDNode>(N0);

AddToWorklist(N);		AddToWorklist(N);
CombineTo(LN0, Res, Res.getValue(1));		DAG.ReplaceAllUsesOfValueWith(SDValue(LN0, 0), Res);
return SDValue(N, 0);		return SDValue(N, 0);
}		}
}		}

if (Level >= AfterLegalizeTypes) {		if (Level >= AfterLegalizeTypes) {
// Attempt to propagate the AND back up to the leaves which, if they're		// Attempt to propagate the AND back up to the leaves which, if they're
// loads, can be combined to narrow loads and the AND node can be removed.		// loads, can be combined to narrow loads and the AND node can be removed.
// Perform after legalization so that extend nodes will already be		// Perform after legalization so that extend nodes will already be
▲ Show 20 Lines • Show All 4,349 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitZERO_EXTEND(SDNode *N) {
if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))		if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))
return NewVSel;		return NewVSel;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitANY_EXTEND(SDNode *N) {		SDValue DAGCombiner::visitANY_EXTEND(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
		dnsampaioUnsubmitted Not Done Reply Inline Actions ActiveBits can be zero, this might throw an error no? `and (load i32, 0x000FF000)` This is a shifted mask. You could use APInt.countPopulation as to get the number of 1s. dnsampaio: ActiveBits can be zero, this might throw an error no? `and (load i32, 0x000FF000)` This is a…
		dnsampaioUnsubmitted Not Done Reply Inline Actions Ignore. u do lsrh. dnsampaio: Ignore. u do lsrh.

if (SDValue Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes))		if (SDValue Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes))
return Res;		return Res;

// fold (aext (aext x)) -> (aext x)		// fold (aext (aext x)) -> (aext x)
// fold (aext (zext x)) -> (zext x)		// fold (aext (zext x)) -> (zext x)
// fold (aext (sext x)) -> (sext x)		// fold (aext (sext x)) -> (sext x)
if (N0.getOpcode() == ISD::ANY_EXTEND \|\|		if (N0.getOpcode() == ISD::ANY_EXTEND \|\|
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::SETCC) {
// aext(setcc x,y,cc) -> select_cc x, y, 1, 0, cc		// aext(setcc x,y,cc) -> select_cc x, y, 1, 0, cc
SDLoc DL(N);		SDLoc DL(N);
if (SDValue SCC = SimplifySelectCC(		if (SDValue SCC = SimplifySelectCC(
DL, N0.getOperand(0), N0.getOperand(1), DAG.getConstant(1, DL, VT),		DL, N0.getOperand(0), N0.getOperand(1), DAG.getConstant(1, DL, VT),
DAG.getConstant(0, DL, VT),		DAG.getConstant(0, DL, VT),
cast<CondCodeSDNode>(N0.getOperand(2))->get(), true))		cast<CondCodeSDNode>(N0.getOperand(2))->get(), true))
return SCC;		return SCC;
}		}

		john.brawnUnsubmitted Not Done Reply Inline Actions On big-endian targets ShAmt has been adjusted by the time we get here, in which case the shifts we do here are wrong, e.g. many of the tests you've added are checking that in big-endian the load is eliminated which is not what should be happening. john.brawn: On big-endian targets ShAmt has been adjusted by the time we get here, in which case the shifts…
		samparkerAuthorUnsubmitted Not Done Reply Inline Actions Ah! Thanks, I'm sure BE always trips me up here. samparker: Ah! Thanks, I'm sure BE always trips me up here.
return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitAssertExt(SDNode *N) {		SDValue DAGCombiner::visitAssertExt(SDNode *N) {
unsigned Opcode = N->getOpcode();		unsigned Opcode = N->getOpcode();
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT AssertVT = cast<VTSDNode>(N1)->getVT();		EVT AssertVT = cast<VTSDNode>(N1)->getVT();
▲ Show 20 Lines • Show All 239 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::ReduceLoadWidth(SDNode *N) {
}		}

if (HasShiftedOffset) {		if (HasShiftedOffset) {
// Recalculate the shift amount after it has been altered to calculate		// Recalculate the shift amount after it has been altered to calculate
// the offset.		// the offset.
if (DAG.getDataLayout().isBigEndian())		if (DAG.getDataLayout().isBigEndian())
ShAmt = AdjustBigEndianShift(ShAmt);		ShAmt = AdjustBigEndianShift(ShAmt);

// We're using a shifted mask, so the load now has an offset. This means we		// We're using a shifted mask, so the load now has an offset. This means
// now need to shift right the mask to match the new load and then shift		// that data has been loaded into the lower bytes than it would have been
// right the result of the AND.		// before, so we need to shl the loaded data into the correct position in the
const APInt &Mask = cast<ConstantSDNode>(N->getOperand(1))->getAPIntValue();		// register.
APInt ShiftedMask = Mask.lshr(ShAmt);
DAG.UpdateNodeOperands(N, Result, DAG.getConstant(ShiftedMask, DL, VT));
SDValue ShiftC = DAG.getConstant(ShAmt, DL, VT);		SDValue ShiftC = DAG.getConstant(ShAmt, DL, VT);
SDValue Shifted = DAG.getNode(ISD::SHL, DL, VT, SDValue(N, 0),		Result = DAG.getNode(ISD::SHL, DL, VT, Result, ShiftC);
ShiftC);		DAG.ReplaceAllUsesOfValueWith(SDValue(N, 0), Result);
DAG.ReplaceAllUsesOfValueWith(SDValue(N, 0), Shifted);
DAG.UpdateNodeOperands(Shifted.getNode(), SDValue(N, 0), ShiftC);
}		}

// Return the new loaded value.		// Return the new loaded value.
return Result;		return Result;
}		}

SDValue DAGCombiner::visitSIGN_EXTEND_INREG(SDNode *N) {		SDValue DAGCombiner::visitSIGN_EXTEND_INREG(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
▲ Show 20 Lines • Show All 4,992 Lines • Show Last 20 Lines

test/CodeGen/ARM/and-load-combine.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=armv7 %s -o - \| FileCheck %s --check-prefix=ARM			; RUN: llc -mtriple=armv7 %s -o - \| FileCheck %s --check-prefix=ARM
	; RUN: llc -mtriple=armv7eb %s -o - \| FileCheck %s --check-prefix=ARMEB			; RUN: llc -mtriple=armv7eb %s -o - \| FileCheck %s --check-prefix=ARMEB
	; RUN: llc -mtriple=armv6m %s -o - \| FileCheck %s --check-prefix=THUMB1			; RUN: llc -mtriple=armv6m %s -o - \| FileCheck %s --check-prefix=THUMB1
	; RUN: llc -mtriple=thumbv8m.main %s -o - \| FileCheck %s --check-prefix=THUMB2			; RUN: llc -mtriple=thumbv8m.main %s -o - \| FileCheck %s --check-prefix=THUMB2

	define arm_aapcscc zeroext i1 @cmp_xor8_short_short(i16* nocapture readonly %a, i16* nocapture readonly %b) {			define arm_aapcscc zeroext i1 @cmp_xor8_short_short(i16* nocapture readonly %a, i16* nocapture readonly %b) {
				john.brawnUnsubmitted Not Done Reply Inline Actions It's not clear to me what the purpose of these test changes are (or the previous version). If I revert these changes then the test still pases. john.brawn: It's not clear to me what the purpose of these test changes are (or the previous version). If I…
				samparkerAuthorUnsubmitted Not Done Reply Inline Actions It's just because I've used the update_llc script and the format was different. Moving the arguments onto a single line removed the unnecessary diffs produced by the script. samparker: It's just because I've used the update_llc script and the format was different. Moving the…
	; ARM-LABEL: cmp_xor8_short_short:			; ARM-LABEL: cmp_xor8_short_short:
	; ARM: @ %bb.0: @ %entry			; ARM: @ %bb.0: @ %entry
	; ARM-NEXT: ldrb r0, [r0]			; ARM-NEXT: ldrb r0, [r0]
	; ARM-NEXT: ldrb r1, [r1]			; ARM-NEXT: ldrb r1, [r1]
	; ARM-NEXT: eor r0, r1, r0			; ARM-NEXT: eor r0, r1, r0
	; ARM-NEXT: clz r0, r0			; ARM-NEXT: clz r0, r0
	; ARM-NEXT: lsr r0, r0, #5			; ARM-NEXT: lsr r0, r0, #5
	; ARM-NEXT: bx lr			; ARM-NEXT: bx lr
	Show All 20 Lines
	; THUMB2: @ %bb.0: @ %entry			; THUMB2: @ %bb.0: @ %entry
	; THUMB2-NEXT: ldrb r0, [r0]			; THUMB2-NEXT: ldrb r0, [r0]
	; THUMB2-NEXT: ldrb r1, [r1]			; THUMB2-NEXT: ldrb r1, [r1]
	; THUMB2-NEXT: eors r0, r1			; THUMB2-NEXT: eors r0, r1
	; THUMB2-NEXT: clz r0, r0			; THUMB2-NEXT: clz r0, r0
	; THUMB2-NEXT: lsrs r0, r0, #5			; THUMB2-NEXT: lsrs r0, r0, #5
	; THUMB2-NEXT: bx lr			; THUMB2-NEXT: bx lr
	entry:			entry:
	%0 = load i16, i16* %a, align 2			%0 = load i16, i16* %a, align 2
				RKSimonUnsubmitted Not Done Reply Inline Actions I'd recommend reverting these changes from diff - you can avoid the problem by moving all the arguments to the same line RKSimon: I'd recommend reverting these changes from diff - you can avoid the problem by moving all the…
	%1 = load i16, i16* %b, align 2			%1 = load i16, i16* %b, align 2
	%xor2 = xor i16 %1, %0			%xor2 = xor i16 %1, %0
	%2 = and i16 %xor2, 255			%2 = and i16 %xor2, 255
	%cmp = icmp eq i16 %2, 0			%cmp = icmp eq i16 %2, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

	define arm_aapcscc zeroext i1 @cmp_xor8_short_int(i16* nocapture readonly %a, i32* nocapture readonly %b) {			define arm_aapcscc zeroext i1 @cmp_xor8_short_int(i16* nocapture readonly %a, i32* nocapture readonly %b) {
	▲ Show 20 Lines • Show All 1,491 Lines • ▼ Show 20 Lines
	; THUMB2-NEXT: ldrh r0, [r0, #6]			; THUMB2-NEXT: ldrh r0, [r0, #6]
	; THUMB2-NEXT: lsls r1, r0, #16			; THUMB2-NEXT: lsls r1, r0, #16
	; THUMB2-NEXT: movs r0, #0			; THUMB2-NEXT: movs r0, #0
	; THUMB2-NEXT: bx lr			; THUMB2-NEXT: bx lr
	%1 = load i64, i64* %p, align 8			%1 = load i64, i64* %p, align 8
	%and = and i64 %1, -281474976710656			%and = and i64 %1, -281474976710656
	ret i64 %and			ret i64 %and
	}			}

				; ARM-LABEL: test27:
				; ARM: @ %bb.0:
				; ARM-NEXT: ldrb r1, [r0, #1]
				; ARM-NEXT: lsl r1, r1, #16
				; ARM-NEXT: str r1, [r0]
				; ARM-NEXT: bx lr
				;
				; ARMEB-LABEL: test27:
				; ARMEB: @ %bb.0:
				; ARMEB-NEXT: ldrb r1, [r0, #2]
				; ARMEB-NEXT: lsl r1, r1, #16
				; ARMEB-NEXT: str r1, [r0]
				; ARMEB-NEXT: bx lr
				;
				; THUMB1-LABEL: test27:
				; THUMB1: @ %bb.0:
				; THUMB1-NEXT: ldrb r1, [r0, #1]
				; THUMB1-NEXT: lsls r1, r1, #16
				; THUMB1-NEXT: str r1, [r0]
				; THUMB1-NEXT: bx lr
				;
				; THUMB2-LABEL: test27:
				; THUMB2: @ %bb.0:
				; THUMB2-NEXT: ldrb r1, [r0, #1]
				; THUMB2-NEXT: lsls r1, r1, #16
				; THUMB2-NEXT: str r1, [r0]
				; THUMB2-NEXT: bx lr
				define void @test27(i32* nocapture %ptr) {
				entry:
				%0 = load i32, i32* %ptr, align 4
				%and = and i32 %0, 65280
				%shl = shl i32 %and, 8
				store i32 %shl, i32* %ptr, align 4
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Reduce load widths of shifted masks
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 181768

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/ARM/and-load-combine.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Reduce load widths of shifted masksClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 181768

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/ARM/and-load-combine.ll

[DAGCombiner] Reduce load widths of shifted masks
ClosedPublic