Download Raw Diff

Details

Reviewers

spatel
niravd
efriedma
hfinkel
resistor
javed.absar

Commits

rGa46518850067: [DAGCombine] Fix alignment for offset loads/stores
rL335210: [DAGCombine] Fix alignment for offset loads/stores

Summary

This test case (which I hope is free on UB), has two stores of 0 to offsets 20 and 24 in a chunk of memory:
store i32 0, i32* %helper.20.32
store i32 0, i32* %helper.24.32, align 8
A 64 bit load, aligned to 4 bytes:
%load.helper.20.64 = load i64, i64* %helper.20.64, align 4

This is on AArch32, so during type legalisation the i64 load is split into two 32bit loads. The second of them:
t35: i32,ch = load<(load 4 from %ir.helper.20.64 + 4)> t21, t37, undef:i32
gets marked as being align 8 (note: the base+offset is align 8, not the base). This is then deemed to not alias with the load to %helper.24.32 as the alignment just set is taken as the base alignment, not the base+offset alignment.

The test case seems to need a <4 x i32> which on ARM is converted to a VLD1_UPD. I believe this pushes certain optimisation back later after legalisation. Originally it needed -combiner-global-alias-analysis, but this version shows the same error without.

Here I've set the updated alignment only if the alignment hold true for the base.

Diff Detail

Event Timeline

dmgreen created this revision.Jun 11 2018, 8:31 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptJun 11 2018, 8:31 AM

Herald added a subscriber: kristof.beyls. · View Herald Transcript

arsenm added a subscriber: arsenm.Jun 11 2018, 8:38 AM

arsenm added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12185–12186	This should just use getAlignment() rather than looking at the two underlying alignments?

Hello.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12185–12186	I'm not sure what you mean by two underlying alignments. Do you mean use LD->getAlignment(), not LD->getOriginalAlignment()? The Align passed to getExtLoad seems to be treated as a base align, not a base+offset align.

arsenm added inline comments.Jun 11 2018, 9:10 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12185–12186	Yes. The original code doesn't make sense to me, if it's looking for an improved alignment it should be looking at alignment the load already has, rather than for some reason looking at the MMO. It's the alignment of the load piece itself. The "base alignment" refers to the alignment of the IR piece, and really nothing in the DAG should be particularly concerned with it.

dmgreen added inline comments.Jun 11 2018, 9:26 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12185–12186	I'm not sure if the load has a alignment itself, unless I'm reading this wrong. The MemSDNode::getAlignment() just calls MMO->getAlignment(), which is MinAlign(getBaseAlignment(), getOffset()). So I was going with getBaseAlignment as it's a simpler call and Align needs to treated as a BaseAlign (which the mod check ensures is valid). Happy enough to change it.

getOriginalAlignment() -> getAlignment()

dmgreen mentioned this in D48074: [ARM] Enable useAA() for the in-order Cortex-R52.Jun 12 2018, 5:00 AM

Ping. This look Ok now?

The getExtLoad/getTruncStore API seems really confusing, but I guess there's no simple way to fix it.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12190	Can we get rid of this check?

I have changed these to asserts, which I believe will never fire. I'm not sure if it breaks the design on DAGCombiner to work like this, just refining the alignment on the MMO.

arsenm added inline comments.Jun 19 2018, 4:51 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12185–12186	Yes, the load alignment is ultimately stored in the MMO. I meant it shouldn't need to be looking at the underlying separate IR pieces and worrying about if it's at an offset from the base. Does this not work if you just change it to getAlignment() for some reason?

dmgreen added inline comments.Jun 19 2018, 6:26 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12185–12186	The chain of events that happen in the attached test case is we have these: t23: ch = store<(store 4 into %ir.helper.20.32)> t0, Constant:i32<0>, t3, undef:i32 t21: ch = store<(store 4 into %ir.helper.24.32, align 8)> t0, Constant:i32<0>, t29, undef:i32 t27: i64,ch = load<(load 8 from %ir.helper.20.64, align 4)> t26, t3, undef:i32 The load is a 64bit unaligned load from %helper + 20. The stores are 32bit to %helper+20 and %helper+24. %helper is align 8. The load is legalised to these 32bit loads: t30: i32,ch = load<(load 4 from %ir.helper.20.64)> t26, t3, undef:i32 t33: i32,ch = load<(load 4 from %ir.helper.20.64 + 4)> t26, t32, undef:i32 The second one, via it being a Frame Index in InferPtrAlignment, is deemed to have an alignment of 8 (which is true for the load, (20+4)%8==0, but not for the base). So if we use Align in this getExtLoad, it is passes to getLoad, which passes it to MachineFunction::getMachineMemOperand as base_align. So the base_align on the MMO gets set to 8, and we end up with this: Combining: t35: i32,ch = load<(load 4 from %ir.helper.20.64 + 4)> t21, t37, undef:i32 Creating new node: t38: i32,ch = load<(load 4 from %ir.helper.20.64 + 4, align 8)> t0, t37, undef:i32 Creating new node: t39: ch = TokenFactor t21, t38:1 Replacing.1 t35: i32,ch = load<(load 4 from %ir.helper.20.64 + 4, align 8)> t21, t37, undef:i32 With: t38: i32,ch = load<(load 4 from %ir.helper.20.64 + 4, align 8)> t0, t37, undef:i32 The "align 8" that magically appeared on t35 is the base align on %ir.helper.20.64, not the align on %ir.helper.20.64+4. The actual combine it's reporting here is from the code in DAGCombiner::isAlias (see line 17955) using this base alignment (getOriginalAlignment()) to conclude that the load (t35) and a store (t21 to %ir.helper.24.32) don't alias. So it sets the chain to t0, and the load then gets scheduled before the store. It seems to me that setting the alignment here, presuming it's a base_align, is the part that's wrong (as opposed to any other part of this chain of events.) As Eli says though, the API here is really confusing, with it's multiple types of alignment, etc.

LGTM

This revision is now accepted and ready to land.Jun 19 2018, 12:31 PM

Very minor update as NewLoad is unused in release builds.

Thanks Eli.

Matt if you have any further comments, let me know.

Closed by commit rL335210: [DAGCombine] Fix alignment for offset loads/stores (authored by dmgreen). · Explain WhyJun 21 2018, 1:34 AM

This revision was automatically updated to reflect the committed changes.

Diff 150775

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,176 Lines • ▼ Show 20 Lines	if (ISD::isNON_TRUNCStore(Chain.getNode())) {
PrevST->getValue().getValueType() == N->getValueType(0))		PrevST->getValue().getValueType() == N->getValueType(0))
return CombineTo(N, PrevST->getOperand(1), Chain);		return CombineTo(N, PrevST->getOperand(1), Chain);
}		}
}		}

// Try to infer better alignment information than the load already has.		// Try to infer better alignment information than the load already has.
if (OptLevel != CodeGenOpt::None && LD->isUnindexed()) {		if (OptLevel != CodeGenOpt::None && LD->isUnindexed()) {
if (unsigned Align = DAG.InferPtrAlignment(Ptr)) {		if (unsigned Align = DAG.InferPtrAlignment(Ptr)) {
if (Align > LD->getMemOperand()->getBaseAlignment()) {		if (Align > LD->getAlignment() && LD->getSrcValueOffset() % Align == 0) {
SDValue NewLoad = DAG.getExtLoad(		SDValue NewLoad = DAG.getExtLoad(
		arsenmUnsubmitted Not Done Reply Inline Actions This should just use getAlignment() rather than looking at the two underlying alignments? arsenm: This should just use getAlignment() rather than looking at the two underlying alignments?
		dmgreenAuthorUnsubmitted Not Done Reply Inline Actions I'm not sure what you mean by two underlying alignments. Do you mean use LD->getAlignment(), not LD->getOriginalAlignment()? The Align passed to getExtLoad seems to be treated as a base align, not a base+offset align. dmgreen: I'm not sure what you mean by two underlying alignments. Do you mean use LD->getAlignment()…
		arsenmUnsubmitted Not Done Reply Inline Actions Yes. The original code doesn't make sense to me, if it's looking for an improved alignment it should be looking at alignment the load already has, rather than for some reason looking at the MMO. It's the alignment of the load piece itself. The "base alignment" refers to the alignment of the IR piece, and really nothing in the DAG should be particularly concerned with it. arsenm: Yes. The original code doesn't make sense to me, if it's looking for an improved alignment it…
		dmgreenAuthorUnsubmitted Not Done Reply Inline Actions I'm not sure if the load has a alignment itself, unless I'm reading this wrong. The MemSDNode::getAlignment() just calls MMO->getAlignment(), which is MinAlign(getBaseAlignment(), getOffset()). So I was going with getBaseAlignment as it's a simpler call and Align needs to treated as a BaseAlign (which the mod check ensures is valid). Happy enough to change it. dmgreen: I'm not sure if the load has a alignment itself, unless I'm reading this wrong. The MemSDNode…
		arsenmUnsubmitted Not Done Reply Inline Actions Yes, the load alignment is ultimately stored in the MMO. I meant it shouldn't need to be looking at the underlying separate IR pieces and worrying about if it's at an offset from the base. Does this not work if you just change it to getAlignment() for some reason? arsenm: Yes, the load alignment is ultimately stored in the MMO. I meant it shouldn't need to be…
		dmgreenAuthorUnsubmitted Not Done Reply Inline Actions The chain of events that happen in the attached test case is we have these: t23: ch = store<(store 4 into %ir.helper.20.32)> t0, Constant:i32<0>, t3, undef:i32 t21: ch = store<(store 4 into %ir.helper.24.32, align 8)> t0, Constant:i32<0>, t29, undef:i32 t27: i64,ch = load<(load 8 from %ir.helper.20.64, align 4)> t26, t3, undef:i32 The load is a 64bit unaligned load from %helper + 20. The stores are 32bit to %helper+20 and %helper+24. %helper is align 8. The load is legalised to these 32bit loads: t30: i32,ch = load<(load 4 from %ir.helper.20.64)> t26, t3, undef:i32 t33: i32,ch = load<(load 4 from %ir.helper.20.64 + 4)> t26, t32, undef:i32 The second one, via it being a Frame Index in InferPtrAlignment, is deemed to have an alignment of 8 (which is true for the load, (20+4)%8==0, but not for the base). So if we use Align in this getExtLoad, it is passes to getLoad, which passes it to MachineFunction::getMachineMemOperand as base_align. So the base_align on the MMO gets set to 8, and we end up with this: Combining: t35: i32,ch = load<(load 4 from %ir.helper.20.64 + 4)> t21, t37, undef:i32 Creating new node: t38: i32,ch = load<(load 4 from %ir.helper.20.64 + 4, align 8)> t0, t37, undef:i32 Creating new node: t39: ch = TokenFactor t21, t38:1 Replacing.1 t35: i32,ch = load<(load 4 from %ir.helper.20.64 + 4, align 8)> t21, t37, undef:i32 With: t38: i32,ch = load<(load 4 from %ir.helper.20.64 + 4, align 8)> t0, t37, undef:i32 The "align 8" that magically appeared on t35 is the base align on %ir.helper.20.64, not the align on %ir.helper.20.64+4. The actual combine it's reporting here is from the code in DAGCombiner::isAlias (see line 17955) using this base alignment (getOriginalAlignment()) to conclude that the load (t35) and a store (t21 to %ir.helper.24.32) don't alias. So it sets the chain to t0, and the load then gets scheduled before the store. It seems to me that setting the alignment here, presuming it's a base_align, is the part that's wrong (as opposed to any other part of this chain of events.) As Eli says though, the API here is really confusing, with it's multiple types of alignment, etc. dmgreen: The chain of events that happen in the attached test case is we have these: t23: ch = store<…
LD->getExtensionType(), SDLoc(N), LD->getValueType(0), Chain, Ptr,		LD->getExtensionType(), SDLoc(N), LD->getValueType(0), Chain, Ptr,
LD->getPointerInfo(), LD->getMemoryVT(), Align,		LD->getPointerInfo(), LD->getMemoryVT(), Align,
LD->getMemOperand()->getFlags(), LD->getAAInfo());		LD->getMemOperand()->getFlags(), LD->getAAInfo());
if (NewLoad.getNode() != N)		if (NewLoad.getNode() != N)
		efriedmaUnsubmitted Not Done Reply Inline Actions Can we get rid of this check? efriedma: Can we get rid of this check?
return CombineTo(N, NewLoad, SDValue(NewLoad.getNode(), 1), true);		return CombineTo(N, NewLoad, SDValue(NewLoad.getNode(), 1), true);
}		}
}		}
}		}

if (LD->isUnindexed()) {		if (LD->isUnindexed()) {
// Walk up chain skipping non-aliasing memory nodes.		// Walk up chain skipping non-aliasing memory nodes.
SDValue BetterChain = FindBetterChain(N, Chain);		SDValue BetterChain = FindBetterChain(N, Chain);
▲ Show 20 Lines • Show All 1,985 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSTORE(SDNode *N) {

// Turn 'store undef, Ptr' -> nothing.		// Turn 'store undef, Ptr' -> nothing.
if (Value.isUndef() && ST->isUnindexed())		if (Value.isUndef() && ST->isUnindexed())
return Chain;		return Chain;

// Try to infer better alignment information than the store already has.		// Try to infer better alignment information than the store already has.
if (OptLevel != CodeGenOpt::None && ST->isUnindexed()) {		if (OptLevel != CodeGenOpt::None && ST->isUnindexed()) {
if (unsigned Align = DAG.InferPtrAlignment(Ptr)) {		if (unsigned Align = DAG.InferPtrAlignment(Ptr)) {
if (Align > ST->getAlignment()) {		if (Align > ST->getAlignment() && ST->getSrcValueOffset() % Align == 0) {
SDValue NewStore =		SDValue NewStore =
DAG.getTruncStore(Chain, SDLoc(N), Value, Ptr, ST->getPointerInfo(),		DAG.getTruncStore(Chain, SDLoc(N), Value, Ptr, ST->getPointerInfo(),
ST->getMemoryVT(), Align,		ST->getMemoryVT(), Align,
ST->getMemOperand()->getFlags(), ST->getAAInfo());		ST->getMemOperand()->getFlags(), ST->getAAInfo());
if (NewStore.getNode() != N)		if (NewStore.getNode() != N)
return CombineTo(ST, NewStore, true);		return CombineTo(ST, NewStore, true);
}		}
}		}
▲ Show 20 Lines • Show All 3,978 Lines • Show Last 20 Lines

test/CodeGen/ARM/alias_align.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "armv8-arm-none-eabi"

				; Check the loads happen after the stores (note: directly returning 0 is also valid)
				; CHECK-LABEL: somesortofhash
				; CHECK-NOT: ldr
				; CHECK: str

				define i64 @somesortofhash() {
				entry:
				%helper = alloca i8, i32 64, align 8
				%helper.0.4x32 = bitcast i8* %helper to <4 x i32>*
				%helper.20 = getelementptr inbounds i8, i8* %helper, i32 20
				%helper.24 = getelementptr inbounds i8, i8* %helper, i32 24
				store <4 x i32> zeroinitializer, <4 x i32>* %helper.0.4x32, align 8
				%helper.20.32 = bitcast i8* %helper.20 to i32*
				%helper.24.32 = bitcast i8* %helper.24 to i32*
				store i32 0, i32* %helper.20.32
				store i32 0, i32* %helper.24.32, align 8
				%helper.20.64 = bitcast i8* %helper.20 to i64*
				%load.helper.20.64 = load i64, i64* %helper.20.64, align 4
				ret i64 %load.helper.20.64
				}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Fix alignment for offset loads/stores
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 150775

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/ARM/alias_align.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Fix alignment for offset loads/storesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 150775

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/ARM/alias_align.ll

[DAGCombine] Fix alignment for offset loads/stores
ClosedPublic