This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Fix alignment for offset loads/stores
ClosedPublic

Authored by dmgreen on Jun 11 2018, 8:31 AM.

Details

Summary

This test case (which I hope is free on UB), has two stores of 0 to offsets 20 and 24 in a chunk of memory:
store i32 0, i32* %helper.20.32
store i32 0, i32* %helper.24.32, align 8
A 64 bit load, aligned to 4 bytes:
%load.helper.20.64 = load i64, i64* %helper.20.64, align 4

This is on AArch32, so during type legalisation the i64 load is split into two 32bit loads. The second of them:
t35: i32,ch = load<(load 4 from %ir.helper.20.64 + 4)> t21, t37, undef:i32
gets marked as being align 8 (note: the base+offset is align 8, not the base). This is then deemed to not alias with the load to %helper.24.32 as the alignment just set is taken as the base alignment, not the base+offset alignment.

The test case seems to need a <4 x i32> which on ARM is converted to a VLD1_UPD. I believe this pushes certain optimisation back later after legalisation. Originally it needed -combiner-global-alias-analysis, but this version shows the same error without.

Here I've set the updated alignment only if the alignment hold true for the base.

Diff Detail

Event Timeline

dmgreen created this revision.Jun 11 2018, 8:31 AM
arsenm added a subscriber: arsenm.Jun 11 2018, 8:38 AM
arsenm added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12185–12186

This should just use getAlignment() rather than looking at the two underlying alignments?

Hello.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12185–12186

I'm not sure what you mean by two underlying alignments. Do you mean use LD->getAlignment(), not LD->getOriginalAlignment()?

The Align passed to getExtLoad seems to be treated as a base align, not a base+offset align.

arsenm added inline comments.Jun 11 2018, 9:10 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12185–12186

Yes. The original code doesn't make sense to me, if it's looking for an improved alignment it should be looking at alignment the load already has, rather than for some reason looking at the MMO.

It's the alignment of the load piece itself. The "base alignment" refers to the alignment of the IR piece, and really nothing in the DAG should be particularly concerned with it.

dmgreen added inline comments.Jun 11 2018, 9:26 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12185–12186

I'm not sure if the load has a alignment itself, unless I'm reading this wrong. The MemSDNode::getAlignment() just calls MMO->getAlignment(), which is MinAlign(getBaseAlignment(), getOffset()).

So I was going with getBaseAlignment as it's a simpler call and Align needs to treated as a BaseAlign (which the mod check ensures is valid). Happy enough to change it.

dmgreen updated this revision to Diff 150775.Jun 11 2018, 9:30 AM

getOriginalAlignment() -> getAlignment()

Ping. This look Ok now?

The getExtLoad/getTruncStore API seems really confusing, but I guess there's no simple way to fix it.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12190

Can we get rid of this check?

dmgreen updated this revision to Diff 151894.Jun 19 2018, 4:44 AM

I have changed these to asserts, which I believe will never fire. I'm not sure if it breaks the design on DAGCombiner to work like this, just refining the alignment on the MMO.

arsenm added inline comments.Jun 19 2018, 4:51 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12185–12186

Yes, the load alignment is ultimately stored in the MMO. I meant it shouldn't need to be looking at the underlying separate IR pieces and worrying about if it's at an offset from the base. Does this not work if you just change it to getAlignment() for some reason?

dmgreen added inline comments.Jun 19 2018, 6:26 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12185–12186

The chain of events that happen in the attached test case is we have these:

t23: ch = store<(store 4 into %ir.helper.20.32)> t0, Constant:i32<0>, t3, undef:i32
t21: ch = store<(store 4 into %ir.helper.24.32, align 8)> t0, Constant:i32<0>, t29, undef:i32
t27: i64,ch = load<(load 8 from %ir.helper.20.64, align 4)> t26, t3, undef:i32

The load is a 64bit unaligned load from %helper + 20. The stores are 32bit to %helper+20 and %helper+24. %helper is align 8.

The load is legalised to these 32bit loads:

t30: i32,ch = load<(load 4 from %ir.helper.20.64)> t26, t3, undef:i32
t33: i32,ch = load<(load 4 from %ir.helper.20.64 + 4)> t26, t32, undef:i32

The second one, via it being a Frame Index in InferPtrAlignment, is deemed to have an alignment of 8 (which is true for the load, (20+4)%8==0, but not for the base).

So if we use Align in this getExtLoad, it is passes to getLoad, which passes it to MachineFunction::getMachineMemOperand as base_align. So the base_align on the MMO gets set to 8, and we end up with this:

Combining: t35: i32,ch = load<(load 4 from %ir.helper.20.64 + 4)> t21, t37, undef:i32
Creating new node: t38: i32,ch = load<(load 4 from %ir.helper.20.64 + 4, align 8)> t0, t37, undef:i32
Creating new node: t39: ch = TokenFactor t21, t38:1
Replacing.1 t35: i32,ch = load<(load 4 from %ir.helper.20.64 + 4, align 8)> t21, t37, undef:i32
With: t38: i32,ch = load<(load 4 from %ir.helper.20.64 + 4, align 8)> t0, t37, undef:i32

The "align 8" that magically appeared on t35 is the base align on %ir.helper.20.64, not the align on %ir.helper.20.64+4.

The actual combine it's reporting here is from the code in DAGCombiner::isAlias (see line 17955) using this base alignment (getOriginalAlignment()) to conclude that the load (t35) and a store (t21 to %ir.helper.24.32) don't alias. So it sets the chain to t0, and the load then gets scheduled before the store.

It seems to me that setting the alignment here, presuming it's a base_align, is the part that's wrong (as opposed to any other part of this chain of events.) As Eli says though, the API here is really confusing, with it's multiple types of alignment, etc.

This revision is now accepted and ready to land.Jun 19 2018, 12:31 PM
dmgreen updated this revision to Diff 152042.Jun 20 2018, 2:41 AM

Very minor update as NewLoad is unused in release builds.

Thanks Eli.

Matt if you have any further comments, let me know.

This revision was automatically updated to reflect the committed changes.