This is an archive of the discontinued LLVM Phabricator instance.

[x86] fix allowsMisalignedMemoryAccess() implementation
ClosedPublic

Authored by spatel on Jun 23 2015, 10:33 AM.

Details

Summary

The ultimate motivation for this patch is to fix the part of PR21711 ( https://llvm.org/bugs/show_bug.cgi?id=21711#c12 ) that is still not working. To get there, I'd like to use TLI.allowsMemoryAccess() in DAGCombiner's MergeConsecutiveStores(). This will require fixing bugs in x86, AArch64 (see post-commit thread for r227242) and possibly other targets.

This patch fixes the x86 implementation of allowsMisalignedMemoryAccess() to correctly return the 'Fast' output parameter for 32-byte accesses. To test that, an existing load merging optimization is changed to use the TLI hook. This exposes a shortcoming in the current logic and results in the regression test update. Changing other direct users of the isUnalignedMem32Slow() x86 CPU attribute would be a follow-on patch.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 28252.Jun 23 2015, 10:33 AM
spatel retitled this revision from to [x86] fix allowsMisalignedMemoryAccess() implementation.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: qcolombet, jyknight, hfinkel.
spatel added a subscriber: Unknown Object (MLST).
qcolombet added inline comments.Jun 30 2015, 5:07 PM
lib/Target/X86/X86ISelLowering.cpp
1802 ↗(On Diff #28252)

Shouldn't we check the alignment from the data layout?

jyknight added inline comments.Jul 1 2015, 8:58 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4131 ↗(On Diff #28252)

It still might actually have an alignment greater than 1 though, right? (I guess it probably doesn't really matter much, though.)

lib/Target/X86/X86ISelLowering.cpp
1802 ↗(On Diff #28252)

Why does this even have a check for correct alignment at all? The function is "allowsMisalignedMemoryAccesses" -- the assumption being you already know your data isn't aligned.

I think it's the caller's responsibility to call DataLayout::getPrefTypeAlignment, isn't it?

Perhaps the static "allowableAlignment" helper function in DAGCombiner.cpp should be made more generally available, to make doing so easier.

qcolombet added inline comments.Jul 1 2015, 10:07 AM
lib/Target/X86/X86ISelLowering.cpp
1802 ↗(On Diff #28252)

My bad, you're right.

spatel added inline comments.Jul 1 2015, 11:56 AM
lib/Target/X86/X86ISelLowering.cpp
1802 ↗(On Diff #28252)

An audit of the trunk overrides of this function shows that only the SI lowering in the AMDGPU backend actually makes use of the Align param. Based on that implementation, it looks like the intended usage of the param is to specify how bad the misalignment can be (an 8-byte access with only 4-byte alignment is ok on that target).

There are a few spots in LegalizeDAG and SelectionDAG that check the DataLayout before calling allowsMisalignedMemoryAccesses(), so making allowableAlignment() more accessible sounds like a good change to me. I'll work on that and then fix this patch up.

Thanks!

spatel updated this revision to Diff 31982.Aug 12 2015, 2:37 PM

We now have a decent (if not perfect) TLI.allowsMemoryAccess() after r243549 (D10905). That makes this patch considerably simpler: change a load merging optimization to use the new hook and fix the 'fast' reporting for x86 misaligned 32-byte accesses.

Without the fix in allowsMisalignedMemoryAccesses(), we will infinite loop when targeting SandyBridge because LowerINSERT_SUBVECTOR() creates 32-byte loads from two 16-byte loads while PerformLOADCombine() splits them back into 16-byte loads.

spatel updated this object.Aug 12 2015, 2:42 PM
spatel updated this object.
jyknight added inline comments.Aug 14 2015, 8:25 AM
lib/Target/X86/X86ISelLowering.cpp
1910 ↗(On Diff #31982)

This (pre-existing code!) seems really wrong. "isUnalignedMemAccessFast" is a very-poorly-named predicate, which only really is intended to indicate whether unaligned SSE 16-byte memory accesses are fast.

I believe unaligned access of all other sizes should always be treated as fast on x86. Does it break anything if you fix that too while you're in here?

spatel added inline comments.Aug 14 2015, 8:39 AM
lib/Target/X86/X86ISelLowering.cpp
1910 ↗(On Diff #31982)

I agree that the name is wrong; I added that FIXME note in x86.td. :)
And yes, the logic here looks quite wrong to me. I'm working on a possibly related bug in PR24449. If it's alright with you, I'd like to get this patch in since it's a small independent fix. Then, I'll make sure the 16-byte and under checks are working as intended and put a patch up for review for that.

jyknight accepted this revision.Aug 14 2015, 10:09 AM
jyknight edited edge metadata.
This revision is now accepted and ready to land.Aug 14 2015, 10:09 AM
This revision was automatically updated to reflect the committed changes.