Page MenuHomePhabricator

[ARM] Fix lowering of misaligned memcpy/memset
ClosedPublic

Authored by john.brawn on May 23 2017, 7:40 AM.

Details

Summary

Currently getOptimalMemOpType returns i32 for large enough sizes without checking for alignment, leading to poor code generation when misaligned accesses aren't permitted as we generate a word store then later split it up into byte stores. This means we inadvertantly go over the MaxStoresPerMemcpy limit and for memset we splat the memset value into a word then immediately split it up again.

Fix this by leaving it up to FindOptimalMemOpLowering to figure out which type to use, but also fix a bug there where it wasn't correctly checking if misaligned memory accesses are allowed.

Diff Detail

Repository
rL LLVM

Event Timeline

john.brawn created this revision.May 23 2017, 7:40 AM

Please regenerate the tests in a separate commit, to make it clear what this patch is actually changing. Please change the description to make it clear what this is actually fixing; as far as I can tell, we aren't generating "wrong code", just inlining memcpy and memset calls too aggressively.

john.brawn edited the summary of this revision. (Show Details)May 24 2017, 6:22 AM

Please regenerate the tests in a separate commit, to make it clear what this patch is actually changing.

I've moved the test changes that aren't directly due to this code change to D33495

Please change the description to make it clear what this is actually fixing; as far as I can tell, we aren't generating "wrong code", just inlining memcpy and memset calls too aggressively.

Done

efriedma added inline comments.May 24 2017, 11:06 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4782

The use of getPointerTy() here seems dubious. On many architectures, you can use getPointerTy as a rough proxy for the largest legal integer type, but that's not universal, and the usage of DstAS doesn't make any sense.

Maybe just loop over {i64, i32, i16}.

4796

The use of isTypeLegal() here looks suspicious; i16 is not legal on ARM, but we definitely want to use i16 stores if we can. Do you have any testcases where the known alignment is two?

john.brawn added inline comments.May 25 2017, 6:42 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
4782

I'd thought about doing that but instead went with the simpler change. I'll give the loop approach a try.

4796

This loop is actually finding the largest legal type, i.e. i32 on ARM, so that if VT is larger than that it's used instead. Adding a test sounds like a good idea though.

Adjust FindOptimalMemOpLowering to use a loop instead of the pointer type, and add a 2-byte aligned test.

This revision is now accepted and ready to land.May 25 2017, 12:12 PM
This revision was automatically updated to reflect the committed changes.