Page MenuHomePhabricator

[ARM] Tighten up VLDRH.32 with low alignments

Authored by dmgreen on Aug 1 2019, 7:46 AM.



VLDRH needs to have an alignment of at least 2, including the widening/narrowing versions. This tightens up the ISel patterns for it and alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded through the stack. It also fixed some incorrect shift amounts, which seemed to be passing a multiple not a shift.

Diff Detail


Event Timeline

dmgreen created this revision.Aug 1 2019, 7:46 AM
samparker added inline comments.Aug 7 2019, 1:06 AM
756 ↗(On Diff #212793)

I am so confused by this, can you explain it for me please?

simon_tatham added inline comments.Aug 7 2019, 2:06 AM
756 ↗(On Diff #212793)

(Drive-by comment since this crossed my inbox)

I think what's going on here is:

VLDRH.S32 means: load 8 bytes of memory, regard them as 4 16-bit halfwords (H), and sign-extend each one into a 32-bit lane (S32) of the output vector register.

But it requires alignment of at least 2 on the memory it's loading from. So in order to apply it to 8 bytes starting at an odd address, the generated code is copying the 8 source bytes to an aligned 8-byte stack slot, and then pointing the VLDRH.S32 at that instead.

I assume this run of llc is in a mode where it assumes unaligned access support on the ordinary LDR instruction has been enabled in the hardware configuration. (If I remember, that's the default – to generate code compatible with a CPU that has that turned _off_ you have to say -mno-unaligned-access in clang, or whatever llc's equivalent option is.)

samparker added inline comments.Aug 7 2019, 2:28 AM
756 ↗(On Diff #212793)

Bah, thanks! For some reason I wasn't thinking about the need to widen, all the loads really threw me.

samparker accepted this revision.Aug 7 2019, 2:32 AM

With my confusion cleared, LGTM.

This revision is now accepted and ready to land.Aug 7 2019, 2:32 AM
dmgreen marked an inline comment as done.Aug 7 2019, 11:02 PM


756 ↗(On Diff #212793)

Yep, this is the default fallback of "align it via the stack and load it again". Its obviously not very efficient, but I don't believe it will often come up (it's only for unaligned 16bit loads). If it does we may be able to do something better, perhaps by splitting out the extend.

This revision was automatically updated to reflect the committed changes.