Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

make fast unaligned memory accesses implicit with SSE4.2 or SSE4a

Authored by spatel on Aug 24 2015, 9:06 AM.



This is a follow-on from the discussion in

This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has generally fast unaligned memory ops.

A motivating use case for this change is a clang invocation that doesn't explicitly set the CPU, but does target a feature that we know only exists on a CPU that supports fast unaligned memops. For example:
$ clang -O1 foo.c -mavx

This resolves a difference in lowering noted in PR24449:

Currently, we use different store types depending on whether the example can be lowered as a memset or not.

Diff Detail


Event Timeline

spatel updated this revision to Diff 32957.Aug 24 2015, 9:06 AM
spatel retitled this revision from to make fast unaligned memory accesses implicit with SSE4.2 or SSE4a.
spatel updated this object.
spatel added a subscriber: llvm-commits.
RKSimon accepted this revision.Aug 24 2015, 12:55 PM
RKSimon edited edge metadata.

LGTM - My only concern was Via Nano which Agner says has particularly bad unaligned memory access. But these appear to only have SSE41.

This revision is now accepted and ready to land.Aug 24 2015, 12:55 PM
zansari edited edge metadata.Aug 24 2015, 1:14 PM

Hi Sanjay,

Just one tiny comment, otherwise lgtm.


197 ↗(On Diff #32957)


spatel marked an inline comment as done.Aug 25 2015, 9:29 AM
spatel added inline comments.
197 ↗(On Diff #32957)

Thanks - updated.

This revision was automatically updated to reflect the committed changes.
spatel marked an inline comment as done.