Page MenuHomePhabricator

make fast unaligned memory accesses implicit with SSE4.2 or SSE4a
ClosedPublic

Authored by spatel on Aug 24 2015, 9:06 AM.

Details

Summary

This is a follow-on from the discussion in http://reviews.llvm.org/D12154.

This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has generally fast unaligned memory ops.

A motivating use case for this change is a clang invocation that doesn't explicitly set the CPU, but does target a feature that we know only exists on a CPU that supports fast unaligned memops. For example:
$ clang -O1 foo.c -mavx

This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449

Currently, we use different store types depending on whether the example can be lowered as a memset or not.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 32957.Aug 24 2015, 9:06 AM
spatel retitled this revision from to make fast unaligned memory accesses implicit with SSE4.2 or SSE4a.
spatel updated this object.
spatel added a subscriber: llvm-commits.
RKSimon accepted this revision.Aug 24 2015, 12:55 PM
RKSimon edited edge metadata.

LGTM - My only concern was Via Nano which Agner says has particularly bad unaligned memory access. But these appear to only have SSE41.

This revision is now accepted and ready to land.Aug 24 2015, 12:55 PM
zansari edited edge metadata.Aug 24 2015, 1:14 PM

Hi Sanjay,

Just one tiny comment, otherwise lgtm.

Thanks,
Zia.

lib/Target/X86/X86Subtarget.cpp
197 ↗(On Diff #32957)

Nehalem/Silvermont

spatel marked an inline comment as done.Aug 25 2015, 9:29 AM
spatel added inline comments.
lib/Target/X86/X86Subtarget.cpp
197 ↗(On Diff #32957)

Thanks - updated.

This revision was automatically updated to reflect the committed changes.
spatel marked an inline comment as done.