This is an archive of the discontinued LLVM Phabricator instance.

[x86] avoid intermediate splat for non-zero memsets (PR27100)
ClosedPublic

Authored by spatel on Mar 31 2016, 2:22 PM.

Details

Summary

Follow-up to D18566 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars.

That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply.

The tests that were added in the last patch are now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. In the new tests, the splat via shuffling looks ok to me, but there might be some room for improvement depending on uarch there.

Note that I didn't change the SSE1/2 paths in this patch. I will follow-up with that patch next. This patch should resolve PR27100.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 52287.Mar 31 2016, 2:22 PM
spatel retitled this revision from to [x86] avoid intermediate splat for non-zero memsets (PR27100).
spatel updated this object.
spatel added reviewers: zansari, RKSimon, hjl.tools.
spatel added a subscriber: llvm-commits.
RKSimon accepted this revision.Apr 1 2016, 4:12 AM
RKSimon edited edge metadata.

LGTM - one comment

lib/Target/X86/X86ISelLowering.cpp
2035 ↗(On Diff #52287)

This /is/ a legal type for AVX1 - the trouble is we can't do much with it.

This revision is now accepted and ready to land.Apr 1 2016, 4:12 AM

Nice patch Sanjay.

test/CodeGen/X86/memset-nonzero.ll
94 ↗(On Diff #52287)

I noticed that on AVX we now always generate a vmovaps to load a vector of constants.
That's obviously fine. However, I wonder if a vbroadcastss would be more appropriate in this case as it would use a smaller constant (for code size only - in this example we would save 28 bytes).

RKSimon added inline comments.Apr 1 2016, 4:56 AM
test/CodeGen/X86/memset-nonzero.ll
94 ↗(On Diff #52287)

This is what is being discussed on PR27141 - its proving tricky to determine when the broadcast is worth it and when it will cause register pressure issues.

spatel added inline comments.Apr 1 2016, 7:25 AM
lib/Target/X86/X86ISelLowering.cpp
2035 ↗(On Diff #52287)

Thanks - I'll fix that comment before committing.

test/CodeGen/X86/memset-nonzero.ll
94 ↗(On Diff #52287)

The difference is actually on the AVX2 side; AVX1 was loading a vector already just with a different format (v4f32).

I know we've discussed the splat load vs. vector load trade-off before. Let's follow up in PR27141, or I'll open another report where we can see the problem that Simon mentioned directly.

This revision was automatically updated to reflect the committed changes.