This is an archive of the discontinued LLVM Phabricator instance.

[X86] memset should be using REPSTOS for memset on recent CPU (that have ERMS).
Needs ReviewPublic

Authored by courbet on Apr 25 2017, 2:02 AM.

Details

Reviewers
zvi
andreadb
Summary

This dramatically improves memset for aligned buffers (no changes for
unaligned buffers).

For example: On Haswell, throughput is roughly doubled and nearly maxes out the
bandwidth (30 B/cycle instead of 15 B/cycle before this change, with a max
bandwidth of 32 B/cycle).

See the graph here:
https://docs.google.com/spreadsheets/d/1bbT5Oqj3e5SFNh_5oKpwghEQuLazHI95E0-htGrADZ4/pubchart?oid=1858075526&format=interactive

Diff Detail

Event Timeline

courbet created this revision.Apr 25 2017, 2:02 AM
courbet added subscribers: craig.topper, RKSimon.
zvi added inline comments.Apr 26 2017, 1:03 PM
lib/Target/X86/X86SelectionDAGInfo.cpp
66

Can this be done in a parent commit?

201

Can you please move the memcpy-related refactoring to a separate patch? It would be best if this patch was minimized to the memset improvement alone.

courbet updated this revision to Diff 96881.Apr 27 2017, 12:41 AM
courbet marked 2 inline comments as done.

Rebase.

reames added a subscriber: reames.Apr 27 2017, 11:22 AM

Looking at your plots and cross checking the intel docs, I don't see that this is obviously a good idea for unaligned copies. The aligned case seems clear, but the unaligned case does show drops in performance for smaller sizes.