This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Implement getOptimalMemOpType for memcpy/memset lowering
ClosedPublic

Authored by reames on Jul 25 2023, 9:58 AM.

Details

Summary

This patch implements the getOptimalMemOpType callback which is used by the generic mem* lowering in SelectionDAG to pick the widest type used. This patch only changes the behavior when vector instructions are available, as the default is reasonable for scalar.

Without this change, we were emitting either XLEN sized stores (for aligned operations) or byte sized stores (for unaligned operations.) Interestingly, the final codegen was nowhere near as bad as that would seem to imply. Generic load combining and store merging kicked in, and frequently (but not always) produced pretty reasonable vector code.

The primary effects of this change are:

  1. Enable the use of vector operations for memset of non-constant. Our generic store merging logic doesn't know how to merge a broadcast store, and thus we were seeing the generic (and awful) byte expansion lowering for unaligned memset.
  2. Enable the generic misaligned overlap trick where we write to some of the same bytes twice. The alternative is to either a) use an increasing small sequence of stores for the tail or b) use VL to restrict the vector store. The later is not implemented at this time, so the former is what previously happened. Interestingly, I'm not sure that changing VL (as opposed to the overlap trick) is even obviously profitable here.

One thing I intentionally left out of this was lowering for operations with size less than min-VLENB. I've got some thoughts there, but I'm not sure exactly where we're going to settle yet, and my first attempt seems to require some changes to generic code which seemed worth separating.

Diff Detail

Event Timeline

reames created this revision.Jul 25 2023, 9:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2023, 9:58 AM
reames requested review of this revision.Jul 25 2023, 9:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2023, 9:58 AM
reames planned changes to this revision.Jul 25 2023, 6:32 PM

After offline discussion with @craig.topper, we decided that the overlap at LMUL8 is a bit too aggressive here. Going to rework this patch series.

kito-cheng added inline comments.Jul 25 2023, 7:16 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
17129–17136

Just a minor optimization thought here, we might use i16 or i32 (on RV64) if alignment is OK?

reames updated this revision to Diff 544411.Jul 26 2023, 9:56 AM

Changes:

  • Restrict to LMUL1 - This causes a minor regression in bzero lowering due to a store merge limitation I will address separately.
  • Handle NoImplicitFloat like other targets.
  • Use the largest type allowed by alignment.
  • Use ELEN not XLEN for deciding preferred element size. Most of our rv32+vector configurations support e64 and we can exploit that.
craig.topper accepted this revision.Jul 31 2023, 1:46 PM

LGTM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
17145

Did you mean to repeat this comment block from earlier?

This revision is now accepted and ready to land.Jul 31 2023, 1:46 PM
This revision was landed with ongoing or failed builds.Aug 1 2023, 12:15 PM
This revision was automatically updated to reflect the committed changes.