This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Patchset for expanding memcpy/memset using at most 2 stores.
ClosedPublic

Authored by jonpa on Mar 20 2022, 12:02 PM.

Details

Summary

This is my proposal for a combined patch that expands memcpys/memsets of up to 32 bytes in length (but greater than 16 bytes), handles the conversion to VREP in combineSTORE(), and rejects big displacements for vector type in isLegalAddrssingMode() which helps the DAGCombiner to handle the multiple uses of the addresses used by the load/store sequences. In addition, the GEP offsets splitting is also enabled as it seems sensible to do this when the big displacements are rejected, even thought this is not really related to the memcpys/memsets. It would probably be possible to do only memop expansions and handlings, but for a performance measurement I think we could do the whole set and expect no regressions, and it would make sense generally...

This is based on https://reviews.llvm.org/D120277 "[SystemZ] Expand some memcpys/memsets into Load/Store sequences." and https://reviews.llvm.org/D120531 "[SystemZ] Use VREP for storing replicated regs/immediates." with a limit of 2 stores per memcpy/memset and handling of replication in combineSTORE().

Tests:

  • combineSTORE() general handling of replicated values: store-replicated-vals.ll
  • Memcpy expansions: memcpy-03.ll
  • Memset expansions: memset-08.ll
  • Splitting big GEP offsets in CGP: codegenprepare-gepoffs-split.ll
  • Rejecting big displacments for vector type in isLegalAddressingMode(): dag-combine-06.ll

combineSTORE():

  • isOnlyUsedByStores() extended to handle also a BuildVectorSDNode. This is needed to handle tests memset-08.ll/reg21()-reg24() optimally, but it is NFC on benchmarks. I guess these cases might as well be handled properly.
  • The handling in combineSTORE() is somewhat involved, as it handles not just one case but scalar and vector replication of both immediates and registers. It makes sense to me to have this as it eliminates scalar multiplications also in other cases than memcpy/memset expansions, but as mentioned before, it would be possible to change SelectionDAGBuilder to emit the splat directly instead, which would then handle just the memcpy/memset expansions in a simpler way.

Common code changes:

  • Check for MVT::Untyped in findOptimalMemOpLowering(), in which case the memop is not expanded to loads/stores. For SystemZ, this would be in cases where a single MVC of length 16 or less could be used.

Diff Detail

Event Timeline

jonpa created this revision.Mar 20 2022, 12:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2022, 12:02 PM
jonpa requested review of this revision.Mar 20 2022, 12:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2022, 12:02 PM

Looks good to me in principle, just a couple suggestions for refactoring inline. Before committing this, we should confirm the performance impact.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
201 ↗(On Diff #416795)

I'm wondering if it would be better to override findOptimalMemOpLowering and handle that check there.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
829

Maybe it would be simpler to have the APFloat constructor only set isFP128 and then call into the APInt constructor with FPImm.bitcastToAPInt(). The we also wouldn't need a separate findSplat routine.

1010

So all the above checks would be in an overloaded findOptimalMemOpLowering, and the overloaded getOptimalMemOpType would consist solely of the line below.

jonpa marked 3 inline comments as done.May 1 2022, 5:08 AM
jonpa added inline comments.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
201 ↗(On Diff #416795)

Yes, probably better...

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
829

Yes, that is better

jonpa updated this revision to Diff 426293.May 1 2022, 5:12 AM
jonpa marked 2 inline comments as done.

Patch updated per review.

uweigand accepted this revision.May 2 2022, 2:57 AM

This version LGTM, assuming the performance results are good. Thanks!

This revision is now accepted and ready to land.May 2 2022, 2:57 AM
This revision was landed with ongoing or failed builds.May 13 2022, 6:31 AM
This revision was automatically updated to reflect the committed changes.