This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Merge succeeding undefs while attempting a `memset`
Needs ReviewPublic

Authored by clubby789 on Dec 27 2022, 9:00 AM.

Details

Reviewers
nikic
Summary

MemCpyOpt current folds preceeding undef stores when creating MemSetRange, i.e. [undef, undef, undef, 0]. However, succeeding undefs end the range, meaning that cases like [0, undef, 0, undef, 0, undef, 0, undef, 0, undef] don't get folded. This patch addresses this case.

Diff Detail

Event Timeline

clubby789 created this revision.Dec 27 2022, 9:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 27 2022, 9:00 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
clubby789 requested review of this revision.Dec 27 2022, 9:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 27 2022, 9:00 AM
clubby789 updated this revision to Diff 485383.Dec 27 2022, 9:02 AM

Add unit test

clubby789 edited the summary of this revision. (Show Details)Dec 27 2022, 9:03 AM
nikic added a reviewer: nikic.EditedDec 27 2022, 11:59 AM
nikic added a subscriber: nikic.

I think there are two problems here:

  1. Profitability: As implemented, I believe this will also transform some non-profitable cases, e.g. imagine a store 0, followed by one hundred store undef, followed by store 0. We'd rather make those two stores than one long memset. This needs some kind of profitability heuristic.
  1. Phase ordering: InstCombine will remove store undef, and it runs before MemCpyOpt. So MemCpyOpt will actually never see this kind of IR (except in degenerate cases) and this change will not make any difference in end-to-end compilation. What one could do here is to allow "holes" in the memset range if the memory is known to be uninitialized beforehand, though that wouldn't apply in your example.
llvm/test/Transforms/MemCpyOpt/merge-undef-memset.ll
3

Please:

  • Use update_test_checks.py.
  • Fully convert the test to opaque pointers and remove the -opaque-pointers flag.
  • Drop the unnecessary datalayout (unless I'm missing something?)
  • Name instructions in test (run through opt -S -passes=instnamer).
clubby789 updated this revision to Diff 485420.Dec 27 2022, 2:07 PM
  • Update test format
  • Add heuristic for series of undef stores
clubby789 updated this revision to Diff 485421.Dec 27 2022, 2:08 PM

Update test format and add profitability heuristic

  1. Profitability: As implemented, I believe this will also transform some non-profitable cases, e.g. imagine a store 0, followed by one hundred store undef, followed by store 0. We'd rather make those two stores than one long memset. This needs some kind of profitability heuristic.

I've updated the isProfitable function to take into account the number of *non undef* stores rather than stores overall.

  1. Phase ordering: InstCombine will remove store undef, and it runs before MemCpyOpt. So MemCpyOpt will actually never see this kind of IR (except in degenerate cases) and this change will not make any difference in end-to-end compilation. What one could do here is to allow "holes" in the memset range if the memory is known to be uninitialized beforehand, though that wouldn't apply in your example.

The motivation for this is https://github.com/rust-lang/rust/issues/104290 - experimentally adding MemCpyOpt between an early SROA and InstCombine pass does indeed fix this case, but causes a regression in instcombine-sroa-inttoptr.ll. I'm also not sure if it's appropriate to run this pass so early on.

clubby789 updated this revision to Diff 485422.Dec 27 2022, 2:14 PM

clang-format

clubby789 updated this revision to Diff 485423.Dec 27 2022, 2:16 PM

clang-format

Sorry for the spam :/ I'm unfamiliar with Arcanist

clubby789 updated this revision to Diff 485424.Dec 27 2022, 2:20 PM
khei4 added a subscriber: khei4.May 20 2023, 8:57 PM