MemCpyOpt current folds preceeding undef stores when creating MemSetRange, i.e. [undef, undef, undef, 0]. However, succeeding undefs end the range, meaning that cases like [0, undef, 0, undef, 0, undef, 0, undef, 0, undef] don't get folded. This patch addresses this case.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
I think there are two problems here:
- Profitability: As implemented, I believe this will also transform some non-profitable cases, e.g. imagine a store 0, followed by one hundred store undef, followed by store 0. We'd rather make those two stores than one long memset. This needs some kind of profitability heuristic.
- Phase ordering: InstCombine will remove store undef, and it runs before MemCpyOpt. So MemCpyOpt will actually never see this kind of IR (except in degenerate cases) and this change will not make any difference in end-to-end compilation. What one could do here is to allow "holes" in the memset range if the memory is known to be uninitialized beforehand, though that wouldn't apply in your example.
llvm/test/Transforms/MemCpyOpt/merge-undef-memset.ll | ||
---|---|---|
2 | Please:
|
Comment Actions
I've updated the isProfitable function to take into account the number of *non undef* stores rather than stores overall.
- Phase ordering: InstCombine will remove store undef, and it runs before MemCpyOpt. So MemCpyOpt will actually never see this kind of IR (except in degenerate cases) and this change will not make any difference in end-to-end compilation. What one could do here is to allow "holes" in the memset range if the memory is known to be uninitialized beforehand, though that wouldn't apply in your example.
The motivation for this is https://github.com/rust-lang/rust/issues/104290 - experimentally adding MemCpyOpt between an early SROA and InstCombine pass does indeed fix this case, but causes a regression in instcombine-sroa-inttoptr.ll. I'm also not sure if it's appropriate to run this pass so early on.
Please: