This is an archive of the discontinued LLVM Phabricator instance.

[GVN] MemorySSA for GVN: use MemorySSA for redundant loads elimination
Needs ReviewPublic

Authored by chill on Jan 7 2022, 10:39 AM.

Diff Detail

Event Timeline

chill created this revision.Jan 7 2022, 10:39 AM
chill requested review of this revision.Jan 7 2022, 10:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2022, 10:39 AM
chill planned changes to this revision.Jan 7 2022, 10:40 AM

Sorry for the spam, it's definitely not ready for review, it's entirely possibly I'll throw away the whole thing, but anyway posting here for the odd chance of an early feedback ...

chill updated this revision to Diff 401906.Jan 21 2022, 2:29 AM
chill edited the summary of this revision. (Show Details)

Passes llvm regression tests and llvm-test-suite on amd64 and aarch64, 1.05% slowdown on sqlite3 (with the child patch)

chill planned changes to this revision.Jan 21 2022, 2:30 AM
chill updated this revision to Diff 403278.Jan 26 2022, 8:12 AM
chill retitled this revision from [GVN] MemorySSA for GVN (WIP) to [GVN] MemorySSA for GVN: use MemorySSA for redundant loads elimination.
chill updated this revision to Diff 407207.Feb 9 2022, 10:37 AM
chill added inline comments.Feb 16 2022, 10:21 AM
llvm/lib/Transforms/Scalar/GVN.cpp
1020

Aha, segfault here, I guess.

chill updated this revision to Diff 409333.Feb 16 2022, 10:54 AM
chill added inline comments.Feb 21 2022, 8:40 AM
llvm/lib/Transforms/Scalar/GVN.cpp
2164

FIXME: This is way too conservative.

chill updated this revision to Diff 410335.Feb 21 2022, 10:42 AM
chill edited the summary of this revision. (Show Details)
chill updated this revision to Diff 410861.Feb 23 2022, 10:31 AM

Added limits to the amount of work we're willing to spend.

anna added a subscriber: anna.Feb 24 2022, 7:29 AM

Do we have any compile time measurements for this change with O3?

I have couple of high level comments/clarifications:

  • Are all mini-transforms within GVN moving to MSSA in this case?
  • Is the move to MemorySSA for performance of cases not caught through MDA?

@asbirlea, a question to clarify my understanding around enabling MSSA for a pass:

  • We optimistically build memorySSA for passes that unconditionally require it. This building and preservation can be costly from compile time perspective. Also, we may have possible invalidation by later function passes which do not preserve GVN, thereby having different implications for pipelines depending on where GVN is present. For example, if we end up with something like:
LPM {
 ...
 LICM <-- requires MSSA.

}
instcombine <-- invalidates MSSA
GVNPass() <-- build MSSA again
simplifyCFG <-- invalidates MSSA

is arguably much worse in compile time than:

LPM {
 ...
 LICM <-- requires MSSA.

}
GVNPass() <-- use preserved MSSA
instcombine <-- invalidates MSSA
simplifyCFG

Do we have any compile time measurements for this change with O3?

@chill mentioned he did some early testing on that, so he might have some numbers.
From my side, it's too early. I'm seeing a correctness issue in the early testing that needs resolving. I'm looking into getting @chill something actionable there.

I have couple of high level comments/clarifications:

  • Are all mini-transforms within GVN moving to MSSA in this case?

Is they needed MemDep before, yes.

  • Is the move to MemorySSA for performance of cases not caught through MDA?

It should address at least one or both. Primary goal is remove last user of MDA and remove the analysis from trunk. The implementation for the transition should be such that it either catches more cases or provides improved performance (or both). From previous experience with switching DSE, this is possible if implemented right.

@asbirlea, a question to clarify my understanding around enabling MSSA for a pass:

  • We optimistically build memorySSA for passes that unconditionally require it. This building and preservation can be costly from compile time perspective. Also, we may have possible invalidation by later function passes which do not preserve GVN, thereby having different implications for pipelines depending on where GVN is present. For example, if we end up with something like:
LPM {
 ...
 LICM <-- requires MSSA.

}
instcombine <-- invalidates MSSA
GVNPass() <-- build MSSA again
simplifyCFG <-- invalidates MSSA

is arguably much worse in compile time than:

LPM {
 ...
 LICM <-- requires MSSA.

}
GVNPass() <-- use preserved MSSA
instcombine <-- invalidates MSSA
simplifyCFG

Building and preserving MSSA should be cheaper than building MDA, hence the goal to replace rather than add. The original discussion that started this work was around adding GVN Hoist with MSSA (https://groups.google.com/g/llvm-dev/c/NmWIQscrDf8). IMO, it's not an option to keep both analyses due to your point precisely, expected increases in compile time. However replacing MDA should not have this issue.

mkazantsev resigned from this revision.Mar 4 2022, 12:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 12:14 AM
chill updated this revision to Diff 417612.Mar 23 2022, 7:46 AM
chill updated this revision to Diff 428112.May 9 2022, 9:43 AM
chill updated this revision to Diff 440218.Jun 27 2022, 7:53 AM

Revert to a previous version, which scanned all the memory
accesses in a block. Add a testcase showing why it's necessary.

I've looked at Eigen benchmarks (matrix multiplications and additions), there were some differences
(relative to GVN using MemDep), but so far I've only seen GVN/MemorySSA removing more loads (e.g. MemDep hits
the block scan limit of 100 instructions, whereas GVN/MemorySSA is within the limit of 100 memory accesses). Increasing the limit
(to 300 for example), eliminated the differences at LLVM IR level, but looks like there's still some at asm level.

There some improvements, some regressions with Eigen, but nothing anywhere near 50-80%.