Details
- Reviewers
asbirlea reames fhahn mkazantsev
Diff Detail
Unit Tests
Event Timeline
Sorry for the spam, it's definitely not ready for review, it's entirely possibly I'll throw away the whole thing, but anyway posting here for the odd chance of an early feedback ...
Passes llvm regression tests and llvm-test-suite on amd64 and aarch64, 1.05% slowdown on sqlite3 (with the child patch)
llvm/lib/Transforms/Scalar/GVN.cpp | ||
---|---|---|
1020 | Aha, segfault here, I guess. |
llvm/lib/Transforms/Scalar/GVN.cpp | ||
---|---|---|
2164 | FIXME: This is way too conservative. |
Do we have any compile time measurements for this change with O3?
I have couple of high level comments/clarifications:
- Are all mini-transforms within GVN moving to MSSA in this case?
- Is the move to MemorySSA for performance of cases not caught through MDA?
@asbirlea, a question to clarify my understanding around enabling MSSA for a pass:
- We optimistically build memorySSA for passes that unconditionally require it. This building and preservation can be costly from compile time perspective. Also, we may have possible invalidation by later function passes which do not preserve GVN, thereby having different implications for pipelines depending on where GVN is present. For example, if we end up with something like:
LPM { ... LICM <-- requires MSSA. } instcombine <-- invalidates MSSA GVNPass() <-- build MSSA again simplifyCFG <-- invalidates MSSA
is arguably much worse in compile time than:
LPM { ... LICM <-- requires MSSA. } GVNPass() <-- use preserved MSSA instcombine <-- invalidates MSSA simplifyCFG
@chill mentioned he did some early testing on that, so he might have some numbers.
From my side, it's too early. I'm seeing a correctness issue in the early testing that needs resolving. I'm looking into getting @chill something actionable there.
I have couple of high level comments/clarifications:
- Are all mini-transforms within GVN moving to MSSA in this case?
Is they needed MemDep before, yes.
- Is the move to MemorySSA for performance of cases not caught through MDA?
It should address at least one or both. Primary goal is remove last user of MDA and remove the analysis from trunk. The implementation for the transition should be such that it either catches more cases or provides improved performance (or both). From previous experience with switching DSE, this is possible if implemented right.
@asbirlea, a question to clarify my understanding around enabling MSSA for a pass:
- We optimistically build memorySSA for passes that unconditionally require it. This building and preservation can be costly from compile time perspective. Also, we may have possible invalidation by later function passes which do not preserve GVN, thereby having different implications for pipelines depending on where GVN is present. For example, if we end up with something like:
LPM { ... LICM <-- requires MSSA. } instcombine <-- invalidates MSSA GVNPass() <-- build MSSA again simplifyCFG <-- invalidates MSSAis arguably much worse in compile time than:
LPM { ... LICM <-- requires MSSA. } GVNPass() <-- use preserved MSSA instcombine <-- invalidates MSSA simplifyCFG
Building and preserving MSSA should be cheaper than building MDA, hence the goal to replace rather than add. The original discussion that started this work was around adding GVN Hoist with MSSA (https://groups.google.com/g/llvm-dev/c/NmWIQscrDf8). IMO, it's not an option to keep both analyses due to your point precisely, expected increases in compile time. However replacing MDA should not have this issue.
Revert to a previous version, which scanned all the memory
accesses in a block. Add a testcase showing why it's necessary.
I've looked at Eigen benchmarks (matrix multiplications and additions), there were some differences
(relative to GVN using MemDep), but so far I've only seen GVN/MemorySSA removing more loads (e.g. MemDep hits
the block scan limit of 100 instructions, whereas GVN/MemorySSA is within the limit of 100 memory accesses). Increasing the limit
(to 300 for example), eliminated the differences at LLVM IR level, but looks like there's still some at asm level.
There some improvements, some regressions with Eigen, but nothing anywhere near 50-80%.
Aha, segfault here, I guess.