It looks MachineLICM pass handles only outmost loops even though there are loop invariant codes in inner loops.
As an example, I have pre-committed a test llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll.
In the test, after isel, there is DUPv8i16gpr in vectorized loop and it is loop invariant. However, MachineLICM does not hoist it to the preheader of vectorized loop because it does not consider inner loops.
I think MachineLICM pass could handle the inner loops.
Details
Diff Detail
Event Timeline
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
788 | I wanted to check that this approach is acceptable first. |
Fixed a bug.
With this patch, we can hoist MIs to inner loop's preheader and the preheader can not dominate the CSE candidate MI's block. In this case, avoid CSE.
@nikic I have updated this patch following your comment.
If you need something more, please let me know.
The perf of this version looks OK. The previous one seemed to be more aggressive, causing more changes both positive and negative, but from the perf I ran this looks OK.
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
783–784 | Do you know what this refers to? I'm not sure I understand what it means. It might be worth just removing it. | |
784 | outmost -> outermost | |
llvm/test/CodeGen/AMDGPU/optimize-negated-cond.ll | ||
0–1 | This doesn't really look autogenerated to me. | |
41 | Should all these lines be removed, or should they be updated for the new codegen? | |
79 | This shouldn't be needed. |
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
800 | Uff, this looks like a pretty big hack. It is viable to pass the loop as a parameter instead of temporarily changing "global" state? | |
1345 | This seems to work around a larger issue. The problem is that CSEMap will get initialized for whichever preheader we happen to hoist into first. Thanks to this check, we will at least not make invalid replacements, but it means we will miss CSE opportunities if we are hoisting into any other preheader. Probably the CSE map should be by preheader, instead of having only the one. |
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
783–784 | Let me check it. | |
784 | Yep, let me update them. | |
llvm/test/CodeGen/AMDGPU/optimize-negated-cond.ll | ||
0–1 | Sorry... it looks it did not use the script first time... | |
41 | It looks these test lines are correct. | |
79 | Yep, let me remove it. |
Following @nikic's comment, updated code.
- Pass CurLoop and CurPreheader as function parameters
- Keep CSEMap per preheader
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
134–135 | Will ExitBlocks be incorrect now? |
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
134–135 | Ah, that is good point! |
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
134–135 | Could this use CurLoop->isLoopExiting(ExitBlocks) instead? It might be quicker for larger loops. |
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
134–135 | This function checks exit blocks which are outside loop and have predecessor inside loop. |
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
134–135 | Oh I see. A different type of Exit Block. |
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
134–135 | CurLoop->getExitBlocks collects blocks which are outside CurLoop and has predecessor in CurLoop. |
Thanks. LGTM
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
134–135 | Yeah - That was what I was aiming to capture. The ExitBlocks is outside the loop, but one of it's predecessors is inside. |
Thanks for review.
Let me push this patch.
If @nikic or other people want to change this patch more, please let me know.
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
783–784 | Ah, I am so sorry... Let me check it again. |
Checked MI erased ahead of update register pressure because CSE can be removed after hoisting it.
I think it would still be invalid to access MI.getParent if MI has been erased. It would be a use after delete. From looking at the old logic it appeared to run UpdateRegPressure only if Hoist was false. Can we do the same thing here?
You are right. I was confused with remove which sets its parent nullptr... I am not sure how we can check the erased MI...
From looking at the old logic it appeared to run UpdateRegPressure only if Hoist was false. Can we do the same thing here?
I am not sure why the original code did not update the register pressure with the hoisted loop invariant code... I think this pass needs to update the register pressure with the loop invariant code, which is hoisted to preheader, because preheader dominates the blocks in the loop and the invariant code makes a live out of preheader. I could miss something...
Leaving a comment here as well. The commit caused an ASAN issue downstream. Cherry picking the revert fixed the asan issue., https://github.com/llvm/llvm-project/commit/5ec9699c4d1f165364586d825baef434e2c110b4#commitcomment-127784939 for more details. Please account for this during the resubmit.
Thanks for the asan output.
I have updated the patch in this review to fix the asan error.
If possible, can you check the updated patch fixes the asan error in your side please?
I am also checking it and I have not seen asan error yet.
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
134–135 | Any reason not to name this enum and then use it instead of unsigned? |
llvm/lib/CodeGen/MachineLICM.cpp | ||
---|---|---|
134–135 | Ah sorry. |
Sorry... with latest update, I can see asan errors from sanitizer bot locally.
Let me fix them again.
Fixed a bug.
- ExtractHoistableLoad can erase MI. It creates new MI for unfolding load and assigns it to MI but the MI is not updated with new MI. In this case, the MI is not valid.
- ExtractHoistableLoad updates the register pressure for the new MI so we do not need to update the register pressure for it outside the function.
I have run sanitizer bot locally and there is no failed tests from sanitizer bot with this patch.
If there is no objection, let me push this updated patch again.