Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[MachineLICM] Handle subloops
ClosedPublic

Authored by jaykang10 on Jun 30 2023, 5:09 AM.

Details

Summary

It looks MachineLICM pass handles only outmost loops even though there are loop invariant codes in inner loops.
As an example, I have pre-committed a test llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll.
In the test, after isel, there is DUPv8i16gpr in vectorized loop and it is loop invariant. However, MachineLICM does not hoist it to the preheader of vectorized loop because it does not consider inner loops.
I think MachineLICM pass could handle the inner loops.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jaykang10 added inline comments.Jul 20 2023, 9:22 AM
llvm/lib/CodeGen/MachineLICM.cpp
788

I wanted to check that this approach is acceptable first.
We can go up the loop chain. Let me update the code.

jaykang10 updated this revision to Diff 542862.EditedJul 21 2023, 4:35 AM

Fixed a bug.
With this patch, we can hoist MIs to inner loop's preheader and the preheader can not dominate the CSE candidate MI's block. In this case, avoid CSE.

jaykang10 updated this revision to Diff 542892.Jul 21 2023, 6:51 AM

Following @nikic's comment, visit loop chain from outer one to inner one.

jaykang10 updated this revision to Diff 543575.Jul 24 2023, 9:01 AM

Removed LoopIsOuterMostWithPredecessor.

jaykang10 updated this revision to Diff 543576.Jul 24 2023, 9:03 AM

@nikic I have updated this patch following your comment.
If you need something more, please let me know.

@nikic Can I push this change please?

@nikic or anyone can review this updated patch please?

The perf of this version looks OK. The previous one seemed to be more aggressive, causing more changes both positive and negative, but from the perf I ran this looks OK.

llvm/lib/CodeGen/MachineLICM.cpp
783–784

Do you know what this refers to? I'm not sure I understand what it means. It might be worth just removing it.

784

outmost -> outermost
is in subloop -> is in a subloop

llvm/test/CodeGen/AMDGPU/optimize-negated-cond.ll
0–1

This doesn't really look autogenerated to me.

41

Should all these lines be removed, or should they be updated for the new codegen?

79

This shouldn't be needed.

nikic added inline comments.Aug 1 2023, 6:14 AM
llvm/lib/CodeGen/MachineLICM.cpp
800

Uff, this looks like a pretty big hack. It is viable to pass the loop as a parameter instead of temporarily changing "global" state?

1345

This seems to work around a larger issue. The problem is that CSEMap will get initialized for whichever preheader we happen to hoist into first. Thanks to this check, we will at least not make invalid replacements, but it means we will miss CSE opportunities if we are hoisting into any other preheader.

Probably the CSE map should be by preheader, instead of having only the one.

jaykang10 added inline comments.Aug 1 2023, 7:16 AM
llvm/lib/CodeGen/MachineLICM.cpp
783–784

Let me check it.

784

Yep, let me update them.

llvm/test/CodeGen/AMDGPU/optimize-negated-cond.ll
0–1

Sorry... it looks it did not use the script first time...
For negated_cond function, SIOptimizeExecMaskingPreRA pass fails to fold mask operations V_CNDMASK_B32_e64 and V_CMP_NE_U32 because they are hoisted.
Let me update it manually.

41

It looks these test lines are correct.
Let me keep these lines in this patch.

79

Yep, let me remove it.

jaykang10 added inline comments.Aug 1 2023, 7:30 AM
llvm/lib/CodeGen/MachineLICM.cpp
800

um... if possible, I did not want to change a lot in this pass... but I agree it is big hack.
Let me try to pass the global ones as parameters.

1345

It is good point.
Let me try to keep multiple CSE maps for multiple preheaders.

jaykang10 updated this revision to Diff 548173.Aug 8 2023, 5:45 AM

Following @nikic's comment, updated code.

  • Pass CurLoop and CurPreheader as function parameters
  • Keep CSEMap per preheader

@nikic If possible, can you check the update please?

dmgreen added inline comments.Thu, Sep 14, 2:50 AM
llvm/lib/CodeGen/MachineLICM.cpp
134–135

Will ExitBlocks be incorrect now?

jaykang10 added inline comments.Thu, Sep 14, 4:00 AM
llvm/lib/CodeGen/MachineLICM.cpp
134–135

Ah, that is good point!
They are out-most loop's ExitBlocks.
Let me fix it.
Thanks for checking it.

jaykang10 updated this revision to Diff 556767.Thu, Sep 14, 4:05 AM

Following @dmgreen's comment, checked ExitBlocks per each loop.

dmgreen added inline comments.Thu, Sep 14, 5:59 AM
llvm/lib/CodeGen/MachineLICM.cpp
134–135

Could this use CurLoop->isLoopExiting(ExitBlocks) instead? It might be quicker for larger loops.

jaykang10 added inline comments.Thu, Sep 14, 6:56 AM
llvm/lib/CodeGen/MachineLICM.cpp
134–135

This function checks exit blocks which are outside loop and have predecessor inside loop.
isLoopExiting checks exiting blocks which are inside loop and have successor outside loop.
I think we need exit blocks here.
Let me try to keep the exit blocks for each loop in order to avoid re-calculation.

dmgreen added inline comments.Thu, Sep 14, 7:10 AM
llvm/lib/CodeGen/MachineLICM.cpp
134–135

Oh I see. A different type of Exit Block.
Could it check !CurLoop->contains(ExitBlocks) && any_of(ExitBlocks->predecessors, is in CurLoop)?

jaykang10 added inline comments.Thu, Sep 14, 7:41 AM
llvm/lib/CodeGen/MachineLICM.cpp
134–135

CurLoop->getExitBlocks collects blocks which are outside CurLoop and has predecessor in CurLoop.
This function checks the MBB parameter is in the blocks.

jaykang10 updated this revision to Diff 556789.Thu, Sep 14, 8:51 AM

Created a map to keep exit blocks for each loop.

dmgreen reopened this revision.Thu, Sep 14, 9:00 AM

Thanks. LGTM

llvm/lib/CodeGen/MachineLICM.cpp
134–135

Yeah - That was what I was aiming to capture. The ExitBlocks is outside the loop, but one of it's predecessors is inside.

This revision is now accepted and ready to land.Thu, Sep 14, 9:00 AM
dmgreen accepted this revision.Thu, Sep 14, 9:00 AM

Thanks. LGTM

Thanks for review.
Let me push this patch.
If @nikic or other people want to change this patch more, please let me know.

This revision was automatically updated to reflect the committed changes.
bkramer added inline comments.
llvm/lib/CodeGen/MachineLICM.cpp
783–784

When MI is hoisted the pointer is no longer valid. I'm seeing use after frees with asan after this change, so reverted in 3454cf6

jaykang10 added inline comments.Fri, Sep 15, 5:59 AM
llvm/lib/CodeGen/MachineLICM.cpp
783–784

Ah, I am so sorry... Let me check it again.
Thanks for reverting the commit.

jaykang10 added inline comments.Fri, Sep 15, 9:15 AM
llvm/lib/CodeGen/MachineLICM.cpp
783–784

It seems I need to check the MI is erased because CSE can be erased.
@bkramer If possible, can you let me know how I can reproduce the case you saw with asan please?

jaykang10 updated this revision to Diff 556931.Mon, Sep 18, 1:16 AM

Checked MI erased ahead of update register pressure because CSE can be removed after hoisting it.

I think it would still be invalid to access MI.getParent if MI has been erased. It would be a use after delete. From looking at the old logic it appeared to run UpdateRegPressure only if Hoist was false. Can we do the same thing here?

I think it would still be invalid to access MI.getParent if MI has been erased. It would be a use after delete.

You are right. I was confused with remove which sets its parent nullptr... I am not sure how we can check the erased MI...

From looking at the old logic it appeared to run UpdateRegPressure only if Hoist was false. Can we do the same thing here?

I am not sure why the original code did not update the register pressure with the hoisted loop invariant code... I think this pass needs to update the register pressure with the loop invariant code, which is hoisted to preheader, because preheader dominates the blocks in the loop and the invariant code makes a live out of preheader. I could miss something...

nikic added a comment.Mon, Sep 18, 2:44 AM

Maybe change Hoist() to return an enum that represents not hoisted / hoisted / CSEd?

Maybe change Hoist() to return an enum that represents not hoisted / hoisted / CSEd?

Thanks for good suggestion.
Let me try it.

jaykang10 updated this revision to Diff 556948.Mon, Sep 18, 7:10 AM

Following @nikic's comment, updated return type of Hoist function.

Leaving a comment here as well. The commit caused an ASAN issue downstream. Cherry picking the revert fixed the asan issue., https://github.com/llvm/llvm-project/commit/5ec9699c4d1f165364586d825baef434e2c110b4#commitcomment-127784939 for more details. Please account for this during the resubmit.

Leaving a comment here as well. The commit caused an ASAN issue downstream. Cherry picking the revert fixed the asan issue., https://github.com/llvm/llvm-project/commit/5ec9699c4d1f165364586d825baef434e2c110b4#commitcomment-127784939 for more details. Please account for this during the resubmit.

Thanks for the asan output.
I have updated the patch in this review to fix the asan error.
If possible, can you check the updated patch fixes the asan error in your side please?
I am also checking it and I have not seen asan error yet.

nikic added inline comments.Tue, Sep 19, 11:59 AM
llvm/lib/CodeGen/MachineLICM.cpp
134–135

Any reason not to name this enum and then use it instead of unsigned?

jaykang10 added inline comments.Tue, Sep 19, 12:25 PM
llvm/lib/CodeGen/MachineLICM.cpp
134–135

Ah sorry.
Let me update the code with named enum tomorrow.

Following @nikic's comment, used named enum.

Sorry... with latest update, I can see asan errors from sanitizer bot locally.
Let me fix them again.

jaykang10 updated this revision to Diff 557289.EditedMon, Sep 25, 12:40 AM

Fixed a bug.

  • ExtractHoistableLoad can erase MI. It creates new MI for unfolding load and assigns it to MI but the MI is not updated with new MI. In this case, the MI is not valid.
  • ExtractHoistableLoad updates the register pressure for the new MI so we do not need to update the register pressure for it outside the function.

I have run sanitizer bot locally and there is no failed tests from sanitizer bot with this patch.
If there is no objection, let me push this updated patch again.

Thanks. I think the changes still look OK.