This patch adds a new pass called LNICM which is a LoopNest version of LICM and a test case to show how LNICM works.
Basically, LNICM only hoists invariants out of loop nest (not a loop) to keep/make perfect loop nest. This enables later optimizations that require perfect loop nest.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Following patches will utilize the LoopNest structure for more efficient optimization.
Could you elaborate on what kind of improvements you plan on adding using LoopNest? I think it would also be good to share a patch that makes implements such an additional optimization so there's a clear path towards concrete improvements and it would also show why using LoopNest is needed/beneficial.
As the patch is written I am concerned that this appears to mostly duplicate the existing code of LICM (which adds a maintenance burden) and just changes the type of the pass.
As the patch is written I am concerned that this appears to mostly duplicate the existing code of LICM (which adds a maintenance burden) and just changes the type of the pass.
I'll update the code to contain less duplication. I'm sorry to trouble you.
Could you elaborate on what kind of improvements you plan on adding using LoopNest?
With LoopNest pass, it'll be easy to efficiently hoist invariants in nested loop out of the nested loop at once.
for (i = 0 to 10) for (k = 0 to 10) j = invariant * 2;
will transform to
x = invariant * 2; for (i = 0 to 10) for (k = 0 to 10) j = x;
I will second @fhahn's comments. Please include follow up patches with the improvements you mention, and tests that motivate those.
Actually, LNICM is to enable other optimizations in loop pipeline that require perfect loop nest by hoisting innermost invariants out of loop nest at once.
I said that LNICM can efficiently perform LICM, but after tweaking the code and running some tests, I realized that it might not get faster.
However, even if it gets no faster, LNICM is worth adding as a new pass.
If we get the code duplication and no faster code, why is it worth adding? Could you please clarify?
LNICM itself is no faster, but we can expect other optimizations to run thanks to LNICM and may be able to get more optimized code.
Got it! Can you get some tests where some of these differences appear?
It would be a good motivator to see a concrete example; here are a couple of scenarios I can think of, but perhaps your use case is different.
- LNICM can hoist additionally where LICM doesn't. If that can happen, we should show case with a test.
- another loop nest pass, in the same LPM with LNICM, and how optimizations would differ if LICM was separated.
I'm sorry to be late to reply.
I'm making small tests and some code changes to demonstrate the following:
- LNICM only hoists invariants out of the outermost loop, to keep/make perfect loop nest
- x[k][j] below could be hoisted out of i-loop (by LICM), but LNICM wouldn't do that because it would break the perfect nest loop.
void test(int arr[10][10][10], int x[10][10]) { for (int k = 0; k < 10; k++) for (int j = 0; j < 10; j++) for (int i = 0; i < 10; i++) arr[i][k][j] += x[k][j]; }
- Since LNICM keeps/makes perfect loop nest, passes like loop-interchange (that is loop nest pass) can perform optimizations
Add a test case to demonstrate that LNICM hoists invariants only out of loop nest and keeps perfect loop nest.
I'm personally is still having a hard time understanding the final picture. (same for loop-idiom pass patch)
Is LoopNest Invariant Code Motion Pass only about not doing the movement that breaks perfect loop nest?
What about all the code that is no longer moved?
What is the envisioned final pipeline structure?
Do we end up having to run both LNICM and LICM?
These two patches are proposing pretty significant changes, but are very light on the motivation/details.
Is LoopNest Invariant Code Motion Pass only about not doing the movement that breaks perfect loop nest?
Yes.
What about all the code that is no longer moved?
What is the envisioned final pipeline structure?
Do we end up having to run both LNICM and LICM?
Yes, we need to run both LNICM and LICM. Sometimes we should run either one.
Update a test case to show loop-interchange doesn't run after running LICM but LNICM.
I don't introduce a lot of changes for LNICM, and reuse as much code in LICM as possible.
Some transformations really benefit from perfect loop nest. One of the examples is loop-interchange as shown in a lit test.
LICM cannot maintain perfect loop nest, so we need to add LNICM.
LGTM, the change to LICM is pretty small, and there is a motivating reason. Will approve tomorrow if there is no other concerns.
I have not been convinced so far.
This patch's description, and replies to the review comments are being prescriptive,
as-if everyone should be aware of some greater good before even commenting on it.
I am sorry to make you have any unintentional negative feelings. English is not my first language.
I appreciate all review comments I have gotten. I will try to have the motivation more up front next time.
I am in a GSoC project, where we try to utilize the LoopNest pass. When we investigate what would be benefited from LoopNest, we found that LICM could be a good candidate.
Unimodular transformations can be hugely simplified by only operating on perfect loop nest.
Loop interchange is one of the unimodular transformations which is implemented in LLVM, and it is implemented as no intervening code is allowed between loops in a loop nest.
I have included a lit test to illustrate that the current LICM could prevent code to be interchanged, where the proposed LNICM could prevent this problem.
The idea of LICM could still benefit for loop interchange, where a loop nest is imperfect to start with, but could be hoisted to make perfect. I have just modified the lit test to show this.
We have considered not to add a new pass, but add a new parameter to the constructor of the existing LICM, to make it operate like the LNICM when the given argument is true.
However, we thought adding a new pass to hoist invariance in the scope of loop nest is clearer.
What do you think @lebedev.ri ?
I think the main thing that I don't get here is how this fits into our optimization pipeline, and whether there is some viable path to default enablement here. In abstract, I can see that the LNICM + LoopInterchange combination makes sense. But the way our pipeline is currently structured, LICM and LoopInterchange are run as part of two separate LPMs. This means that by the time LoopInterchange runs, LICM will have been run on all loops already, and hoisted operations as far as possible, including out of loop nests.
This would only make sense to me if we wanted to add a LNICM run into LPM2, but why would we want to do that? Also, LoopInterchange doesn't preserve MSSA, so it can't be part of the same LPM as LICM/LNICM anyway. Or technically it can, because we still support AST-based LICM, but that's only because we haven't gotten around to removing it yet. It's not an option that exists longer term.
We're thinking of replacing LICM in LPM1 with LNICM and adding LPM3 containing LICM after LPM2.
In long term, LPM3 should be removed though.
When you say "In long term, LPM3 should be removed though.", do you mean when LoopInterchange can preserve MSSA, then it could be move to LPM1, so LICM can be move to LPM2?
I think LICM is always needed in long term, correct?
This change LGTM.
- changes to the existing LICM pass is minimal
- there is an motivating example
More work need to be done before modifying the pipeline.
@lebedev.ri, @nikic, have @uint256_t successfully clarified your concerns?
I will attempt to approve it in two days if there are no future comments.
I'm okay with landing this, as the change to LICM is trivial and won't be an undue maintenance burden.
I remain somewhat skeptical about the proposed pipeline changes though. I believe it is pretty important that LICM runs early in the loop pipeline, e.g. in D99249 we've seen that we sometimes need it already before LoopRotate (but also after LoopRotate). Replacing early LICM passes with LNICM means we don't perform (early) LICM anymore in some cases, which will presumably affect other passes in the pipeline adversely. At the same time, your proposal to add an additional LICM pass after LPM2 will have a significant negative compile-time impact because it requires recomputing MemorySSA. Though it's worth mentioning that the simplification pipeline already contains another late LICM pass, so possibly adding another one wouldn't actually be necessary.
But anyway, we can land this and then evaluate any pipeline changes separately.