This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Scalar/
-
llvm/
-
Transforms/
-
Scalar/
1
LoopBoundSplit.h
-
lib/
-
Passes/
-
PassBuilder.cpp
-
PassRegistry.def
-
Transforms/Scalar/
-
Scalar/
-
CMakeLists.txt
20/44
LoopBoundSplit.cpp
-
test/Transforms/LoopBoundSplit/
-
Transforms/
-
LoopBoundSplit/
-
loop-bound-split.ll
-
utils/gn/secondary/llvm/lib/Transforms/Scalar/
-
gn/
-
secondary/
-
llvm/
-
lib/
-
Transforms/
-
Scalar/
-
BUILD.gn

Differential D102234

[SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV
ClosedPublic

Authored by jaykang10 on May 11 2021, 5:46 AM.

Download Raw Diff

Details

Reviewers

reames
mkazantsev
sanwou01

Commits

rGa2a0ac42abcb: [SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV

Summary

This pass transforms loops that contain a conditional branch with induction variable. For example, it transforms left code to right code:

                            newbound = min(n, c)
while (iv < n) {            while(iv < newbound) {
  A                           A
  if (iv < c)                 B
    B                         C
  C                         }
}                           if (iv != n) {
                              while (iv < n) {
                                A
                                C
                              }
                            }

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jaykang10 created this revision.May 11 2021, 5:46 AM

Herald added subscribers: hiraditya, mgorny. · View Herald TranscriptMay 11 2021, 5:46 AM

jaykang10 requested review of this revision.May 11 2021, 5:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2021, 5:46 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B103721: Diff 344375.May 11 2021, 6:22 AM

SjoerdMeijer added a subscriber: SjoerdMeijer.May 11 2021, 6:24 AM

This comment has been deleted.

llvm/lib/Transforms/Scalar/SimpleLoopBoundSplit.cpp
46 ↗	(On Diff #344375)	Why signed only? TODO unsigned?
62 ↗	(On Diff #344375)	I think this is a natural place where you could apply PatternMatch.
72 ↗	(On Diff #344375)	Assert that one of them is an addrec and another is available at loop entry?
75 ↗	(On Diff #344375)	It looks natural to encapsulate Pred, AddRec, Bound and nowrap flags into a structure (e.g. some `BoundInfo`) and pass `const BoundInfo &` here. WDYT?
92 ↗	(On Diff #344375)	Why do you prefer signed predicates over unsigned? Other parts of LLVM (e.g. indvars) tend to canonicalize everything as unsigned if they can prove it. Isn't `AddRecSCEV->hasNoSignedWrap()` a same thing as `HasNoSignedWrap`? If so, remove one of them.
102 ↗	(On Diff #344375)	This is just wrong, it should be `AddRec < Bound + 1`. It is only correct if you can prove that `Bound + 1` does not overflow. Simple exapmle is `X <= SINT_MAX` --> `X < SINT_MIN` is wrong.
120 ↗	(On Diff #344375)	Check `isAffine` before it to save compile time in case of really complex `AddRecs`.
127 ↗	(On Diff #344375)	I understand that zero step is impossible, but still, being negative and being non-positive is't the same thing is general. :)
131 ↗	(On Diff #344375)	This check makes the check above meaningless. Reading the code, I think the only place where it matters is replacing `Eq` condition with `Lt`. Do we beed this for something else? Please add a TODO to add negative step support. We've stuggled a lot with it in IndVars, and I see this pain coming again.
139 ↗	(On Diff #344375)	Check this before you construct SCEVs to save compile time.
160 ↗	(On Diff #344375)	It's innermost loop, so I thing "recursive" check is an overkill. Or do we plan to support outer loops for it?
168 ↗	(On Diff #344375)	This is a very restrictive thing. Why is it required?
174 ↗	(On Diff #344375)	Use pattern match.
185 ↗	(On Diff #344375)	Pls add `/IsExitCond/`
220 ↗	(On Diff #344375)	Logicaly, this check should go last. You may discard the candidate by conditions below. Anyways, why is that a problem at all? If we have at least one candidate we can get rid of it and then proceed with the other ones.
229 ↗	(On Diff #344375)	Why not `continue`?
232 ↗	(On Diff #344375)	Break?
239 ↗	(On Diff #344375)	Do we ever check they are against the same AddRec? If it's supposed to be same, I think we can just assert on it. If not, I'm not catching how the rest of logic is going to work.
244 ↗	(On Diff #344375)	I think it's a too strong statement. Getting rid of a branch in a critical loop may be useful even without vectorization. If you want to be that restrictive, place add an option to ignore this check.
263 ↗	(On Diff #344375)	`any_of`? I believe we don't vectorize volatile load/stores (or do we?) Maybe check at least this. It looks like the transform is also profitable if one of these instructions is side exiting. We don't vectorize them so far, but if you get rid of it, the loop may become vectorizable.
273 ↗	(On Diff #344375)	It looks like it can be an utility function usable by other parts of LLVM. Consider factoring it out.
337 ↗	(On Diff #344375)	Same question about preference of signed over unsigned. LLVM does otherwise in other parts. Let's not create more self-contradictions.
411 ↗	(On Diff #344375)	This can be expensive. Can't we do a more surgical update? I'm fine if it's a TODO with follow-up fix.
435 ↗	(On Diff #344375)	Please verify loop info too.

This revision now requires changes to proceed.May 13 2021, 10:30 PM

General question: does this pass do something that IRCE doesn't? From what I've read, it looks a very limited version of InductiveRangeCheckElimination, with only difference that it works for non-loop-existing conditions.

Maybe what you are looking for is to add a flag into IRCE that it works with such conditions.

In D102234#2758839, @mkazantsev wrote:

General question: does this pass do something that IRCE doesn't? From what I've read, it looks a very limited version of InductiveRangeCheckElimination, with only difference that it works for non-loop-existing conditions.

Maybe what you are looking for is to add a flag into IRCE that it works with such conditions.

@mkazantsev I appreciate your comments. I will update code following them.

Yep, you are right. It is a limited version of IRCE pass. For the first time, I tried to extend IRCE pass with extending SCEV or changing the population order of loop in new pass manager in order to handle my motivational examples. You can see the discussion with below patches.
https://reviews.llvm.org/D101409
https://reviews.llvm.org/D100566
https://reviews.llvm.org/D99774

During discussion, @reames recommended to write a limited pass to handle my motivational example rather than extending IRCE pass because he feels IRCE pass has some problems and it is overkill to handle my motivational example. You can see the discussion with below email thread.
https://lists.llvm.org/pipermail/llvm-dev/2021-April/150281.html

I am aiming to enable transformation like IRCE pass or something like that in the pipeline of new pass manager. If it is possible to enable IRCE pass in the pipeline of new pass manager, I am OK to add the new flag to IRCE pass. If it is not possible, I would try to write transformation pass like this patch or alternative which can be accepted into upstream.

Got it, thanks. The biggest problem IRCE has is how it's written. So indeed, if we can make a simpler version of this, and then expand it to be as powerful as IRCE, that might be a better approach.

Some drive by comments:

Bike shedding names: SimpleLoopBoundSplit. How about just LoopBoundSplit? I understand the current implementation has restrictions/limitations, but personally I think it looks a bit silly all these SimpleLoopXYZ passes/names.
You probably want to add this to the optimisation pipeline somewhere, off by default for now,
I guess we want to skip with this transformation with OptForSize,
I haven't looked into much details, but I guess you look only at one if-statement and thus we don't support things like if ( i > c1 && i < c2)?

In D102234#2762711, @SjoerdMeijer wrote:

Some drive by comments:

@SjoerdMeijer Thanks for comments.

Bike shedding names: SimpleLoopBoundSplit. How about just LoopBoundSplit? I understand the current implementation has restrictions/limitations, but personally I think it looks a bit silly all these SimpleLoopXYZ passes/names.

Yep, I am ok with the name. I just imitated the LoopUnswitch and SimpleLoopUnswitchPass because this pass is a limited version of IRCE.

You probably want to add this to the optimisation pipeline somewhere, off by default for now,

Yep, I have enabled this pass after SimpleLoopUnswitch experimentally and I am checking the impact from benchmarks. It seems the isProfitableToTransform needs to be updated.

I guess we want to skip with this transformation with OptForSize,

Yep, it is already being checked with below code.

// Skip function with optsize.
if (L.getHeader()->getParent()->hasOptSize())
  return false;

I haven't looked into much details, but I guess you look only at one if-statement and thus we don't support things like if ( i > c1 && i < c2)?

At this stage, I am aiming this pass as simple as possible. As @reames mentioned on the email discussion, it is very hard to generalize the cases. Later, we could create new transformation passes or extend this pass to handle more cases as @mkazantsev mentioned.

jaykang10 added inline comments.May 18 2021, 7:38 AM

llvm/lib/Transforms/Scalar/SimpleLoopBoundSplit.cpp
46 ↗	(On Diff #344375)	You are right! I will add unsigned one too.
62 ↗	(On Diff #344375)	Sorry... I am not sure how I can use PatternMatch to extract information from ICmp instruction...
72 ↗	(On Diff #344375)	The AddRec is checked on `HasProcessableCondition`. I will add code to check whether the bound is available at loop entry on `HasProcessableCondition`.
75 ↗	(On Diff #344375)	Yep, you are right. I could pass the `ConditionInfo` directly. I will update it.
92 ↗	(On Diff #344375)	Why do you prefer signed predicates over unsigned? Other parts of LLVM (e.g. indvars) tend to canonicalize everything as unsigned if they can prove it. I just tried to follow the AddRec's flag. I will update the code with unsigned one. Isn't `AddRecSCEV->hasNoSignedWrap()` a same thing as `HasNoSignedWrap`? If so, remove one of them. Yep, I will remove the `AddRecSCEV->hasNoSignedWrap()`.
102 ↗	(On Diff #344375)	This is just wrong, it should be `AddRec < Bound + 1`. Oops, sorry. I will update it. It is only correct if you can prove that `Bound + 1` does not overflow. Simple exapmle is `X <= SINT_MAX` --> `X < SINT_MIN` is wrong. You are right!!! I will add code to check the overflow.
120 ↗	(On Diff #344375)	Yep, I will check it.
127 ↗	(On Diff #344375)	Yep, I will check zero step too.
131 ↗	(On Diff #344375)	This check makes the check above meaningless. I agree. Reading the code, I think the only place where it matters is replacing `Eq` condition with `Lt`. Do we beed this for something else? No, it is for the `EQ` condition. I will update code to check the one step with `EQ` condition. Please add a TODO to add negative step support. We've stuggled a lot with it in IndVars, and I see this pain coming again. Yep, I will add the TODO for negative step.
139 ↗	(On Diff #344375)	um... I think we need to check it after `CalculateUpperBoundWithLT` in order to check only `LT` condition... We could add more conversions of conditions like 'EQ` --> `LT',` LE' --> `LT` later. If I missed something, please let me know.
160 ↗	(On Diff #344375)	You are right!!! I will update it with `isLCSSAForm`.
168 ↗	(On Diff #344375)	I wanted to make this pass as simple as possible at this stage to figure out basic problems. If we support multiple exiting blocks, we could have to consider below things. Multiple exit conditions We could need more min/max operations to create new bounds. Update first loop's exit blocks to preheader of second loop. There could be something more to be considered. If possible, I would like to consider multiple exits after finishing to support single exit.
174 ↗	(On Diff #344375)	Yep, I will update it with pattern match.
185 ↗	(On Diff #344375)	Yep, I will update it.
220 ↗	(On Diff #344375)	It is not problem. As I mentioned before, I wanted to make this pass as simple as possible at this stage to figure out basic problems. I will remove this check.
229 ↗	(On Diff #344375)	Oops, You are right!!! I will update it.
232 ↗	(On Diff #344375)	It was to avoid multiple split candidates. I will add break.
239 ↗	(On Diff #344375)	You are right. It is supposed to be same. However, we can not compare the AddRec directly because the AddRec of exit cond is different with split candidate's one because `inc++` is followed by exit cond. The start value of AddRec is different as below. ExitCond AddRec {1,+,1}<nuw><nsw><%loop> SplitCandidateCond AddRec {0,+,1}<nuw><nsw><%loop> I think we have checked the step of AddRec and need to check its start value and signedness more to say they are same AddRec. I forgot to check the start value. I will add the check. If I missed something or you feel something wrong, please let me know.
244 ↗	(On Diff #344375)	Yep, you are right. I will remove this check. I am checking performance impact with this pass. Maybe, I could add something more here.
263 ↗	(On Diff #344375)	`any_of`? I believe we don't vectorize volatile load/stores (or do we?) Maybe check at least this. Yep, I will update it. It looks like the transform is also profitable if one of these instructions is side exiting. We don't vectorize them so far, but if you get rid of it, the loop may become vectorizable. um... if the condition of the side exit is related to induction variable, we could handle it.
273 ↗	(On Diff #344375)	Once the basic version of this pass is stabilized, I would consider it.
337 ↗	(On Diff #344375)	Yep, I will update it.
411 ↗	(On Diff #344375)	Yep, I will add TODO and try to fix it later.
435 ↗	(On Diff #344375)	Yep, I will update it.

Following the comments of @mkazantsev, updated code.

Harbormaster completed remote builds in B105044: Diff 346174.May 18 2021, 11:21 AM

I want to make a very high level suggestion on this. This isn't really about the code per se, and more about the approach to writing the code.

I'd start with a really trivially transform for the loop:
for (i = 0; i < N; i++) { body }

Build a mechanism to produce a form which looks like so:
for (i = 0; i < N; i++) { body }
if (i != N) {

for (; i < N; i++) { body }

}

This should (rightfully) look fairly odd as the second loop is dead. However, once we have that, iteration splitting becomes much more straight forward. A few observations:

The "if (i != N)" is a loop guard and can be identified by getLoopGuardBranch.
N is the exit value of the i addrec, and can be gotten from SCEV for any arbitrary AddRec for a known exit count.

Once we have this form, we can restrict the iteration space of the pre-loop without modifying the post loop at all. (Provided we haven't run any optimization in between. A slightly safer form would be to have the guard condition be an unknown value to prevent accidental optimization.)

The next core primitive is a routine which uses an Exact Exit Count (as defined in SCEV today) to *reduce* the number of iterations in a loop. A key thing to note is that mutating existing IR is an optimization, but the routine is always allowed to introduce a new IV and clamp if needed. That helps a lot in making the code robust. Being able to use SCEVAddRecExpr::evaluateAtIteration also helps to simplify things a lot.

The final primitive is to generalize the existing exit count logic to work for when an arbitrary monotonic condition toggles. (There's a bunch of ways of computing an "exit" count for the branch of interest. This is merely one.)

Once we have both of those, we'd

Determine if splitting is worthwhile. Pick a set of branches to eliminate. (Must be able to compute "exit" counts.)
Produce dead loop form
For each branch we want to remove from the pre-loop, compute an "exit" count.
Then constrain the preloop by the umin of all the desired exit counts.
Simplify branches out of pre-loop. Leave all generality of control flow in preloop.

Note carefully what this approach *doesn't* do. It doesn't require the pass (as opposed to scev) to reason about conditions, signed vs unsigned, overflow, or whether two IVs are congruent. It heavily reuses the existing logic in SCEV which (mostly) gets all those cases right already.

This works for any condition which is "monotonic" (e.g. transitions from true to false (or vice versa) at most once in the original iteration space). It does not work for branch conditions which can transition or more times without a bit more generalization.

Where this approach fails a bit is in handling multiple branches to be pruned. As described above, it runs the preloop until *any* of the conditions are hit, and then the post-loop for the remainder. That may or may not have been what was desired.

I think this approach can be used to formulate IRCE, but it requires a bit of care for the case where you don't know an index is in bounds or not on the first iteration.

For a profitability check, I'd start specifically with the case where the condition precisely splits a loop into two halves. e.g. for ... { if (C) { body1 } else { body2 }. This is the easiest to believe is generally profitable, and we can generalize the heuristic selection later.

Thanks for the update, I will try to find some time to wrap my head about it within next few days.

llvm/lib/Transforms/Scalar/SimpleLoopBoundSplit.cpp
62 ↗	(On Diff #344375)	It will be something like ICmpInst::Predicate Pred; Value *LHS, RHS; match(condition, m_ICmp(Pred, m_Value(LHS), m_Value(RHS))) You can grep examples of this in many places, just grep by "m_ICmp". It just simplifies code a lot.
139 ↗	(On Diff #344375)	Ah ok, you're right.
239 ↗	(On Diff #344375)	Maybe we should assert the preconditions here to make sure we are doing the right thing. :)
263 ↗	(On Diff #344375)	It's never possible to say. For example, it may be some call to a method that has its own iteration counter (not related to the IV) and still have very subtle dependence on it that is impossible to fugire out.
273 ↗	(On Diff #344375)	Fair enough!

Following comments of @mkazantsev, updated code.
Fixed some bugs from previous diff.

@reames I appreciate your suggestion.

It looks your suggestion with existing SCEV logic is more general. Once I update this patch with your suggestion, I will let you and @mkazantsev know.

P.S I am feeling unwell after getting covid vaccination jab so it could take some time to implement your suggestion. Please understand it.

llvm/lib/Transforms/Scalar/SimpleLoopBoundSplit.cpp
62 ↗	(On Diff #344375)	Ah, Thanks for letting me know. I will update it.
239 ↗	(On Diff #344375)	Yep, I will update it.
263 ↗	(On Diff #344375)	Ah, Thanks for sharing the info.

Harbormaster completed remote builds in B105405: Diff 346698.May 20 2021, 5:06 AM

@jaykang10 hope you get well, I have same aftermath after 2nd dose. :) Could you please mark patch as "planned changes" so that it doesn't hightlight on my review list?

jaykang10 planned changes to this revision.May 24 2021, 12:46 AM

In D102234#2776326, @mkazantsev wrote:

@jaykang10 hope you get well, I have same aftermath after 2nd dose. :) Could you please mark patch as "planned changes" so that it doesn't hightlight on my review list?

Thanks @mkazantsev I am ok now. I hope you also get well soon. Let me mark this patch as "planned changes".

Following comments of @reames, updated code.

@reames @mkazantsev I have tried to update this patch following the new suggestion.

I have kept below things from previous patch.

ICmp of conditional branch for AddRec and its step.
Chosen one split candidate to remove conditional branch
If we do not choose one split candidate, we need a logic which conditional branch has the min bound.
The branch condition, which is removed, is changed to true or false to avoid same transformation with same conditional branch during next run of this pass.

This update could not be enough for the new suggestion. If you feel something wrong from this update, please let me know.

Harbormaster completed remote builds in B106286: Diff 347948.May 26 2021, 7:24 AM

Invalidate cached SE information.

Fixed typo

Harbormaster completed remote builds in B106691: Diff 348500.May 28 2021, 6:32 AM

mkazantsev added inline comments.May 31 2021, 2:06 AM

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
48	Instead of this, consider flipping the predicate.
77	General notion: all this code is very hard of read because of all lambdae inlined. Can they be separate functions?
98	You can sink this computation under the condition `bound < max`.
158	How about checking `isSCEVableType` for comparison arguments right here?
239	It is already checked by `IsProcessableCondBI` , you can use `cast` instead of `dyn_cast`. `isa<IntegerType>` --> `SE.isSCEVableType` and move it inside of `IsProcessableCondBI`.
253	That would be more natural to just return `BI` or `nullptr` from it (might also require function renaming).
279	What if both of them are nullptr?

mkazantsev added inline comments.May 31 2021, 2:09 AM

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
279	Ah I see, they will fail the next check.

Following comments of @mkazantsev, updated code.

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
48	Yep, let me change it.
77	Sorry for inconvenient. I will move them to functions.
98	Yep, I will move it.
158	Yep, I will add checks with `isSCEVable(Type)` for the comparison arguments.
239	Yep, I will update it.
253	Yep, I will update it.

Harbormaster completed remote builds in B107004: Diff 348925.Jun 1 2021, 4:33 AM

jaykang10 edited the summary of this revision. (Show Details)Jun 1 2021, 7:51 AM

Looking much better now. Few more nits & a question regarding ne, which is a potential correctness concern.

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
29	namespace llvm {
89	nit: I'd suggest resturure it as if (pred != ULE && pred != SLE) return false; this will reduce the nest for the biggest code piece. Just an idea.
109	Please add a TODO to handle ICMP_NE/EQ (I guess it was in the earlier versions of this patch and still good to support, but not necessarily in this revision).
132	ConstantInt *StepCI = dyn_cast<SCEVConstant>(StepRecSCEV)->getValue(); if (!StepCI \|\| !StepCI->isPositive()) return false; ...
153	Could declare as `Value LHS, RHS;` Just an idea.
158	It's enough to check LHS type and assert on RHS type.
167	canSplit...
168	Consider making params `const` where suitable.
218	Why is that needed?
222	This is still a very strict limitation I think. I can always split critical edges if you need it. I'm fine if you just add a TODO to consider it in the future.
245	Consider marking params as `const` where suitable.

mkazantsev added inline comments.Jun 2 2021, 3:33 AM

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
356	Two points here: Functional concern. Will NE work ok for step other than 1? `lt` generally gives more info to the opt than `ne` (at least because `lt` implies `ne`). Any reason for `ne` here?

jaykang10 added inline comments.Jun 2 2021, 7:06 AM

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
29	Yep, I will update it.
89	Yep, I will update it.
109	Yep, I will add ToDo for it.
132	It looked there is no `isPositive` in ConstantInt. I will update it with isNegative() and isZero(),
153	Yep, I will update it.
158	Yep, I will update it.
167	sorry for inconvenient. I will update it.
168	Yep, let me try to add it.
218	Ah, I have tried to follow the below comment from @reames. For a profitability check, I'd start specifically with the case where the condition precisely splits a loop into two halves. e.g. for ... { if (C) { body1 } else { body2 }. This is the easiest to believe is generally profitable, and we can generalize the heuristic selection later. For the condition precisely splits a loop into two halves, I have checked that the conditional branch is in header and the join point is latch. Let me remove them.
222	Yep, I will remove the checks with single predecessor.
245	Yep, I will update it.
356	I have tried to follow below comment from @reames. The "if (i != N)" is a loop guard and can be identified by getLoopGuardBranch. I was also not sure whether `ne` is better than `lt` here...

Following comments from @mkazantsev, updated code.

The ne code will be updated after more discussion.

jaykang10 added inline comments.Jun 2 2021, 8:14 AM

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
132	Sorry, I have found a test which has StepRecSCEV is not SCEVConstant. In this case, the nullptr->getValue() causes segment fault. I will re-update it.

Harbormaster completed remote builds in B107243: Diff 349266.Jun 2 2021, 8:34 AM

Fixed a bug

Harbormaster completed remote builds in B107257: Diff 349291.Jun 2 2021, 9:27 AM

mkazantsev added inline comments.Jun 6 2021, 8:47 PM

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
356	I heard (and this info may be imprecise, out-of-date etc) that some parts of vectorization treat `ne` as its canonical form and don't recognize `lt`. As for me, it should never be a problem (`lt` always implies `ne`), but in practice it could be, just because how things are written now. I was asking if you are trying to deal with one of such cases, or it doesn't really matter. If it doesn't, `lt` is definitely better because it gives more info.

I'm getting incline towards accepting this, but the fact that you keep finding bugs worries me. Please give me some time, I will run a corpus of Fuzzer tests with this pass locally to see if something breaks. If yes, I will try to give you a reproducer.

My fuzzer didn't reveal any new failures, so there is a chance it's working as expected. :) LGTM with some nits.

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
33	Please lock it in anonymous namespace (see how it's done e.g. in InstSimplifypass.cpp), https://llvm.org/docs/CodingStandards.html#anonymous-namespaces
424	Please delete this.

This revision is now accepted and ready to land.Jun 6 2021, 11:54 PM

mkazantsev added inline comments.Jun 6 2021, 11:57 PM

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
424	UPD: never mind, I didn't notice it was used for begug print.

In D102234#2801815, @mkazantsev wrote:

My fuzzer didn't reveal any new failures, so there is a chance it's working as expected. :) LGTM with some nits.

Thanks for your kind help! @mkazantsev :)

I have tried to enable this pass in the pipeline of new pass manager after loop unroll experimentally.

if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink || !PGOOpt ||
    PGOOpt->Action != PGOOptions::SampleUse)
  LPM2.addPass(LoopFullUnrollPass(Level.getSpeedupLevel(),
                                  /* OnlyWhenForced= */ !PTO.LoopUnrolling,
                                  PTO.ForgetAllSCEVInLoopUnroll));
LPM2.addPass(LoopBoundSplitPass());

I found some failures from it and I fixed them. At this moment, there is no failure from llvm-test-suite and spec bencharmarks with enabling this pass as above.

Maybe, I will try to enable this pass in the pipeline of new pass manager later. If you have a idea about the location of this pass, please let me know. It will be very helpful.

After updating code following your comments, I will push this patch.

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
33	Yep, I will update it.

Following comments of @mkazantsev, updated code.

This revision was landed with ongoing or failed builds.Jun 7 2021, 2:56 AM

Closed by commit rGa2a0ac42abcb: [SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV (authored by jaykang10). · Explain Why

This revision was automatically updated to reflect the committed changes.

jaykang10 added a commit: rGa2a0ac42abcb: [SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV.

Harbormaster completed remote builds in B107933: Diff 350208.Jun 7 2021, 3:12 AM

I'm seeing crashes when trying to build 471.omnetpp with -O3 -flto on X86 when running the pass just before the vectorizer (as below). Please take a look.

diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index b07f966e3b7e..a3e3ed093ecb 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -1195,6 +1195,7 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
 /// TODO: Should LTO cause any differences to this set of passes?
 void PassBuilder::addVectorPasses(OptimizationLevel Level,
                                   FunctionPassManager &FPM, bool IsLTO) {
+  FPM.addPass(createFunctionToLoopPassAdaptor(LoopBoundSplitPass()));
   FPM.addPass(LoopVectorizePass(
       LoopVectorizeOptions(!PTO.LoopInterleaving, !PTO.LoopVectorization)));

In D102234#2802193, @fhahn wrote:

I'm seeing crashes when trying to build 471.omnetpp with -O3 -flto on X86 when running the pass just before the vectorizer (as below). Please take a look.

diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index b07f966e3b7e..a3e3ed093ecb 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -1195,6 +1195,7 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
 /// TODO: Should LTO cause any differences to this set of passes?
 void PassBuilder::addVectorPasses(OptimizationLevel Level,
                                   FunctionPassManager &FPM, bool IsLTO) {
+  FPM.addPass(createFunctionToLoopPassAdaptor(LoopBoundSplitPass()));
   FPM.addPass(LoopVectorizePass(
       LoopVectorizeOptions(!PTO.LoopInterleaving, !PTO.LoopVectorization)));

Ah, Thanks @fhahn! I will have a look.

Allen added a subscriber: Allen.Feb 6 2023, 6:23 PM

Allen added inline comments.

llvm/include/llvm/Transforms/Scalar/LoopBoundSplit.h
28	hi @jaykang10: Excuse me, as while (iv < n) will guard the split loop body doesn't execute, so the insert condition if (iv != n) is good for performance?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2023, 6:23 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

I add a MR, try to remove the loop guard in loop guard D143705.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Scalar/

LoopBoundSplit.h

42 lines

lib/

Passes/

PassBuilder.cpp

1 line

PassRegistry.def

1 line

Transforms/

Scalar/

CMakeLists.txt

1 line

LoopBoundSplit.cpp

439 lines

test/

Transforms/

LoopBoundSplit/

loop-bound-split.ll

453 lines

utils/

gn/

secondary/

llvm/

lib/

Transforms/

Scalar/

BUILD.gn

1 line

Diff 350215

llvm/include/llvm/Transforms/Scalar/LoopBoundSplit.h

This file was added.

				//===------- LoopBoundSplit.h - Split Loop Bound ----------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_SCALAR_LOOPBOUNDSPLIT_H
				#define LLVM_TRANSFORMS_SCALAR_LOOPBOUNDSPLIT_H

				#include "llvm/Analysis/LoopAnalysisManager.h"
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/IR/PassManager.h"
				#include "llvm/Transforms/Scalar/LoopPassManager.h"

				namespace llvm {

				/// This pass transforms loops that contain a conditional branch with induction
				/// variable. For example, it transforms left code to right code:
				///
				/// newbound = min(n, c)
				/// while (iv < n) { while(iv < newbound) {
				/// A A
				/// if (iv < c) B
				/// B C
				/// C }
				/// if (iv != n) {
				AllenUnsubmitted Not Done Reply Inline Actions hi @jaykang10: Excuse me, as while (iv < n) will guard the split loop body doesn't execute, so the insert condition if (iv != n) is good for performance? Allen: hi @jaykang10: Excuse me, as while (iv < n) will guard the split loop body doesn't execute…
				/// while (iv < n) {
				/// A
				/// C
				/// }
				/// }
				class LoopBoundSplitPass : public PassInfoMixin<LoopBoundSplitPass> {
				public:
				PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,
				LoopStandardAnalysisResults &AR, LPMUpdater &U);
				};

				} // end namespace llvm

				#endif // LLVM_TRANSFORMS_SCALAR_LOOPBOUNDSPLIT_H

llvm/lib/Passes/PassBuilder.cpp

	Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	#include "llvm/Transforms/Scalar/IVUsersPrinter.h"			#include "llvm/Transforms/Scalar/IVUsersPrinter.h"
	#include "llvm/Transforms/Scalar/IndVarSimplify.h"			#include "llvm/Transforms/Scalar/IndVarSimplify.h"
	#include "llvm/Transforms/Scalar/InductiveRangeCheckElimination.h"			#include "llvm/Transforms/Scalar/InductiveRangeCheckElimination.h"
	#include "llvm/Transforms/Scalar/InferAddressSpaces.h"			#include "llvm/Transforms/Scalar/InferAddressSpaces.h"
	#include "llvm/Transforms/Scalar/InstSimplifyPass.h"			#include "llvm/Transforms/Scalar/InstSimplifyPass.h"
	#include "llvm/Transforms/Scalar/JumpThreading.h"			#include "llvm/Transforms/Scalar/JumpThreading.h"
	#include "llvm/Transforms/Scalar/LICM.h"			#include "llvm/Transforms/Scalar/LICM.h"
	#include "llvm/Transforms/Scalar/LoopAccessAnalysisPrinter.h"			#include "llvm/Transforms/Scalar/LoopAccessAnalysisPrinter.h"
				#include "llvm/Transforms/Scalar/LoopBoundSplit.h"
	#include "llvm/Transforms/Scalar/LoopDataPrefetch.h"			#include "llvm/Transforms/Scalar/LoopDataPrefetch.h"
	#include "llvm/Transforms/Scalar/LoopDeletion.h"			#include "llvm/Transforms/Scalar/LoopDeletion.h"
	#include "llvm/Transforms/Scalar/LoopDistribute.h"			#include "llvm/Transforms/Scalar/LoopDistribute.h"
	#include "llvm/Transforms/Scalar/LoopFlatten.h"			#include "llvm/Transforms/Scalar/LoopFlatten.h"
	#include "llvm/Transforms/Scalar/LoopFuse.h"			#include "llvm/Transforms/Scalar/LoopFuse.h"
	#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"			#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"
	#include "llvm/Transforms/Scalar/LoopInstSimplify.h"			#include "llvm/Transforms/Scalar/LoopInstSimplify.h"
	#include "llvm/Transforms/Scalar/LoopInterchange.h"			#include "llvm/Transforms/Scalar/LoopInterchange.h"
	▲ Show 20 Lines • Show All 3,038 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 401 Lines • ▼ Show 20 Lines
	LOOP_PASS("loop-unroll-full", LoopFullUnrollPass())			LOOP_PASS("loop-unroll-full", LoopFullUnrollPass())
	LOOP_PASS("print-access-info", LoopAccessInfoPrinterPass(dbgs()))			LOOP_PASS("print-access-info", LoopAccessInfoPrinterPass(dbgs()))
	LOOP_PASS("print<ddg>", DDGAnalysisPrinterPass(dbgs()))			LOOP_PASS("print<ddg>", DDGAnalysisPrinterPass(dbgs()))
	LOOP_PASS("print<iv-users>", IVUsersPrinterPass(dbgs()))			LOOP_PASS("print<iv-users>", IVUsersPrinterPass(dbgs()))
	LOOP_PASS("print<loopnest>", LoopNestPrinterPass(dbgs()))			LOOP_PASS("print<loopnest>", LoopNestPrinterPass(dbgs()))
	LOOP_PASS("print<loop-cache-cost>", LoopCachePrinterPass(dbgs()))			LOOP_PASS("print<loop-cache-cost>", LoopCachePrinterPass(dbgs()))
	LOOP_PASS("loop-predication", LoopPredicationPass())			LOOP_PASS("loop-predication", LoopPredicationPass())
	LOOP_PASS("guard-widening", GuardWideningPass())			LOOP_PASS("guard-widening", GuardWideningPass())
				LOOP_PASS("loop-bound-split", LoopBoundSplitPass())
	LOOP_PASS("simple-loop-unswitch", SimpleLoopUnswitchPass())			LOOP_PASS("simple-loop-unswitch", SimpleLoopUnswitchPass())
	LOOP_PASS("loop-reroll", LoopRerollPass())			LOOP_PASS("loop-reroll", LoopRerollPass())
	LOOP_PASS("loop-versioning-licm", LoopVersioningLICMPass())			LOOP_PASS("loop-versioning-licm", LoopVersioningLICMPass())
	#undef LOOP_PASS			#undef LOOP_PASS

	#ifndef LOOP_PASS_WITH_PARAMS			#ifndef LOOP_PASS_WITH_PARAMS
	#define LOOP_PASS_WITH_PARAMS(NAME, CREATE_PASS, PARSER)			#define LOOP_PASS_WITH_PARAMS(NAME, CREATE_PASS, PARSER)
	#endif			#endif
	LOOP_PASS_WITH_PARAMS("unswitch",			LOOP_PASS_WITH_PARAMS("unswitch",
	[](bool NonTrivial) {			[](bool NonTrivial) {
	return SimpleLoopUnswitchPass(NonTrivial);			return SimpleLoopUnswitchPass(NonTrivial);
	},			},
	parseLoopUnswitchOptions)			parseLoopUnswitchOptions)
	#undef LOOP_PASS_WITH_PARAMS			#undef LOOP_PASS_WITH_PARAMS

llvm/lib/Transforms/Scalar/CMakeLists.txt

Show All 19 Lines	add_llvm_component_library(LLVMScalarOpts
IVUsersPrinter.cpp		IVUsersPrinter.cpp
InductiveRangeCheckElimination.cpp		InductiveRangeCheckElimination.cpp
IndVarSimplify.cpp		IndVarSimplify.cpp
InferAddressSpaces.cpp		InferAddressSpaces.cpp
InstSimplifyPass.cpp		InstSimplifyPass.cpp
JumpThreading.cpp		JumpThreading.cpp
LICM.cpp		LICM.cpp
LoopAccessAnalysisPrinter.cpp		LoopAccessAnalysisPrinter.cpp
		LoopBoundSplit.cpp
LoopSink.cpp		LoopSink.cpp
LoopDeletion.cpp		LoopDeletion.cpp
LoopDataPrefetch.cpp		LoopDataPrefetch.cpp
LoopDistribute.cpp		LoopDistribute.cpp
LoopFuse.cpp		LoopFuse.cpp
LoopIdiomRecognize.cpp		LoopIdiomRecognize.cpp
LoopInstSimplify.cpp		LoopInstSimplify.cpp
LoopInterchange.cpp		LoopInterchange.cpp
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp

This file was added.

				//===------- LoopBoundSplit.cpp - Split Loop Bound --------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Scalar/LoopBoundSplit.h"
				#include "llvm/Analysis/LoopAccessAnalysis.h"
				#include "llvm/Analysis/LoopAnalysisManager.h"
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/LoopIterator.h"
				#include "llvm/Analysis/LoopPass.h"
				#include "llvm/Analysis/MemorySSA.h"
				#include "llvm/Analysis/MemorySSAUpdater.h"
				#include "llvm/Analysis/ScalarEvolution.h"
				#include "llvm/Analysis/ScalarEvolutionExpressions.h"
				#include "llvm/IR/PatternMatch.h"
				#include "llvm/Transforms/Utils/BasicBlockUtils.h"
				#include "llvm/Transforms/Utils/Cloning.h"
				#include "llvm/Transforms/Utils/LoopSimplify.h"
				#include "llvm/Transforms/Utils/LoopUtils.h"
				#include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"

				#define DEBUG_TYPE "loop-bound-split"

				namespace llvm {

				mkazantsevUnsubmitted Not Done Reply Inline Actions namespace llvm { mkazantsev: namespace llvm {
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, I will update it. jaykang10: Yep, I will update it.
				using namespace PatternMatch;

				namespace {
				struct ConditionInfo {
				mkazantsevUnsubmitted Not Done Reply Inline Actions Please lock it in anonymous namespace (see how it's done e.g. in InstSimplifypass.cpp), https://llvm.org/docs/CodingStandards.html#anonymous-namespaces mkazantsev: Please lock it in anonymous namespace (see how it's done e.g. in InstSimplifypass.cpp), https…
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, I will update it. jaykang10: Yep, I will update it.
				/// Branch instruction with this condition
				BranchInst *BI;
				/// ICmp instruction with this condition
				ICmpInst *ICmp;
				/// Preciate info
				ICmpInst::Predicate Pred;
				/// AddRec llvm value
				Value *AddRecValue;
				/// Bound llvm value
				Value *BoundValue;
				/// AddRec SCEV
				const SCEV *AddRecSCEV;
				/// Bound SCEV
				const SCEV *BoundSCEV;

				mkazantsevUnsubmitted Not Done Reply Inline Actions Instead of this, consider flipping the predicate. mkazantsev: Instead of this, consider flipping the predicate.
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, let me change it. jaykang10: Yep, let me change it.
				ConditionInfo()
				: BI(nullptr), ICmp(nullptr), Pred(ICmpInst::BAD_ICMP_PREDICATE),
				AddRecValue(nullptr), BoundValue(nullptr), AddRecSCEV(nullptr),
				BoundSCEV(nullptr) {}
				};
				} // namespace

				static void analyzeICmp(ScalarEvolution &SE, ICmpInst *ICmp,
				ConditionInfo &Cond) {
				Cond.ICmp = ICmp;
				if (match(ICmp, m_ICmp(Cond.Pred, m_Value(Cond.AddRecValue),
				m_Value(Cond.BoundValue)))) {
				Cond.AddRecSCEV = SE.getSCEV(Cond.AddRecValue);
				Cond.BoundSCEV = SE.getSCEV(Cond.BoundValue);
				// Locate AddRec in LHSSCEV and Bound in RHSSCEV.
				if (isa<SCEVAddRecExpr>(Cond.BoundSCEV) &&
				!isa<SCEVAddRecExpr>(Cond.AddRecSCEV)) {
				std::swap(Cond.AddRecValue, Cond.BoundValue);
				std::swap(Cond.AddRecSCEV, Cond.BoundSCEV);
				Cond.Pred = ICmpInst::getSwappedPredicate(Cond.Pred);
				}
				}
				}

				static bool calculateUpperBound(const Loop &L, ScalarEvolution &SE,
				ConditionInfo &Cond, bool IsExitCond) {
				if (IsExitCond) {
				const SCEV *ExitCount = SE.getExitCount(&L, Cond.ICmp->getParent());
				if (isa<SCEVCouldNotCompute>(ExitCount))
				mkazantsevUnsubmitted Not Done Reply Inline Actions General notion: all this code is very hard of read because of all lambdae inlined. Can they be separate functions? mkazantsev: General notion: all this code is very hard of read because of all lambdae inlined. Can they be…
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Sorry for inconvenient. I will move them to functions. jaykang10: Sorry for inconvenient. I will move them to functions.
				return false;

				Cond.BoundSCEV = ExitCount;
				return true;
				}

				// For non-exit condtion, if pred is LT, keep existing bound.
				if (Cond.Pred == ICmpInst::ICMP_SLT \|\| Cond.Pred == ICmpInst::ICMP_ULT)
				return true;

				// For non-exit condition, if pre is LE, try to convert it to LT.
				// Range Range
				mkazantsevUnsubmitted Not Done Reply Inline Actions nit: I'd suggest resturure it as if (pred != ULE && pred != SLE) return false; this will reduce the nest for the biggest code piece. Just an idea. mkazantsev: nit: I'd suggest resturure it as ``` if (pred != ULE && pred != SLE) return false; ```…
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, I will update it. jaykang10: Yep, I will update it.
				// AddRec <= Bound --> AddRec < Bound + 1
				if (Cond.Pred != ICmpInst::ICMP_ULE && Cond.Pred != ICmpInst::ICMP_SLE)
				return false;

				if (IntegerType *BoundSCEVIntType =
				dyn_cast<IntegerType>(Cond.BoundSCEV->getType())) {
				unsigned BitWidth = BoundSCEVIntType->getBitWidth();
				APInt Max = ICmpInst::isSigned(Cond.Pred)
				? APInt::getSignedMaxValue(BitWidth)
				mkazantsevUnsubmitted Not Done Reply Inline Actions You can sink this computation under the condition `bound < max`. mkazantsev: You can sink this computation under the condition `bound < max`.
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, I will move it. jaykang10: Yep, I will move it.
				: APInt::getMaxValue(BitWidth);
				const SCEV *MaxSCEV = SE.getConstant(Max);
				// Check Bound < INT_MAX
				ICmpInst::Predicate Pred =
				ICmpInst::isSigned(Cond.Pred) ? ICmpInst::ICMP_SLT : ICmpInst::ICMP_ULT;
				if (SE.isKnownPredicate(Pred, Cond.BoundSCEV, MaxSCEV)) {
				const SCEV *BoundPlusOneSCEV =
				SE.getAddExpr(Cond.BoundSCEV, SE.getOne(BoundSCEVIntType));
				Cond.BoundSCEV = BoundPlusOneSCEV;
				Cond.Pred = Pred;
				return true;
				mkazantsevUnsubmitted Not Done Reply Inline Actions Please add a TODO to handle ICMP_NE/EQ (I guess it was in the earlier versions of this patch and still good to support, but not necessarily in this revision). mkazantsev: Please add a TODO to handle ICMP_NE/EQ (I guess it was in the earlier versions of this patch…
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, I will add ToDo for it. jaykang10: Yep, I will add ToDo for it.
				}
				}

				// ToDo: Support ICMP_NE/EQ.

				return false;
				}

				static bool hasProcessableCondition(const Loop &L, ScalarEvolution &SE,
				ICmpInst *ICmp, ConditionInfo &Cond,
				bool IsExitCond) {
				analyzeICmp(SE, ICmp, Cond);

				// The BoundSCEV should be evaluated at loop entry.
				if (!SE.isAvailableAtLoopEntry(Cond.BoundSCEV, &L))
				return false;

				const SCEVAddRecExpr *AddRecSCEV = dyn_cast<SCEVAddRecExpr>(Cond.AddRecSCEV);
				// Allowed AddRec as induction variable.
				if (!AddRecSCEV)
				return false;

				if (!AddRecSCEV->isAffine())
				mkazantsevUnsubmitted Not Done Reply Inline Actions ConstantInt StepCI = dyn_cast<SCEVConstant>(StepRecSCEV)->getValue(); if (!StepCI \|\| !StepCI->isPositive()) return false; ... mkazantsev:* ``` ConstantInt *StepCI = dyn_cast<SCEVConstant>(StepRecSCEV)->getValue(); if (!StepCI \|\| !
				jaykang10AuthorUnsubmitted Done Reply Inline Actions It looked there is no `isPositive` in ConstantInt. I will update it with isNegative() and isZero(), jaykang10: It looked there is no `isPositive` in ConstantInt. I will update it with isNegative() and…
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Sorry, I have found a test which has StepRecSCEV is not SCEVConstant. In this case, the nullptr->getValue() causes segment fault. I will re-update it. jaykang10: Sorry, I have found a test which has StepRecSCEV is not SCEVConstant. In this case, the nullptr…
				return false;

				const SCEV *StepRecSCEV = AddRecSCEV->getStepRecurrence(SE);
				// Allowed constant step.
				if (!isa<SCEVConstant>(StepRecSCEV))
				return false;

				ConstantInt *StepCI = cast<SCEVConstant>(StepRecSCEV)->getValue();
				// Allowed positive step for now.
				// TODO: Support negative step.
				if (StepCI->isNegative() \|\| StepCI->isZero())
				return false;

				// Calculate upper bound.
				if (!calculateUpperBound(L, SE, Cond, IsExitCond))
				return false;

				return true;
				}

				static bool isProcessableCondBI(const ScalarEvolution &SE,
				mkazantsevUnsubmitted Not Done Reply Inline Actions Could declare as `Value LHS, RHS;` Just an idea. mkazantsev: Could declare as `Value LHS, RHS;` Just an idea.
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, I will update it. jaykang10: Yep, I will update it.
				const BranchInst *BI) {
				BasicBlock *TrueSucc = nullptr;
				BasicBlock *FalseSucc = nullptr;
				ICmpInst::Predicate Pred;
				Value LHS, RHS;
				mkazantsevUnsubmitted Not Done Reply Inline Actions How about checking `isSCEVableType` for comparison arguments right here? mkazantsev: How about checking `isSCEVableType` for comparison arguments right here?
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, I will add checks with `isSCEVable(Type)` for the comparison arguments. jaykang10: Yep, I will add checks with `isSCEVable(Type)` for the comparison arguments.
				mkazantsevUnsubmitted Not Done Reply Inline Actions It's enough to check LHS type and assert on RHS type. mkazantsev: It's enough to check LHS type and assert on RHS type.
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, I will update it. jaykang10: Yep, I will update it.
				if (!match(BI, m_Br(m_ICmp(Pred, m_Value(LHS), m_Value(RHS)),
				m_BasicBlock(TrueSucc), m_BasicBlock(FalseSucc))))
				return false;

				if (!SE.isSCEVable(LHS->getType()))
				return false;
				assert(SE.isSCEVable(RHS->getType()) && "Expected RHS's type is SCEVable");

				if (TrueSucc == FalseSucc)
				mkazantsevUnsubmitted Not Done Reply Inline Actions canSplit... mkazantsev: canSplit...
				jaykang10AuthorUnsubmitted Done Reply Inline Actions sorry for inconvenient. I will update it. jaykang10: sorry for inconvenient. I will update it.
				return false;
				mkazantsevUnsubmitted Not Done Reply Inline Actions Consider making params `const` where suitable. mkazantsev: Consider making params `const` where suitable.
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, let me try to add it. jaykang10: Yep, let me try to add it.

				return true;
				}

				static bool canSplitLoopBound(const Loop &L, const DominatorTree &DT,
				ScalarEvolution &SE, ConditionInfo &Cond) {
				// Skip function with optsize.
				if (L.getHeader()->getParent()->hasOptSize())
				return false;

				// Split only innermost loop.
				if (!L.isInnermost())
				return false;

				// Check loop is in simplified form.
				if (!L.isLoopSimplifyForm())
				return false;

				// Check loop is in LCSSA form.
				if (!L.isLCSSAForm(DT))
				return false;

				// Skip loop that cannot be cloned.
				if (!L.isSafeToClone())
				return false;

				BasicBlock *ExitingBB = L.getExitingBlock();
				// Assumed only one exiting block.
				if (!ExitingBB)
				return false;

				BranchInst *ExitingBI = dyn_cast<BranchInst>(ExitingBB->getTerminator());
				if (!ExitingBI)
				return false;

				// Allowed only conditional branch with ICmp.
				if (!isProcessableCondBI(SE, ExitingBI))
				return false;

				// Check the condition is processable.
				ICmpInst *ICmp = cast<ICmpInst>(ExitingBI->getCondition());
				if (!hasProcessableCondition(L, SE, ICmp, Cond, /IsExitCond/ true))
				return false;

				Cond.BI = ExitingBI;
				return true;
				}

				static bool isProfitableToTransform(const Loop &L, const BranchInst *BI) {
				// If the conditional branch splits a loop into two halves, we could
				mkazantsevUnsubmitted Not Done Reply Inline Actions Why is that needed? mkazantsev: Why is that needed?
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Ah, I have tried to follow the below comment from @reames. For a profitability check, I'd start specifically with the case where the condition precisely splits a loop into two halves. e.g. for ... { if (C) { body1 } else { body2 }. This is the easiest to believe is generally profitable, and we can generalize the heuristic selection later. For the condition precisely splits a loop into two halves, I have checked that the conditional branch is in header and the join point is latch. Let me remove them. jaykang10: Ah, I have tried to follow the below comment from @reames. > For a profitability check, I'd…
				// generally say it is profitable.
				//
				// ToDo: Add more profitable cases here.

				mkazantsevUnsubmitted Not Done Reply Inline Actions This is still a very strict limitation I think. I can always split critical edges if you need it. I'm fine if you just add a TODO to consider it in the future. mkazantsev: This is still a very strict limitation I think. I can always split critical edges if you need…
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, I will remove the checks with single predecessor. jaykang10: Yep, I will remove the checks with single predecessor.
				// Check this branch causes diamond CFG.
				BasicBlock *Succ0 = BI->getSuccessor(0);
				BasicBlock *Succ1 = BI->getSuccessor(1);

				BasicBlock *Succ0Succ = Succ0->getSingleSuccessor();
				BasicBlock *Succ1Succ = Succ1->getSingleSuccessor();
				if (!Succ0Succ \|\| !Succ1Succ \|\| Succ0Succ != Succ1Succ)
				return false;

				// ToDo: Calculate each successor's instruction cost.

				return true;
				}

				static BranchInst *findSplitCandidate(const Loop &L, ScalarEvolution &SE,
				ConditionInfo &ExitingCond,
				ConditionInfo &SplitCandidateCond) {
				mkazantsevUnsubmitted Not Done Reply Inline Actions It is already checked by `IsProcessableCondBI` , you can use `cast` instead of `dyn_cast`. `isa<IntegerType>` --> `SE.isSCEVableType` and move it inside of `IsProcessableCondBI`. mkazantsev: 1. It is already checked by `IsProcessableCondBI `, you can use `cast` instead of `dyn_cast`. 2.
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, I will update it. jaykang10: Yep, I will update it.
				for (auto *BB : L.blocks()) {
				// Skip condition of backedge.
				if (L.getLoopLatch() == BB)
				continue;

				auto *BI = dyn_cast<BranchInst>(BB->getTerminator());
				mkazantsevUnsubmitted Not Done Reply Inline Actions Consider marking params as `const` where suitable. mkazantsev: Consider marking params as `const` where suitable.
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, I will update it. jaykang10: Yep, I will update it.
				if (!BI)
				continue;

				// Check conditional branch with ICmp.
				if (!isProcessableCondBI(SE, BI))
				continue;

				// Skip loop invariant condition.
				mkazantsevUnsubmitted Not Done Reply Inline Actions That would be more natural to just return `BI` or `nullptr` from it (might also require function renaming). mkazantsev: That would be more natural to just return `BI` or `nullptr` from it (might also require…
				jaykang10AuthorUnsubmitted Done Reply Inline Actions Yep, I will update it. jaykang10: Yep, I will update it.
				if (L.isLoopInvariant(BI->getCondition()))
				continue;

				// Check the condition is processable.
				ICmpInst *ICmp = cast<ICmpInst>(BI->getCondition());
				if (!hasProcessableCondition(L, SE, ICmp, SplitCandidateCond,
				/IsExitCond/ false))
				continue;

				if (ExitingCond.BoundSCEV->getType() !=
				SplitCandidateCond.BoundSCEV->getType())
				continue;

				SplitCandidateCond.BI = BI;
				return BI;
				}

				return nullptr;
				}

				static bool splitLoopBound(Loop &L, DominatorTree &DT, LoopInfo &LI,
				ScalarEvolution &SE, LPMUpdater &U) {
				ConditionInfo SplitCandidateCond;
				ConditionInfo ExitingCond;

				// Check we can split this loop's bound.
				mkazantsevUnsubmitted Not Done Reply Inline Actions What if both of them are nullptr? mkazantsev: What if both of them are nullptr?
				mkazantsevUnsubmitted Not Done Reply Inline Actions Ah I see, they will fail the next check. mkazantsev: Ah I see, they will fail the next check.
				if (!canSplitLoopBound(L, DT, SE, ExitingCond))
				return false;

				if (!findSplitCandidate(L, SE, ExitingCond, SplitCandidateCond))
				return false;

				if (!isProfitableToTransform(L, SplitCandidateCond.BI))
				return false;

				// Now, we have a split candidate. Let's build a form as below.
				// +--------------------+
				// \| preheader \|
				// \| set up newbound \|
				// +--------------------+
				// \| /----------------\
				// +--------v----v------+ \|
				// \| header \|---\ \|
				// \| with true condition\| \| \|
				// +--------------------+ \| \|
				// \| \| \|
				// +--------v-----------+ \| \|
				// \| if.then.BB \| \| \|
				// +--------------------+ \| \|
				// \| \| \|
				// +--------v-----------<---/ \|
				// \| latch >----------/
				// \| with newbound \|
				// +--------------------+
				// \|
				// +--------v-----------+
				// \| preheader2 \|--------------\
				// \| if (AddRec i != \| \|
				// \| org bound) \| \|
				// +--------------------+ \|
				// \| /----------------\ \|
				// +--------v----v------+ \| \|
				// \| header2 \|---\ \| \|
				// \| conditional branch \| \| \| \|
				// \|with false condition\| \| \| \|
				// +--------------------+ \| \| \|
				// \| \| \| \|
				// +--------v-----------+ \| \| \|
				// \| if.then.BB2 \| \| \| \|
				// +--------------------+ \| \| \|
				// \| \| \| \|
				// +--------v-----------<---/ \| \|
				// \| latch2 >----------/ \|
				// \| with org bound \| \|
				// +--------v-----------+ \|
				// \| \|
				// \| +---------------+ \|
				// +--> exit <-------/
				// +---------------+

				// Let's create post loop.
				SmallVector<BasicBlock *, 8> PostLoopBlocks;
				Loop *PostLoop;
				ValueToValueMapTy VMap;
				BasicBlock *PreHeader = L.getLoopPreheader();
				BasicBlock *SplitLoopPH = SplitEdge(PreHeader, L.getHeader(), &DT, &LI);
				PostLoop = cloneLoopWithPreheader(L.getExitBlock(), SplitLoopPH, &L, VMap,
				".split", &LI, &DT, PostLoopBlocks);
				remapInstructionsInBlocks(PostLoopBlocks, VMap);

				// Add conditional branch to check we can skip post-loop in its preheader.
				BasicBlock *PostLoopPreHeader = PostLoop->getLoopPreheader();
				IRBuilder<> Builder(PostLoopPreHeader);
				Instruction *OrigBI = PostLoopPreHeader->getTerminator();
				ICmpInst::Predicate Pred = ICmpInst::ICMP_NE;
				Value *Cond =
				Builder.CreateICmp(Pred, ExitingCond.AddRecValue, ExitingCond.BoundValue);
				Builder.CreateCondBr(Cond, PostLoop->getHeader(), PostLoop->getExitBlock());
				OrigBI->eraseFromParent();

				// Create new loop bound and add it into preheader of pre-loop.
				const SCEV *NewBoundSCEV = ExitingCond.BoundSCEV;
				const SCEV *SplitBoundSCEV = SplitCandidateCond.BoundSCEV;
				mkazantsevUnsubmitted Not Done Reply Inline Actions Two points here: Functional concern. Will NE work ok for step other than 1? `lt` generally gives more info to the opt than `ne` (at least because `lt` implies `ne`). Any reason for `ne` here? mkazantsev: Two points here: 1. Functional concern. Will NE work ok for step other than 1? 2. `lt`…
				jaykang10AuthorUnsubmitted Done Reply Inline Actions I have tried to follow below comment from @reames. The "if (i != N)" is a loop guard and can be identified by getLoopGuardBranch. I was also not sure whether `ne` is better than `lt` here... jaykang10: I have tried to follow below comment from @reames. > The "if (i != N)" is a loop guard and can…
				mkazantsevUnsubmitted Not Done Reply Inline Actions I heard (and this info may be imprecise, out-of-date etc) that some parts of vectorization treat `ne` as its canonical form and don't recognize `lt`. As for me, it should never be a problem (`lt` always implies `ne`), but in practice it could be, just because how things are written now. I was asking if you are trying to deal with one of such cases, or it doesn't really matter. If it doesn't, `lt` is definitely better because it gives more info. mkazantsev: I heard (and this info may be imprecise, out-of-date etc) that some parts of vectorization…
				NewBoundSCEV = ICmpInst::isSigned(ExitingCond.Pred)
				? SE.getSMinExpr(NewBoundSCEV, SplitBoundSCEV)
				: SE.getUMinExpr(NewBoundSCEV, SplitBoundSCEV);

				SCEVExpander Expander(
				SE, L.getHeader()->getParent()->getParent()->getDataLayout(), "split");
				Instruction *InsertPt = SplitLoopPH->getTerminator();
				Value *NewBoundValue =
				Expander.expandCodeFor(NewBoundSCEV, NewBoundSCEV->getType(), InsertPt);
				NewBoundValue->setName("new.bound");

				// Replace exiting bound value of pre-loop NewBound.
				ExitingCond.ICmp->setOperand(1, NewBoundValue);

				// Replace IV's start value of post-loop by NewBound.
				for (PHINode &PN : L.getHeader()->phis()) {
				// Find PHI with exiting condition from pre-loop.
				if (isa<SCEVAddRecExpr>(SE.getSCEV(&PN))) {
				for (Value *Op : PN.incoming_values()) {
				if (Op == ExitingCond.AddRecValue) {
				// Find cloned PHI for post-loop.
				PHINode *PostLoopPN = cast<PHINode>(VMap[&PN]);
				PostLoopPN->setIncomingValueForBlock(PostLoopPreHeader,
				NewBoundValue);
				}
				}
				}
				}

				// Replace SplitCandidateCond.BI's condition of pre-loop by True.
				LLVMContext &Context = PreHeader->getContext();
				SplitCandidateCond.BI->setCondition(ConstantInt::getTrue(Context));

				// Replace cloned SplitCandidateCond.BI's condition in post-loop by False.
				BranchInst *ClonedSplitCandidateBI =
				cast<BranchInst>(VMap[SplitCandidateCond.BI]);
				ClonedSplitCandidateBI->setCondition(ConstantInt::getFalse(Context));

				// Replace exit branch target of pre-loop by post-loop's preheader.
				if (L.getExitBlock() == ExitingCond.BI->getSuccessor(0))
				ExitingCond.BI->setSuccessor(0, PostLoopPreHeader);
				else
				ExitingCond.BI->setSuccessor(1, PostLoopPreHeader);

				// Update dominator tree.
				DT.changeImmediateDominator(PostLoopPreHeader, L.getExitingBlock());
				DT.changeImmediateDominator(PostLoop->getExitBlock(), PostLoopPreHeader);

				// Invalidate cached SE information.
				SE.forgetLoop(&L);

				// Canonicalize loops.
				// TODO: Try to update LCSSA information according to above change.
				formLCSSA(L, DT, &LI, &SE);
				simplifyLoop(&L, &DT, &LI, &SE, nullptr, nullptr, true);
				formLCSSA(*PostLoop, DT, &LI, &SE);
				simplifyLoop(PostLoop, &DT, &LI, &SE, nullptr, nullptr, true);

				// Add new post-loop to loop pass manager.
				U.addSiblingLoops(PostLoop);

				return true;
				}

				PreservedAnalyses LoopBoundSplitPass::run(Loop &L, LoopAnalysisManager &AM,
				LoopStandardAnalysisResults &AR,
				LPMUpdater &U) {
				Function &F = *L.getHeader()->getParent();
				mkazantsevUnsubmitted Not Done Reply Inline Actions Please delete this. mkazantsev: Please delete this.
				mkazantsevUnsubmitted Not Done Reply Inline Actions UPD: never mind, I didn't notice it was used for begug print. mkazantsev: UPD: never mind, I didn't notice it was used for begug print.
				(void)F;

				LLVM_DEBUG(dbgs() << "Spliting bound of loop in " << F.getName() << ": " << L
				<< "\n");

				if (!splitLoopBound(L, AR.DT, AR.LI, AR.SE, U))
				return PreservedAnalyses::all();

				assert(AR.DT.verify(DominatorTree::VerificationLevel::Fast));
				AR.LI.verify(AR.DT);

				return getLoopPassPreservedAnalyses();
				}

				} // end namespace llvm

llvm/test/Transforms/LoopBoundSplit/loop-bound-split.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes=loop-bound-split -S < %s \| FileCheck %s

				define void @split_loop_bound_inc_with_sgt(i64 %a, i64* noalias %src, i64* noalias %dst, i64 %n) {
				; CHECK-LABEL: @split_loop_bound_inc_with_sgt(
				; CHECK-NEXT: loop.ph:
				; CHECK-NEXT: br label [[LOOP_PH_SPLIT:%.*]]
				; CHECK: loop.ph.split:
				; CHECK-NEXT: [[SMAX:%.]] = call i64 @llvm.smax.i64(i64 [[N:%.]], i64 0)
				; CHECK-NEXT: [[NEW_BOUND:%.]] = call i64 @llvm.smin.i64(i64 [[A:%.]], i64 [[SMAX]])
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_INC:%.*]] ], [ 0, [[LOOP_PH_SPLIT]] ]
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[IV]], [[A]]
				; CHECK-NEXT: br i1 true, label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: [[SRC_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[SRC:%.*]], i64 [[IV]]
				; CHECK-NEXT: [[VAL:%.]] = load i64, i64 [[SRC_ARRAYIDX]], align 4
				; CHECK-NEXT: [[DST_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[DST:%.*]], i64 [[IV]]
				; CHECK-NEXT: store i64 [[VAL]], i64* [[DST_ARRAYIDX]], align 4
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: if.else:
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[COND:%.*]] = icmp sgt i64 [[INC]], [[NEW_BOUND]]
				; CHECK-NEXT: br i1 [[COND]], label [[LOOP_PH_SPLIT_SPLIT:%.*]], label [[LOOP]]
				; CHECK: loop.ph.split.split:
				; CHECK-NEXT: [[INC_LCSSA:%.*]] = phi i64 [ [[INC]], [[FOR_INC]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = icmp ne i64 [[INC_LCSSA]], [[N]]
				; CHECK-NEXT: br i1 [[TMP0]], label [[LOOP_SPLIT_PREHEADER:%.]], label [[EXIT:%.]]
				; CHECK: loop.split.preheader:
				; CHECK-NEXT: br label [[LOOP_SPLIT:%.*]]
				; CHECK: loop.split:
				; CHECK-NEXT: [[IV_SPLIT:%.]] = phi i64 [ [[INC_SPLIT:%.]], [[FOR_INC_SPLIT:%.*]] ], [ [[NEW_BOUND]], [[LOOP_SPLIT_PREHEADER]] ]
				; CHECK-NEXT: [[CMP_SPLIT:%.*]] = icmp slt i64 [[IV_SPLIT]], [[A]]
				; CHECK-NEXT: br i1 false, label [[IF_THEN_SPLIT:%.]], label [[IF_ELSE_SPLIT:%.]]
				; CHECK: if.else.split:
				; CHECK-NEXT: br label [[FOR_INC_SPLIT]]
				; CHECK: if.then.split:
				; CHECK-NEXT: [[SRC_ARRAYIDX_SPLIT:%.]] = getelementptr inbounds i64, i64 [[SRC]], i64 [[IV_SPLIT]]
				; CHECK-NEXT: [[VAL_SPLIT:%.]] = load i64, i64 [[SRC_ARRAYIDX_SPLIT]], align 4
				; CHECK-NEXT: [[DST_ARRAYIDX_SPLIT:%.]] = getelementptr inbounds i64, i64 [[DST]], i64 [[IV_SPLIT]]
				; CHECK-NEXT: store i64 [[VAL_SPLIT]], i64* [[DST_ARRAYIDX_SPLIT]], align 4
				; CHECK-NEXT: br label [[FOR_INC_SPLIT]]
				; CHECK: for.inc.split:
				; CHECK-NEXT: [[INC_SPLIT]] = add nuw nsw i64 [[IV_SPLIT]], 1
				; CHECK-NEXT: [[COND_SPLIT:%.*]] = icmp sgt i64 [[INC_SPLIT]], [[N]]
				; CHECK-NEXT: br i1 [[COND_SPLIT]], label [[EXIT_LOOPEXIT:%.*]], label [[LOOP_SPLIT]]
				; CHECK: exit.loopexit:
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				loop.ph:
				br label %loop

				loop:
				%iv = phi i64 [ %inc, %for.inc ], [ 0, %loop.ph ]
				%cmp = icmp slt i64 %iv, %a
				br i1 %cmp, label %if.then, label %if.else

				if.then:
				%src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
				%val = load i64, i64* %src.arrayidx
				%dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
				store i64 %val, i64* %dst.arrayidx
				br label %for.inc

				if.else:
				br label %for.inc

				for.inc:
				%inc = add nuw nsw i64 %iv, 1
				%cond = icmp sgt i64 %inc, %n
				br i1 %cond, label %exit, label %loop

				exit:
				ret void
				}

				define void @split_loop_bound_inc_with_eq(i64 %a, i64* noalias %src, i64* noalias %dst, i64 %n) {
				; CHECK-LABEL: @split_loop_bound_inc_with_eq(
				; CHECK-NEXT: loop.ph:
				; CHECK-NEXT: br label [[LOOP_PH_SPLIT:%.*]]
				; CHECK: loop.ph.split:
				; CHECK-NEXT: [[TMP0:%.]] = add i64 [[N:%.]], -1
				; CHECK-NEXT: [[NEW_BOUND:%.]] = call i64 @llvm.umin.i64(i64 [[A:%.]], i64 [[TMP0]])
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_INC:%.*]] ], [ 0, [[LOOP_PH_SPLIT]] ]
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[IV]], [[A]]
				; CHECK-NEXT: br i1 true, label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: [[SRC_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[SRC:%.*]], i64 [[IV]]
				; CHECK-NEXT: [[VAL:%.]] = load i64, i64 [[SRC_ARRAYIDX]], align 4
				; CHECK-NEXT: [[DST_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[DST:%.*]], i64 [[IV]]
				; CHECK-NEXT: store i64 [[VAL]], i64* [[DST_ARRAYIDX]], align 4
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: if.else:
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[COND:%.*]] = icmp eq i64 [[INC]], [[NEW_BOUND]]
				; CHECK-NEXT: br i1 [[COND]], label [[LOOP_PH_SPLIT_SPLIT:%.*]], label [[LOOP]]
				; CHECK: loop.ph.split.split:
				; CHECK-NEXT: [[INC_LCSSA:%.*]] = phi i64 [ [[INC]], [[FOR_INC]] ]
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ne i64 [[INC_LCSSA]], [[N]]
				; CHECK-NEXT: br i1 [[TMP1]], label [[LOOP_SPLIT_PREHEADER:%.]], label [[EXIT:%.]]
				; CHECK: loop.split.preheader:
				; CHECK-NEXT: br label [[LOOP_SPLIT:%.*]]
				; CHECK: loop.split:
				; CHECK-NEXT: [[IV_SPLIT:%.]] = phi i64 [ [[INC_SPLIT:%.]], [[FOR_INC_SPLIT:%.*]] ], [ [[NEW_BOUND]], [[LOOP_SPLIT_PREHEADER]] ]
				; CHECK-NEXT: [[CMP_SPLIT:%.*]] = icmp slt i64 [[IV_SPLIT]], [[A]]
				; CHECK-NEXT: br i1 false, label [[IF_THEN_SPLIT:%.]], label [[IF_ELSE_SPLIT:%.]]
				; CHECK: if.else.split:
				; CHECK-NEXT: br label [[FOR_INC_SPLIT]]
				; CHECK: if.then.split:
				; CHECK-NEXT: [[SRC_ARRAYIDX_SPLIT:%.]] = getelementptr inbounds i64, i64 [[SRC]], i64 [[IV_SPLIT]]
				; CHECK-NEXT: [[VAL_SPLIT:%.]] = load i64, i64 [[SRC_ARRAYIDX_SPLIT]], align 4
				; CHECK-NEXT: [[DST_ARRAYIDX_SPLIT:%.]] = getelementptr inbounds i64, i64 [[DST]], i64 [[IV_SPLIT]]
				; CHECK-NEXT: store i64 [[VAL_SPLIT]], i64* [[DST_ARRAYIDX_SPLIT]], align 4
				; CHECK-NEXT: br label [[FOR_INC_SPLIT]]
				; CHECK: for.inc.split:
				; CHECK-NEXT: [[INC_SPLIT]] = add nuw nsw i64 [[IV_SPLIT]], 1
				; CHECK-NEXT: [[COND_SPLIT:%.*]] = icmp eq i64 [[INC_SPLIT]], [[N]]
				; CHECK-NEXT: br i1 [[COND_SPLIT]], label [[EXIT_LOOPEXIT:%.*]], label [[LOOP_SPLIT]]
				; CHECK: exit.loopexit:
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				loop.ph:
				br label %loop

				loop:
				%iv = phi i64 [ %inc, %for.inc ], [ 0, %loop.ph ]
				%cmp = icmp slt i64 %iv, %a
				br i1 %cmp, label %if.then, label %if.else

				if.then:
				%src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
				%val = load i64, i64* %src.arrayidx
				%dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
				store i64 %val, i64* %dst.arrayidx
				br label %for.inc

				if.else:
				br label %for.inc

				for.inc:
				%inc = add nuw nsw i64 %iv, 1
				%cond = icmp eq i64 %inc, %n
				br i1 %cond, label %exit, label %loop

				exit:
				ret void
				}

				define void @split_loop_bound_inc_with_sge(i64 %a, i64* noalias %src, i64* noalias %dst, i64 %n) {
				; CHECK-LABEL: @split_loop_bound_inc_with_sge(
				; CHECK-NEXT: loop.ph:
				; CHECK-NEXT: br label [[LOOP_PH_SPLIT:%.*]]
				; CHECK: loop.ph.split:
				; CHECK-NEXT: [[SMAX:%.]] = call i64 @llvm.smax.i64(i64 [[N:%.]], i64 1)
				; CHECK-NEXT: [[TMP0:%.*]] = add nsw i64 [[SMAX]], -1
				; CHECK-NEXT: [[NEW_BOUND:%.]] = call i64 @llvm.smin.i64(i64 [[A:%.]], i64 [[TMP0]])
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_INC:%.*]] ], [ 0, [[LOOP_PH_SPLIT]] ]
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[IV]], [[A]]
				; CHECK-NEXT: br i1 true, label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: [[SRC_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[SRC:%.*]], i64 [[IV]]
				; CHECK-NEXT: [[VAL:%.]] = load i64, i64 [[SRC_ARRAYIDX]], align 4
				; CHECK-NEXT: [[DST_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[DST:%.*]], i64 [[IV]]
				; CHECK-NEXT: store i64 [[VAL]], i64* [[DST_ARRAYIDX]], align 4
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: if.else:
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[COND:%.*]] = icmp sge i64 [[INC]], [[NEW_BOUND]]
				; CHECK-NEXT: br i1 [[COND]], label [[LOOP_PH_SPLIT_SPLIT:%.*]], label [[LOOP]]
				; CHECK: loop.ph.split.split:
				; CHECK-NEXT: [[INC_LCSSA:%.*]] = phi i64 [ [[INC]], [[FOR_INC]] ]
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ne i64 [[INC_LCSSA]], [[N]]
				; CHECK-NEXT: br i1 [[TMP1]], label [[LOOP_SPLIT_PREHEADER:%.]], label [[EXIT:%.]]
				; CHECK: loop.split.preheader:
				; CHECK-NEXT: br label [[LOOP_SPLIT:%.*]]
				; CHECK: loop.split:
				; CHECK-NEXT: [[IV_SPLIT:%.]] = phi i64 [ [[INC_SPLIT:%.]], [[FOR_INC_SPLIT:%.*]] ], [ [[NEW_BOUND]], [[LOOP_SPLIT_PREHEADER]] ]
				; CHECK-NEXT: [[CMP_SPLIT:%.*]] = icmp slt i64 [[IV_SPLIT]], [[A]]
				; CHECK-NEXT: br i1 false, label [[IF_THEN_SPLIT:%.]], label [[IF_ELSE_SPLIT:%.]]
				; CHECK: if.else.split:
				; CHECK-NEXT: br label [[FOR_INC_SPLIT]]
				; CHECK: if.then.split:
				; CHECK-NEXT: [[SRC_ARRAYIDX_SPLIT:%.]] = getelementptr inbounds i64, i64 [[SRC]], i64 [[IV_SPLIT]]
				; CHECK-NEXT: [[VAL_SPLIT:%.]] = load i64, i64 [[SRC_ARRAYIDX_SPLIT]], align 4
				; CHECK-NEXT: [[DST_ARRAYIDX_SPLIT:%.]] = getelementptr inbounds i64, i64 [[DST]], i64 [[IV_SPLIT]]
				; CHECK-NEXT: store i64 [[VAL_SPLIT]], i64* [[DST_ARRAYIDX_SPLIT]], align 4
				; CHECK-NEXT: br label [[FOR_INC_SPLIT]]
				; CHECK: for.inc.split:
				; CHECK-NEXT: [[INC_SPLIT]] = add nuw nsw i64 [[IV_SPLIT]], 1
				; CHECK-NEXT: [[COND_SPLIT:%.*]] = icmp sge i64 [[INC_SPLIT]], [[N]]
				; CHECK-NEXT: br i1 [[COND_SPLIT]], label [[EXIT_LOOPEXIT:%.*]], label [[LOOP_SPLIT]]
				; CHECK: exit.loopexit:
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				loop.ph:
				br label %loop

				loop:
				%iv = phi i64 [ %inc, %for.inc ], [ 0, %loop.ph ]
				%cmp = icmp slt i64 %iv, %a
				br i1 %cmp, label %if.then, label %if.else

				if.then:
				%src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
				%val = load i64, i64* %src.arrayidx
				%dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
				store i64 %val, i64* %dst.arrayidx
				br label %for.inc

				if.else:
				br label %for.inc

				for.inc:
				%inc = add nuw nsw i64 %iv, 1
				%cond = icmp sge i64 %inc, %n
				br i1 %cond, label %exit, label %loop

				exit:
				ret void
				}

				define void @split_loop_bound_inc_with_step_is_not_one(i64 %a, i64* noalias %src, i64* noalias %dst, i64 %n) {
				; CHECK-LABEL: @split_loop_bound_inc_with_step_is_not_one(
				; CHECK-NEXT: loop.ph:
				; CHECK-NEXT: br label [[LOOP_PH_SPLIT:%.*]]
				; CHECK: loop.ph.split:
				; CHECK-NEXT: [[SMAX:%.]] = call i64 @llvm.smax.i64(i64 [[N:%.]], i64 1)
				; CHECK-NEXT: [[TMP0:%.*]] = lshr i64 [[SMAX]], 1
				; CHECK-NEXT: [[NEW_BOUND:%.]] = call i64 @llvm.smin.i64(i64 [[A:%.]], i64 [[TMP0]])
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_INC:%.*]] ], [ 0, [[LOOP_PH_SPLIT]] ]
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[IV]], [[A]]
				; CHECK-NEXT: br i1 true, label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: [[SRC_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[SRC:%.*]], i64 [[IV]]
				; CHECK-NEXT: [[VAL:%.]] = load i64, i64 [[SRC_ARRAYIDX]], align 4
				; CHECK-NEXT: [[DST_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[DST:%.*]], i64 [[IV]]
				; CHECK-NEXT: store i64 [[VAL]], i64* [[DST_ARRAYIDX]], align 4
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: if.else:
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 2
				; CHECK-NEXT: [[COND:%.*]] = icmp sgt i64 [[INC]], [[NEW_BOUND]]
				; CHECK-NEXT: br i1 [[COND]], label [[LOOP_PH_SPLIT_SPLIT:%.*]], label [[LOOP]]
				; CHECK: loop.ph.split.split:
				; CHECK-NEXT: [[INC_LCSSA:%.*]] = phi i64 [ [[INC]], [[FOR_INC]] ]
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ne i64 [[INC_LCSSA]], [[N]]
				; CHECK-NEXT: br i1 [[TMP1]], label [[LOOP_SPLIT_PREHEADER:%.]], label [[EXIT:%.]]
				; CHECK: loop.split.preheader:
				; CHECK-NEXT: br label [[LOOP_SPLIT:%.*]]
				; CHECK: loop.split:
				; CHECK-NEXT: [[IV_SPLIT:%.]] = phi i64 [ [[INC_SPLIT:%.]], [[FOR_INC_SPLIT:%.*]] ], [ [[NEW_BOUND]], [[LOOP_SPLIT_PREHEADER]] ]
				; CHECK-NEXT: [[CMP_SPLIT:%.*]] = icmp slt i64 [[IV_SPLIT]], [[A]]
				; CHECK-NEXT: br i1 false, label [[IF_THEN_SPLIT:%.]], label [[IF_ELSE_SPLIT:%.]]
				; CHECK: if.else.split:
				; CHECK-NEXT: br label [[FOR_INC_SPLIT]]
				; CHECK: if.then.split:
				; CHECK-NEXT: [[SRC_ARRAYIDX_SPLIT:%.]] = getelementptr inbounds i64, i64 [[SRC]], i64 [[IV_SPLIT]]
				; CHECK-NEXT: [[VAL_SPLIT:%.]] = load i64, i64 [[SRC_ARRAYIDX_SPLIT]], align 4
				; CHECK-NEXT: [[DST_ARRAYIDX_SPLIT:%.]] = getelementptr inbounds i64, i64 [[DST]], i64 [[IV_SPLIT]]
				; CHECK-NEXT: store i64 [[VAL_SPLIT]], i64* [[DST_ARRAYIDX_SPLIT]], align 4
				; CHECK-NEXT: br label [[FOR_INC_SPLIT]]
				; CHECK: for.inc.split:
				; CHECK-NEXT: [[INC_SPLIT]] = add nuw nsw i64 [[IV_SPLIT]], 2
				; CHECK-NEXT: [[COND_SPLIT:%.*]] = icmp sgt i64 [[INC_SPLIT]], [[N]]
				; CHECK-NEXT: br i1 [[COND_SPLIT]], label [[EXIT_LOOPEXIT:%.*]], label [[LOOP_SPLIT]]
				; CHECK: exit.loopexit:
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				loop.ph:
				br label %loop

				loop:
				%iv = phi i64 [ %inc, %for.inc ], [ 0, %loop.ph ]
				%cmp = icmp slt i64 %iv, %a
				br i1 %cmp, label %if.then, label %if.else

				if.then:
				%src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
				%val = load i64, i64* %src.arrayidx
				%dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
				store i64 %val, i64* %dst.arrayidx
				br label %for.inc

				if.else:
				br label %for.inc

				for.inc:
				%inc = add nuw nsw i64 %iv, 2
				%cond = icmp sgt i64 %inc, %n
				br i1 %cond, label %exit, label %loop

				exit:
				ret void
				}

				define void @split_loop_bound_inc_with_ne(i64 %a, i64* noalias %src, i64* noalias %dst, i64 %n) {
				; CHECK-LABEL: @split_loop_bound_inc_with_ne(
				; CHECK-NEXT: loop.ph:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[FOR_INC:%.]] ], [ 0, [[LOOP_PH:%.]] ]
				; CHECK-NEXT: [[CMP:%.]] = icmp slt i64 [[IV]], [[A:%.]]
				; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[FOR_INC]]
				; CHECK: if.then:
				; CHECK-NEXT: [[SRC_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[SRC:%.*]], i64 [[IV]]
				; CHECK-NEXT: [[VAL:%.]] = load i64, i64 [[SRC_ARRAYIDX]], align 4
				; CHECK-NEXT: [[DST_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[DST:%.*]], i64 [[IV]]
				; CHECK-NEXT: store i64 [[VAL]], i64* [[DST_ARRAYIDX]], align 4
				; CHECK-NEXT: br label [[FOR_INC]]
				; CHECK: for.inc:
				; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[COND:%.]] = icmp ne i64 [[INC]], [[N:%.]]
				; CHECK-NEXT: br i1 [[COND]], label [[EXIT:%.*]], label [[LOOP]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				loop.ph:
				br label %loop

				loop:
				%iv = phi i64 [ %inc, %for.inc ], [ 0, %loop.ph ]
				%cmp = icmp slt i64 %iv, %a
				br i1 %cmp, label %if.then, label %for.inc

				if.then:
				%src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
				%val = load i64, i64* %src.arrayidx
				%dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
				store i64 %val, i64* %dst.arrayidx
				br label %for.inc

				for.inc:
				%inc = add nuw nsw i64 %iv, 1
				%cond = icmp ne i64 %inc, %n
				br i1 %cond, label %exit, label %loop

				exit:
				ret void
				}

				define void @split_loop_bound_dec_with_slt(i64 %a, i64* noalias %src, i64* noalias %dst, i64 %n) {
				; CHECK-LABEL: @split_loop_bound_dec_with_slt(
				; CHECK-NEXT: loop.ph:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[DEC:%.]], [[FOR_DEC:%.]] ], [ 0, [[LOOP_PH:%.]] ]
				; CHECK-NEXT: [[CMP:%.]] = icmp slt i64 [[IV]], [[A:%.]]
				; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[FOR_DEC]]
				; CHECK: if.then:
				; CHECK-NEXT: [[SRC_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[SRC:%.*]], i64 [[IV]]
				; CHECK-NEXT: [[VAL:%.]] = load i64, i64 [[SRC_ARRAYIDX]], align 4
				; CHECK-NEXT: [[DST_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[DST:%.*]], i64 [[IV]]
				; CHECK-NEXT: store i64 [[VAL]], i64* [[DST_ARRAYIDX]], align 4
				; CHECK-NEXT: br label [[FOR_DEC]]
				; CHECK: for.dec:
				; CHECK-NEXT: [[DEC]] = sub nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[COND:%.]] = icmp slt i64 [[DEC]], [[N:%.]]
				; CHECK-NEXT: br i1 [[COND]], label [[EXIT:%.*]], label [[LOOP]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				loop.ph:
				br label %loop

				loop:
				%iv = phi i64 [ %dec, %for.dec ], [ 0, %loop.ph ]
				%cmp = icmp slt i64 %iv, %a
				br i1 %cmp, label %if.then, label %for.dec

				if.then:
				%src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
				%val = load i64, i64* %src.arrayidx
				%dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
				store i64 %val, i64* %dst.arrayidx
				br label %for.dec

				for.dec:
				%dec = sub nuw nsw i64 %iv, 1
				%cond = icmp slt i64 %dec, %n
				br i1 %cond, label %exit, label %loop

				exit:
				ret void
				}

				define void @split_loop_bound_dec_with_sle(i64 %a, i64* noalias %src, i64* noalias %dst, i64 %n) {
				; CHECK-LABEL: @split_loop_bound_dec_with_sle(
				; CHECK-NEXT: loop.ph:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[DEC:%.]], [[FOR_DEC:%.]] ], [ 0, [[LOOP_PH:%.]] ]
				; CHECK-NEXT: [[CMP:%.]] = icmp slt i64 [[IV]], [[A:%.]]
				; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[FOR_DEC]]
				; CHECK: if.then:
				; CHECK-NEXT: [[SRC_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[SRC:%.*]], i64 [[IV]]
				; CHECK-NEXT: [[VAL:%.]] = load i64, i64 [[SRC_ARRAYIDX]], align 4
				; CHECK-NEXT: [[DST_ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[DST:%.*]], i64 [[IV]]
				; CHECK-NEXT: store i64 [[VAL]], i64* [[DST_ARRAYIDX]], align 4
				; CHECK-NEXT: br label [[FOR_DEC]]
				; CHECK: for.dec:
				; CHECK-NEXT: [[DEC]] = sub nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[COND:%.]] = icmp sle i64 [[DEC]], [[N:%.]]
				; CHECK-NEXT: br i1 [[COND]], label [[EXIT:%.*]], label [[LOOP]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				loop.ph:
				br label %loop

				loop:
				%iv = phi i64 [ %dec, %for.dec ], [ 0, %loop.ph ]
				%cmp = icmp slt i64 %iv, %a
				br i1 %cmp, label %if.then, label %for.dec

				if.then:
				%src.arrayidx = getelementptr inbounds i64, i64* %src, i64 %iv
				%val = load i64, i64* %src.arrayidx
				%dst.arrayidx = getelementptr inbounds i64, i64* %dst, i64 %iv
				store i64 %val, i64* %dst.arrayidx
				br label %for.dec

				for.dec:
				%dec = sub nuw nsw i64 %iv, 1
				%cond = icmp sle i64 %dec, %n
				br i1 %cond, label %exit, label %loop

				exit:
				ret void
				}

llvm/utils/gn/secondary/llvm/lib/Transforms/Scalar/BUILD.gn

Show All 30 Lines	sources = [
"IVUsersPrinter.cpp",		"IVUsersPrinter.cpp",
"IndVarSimplify.cpp",		"IndVarSimplify.cpp",
"InductiveRangeCheckElimination.cpp",		"InductiveRangeCheckElimination.cpp",
"InferAddressSpaces.cpp",		"InferAddressSpaces.cpp",
"InstSimplifyPass.cpp",		"InstSimplifyPass.cpp",
"JumpThreading.cpp",		"JumpThreading.cpp",
"LICM.cpp",		"LICM.cpp",
"LoopAccessAnalysisPrinter.cpp",		"LoopAccessAnalysisPrinter.cpp",
		"LoopBoundSplit.cpp",
"LoopDataPrefetch.cpp",		"LoopDataPrefetch.cpp",
"LoopDeletion.cpp",		"LoopDeletion.cpp",
"LoopDistribute.cpp",		"LoopDistribute.cpp",
"LoopFlatten.cpp",		"LoopFlatten.cpp",
"LoopFuse.cpp",		"LoopFuse.cpp",
"LoopIdiomRecognize.cpp",		"LoopIdiomRecognize.cpp",
"LoopInstSimplify.cpp",		"LoopInstSimplify.cpp",
"LoopInterchange.cpp",		"LoopInterchange.cpp",
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IVClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 350215

llvm/include/llvm/Transforms/Scalar/LoopBoundSplit.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/Scalar/CMakeLists.txt

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp

llvm/test/Transforms/LoopBoundSplit/loop-bound-split.ll

llvm/utils/gn/secondary/llvm/lib/Transforms/Scalar/BUILD.gn

[SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV
ClosedPublic