This is an archive of the discontinued LLVM Phabricator instance.

[SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV
ClosedPublic

Authored by jaykang10 on May 11 2021, 5:46 AM.

Details

Summary

This pass transforms loops that contain a conditional branch with induction variable. For example, it transforms left code to right code:

                            newbound = min(n, c)
while (iv < n) {            while(iv < newbound) {
  A                           A
  if (iv < c)                 B
    B                         C
  C                         }
}                           if (iv != n) {
                              while (iv < n) {
                                A
                                C
                              }
                            }

Diff Detail

Event Timeline

jaykang10 created this revision.May 11 2021, 5:46 AM
jaykang10 requested review of this revision.May 11 2021, 5:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2021, 5:46 AM
mkazantsev requested changes to this revision.EditedMay 13 2021, 10:30 PM
This comment has been deleted.
llvm/lib/Transforms/Scalar/SimpleLoopBoundSplit.cpp
46 ↗(On Diff #344375)

Why signed only? TODO unsigned?

62 ↗(On Diff #344375)

I think this is a natural place where you could apply PatternMatch.

72 ↗(On Diff #344375)

Assert that one of them is an addrec and another is available at loop entry?

75 ↗(On Diff #344375)

It looks natural to encapsulate Pred, AddRec, Bound and nowrap flags into a structure (e.g. some BoundInfo) and pass const BoundInfo & here. WDYT?

92 ↗(On Diff #344375)
  1. Why do you prefer signed predicates over unsigned? Other parts of LLVM (e.g. indvars) tend to canonicalize everything as unsigned if they can prove it.
  2. Isn't AddRecSCEV->hasNoSignedWrap() a same thing as HasNoSignedWrap? If so, remove one of them.
102 ↗(On Diff #344375)
  1. This is just wrong, it should be AddRec < Bound + 1.
  2. It is only correct if you can prove that Bound + 1 does not overflow. Simple exapmle is X <= SINT_MAX --> X < SINT_MIN is wrong.
120 ↗(On Diff #344375)

Check isAffine before it to save compile time in case of really complex AddRecs.

127 ↗(On Diff #344375)

I understand that zero step is impossible, but still, being negative and being non-positive is't the same thing is general. :)

131 ↗(On Diff #344375)
  1. This check makes the check above meaningless.
  2. Reading the code, I think the only place where it matters is replacing Eq condition with Lt. Do we beed this for something else?
  3. Please add a TODO to add negative step support. We've stuggled a lot with it in IndVars, and I see this pain coming again.
139 ↗(On Diff #344375)

Check this before you construct SCEVs to save compile time.

160 ↗(On Diff #344375)

It's innermost loop, so I thing "recursive" check is an overkill. Or do we plan to support outer loops for it?

168 ↗(On Diff #344375)

This is a very restrictive thing. Why is it required?

174 ↗(On Diff #344375)

Use pattern match.

185 ↗(On Diff #344375)

Pls add /*IsExitCond*/

220 ↗(On Diff #344375)

Logicaly, this check should go last. You may discard the candidate by conditions below.
Anyways, why is that a problem at all? If we have at least one candidate we can get rid of it and then proceed with the other ones.

229 ↗(On Diff #344375)

Why not continue?

232 ↗(On Diff #344375)

Break?

239 ↗(On Diff #344375)

Do we ever check they are against the same AddRec?
If it's supposed to be same, I think we can just assert on it.
If not, I'm not catching how the rest of logic is going to work.

244 ↗(On Diff #344375)

I think it's a too strong statement. Getting rid of a branch in a critical loop may be useful even without vectorization. If you want to be that restrictive, place add an option to ignore this check.

263 ↗(On Diff #344375)

any_of?
I believe we don't vectorize volatile load/stores (or do we?) Maybe check at least this.
It looks like the transform is also profitable if one of these instructions is side exiting. We don't vectorize them so far, but if you get rid of it, the loop may become vectorizable.

273 ↗(On Diff #344375)

It looks like it can be an utility function usable by other parts of LLVM. Consider factoring it out.

337 ↗(On Diff #344375)

Same question about preference of signed over unsigned. LLVM does otherwise in other parts. Let's not create more self-contradictions.

411 ↗(On Diff #344375)

This can be expensive. Can't we do a more surgical update?

I'm fine if it's a TODO with follow-up fix.

435 ↗(On Diff #344375)

Please verify loop info too.

This revision now requires changes to proceed.May 13 2021, 10:30 PM
mkazantsev added a comment.EditedMay 13 2021, 10:33 PM

General question: does this pass do something that IRCE doesn't? From what I've read, it looks a very limited version of InductiveRangeCheckElimination, with only difference that it works for non-loop-existing conditions.

Maybe what you are looking for is to add a flag into IRCE that it works with such conditions.

General question: does this pass do something that IRCE doesn't? From what I've read, it looks a very limited version of InductiveRangeCheckElimination, with only difference that it works for non-loop-existing conditions.

Maybe what you are looking for is to add a flag into IRCE that it works with such conditions.

@mkazantsev I appreciate your comments. I will update code following them.

Yep, you are right. It is a limited version of IRCE pass. For the first time, I tried to extend IRCE pass with extending SCEV or changing the population order of loop in new pass manager in order to handle my motivational examples. You can see the discussion with below patches.
https://reviews.llvm.org/D101409
https://reviews.llvm.org/D100566
https://reviews.llvm.org/D99774

During discussion, @reames recommended to write a limited pass to handle my motivational example rather than extending IRCE pass because he feels IRCE pass has some problems and it is overkill to handle my motivational example. You can see the discussion with below email thread.
https://lists.llvm.org/pipermail/llvm-dev/2021-April/150281.html

I am aiming to enable transformation like IRCE pass or something like that in the pipeline of new pass manager. If it is possible to enable IRCE pass in the pipeline of new pass manager, I am OK to add the new flag to IRCE pass. If it is not possible, I would try to write transformation pass like this patch or alternative which can be accepted into upstream.

Got it, thanks. The biggest problem IRCE has is how it's written. So indeed, if we can make a simpler version of this, and then expand it to be as powerful as IRCE, that might be a better approach.

Some drive by comments:

  • Bike shedding names: SimpleLoopBoundSplit. How about just LoopBoundSplit? I understand the current implementation has restrictions/limitations, but personally I think it looks a bit silly all these SimpleLoopXYZ passes/names.
  • You probably want to add this to the optimisation pipeline somewhere, off by default for now,
  • I guess we want to skip with this transformation with OptForSize,
  • I haven't looked into much details, but I guess you look only at one if-statement and thus we don't support things like if ( i > c1 && i < c2)?

Some drive by comments:

@SjoerdMeijer Thanks for comments.

  • Bike shedding names: SimpleLoopBoundSplit. How about just LoopBoundSplit? I understand the current implementation has restrictions/limitations, but personally I think it looks a bit silly all these SimpleLoopXYZ passes/names.

Yep, I am ok with the name. I just imitated the LoopUnswitch and SimpleLoopUnswitchPass because this pass is a limited version of IRCE.

  • You probably want to add this to the optimisation pipeline somewhere, off by default for now,

Yep, I have enabled this pass after SimpleLoopUnswitch experimentally and I am checking the impact from benchmarks. It seems the isProfitableToTransform needs to be updated.

  • I guess we want to skip with this transformation with OptForSize,

Yep, it is already being checked with below code.

// Skip function with optsize.
if (L.getHeader()->getParent()->hasOptSize())
  return false;
  • I haven't looked into much details, but I guess you look only at one if-statement and thus we don't support things like if ( i > c1 && i < c2)?

At this stage, I am aiming this pass as simple as possible. As @reames mentioned on the email discussion, it is very hard to generalize the cases. Later, we could create new transformation passes or extend this pass to handle more cases as @mkazantsev mentioned.

jaykang10 added inline comments.May 18 2021, 7:38 AM
llvm/lib/Transforms/Scalar/SimpleLoopBoundSplit.cpp
46 ↗(On Diff #344375)

You are right! I will add unsigned one too.

62 ↗(On Diff #344375)

Sorry... I am not sure how I can use PatternMatch to extract information from ICmp instruction...

72 ↗(On Diff #344375)

The AddRec is checked on HasProcessableCondition. I will add code to check whether the bound is available at loop entry on HasProcessableCondition.

75 ↗(On Diff #344375)

Yep, you are right. I could pass the ConditionInfo directly. I will update it.

92 ↗(On Diff #344375)
  1. Why do you prefer signed predicates over unsigned? Other parts of LLVM (e.g. indvars) tend to canonicalize everything as unsigned if they can prove it.

I just tried to follow the AddRec's flag. I will update the code with unsigned one.

  1. Isn't AddRecSCEV->hasNoSignedWrap() a same thing as HasNoSignedWrap? If so, remove one of them.

Yep, I will remove the AddRecSCEV->hasNoSignedWrap().

102 ↗(On Diff #344375)
  1. This is just wrong, it should be AddRec < Bound + 1.

Oops, sorry. I will update it.

  1. It is only correct if you can prove that Bound + 1 does not overflow. Simple exapmle is X <= SINT_MAX --> X < SINT_MIN is wrong.

You are right!!! I will add code to check the overflow.

120 ↗(On Diff #344375)

Yep, I will check it.

127 ↗(On Diff #344375)

Yep, I will check zero step too.

131 ↗(On Diff #344375)
  1. This check makes the check above meaningless.

I agree.

  1. Reading the code, I think the only place where it matters is replacing Eq condition with Lt. Do we beed this for something else?

No, it is for the EQ condition. I will update code to check the one step with EQ condition.

  1. Please add a TODO to add negative step support. We've stuggled a lot with it in IndVars, and I see this pain coming again.

Yep, I will add the TODO for negative step.

139 ↗(On Diff #344375)

um... I think we need to check it after CalculateUpperBoundWithLT in order to check only LT condition... We could add more conversions of conditions like 'EQ` --> LT', LE' --> LT later. If I missed something, please let me know.

160 ↗(On Diff #344375)

You are right!!! I will update it with isLCSSAForm.

168 ↗(On Diff #344375)

I wanted to make this pass as simple as possible at this stage to figure out basic problems. If we support multiple exiting blocks, we could have to consider below things.

  1. Multiple exit conditions
  2. We could need more min/max operations to create new bounds.
  3. Update first loop's exit blocks to preheader of second loop.

There could be something more to be considered.
If possible, I would like to consider multiple exits after finishing to support single exit.

174 ↗(On Diff #344375)

Yep, I will update it with pattern match.

185 ↗(On Diff #344375)

Yep, I will update it.

220 ↗(On Diff #344375)

It is not problem. As I mentioned before, I wanted to make this pass as simple as possible at this stage to figure out basic problems. I will remove this check.

229 ↗(On Diff #344375)

Oops, You are right!!! I will update it.

232 ↗(On Diff #344375)

It was to avoid multiple split candidates. I will add break.

239 ↗(On Diff #344375)

You are right. It is supposed to be same. However, we can not compare the AddRec directly because the AddRec of exit cond is different with split candidate's one because inc++ is followed by exit cond. The start value of AddRec is different as below.

ExitCond AddRec
{1,+,1}<nuw><nsw><%loop>

SplitCandidateCond AddRec
{0,+,1}<nuw><nsw><%loop>

I think we have checked the step of AddRec and need to check its start value and signedness more to say they are same AddRec. I forgot to check the start value. I will add the check. If I missed something or you feel something wrong, please let me know.

244 ↗(On Diff #344375)

Yep, you are right. I will remove this check. I am checking performance impact with this pass. Maybe, I could add something more here.

263 ↗(On Diff #344375)

any_of?
I believe we don't vectorize volatile load/stores (or do we?) Maybe check at least this.

Yep, I will update it.

It looks like the transform is also profitable if one of these instructions is side exiting. We don't vectorize them so far, but if you get rid of it, the loop may become vectorizable.

um... if the condition of the side exit is related to induction variable, we could handle it.

273 ↗(On Diff #344375)

Once the basic version of this pass is stabilized, I would consider it.

337 ↗(On Diff #344375)

Yep, I will update it.

411 ↗(On Diff #344375)

Yep, I will add TODO and try to fix it later.

435 ↗(On Diff #344375)

Yep, I will update it.

jaykang10 updated this revision to Diff 346174.May 18 2021, 7:44 AM

Following the comments of @mkazantsev, updated code.

I want to make a very high level suggestion on this. This isn't really about the code per se, and more about the approach to writing the code.

I'd start with a really trivially transform for the loop:
for (i = 0; i < N; i++) { body }

Build a mechanism to produce a form which looks like so:
for (i = 0; i < N; i++) { body }
if (i != N) {

for (; i < N; i++) { body }

}

This should (rightfully) look fairly odd as the second loop is dead. However, once we have that, iteration splitting becomes much more straight forward. A few observations:

  • The "if (i != N)" is a loop guard and can be identified by getLoopGuardBranch.
  • N is the exit value of the i addrec, and can be gotten from SCEV for any arbitrary AddRec for a known exit count.

Once we have this form, we can restrict the iteration space of the pre-loop without modifying the post loop at all. (Provided we haven't run any optimization in between. A slightly safer form would be to have the guard condition be an unknown value to prevent accidental optimization.)

The next core primitive is a routine which uses an Exact Exit Count (as defined in SCEV today) to *reduce* the number of iterations in a loop. A key thing to note is that mutating existing IR is an optimization, but the routine is always allowed to introduce a new IV and clamp if needed. That helps a lot in making the code robust. Being able to use SCEVAddRecExpr::evaluateAtIteration also helps to simplify things a lot.

The final primitive is to generalize the existing exit count logic to work for when an arbitrary monotonic condition toggles. (There's a bunch of ways of computing an "exit" count for the branch of interest. This is merely one.)

Once we have both of those, we'd

  1. Determine if splitting is worthwhile. Pick a set of branches to eliminate. (Must be able to compute "exit" counts.)
  2. Produce dead loop form
  3. For each branch we want to remove from the pre-loop, compute an "exit" count.
  4. Then constrain the preloop by the umin of all the desired exit counts.
  5. Simplify branches out of pre-loop. Leave all generality of control flow in preloop.

Note carefully what this approach *doesn't* do. It doesn't require the pass (as opposed to scev) to reason about conditions, signed vs unsigned, overflow, or whether two IVs are congruent. It heavily reuses the existing logic in SCEV which (mostly) gets all those cases right already.

This works for any condition which is "monotonic" (e.g. transitions from true to false (or vice versa) at most once in the original iteration space). It does not work for branch conditions which can transition or more times without a bit more generalization.

Where this approach fails a bit is in handling multiple branches to be pruned. As described above, it runs the preloop until *any* of the conditions are hit, and then the post-loop for the remainder. That may or may not have been what was desired.

I think this approach can be used to formulate IRCE, but it requires a bit of care for the case where you don't know an index is in bounds or not on the first iteration.

For a profitability check, I'd start specifically with the case where the condition precisely splits a loop into two halves. e.g. for ... { if (C) { body1 } else { body2 }. This is the easiest to believe is generally profitable, and we can generalize the heuristic selection later.

Thanks for the update, I will try to find some time to wrap my head about it within next few days.

llvm/lib/Transforms/Scalar/SimpleLoopBoundSplit.cpp
62 ↗(On Diff #344375)

It will be something like

ICmpInst::Predicate Pred;
Value *LHS, RHS;
match(condition, m_ICmp(Pred, m_Value(LHS), m_Value(RHS)))

You can grep examples of this in many places, just grep by "m_ICmp". It just simplifies code a lot.

139 ↗(On Diff #344375)

Ah ok, you're right.

239 ↗(On Diff #344375)

Maybe we should assert the preconditions here to make sure we are doing the right thing. :)

263 ↗(On Diff #344375)

It's never possible to say. For example, it may be some call to a method that has its own iteration counter (not related to the IV) and still have very subtle dependence on it that is impossible to fugire out.

273 ↗(On Diff #344375)

Fair enough!

jaykang10 updated this revision to Diff 346698.May 20 2021, 4:27 AM

Following comments of @mkazantsev, updated code.
Fixed some bugs from previous diff.

@reames I appreciate your suggestion.

It looks your suggestion with existing SCEV logic is more general. Once I update this patch with your suggestion, I will let you and @mkazantsev know.

P.S I am feeling unwell after getting covid vaccination jab so it could take some time to implement your suggestion. Please understand it.

llvm/lib/Transforms/Scalar/SimpleLoopBoundSplit.cpp
62 ↗(On Diff #344375)

Ah, Thanks for letting me know. I will update it.

239 ↗(On Diff #344375)

Yep, I will update it.

263 ↗(On Diff #344375)

Ah, Thanks for sharing the info.

@jaykang10 hope you get well, I have same aftermath after 2nd dose. :) Could you please mark patch as "planned changes" so that it doesn't hightlight on my review list?

jaykang10 planned changes to this revision.May 24 2021, 12:46 AM

@jaykang10 hope you get well, I have same aftermath after 2nd dose. :) Could you please mark patch as "planned changes" so that it doesn't hightlight on my review list?

Thanks @mkazantsev I am ok now. I hope you also get well soon. Let me mark this patch as "planned changes".

jaykang10 updated this revision to Diff 347948.May 26 2021, 6:47 AM

Following comments of @reames, updated code.

@reames @mkazantsev I have tried to update this patch following the new suggestion.

I have kept below things from previous patch.

  1. ICmp of conditional branch for AddRec and its step.
  2. Chosen one split candidate to remove conditional branch
  3. If we do not choose one split candidate, we need a logic which conditional branch has the min bound.
  4. The branch condition, which is removed, is changed to true or false to avoid same transformation with same conditional branch during next run of this pass.

This update could not be enough for the new suggestion. If you feel something wrong from this update, please let me know.

jaykang10 updated this revision to Diff 348499.May 28 2021, 5:13 AM

Invalidate cached SE information.

jaykang10 updated this revision to Diff 348500.May 28 2021, 5:33 AM

Fixed typo

mkazantsev added inline comments.May 31 2021, 2:06 AM
llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
48

Instead of this, consider flipping the predicate.

77

General notion: all this code is very hard of read because of all lambdae inlined. Can they be separate functions?

98

You can sink this computation under the condition bound < max.

158

How about checking isSCEVableType for comparison arguments right here?

239
  1. It is already checked by IsProcessableCondBI , you can use cast instead of dyn_cast.
  2. isa<IntegerType> --> SE.isSCEVableType and move it inside of IsProcessableCondBI.
253

That would be more natural to just return BI or nullptr from it (might also require function renaming).

279

What if both of them are nullptr?

mkazantsev added inline comments.May 31 2021, 2:09 AM
llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
279

Ah I see, they will fail the next check.

jaykang10 updated this revision to Diff 348925.Jun 1 2021, 3:47 AM

Following comments of @mkazantsev, updated code.

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
48

Yep, let me change it.

77

Sorry for inconvenient. I will move them to functions.

98

Yep, I will move it.

158

Yep, I will add checks with isSCEVable(Type) for the comparison arguments.

239

Yep, I will update it.

253

Yep, I will update it.

jaykang10 edited the summary of this revision. (Show Details)Jun 1 2021, 7:51 AM

Looking much better now. Few more nits & a question regarding ne, which is a potential correctness concern.

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
29

namespace llvm {

89

nit: I'd suggest resturure it as

if (pred != ULE && pred != SLE)
  return false;

this will reduce the nest for the biggest code piece. Just an idea.

109

Please add a TODO to handle ICMP_NE/EQ (I guess it was in the earlier versions of this patch and still good to support, but not necessarily in this revision).

132
ConstantInt *StepCI = dyn_cast<SCEVConstant>(StepRecSCEV)->getValue();
if (!StepCI || !StepCI->isPositive())
  return false;
...
153

Could declare as Value *LHS, *RHS; Just an idea.

158

It's enough to check LHS type and assert on RHS type.

167

canSplit...

168

Consider making params const where suitable.

218

Why is that needed?

222

This is still a very strict limitation I think. I can always split critical edges if you need it. I'm fine if you just add a TODO to consider it in the future.

245

Consider marking params as const where suitable.

mkazantsev added inline comments.Jun 2 2021, 3:33 AM
llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
356

Two points here:

  1. Functional concern. Will NE work ok for step other than 1?
  2. lt generally gives more info to the opt than ne (at least because lt implies ne). Any reason for ne here?
jaykang10 added inline comments.Jun 2 2021, 7:06 AM
llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
29

Yep, I will update it.

89

Yep, I will update it.

109

Yep, I will add ToDo for it.

132

It looked there is no isPositive in ConstantInt. I will update it with isNegative() and isZero(),

153

Yep, I will update it.

158

Yep, I will update it.

167

sorry for inconvenient. I will update it.

168

Yep, let me try to add it.

218

Ah, I have tried to follow the below comment from @reames.

For a profitability check, I'd start specifically with the case where the condition precisely splits a loop into two halves. e.g. for ... { if (C) { body1 } else { body2 }. This is the easiest to believe is generally profitable, and we can generalize the heuristic selection later.

For the condition precisely splits a loop into two halves, I have checked that the conditional branch is in header and the join point is latch. Let me remove them.

222

Yep, I will remove the checks with single predecessor.

245

Yep, I will update it.

356

I have tried to follow below comment from @reames.

The "if (i != N)" is a loop guard and can be identified by getLoopGuardBranch.

I was also not sure whether ne is better than lt here...

jaykang10 updated this revision to Diff 349266.Jun 2 2021, 7:38 AM

Following comments from @mkazantsev, updated code.

  • The ne code will be updated after more discussion.
jaykang10 added inline comments.Jun 2 2021, 8:14 AM
llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
132

Sorry, I have found a test which has StepRecSCEV is not SCEVConstant. In this case, the nullptr->getValue() causes segment fault. I will re-update it.

jaykang10 updated this revision to Diff 349291.Jun 2 2021, 8:54 AM

Fixed a bug

mkazantsev added inline comments.Jun 6 2021, 8:47 PM
llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
356

I heard (and this info may be imprecise, out-of-date etc) that some parts of vectorization treat ne as its canonical form and don't recognize lt. As for me, it should never be a problem (lt always implies ne), but in practice it could be, just because how things are written now. I was asking if you are trying to deal with one of such cases, or it doesn't really matter. If it doesn't, lt is definitely better because it gives more info.

I'm getting incline towards accepting this, but the fact that you keep finding bugs worries me. Please give me some time, I will run a corpus of Fuzzer tests with this pass locally to see if something breaks. If yes, I will try to give you a reproducer.

mkazantsev accepted this revision.Jun 6 2021, 11:54 PM

My fuzzer didn't reveal any new failures, so there is a chance it's working as expected. :) LGTM with some nits.

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
33

Please lock it in anonymous namespace (see how it's done e.g. in InstSimplifypass.cpp), https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

424

Please delete this.

This revision is now accepted and ready to land.Jun 6 2021, 11:54 PM
mkazantsev added inline comments.Jun 6 2021, 11:57 PM
llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
424

UPD: never mind, I didn't notice it was used for begug print.

My fuzzer didn't reveal any new failures, so there is a chance it's working as expected. :) LGTM with some nits.

Thanks for your kind help! @mkazantsev :)

I have tried to enable this pass in the pipeline of new pass manager after loop unroll experimentally.

if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink || !PGOOpt ||
    PGOOpt->Action != PGOOptions::SampleUse)
  LPM2.addPass(LoopFullUnrollPass(Level.getSpeedupLevel(),
                                  /* OnlyWhenForced= */ !PTO.LoopUnrolling,
                                  PTO.ForgetAllSCEVInLoopUnroll));
LPM2.addPass(LoopBoundSplitPass());

I found some failures from it and I fixed them. At this moment, there is no failure from llvm-test-suite and spec bencharmarks with enabling this pass as above.

Maybe, I will try to enable this pass in the pipeline of new pass manager later. If you have a idea about the location of this pass, please let me know. It will be very helpful.

After updating code following your comments, I will push this patch.

llvm/lib/Transforms/Scalar/LoopBoundSplit.cpp
33

Yep, I will update it.

jaykang10 updated this revision to Diff 350208.Jun 7 2021, 2:40 AM

Following comments of @mkazantsev, updated code.

This revision was landed with ongoing or failed builds.Jun 7 2021, 2:56 AM
This revision was automatically updated to reflect the committed changes.
fhahn added a subscriber: fhahn.Jun 7 2021, 3:25 AM

I'm seeing crashes when trying to build 471.omnetpp with -O3 -flto on X86 when running the pass just before the vectorizer (as below). Please take a look.

diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index b07f966e3b7e..a3e3ed093ecb 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -1195,6 +1195,7 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
 /// TODO: Should LTO cause any differences to this set of passes?
 void PassBuilder::addVectorPasses(OptimizationLevel Level,
                                   FunctionPassManager &FPM, bool IsLTO) {
+  FPM.addPass(createFunctionToLoopPassAdaptor(LoopBoundSplitPass()));
   FPM.addPass(LoopVectorizePass(
       LoopVectorizeOptions(!PTO.LoopInterleaving, !PTO.LoopVectorization)));

I'm seeing crashes when trying to build 471.omnetpp with -O3 -flto on X86 when running the pass just before the vectorizer (as below). Please take a look.

diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index b07f966e3b7e..a3e3ed093ecb 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -1195,6 +1195,7 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
 /// TODO: Should LTO cause any differences to this set of passes?
 void PassBuilder::addVectorPasses(OptimizationLevel Level,
                                   FunctionPassManager &FPM, bool IsLTO) {
+  FPM.addPass(createFunctionToLoopPassAdaptor(LoopBoundSplitPass()));
   FPM.addPass(LoopVectorizePass(
       LoopVectorizeOptions(!PTO.LoopInterleaving, !PTO.LoopVectorization)));

Ah, Thanks @fhahn! I will have a look.

Allen added a subscriber: Allen.Feb 6 2023, 6:23 PM
Allen added inline comments.
llvm/include/llvm/Transforms/Scalar/LoopBoundSplit.h
28

hi @jaykang10:

Excuse me, as *while (iv < n)* will guard the split loop body doesn't execute, so the insert condition *if (iv != n)* is good for performance?
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2023, 6:23 PM
Herald added a subscriber: StephenFan. · View Herald Transcript

I add a MR, try to remove the loop guard in loop guard D143705.