This is an archive of the discontinued LLVM Phabricator instance.

[IRCE] Do not transform is loop has small number of iterations
ClosedPublic

Authored by skatkov on Oct 16 2020, 4:23 AM.

Details

Summary

IRCE has some overhead for runtime checks and in case number of iteration is small
the overhead can kill the benefit from optimizations.

This CL bases on BlockFrequencyInfo of pre-header and header to estimate the
number of loop iterations. If it is less than irce-min-estimated-iters we do not transform the loop.

Probably it is better to make more complex cost model but for simplicity it seems the be enough.

The usage of BFI is added only for new pass manager and tries to use it efficiently.

Diff Detail

Event Timeline

skatkov created this revision.Oct 16 2020, 4:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 16 2020, 4:23 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
skatkov requested review of this revision.Oct 16 2020, 4:23 AM
mkazantsev added inline comments.Oct 18 2020, 9:15 PM
llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
120

I think the name should hint that this is profile-based, e.g. "MinRuntimeLoopIterations" or so. We also might want to have another boolean flag to turn this check off, because some users of IRCE might not have a profile.

244

Why not Optional<BlockFrequencyInfo>?

1799

If you stop updating this thing after pre-opt preparations, does it make sense to limit its scope? There is a temptation to use CFGChanged variable in the end to not drop some analysis if it did not happen.

1911

profitability

1912

"the estimated number of iterations basing on frequency info"?

llvm/test/Transforms/IRCE/low-iterations.ll
2

Pass the frequency parameter value directly rather than rely on the default value.

skatkov added inline comments.Oct 18 2020, 9:36 PM
llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
120

The value 0 switches off this check.

244

I'd like to avoid generation of BFI result if I will not use it, So it is a lazy accessor to BFI.

mkazantsev added inline comments.Oct 18 2020, 10:06 PM
llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
120

Okay.

ebrevnov added a comment.EditedOct 18 2020, 10:31 PM

I think it makes sense to introduce the proposed heuristic since there is no easy way to estimate cost of the code generated by SCEVExpander at the moment.

FYI, there is on going work (https://reviews.llvm.org/D75980) which enables vectorizer to evaluate cost of the code SCEVExpander "would" generate. Looks like there is an additional use case for that. Also this may be useful for LoopPredication since it does similar thing to IRCE. Taking that we might need to make this functionality publicly available. @fhahn what do you think?

Thanks
Evgeniy

llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
1905

There is an existing utility function inside the vectorizer (getSmallBestKnownTC) which estimates loop's trip count. Can this be used (if made public) instead of BFI?

skatkov added inline comments.Oct 19 2020, 2:43 AM
llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
1905

The utility you mentioned uses SE and BPI on latch only to estimate the trip count.

IRCE has already estimation basing on BPI of latch. It is not enough if the main exit from the loop is not a latch like in the test I've added to this CL.

May be for vectorizer it is ok to be based on latch but not for IRCE.

skatkov updated this revision to Diff 298969.Oct 19 2020, 2:44 AM

Handled Max's comments.

mkazantsev accepted this revision.Oct 19 2020, 4:23 AM

LGTM with a nit.

llvm/test/Transforms/IRCE/low-iterations.ll
6

CHECK-NOT-NO?

This revision is now accepted and ready to land.Oct 19 2020, 4:23 AM
ebrevnov added inline comments.Oct 19 2020, 6:56 AM
llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
1905

The utility you mentioned uses SE

That's right, currently it does use of SE. Since it aims at returning best known TC I don't see any problems to make SE optional.

and BPI on latch only to estimate the trip count.

In fact, it doesn't use BPI. It checks for profile info directly. I take this another reason to have one source of estimation for loop's TC.
If BPI available it would be preferable to relay on it instead of profile info. If BFI available, there is no need to use BPI since it is completely based on BPI. If we know exact trip count from SE know need to use BFI.

IRCE has already estimation basing on BPI of latch. It is not enough if the main exit from the loop is not a latch like in the test I've added to this CL.

If the above suggestion looks too complex I would still want us to merge the logic at least inside IRCE. If BFI is available I don't see why we should even try to do an estimation based on BPI for the latch inside parseLoopStructure.

May be for vectorizer it is ok to be based on latch but not for IRCE.

skatkov added inline comments.Oct 19 2020, 8:02 PM
llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
1905

I'll do a follow-up patch for this.