This is an archive of the discontinued LLVM Phabricator instance.

[LoopIdiom] Introduce LoopNestIdiomRecognize as an alternative
Needs ReviewPublic

Authored by eopXD on Jun 12 2021, 5:43 AM.

Details

Reviewers
Whitney
qianzhen
Summary

This LoopNestPass-version LIR is currently identical to the LoopPass-version
LIR. The new added pass is added into PassRegistry as "loop-nest-idiom".

The LoopNest structure makes optimizations that insert runtime checks for the
nested loop more easy to apply. D104636 is an example of a use case to this
newly created pass.

Diff Detail

Event Timeline

eopXD created this revision.Jun 12 2021, 5:43 AM
eopXD requested review of this revision.Jun 12 2021, 5:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 12 2021, 5:43 AM
eopXD edited the summary of this revision. (Show Details)Jun 12 2021, 5:48 AM
fhahn added a subscriber: fhahn.Jun 12 2021, 6:05 AM

Following patches will utilize the LoopNest structure for more efficient optimization.

Would it be possible to provide some data on the efficiency improvements?

eopXD added a comment.Jun 12 2021, 6:12 AM

Following patches will utilize the LoopNest structure for more efficient optimization.

Would it be possible to provide some data on the efficiency improvements?

Hi @fhahn ,
For example, given a perfectly nested loop-pair and we have matched a store operation that we can hoist to the inner-loop's header as strided store operations,
we can further check on the outer-loop, see if the outer-loop SCEV maintains the continuous memory access and hoist the operation one more loop outwards.

fhahn added a comment.Jun 12 2021, 7:48 AM

Following patches will utilize the LoopNest structure for more efficient optimization.

Would it be possible to provide some data on the efficiency improvements?

Hi @fhahn ,
For example, given a perfectly nested loop-pair and we have matched a store operation that we can hoist to the inner-loop's header as strided store operations,
we can further check on the outer-loop, see if the outer-loop SCEV maintains the continuous memory access and hoist the operation one more loop outwards.

Ah I see, so you are planning to add additional optimizations based on LoopNest? If so, I think it would also be good to share a patch that makes implements such an additional optimization so there's a clear path towards concrete improvements and it would also show why using LoopNest is needed/beneficial.

eopXD updated this revision to Diff 351666.Jun 12 2021, 9:14 AM

fix some clang-format

eopXD updated this revision to Diff 351704.Jun 13 2021, 2:38 AM

fix some clang-format

eopXD added a comment.Jun 13 2021, 5:55 AM

Following patches will utilize the LoopNest structure for more efficient optimization.

Would it be possible to provide some data on the efficiency improvements?

Hi @fhahn ,
For example, given a perfectly nested loop-pair and we have matched a store operation that we can hoist to the inner-loop's header as strided store operations,
we can further check on the outer-loop, see if the outer-loop SCEV maintains the continuous memory access and hoist the operation one more loop outwards.

Ah I see, so you are planning to add additional optimizations based on LoopNest? If so, I think it would also be good to share a patch that makes implements such an additional optimization so there's a clear path towards concrete improvements and it would also show why using LoopNest is needed/beneficial.

Sure, I would follow up with the optimization that utilizes LoopNest and state out the benefits/improvement of the optimization.
Thank you for the comments.

When the loop idiom transformation processes a memset instruction in a loop, currently it only handles the memset with a compile-time constant size. The motivation of this work is to relax this limitation, so that a memset with a variable size in a loop may still be processed and promoted to a larger memset if it passes all the eligibility checks. Performance-wise, promoting the memset in a loop to a larger memset reduces the number of calls to memset; hence reducing the overall call overhead.
A similar technique may also apply to the memcpy with a variable size in a loop.

Whitney added inline comments.Jun 25 2021, 7:05 AM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
347

Function *F = LN.getParent();
const auto *DL = &F->getModule()->getDataLayout();
OptimizationRemarkEmitter ORE(F);

eopXD updated this revision to Diff 354660.Jun 26 2021, 2:17 AM

Address comments.

eopXD marked an inline comment as done.Jun 26 2021, 2:19 AM
Whitney added a project: Restricted Project.Dec 1 2021, 8:27 AM
eopXD retitled this revision from [NFC] [LoopIdiom] [LoopNest] Create LoopIdiomRecognize as a LoopNestPass to [NFC][LoopIdiom][LoopNest] Create LoopIdiomRecognize as a LoopNestPass.Dec 14 2021, 3:39 AM
eopXD updated this revision to Diff 394255.Dec 14 2021, 8:01 AM

Rebase to main.

eopXD updated this revision to Diff 394762.Dec 15 2021, 11:56 PM

Rebase and update test case.

eopXD added inline comments.Dec 15 2021, 11:58 PM
llvm/test/Transforms/LoopIdiom/memset-runtime-64bit.ll
304 ↗(On Diff #394762)

The reason the IR changed is that the testing pass pipeline is adjusted. This patch is still an NFC.

eopXD edited the summary of this revision. (Show Details)Dec 16 2021, 12:05 AM
eopXD retitled this revision from [NFC][LoopIdiom][LoopNest] Create LoopIdiomRecognize as a LoopNestPass to [LoopIdiom] Introduce LoopNestIdiomRecognize as an alternative.Dec 16 2021, 12:07 AM