I like to propose a new loop multi versioning optimization for LICM.
Cases where memory aliasing wasn't sure compiler became conservative and
didn't proceeds with invariant code motion. This results in possible
missed optimization opportunities.
Our main motivation is to exploit such cases and allow LICM optimization.
Most of the time when alias analysis is unsure about memory access and it
assumes may-alias. This un surety from alias analysis restrict LICM to proceed
further. Cases where alias analysis is unsure we like to use loop versioning
as an alternative.
Loop Versioning will creates version of the loop with aggressive alias and
the other with conservative (default) alias. Aggressive alias version of loop
will have all the memory access marked as no-alias. These two version of loop
will be preceded by a memory runtime check. This runtime check consists of bound
checks for all unique memory accessed in loop, and it ensures aliasing of memory.
Based on this check result at runtime any of the loops gets executed, if memory
is non aliased then aggressive aliasing loop gets executed, else when memory is
aliased then non aggressive aliased version gets executed.
Following are the top level steps:
- Perform loop versioning feasibility check.
- If loop is a candidate for versioning then create a memory bound check, by considering all the memory access in loop body.
- Clone original loop and set all memory access as no-alias in new loop.
- Set original loop & versioned loop as a branch target of runtime check result.
- Call LICM::promoteLoopAccessesToScalars on aggressive alias versioned of loop.
Consider following test:
1 int foo(int * var1, int * var2, int * var3, unsigned itr) {
2 unsigned i = 0, j = 0;
3 for(; i < itr; i++) {
4 for(; j < itr; j++) {
5 var1[j] = itr + i;
6 var3[i] = var1[j] + var3[i];
7 }
8 }
9 }
At line #6 store to var3 can be moved out by LICM(promoteLoopAccessesToScalars)
but because of alias analysis un surety about memory access it unable to move it out.
After Loop versioning IR:
<Versioned Loop>
for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, %for.body3.loopVersion
%indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ]
%arrayidx.loopVersion = getelementptr inbounds i32* %var1, i64 %indvars.iv.loopVersion
store i32 %add, i32* %arrayidx.loopVersion, align 4, !tbaa !1, !alias.scope !11, !noalias !11
%indvars.iv.next.loopVersion = add nuw nsw i64 %indvars.iv.loopVersion, 1
%lftr.wideiv.loopVersion = trunc i64 %indvars.iv.loopVersion to i32
%exitcond.loopVersion = icmp eq i32 %lftr.wideiv.loopVersion, %0
br i1 %exitcond.loopVersion, label %for.inc11.loopexit38, label %for.body3.loopVersion
<Original Loop>
for.body3: ; preds = %for.body3.lr.ph, %for.body3
%indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2, %for.body3.lr.ph ]
%arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv
store i32 %add, i32* %arrayidx, align 4, !tbaa !1
%8 = load i32* %arrayidx7, align 4, !tbaa !1
%add8 = add nsw i32 %8, %add
store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv to i32
%exitcond = icmp eq i32 %lftr.wideiv, %0
br i1 %exitcond, label %for.inc11, label %for.body3
In versioned loop difference is visible, 1 store has moved out.
Following are some high level details about current implementation:
LoopVersioning
LoopVersioning is main class which holds multi versioning functionality.
LoopVersioning :: isLegalForVersioning
Its member to ‘LoopVersioning’
Does feasibility check for loop versioning.
a) Checks layout of loop.
b) Instruction level check.
c) memory checks.
LoopVersioning :: versionizeLoop
a) Clone original loop
b) Create a runtime memory check.
c) Add both loops under runtime check results target.
In this patch used maximum loop nest threshold as 2, and maximum number
of pointers in runtime memory check as 5, also these threshold are controlled
by command line.
During feasibility check, we compare possible invariant code motion with total
instruction of loop. If the comparison goes beyond certain threshold limit and
found profitable then only we do loop versioning.
Requesting to go through patch for detailed approach.
Suggestions are comments are welcome.