Vectorizer has to make the decision for every mem inst either to
interleave, GatherScatter or Scalarize.
getMemInstScalarizationCost calls 'getAddressAccessSCEV' for access SCEV, which
will be sent to the target to estimate the address calculation cost.
getAddressAccessSCEV: Looks for a 'GEP' with all loop invariant indices except
for one which should be an induction variable(PHI node).
There are cases where GEP has non-phinode(like scalarization-fix.ll), in those
cases it is given a higher cost.
This is the motivating test case(scalarization-fix.ll) for this change.
; void fun(int* restrict a, int *b, int N) {
; for (int i = 1; i < N; i += 2)
; a[i] = a[i - 1] + b[i];
; }
The orginal scalarization costs for
a[i-1]: 29
b[i] : 9
a[i] : 9
Though the accesses are simple the cost for a[i-1] is 3x higher.
The address GEP corresponding to
a[i-1]:
%indvars.iv = phi i64 [ 1, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
%1 = add nsw i64 %indvars.iv, -1
%arrayidx = getelementptr inbounds i32, i32* %a, i64 %1
a[i]:
%indvars.iv = phi i64 [ 1, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
%arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
b[i]:
%indvars.iv = phi i64 [ 1, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
To compute the "address calculation cost" SCEVs of the access are obtained and passed on to the target(TTI). Target assigns higher cost if the access is not strided(not AddRec, refer: X86TargetTransformInfo.cpp:getAddressComputationCost) a[i-1] is certainly a strided access, it should get lower cost.
'LoopVectorize.cpp:getMemInstScalarizationCost' call 'LoopVectorize.cpp:getAddressAccessSCEV' for the access SCEV. While 'getAddressAccessSCEV' expects the GEPs to always have loop-invariants and index variables.
Example:
b[i]:
%indvars.iv = phi i64 [ 1, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
Because a[i-1]'s GEP has '%add' which is not an induction variable it returns a 'nullptr' for SCEV, further it is not considered a strided-access and given a higher cost.
The proposed change:
Instead of looking for invariants and induction vars in GEP, we get the SCEV of GEP and look for the variant operands. If all the operands are invariant the SCEV is good enough to pass on to the target.
New scalarization costs:
a[i-1]: 9
b[i] : 9
a[i] : 9
I would like your opinions on the correctness of this patch because "gather-cost.ll" is getting vectorized with this change originally, it is not expected to get vectorized.