This is an archive of the discontinued LLVM Phabricator instance.

[IndVarS] Keep the nsw/nuw flags after simplifyAndExtend
Needs ReviewPublic

Authored by guopeilin on Dec 24 2021, 5:31 PM.

Diff Detail

Event Timeline

guopeilin created this revision.Dec 24 2021, 5:31 PM
guopeilin requested review of this revision.Dec 24 2021, 5:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 24 2021, 5:31 PM

With the help of nsw flag in keep-nsw-nuw-flag.ll, this loop which cannot be vectorized due to CouldNotCompute BackedgeTakenCount now can be vectorized.
We can use opt -indvars keep-nsw-nuw-flag.ll -S | opt -loop-vectorize -S to verify this.
Also, if we just use -indvars -loop-vectorize back to back, this loop can be vectorized without this patch. And I found out that the reason is that nsw flag still remains when we try to vectorize this loop. That is some cached value is reused during scalar-evolution analysis for loop-vectorize pass.
During the pipeline, we may call forgetLoop in many places, thus the cached value may be cleared. So I guess it is better to keep the nsw or nuw flags explicitly after widen IV.

nikic added a comment.Dec 25 2021, 8:29 AM

Can you explain why SCEVExpander does not preserve the nowrap flag in this case? Assuming it is present on the wide IV, I would have expect it to also get expanded as such.

Also, it looks like you did not regenerate all affected tests.

guopeilin updated this revision to Diff 396501.Dec 29 2021, 1:05 AM

fix the affected tests

Can you explain why SCEVExpander does not preserve the nowrap flag in this case? Assuming it is present on the wide IV, I would have expect it to also get expanded as such.

Sorry for the late reply. The SCEVExpander(Rewriter) is used to create the widen IV and its widenIVUse(). As for the IV increment, we just use the following code:

WideInc =
      cast<Instruction>(WidePhi->getIncomingValueForBlock(LatchBlock));

Of course in this way, we cannot preserve the nowrap flag cause both the WidePhi and the IncomingValueForBlock do not contain the flag.
I guess the reason why we don`t need to preserve this flag previously is that the AddRec is computed from the OrigPhi, which is like the following:

{((sext i32 %arg1 to i64) + (sext i32 %arg2 to i64)),+,(sext i32 %arg2 to i64)}<nsw><%body>

So, at this moment, it does not matter whether the increased instruction contains the flag because the SCEV is right.
However, during the optimization pipeline, we may call the SE->forgetLoop() to drop the cache value and recompute from scratch. At that moment, since we have lost the NSW flag, then the BackedgeTakenCount would be CouldNotCompute, which will prevent vectorizing.
With this patch, function s122 and function s172 in TSVC now can be vectorized. Following is the source code of s122:

real_t s122(struct args_t * func_args)
{
//    induction variable recognition
//    variable lower and upper bound, and stride
//    reverse data access and jump in data access
    struct{int a;int b;} * x = func_args->arg_info;
    int n1 = x->a;
    int n3 = x->b;
    initialise_arrays(__func__);
    int j, k;
#pragma clang loop vectorize(assume_safety)
    for (int nl = 0; nl < iterations; nl++) {
        j = 1;
        k = 0;
        for (int i = n1-1; i < LEN_1D; i += n3) {
            k += j;
            a[i] += b[LEN_1D - k];
        }
    }
}

Also, I guess it is ok that if an i32-IV has nowrap flag, its corresponding widen-IV has the same nowrap flag.

nikic added a comment.Dec 29 2021, 9:55 AM

@guopeilin This doesn't really answer my question. You say that the SCEV for the wide IV at this point is {((sext i32 %arg1 to i64) + (sext i32 %arg2 to i64)),+,(sext i32 %arg2 to i64)}<nsw><%body>, which has the <nsw> flag. My baseline expectation would be that SCEVExpander will also add the nsw flag to the expanded IR (see the code around https://github.com/llvm/llvm-project/blob/015ff729cb90317e4e75cf48b1e5dd7850f0cbd0/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L1300-L1304). My question is why that doesn't happen, because making sure SCEVExpander propagates the flag from SCEV to IR seems like the more principled way to address this problem.

@guopeilin This doesn't really answer my question. You say that the SCEV for the wide IV at this point is {((sext i32 %arg1 to i64) + (sext i32 %arg2 to i64)),+,(sext i32 %arg2 to i64)}<nsw><%body>, which has the <nsw> flag. My baseline expectation would be that SCEVExpander will also add the nsw flag to the expanded IR (see the code around https://github.com/llvm/llvm-project/blob/015ff729cb90317e4e75cf48b1e5dd7850f0cbd0/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L1300-L1304). My question is why that doesn't happen because making sure SCEVExpander propagates the flag from SCEV to IR seems like the more principled way to address this problem.

Sorry that I misunderstand your question before. And I guess the code you show here may be the exact root reason for the omission of nowarp flags. I have dump the SCEV value of OpAfterExtend and ExtendAfterOp within the function IsIncrementNSW(https://github.com/llvm/llvm-project/blob/015ff729cb90317e4e75cf48b1e5dd7850f0cbd0/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L1142-L1144), which is like the following:

OpAfterExtend = {((sext i32 %arg1 to i128) + (sext i32 %arg2 to i128)),+,(sext i32 %arg2 to i128)}<nw><%body>
ExtendAfterOp = {(sext i64 ((sext i32 %arg1 to i64) + (sext i32 %arg2 to i64)) to i128),+,(sext i32 %arg2 to i128)}<nsw><%body>

They are different from each other, which makes function IsIncrementNSW return false, and thus SCEVExpander stops propagating the flag.
I will keep debugging it and to see the reason for the difference and try to upstream a new patch. Please wait.