Since there is no sdiv in SCEV, an 'udiv' is a better canonical form than an 'sdiv' as the user of induction variable
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Transforms/Utils/SimplifyIndVar.cpp | ||
---|---|---|
289–291 | maybe I should also copy the 'exact' flag? |
Hi Hongbin,
I am not sure what you are trying to achieve here. This kernel is already optimized by -instcombine to
define void @test(i32* %a) { entry: br label %for.body for.body: ; preds = %for.body, %entry %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ] %div = shl nsw i32 %i.01, 5 %idxprom = sext i32 %div to i64 %arrayidx = getelementptr inbounds i32, i32* %a, i64 %idxprom store i32 %i.01, i32* %arrayidx, align 4 %inc = add nsw i32 %i.01, 1 %cmp = icmp slt i32 %i.01, 63 br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body ret void }
So the sdiv is replaced by a lsh, taking into account the information that sdiv is working on unsigned inputs only.
If you divide by a non-power-of-two, instcombine does not change the code and leaves the sdiv in. However, if you would like to move to an sdiv even in this case this should probably be added to instcombine.
I would also be interested to learn why an 'udiv' is a better canonical form than an 'sdiv'. Is it faster to execute (non-power-of-two) udivs on FPGAs than sdivs?
Hi Tobias
This is based on the discussion in https://groups.google.com/d/topic/llvm-dev/HKK9ffG-dIU/discussion
define void @test(i32* %a) { entry: br label %for.body for.body: ; preds = %for.body, %entry %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.body ] %div = shl nsw i32 %i.01, 5 %idxprom = sext i32 %div to i64 %arrayidx = getelementptr inbounds i32, i32* %a, i64 %idxprom store i32 %i.01, i32* %arrayidx, align 4 %inc = add nsw i32 %i.01, 1 %cmp = icmp slt i32 %i.01, 63 br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body ret void }So the sdiv is replaced by a lsh, taking into account the information that sdiv is working on unsigned inputs only.
If you divide by a non-power-of-two, instcombine does not change the code and leaves the sdiv in. However, if you would like to move to an sdiv even in this case this should probably be added to instcombine.
Yes, see below.
I would also be interested to learn why an 'udiv' is a better canonical form than an 'sdiv'. Is it faster to execute (non-power-of-two) udivs on FPGAs than sdivs?
I am focusing on the induction variable and its users here. I am trying to get an AddRec from arrayidx.
However, there is no sdiv in SCEV, and SCEV will give up when it see a sdiv. From this perspective udiv is better then sdiv. I think I should explicitly explain this in the summary
Let me also look at instcombine, the original motivation to put it here is mainly because SimplifyIndVar use SCEV, which provide better range knowledge about induction variables.
The relevant function is visitSDiv in InstCombineMulDivRem.cpp. Maybe we have indeed insufficient information in instcombine.
test/Transforms/IndVarSimplify/replace-sdiv-by-udiv.ll | ||
---|---|---|
11 | instcombine (which use computeKnownBits) is not able to prove the status of the signbit of %i.01 here |
Comments inline.
You can also do this same rewrite within SCEV. That is, try to have getSCEV(sdiv instruction) return a SCEVUDivExpr if legal. However, given that udiv is more canonical, this patch is good too if it solves your problem.
lib/Transforms/Utils/SimplifyIndVar.cpp | ||
---|---|---|
38 | s/eliminated/converted to unsigned division/ | |
275 | This seems unnecessary - why do you not care about optimizing %val = sdiv <non-iv-known-positive>, <iv> | |
279 | Use auto *. | |
284 | I'd use N and D instead of S and X. | |
289 | Use auto *. | |
297 | Why not push_back? |
This also sounds reasonable and I can also look at that after this.
Just a quick question: are we suppose to use getSCEVAtScope in getSCEV?
However, given that udiv is more canonical, this patch is good too if it solves your problem.
Not in general. The SCEV expressions you return from getSCEV need to be context insensitive.
However, given that udiv is more canonical, this patch is good too if it solves your problem.
s/eliminated/converted to unsigned division/