This is an archive of the discontinued LLVM Phabricator instance.

[IndVars] Fold int->fp->int cast for small PHI node
AbandonedPublic

Authored by Allen on May 17 2022, 6:29 AM.

Details

Summary

rewriteNonIntergerIVs will attemps to transform the floating-point recurrences to integer recurrences.
so int->fp->int will generate when the original floating-point recurrences need cast type integer.
ie. Fold (fptosi (sitofp iN to float)) to i32 --> i32 iN

Fixes https://github.com/llvm/llvm-project/issues/55505.

Diff Detail

Event Timeline

Allen created this revision.May 17 2022, 6:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 6:29 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
Allen requested review of this revision.May 17 2022, 6:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 6:29 AM

Specifically detecting this exact case doesn't really seem very useful... we should be leveraging some sort of range analysis (known bits, CorrelatedValuePropagation, etc.)

Allen updated this revision to Diff 435835.Jun 10 2022, 1:38 AM
Allen added a comment.Jun 10 2022, 1:41 AM

Specifically detecting this exact case doesn't really seem very useful... we should be leveraging some sort of range analysis (known bits, CorrelatedValuePropagation, etc.)

Thanks @efriedma very much, I updated with computeKnownBits according your idea.

efriedma added inline comments.Jun 10 2022, 12:12 PM
llvm/lib/Analysis/ValueTracking.cpp
1532 ↗(On Diff #435835)

I'm very confused by this code.

You're taking the binary operator, checking if the first use in the use list is an icmp, then... assuming the result of the icmp somehow constraints the result of the add?

If you need some sort of reasoning about control flow, maybe look at LazyValueInfo?

llvm/test/Transforms/InstCombine/sitofp.ll
228 ↗(On Diff #435835)

We probably want a few more testcases like this to verify the exact boundary conditions.

Allen added inline comments.Jun 10 2022, 8:35 PM
llvm/lib/Analysis/ValueTracking.cpp
1532 ↗(On Diff #435835)

Thanks @efriedma very much. I'll take a look at LazyValueInfo. Since I'm not familiar with this module, more detailed guidance is welcome.

In current implementation, by using matchSimpleRecurrence, we can know that the initial value of the %indvar.phi is R, and the step is L. Then, according to the icmp operand, we know that the end value is KnownEnd, that is, the current value range is [KnownEnd. Known2]. Therefore, if we can know that the most significant n bits of both KnownEnd and Known2 are 0, the most significant n bits of the currently analyzed value %indvar.phi are also necessarily 0.

for.body:                                         ; preds = %for.body, %entry
  %indvar.phi = phi i32 [ 16777215, %entry ], [ %dec.int, %for.body ]
   ...
  %dec.int = add nsw i32 %indvar.phi, -1
  %cmp = icmp ugt i32 %dec.int, 0
  br i1 %cmp, label %for.body, label %cleanup
spatel added inline comments.Jun 13 2022, 1:55 PM
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
1757–1758 ↗(On Diff #435835)

This part is mentioned in the TODO right under here (update that...) and independent of any control-flow changes, so you could remove the changes to ValueTracking and just focus on this part.

efriedma added inline comments.Jun 13 2022, 3:52 PM
llvm/lib/Analysis/ValueTracking.cpp
1532 ↗(On Diff #435835)

You're missing a bunch of steps here. matchSimpleRecurrence only checks that the PHI has an invariant start, and increases by some "step" each iteration. It doesn't check that the step is loop-invariant, and it doesn't check that the loop's trip count is actually related to the PHI at all.

It might be possible to add the additional checks necessary to make this work here, directly, but it's a lot of complicated code.

Instcombine normally doesn't compute LazyValueInfo or ScalarEvolution; a solution involving those probably means moving the transform somewhere else. Maybe CorrelatedValuePropagation.

Allen added inline comments.Jun 13 2022, 8:28 PM
llvm/lib/Analysis/ValueTracking.cpp
1532 ↗(On Diff #435835)

Thanks @efriedma very much for your detail explanation.
I find https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp#L263 is addressing the PHI node, and if I move the transform here, does the above two check still need or we can use a more simple API to address that ? (make sure step is loop-invariant, loop's trip count is related to the PHI).

efriedma added inline comments.Jun 14 2022, 2:45 PM
llvm/lib/Analysis/ValueTracking.cpp
1532 ↗(On Diff #435835)

You'd want to compute the actual value range in LazyValueInfo, not CorrelatedValuePropagation itself. I think LazyValueInfoImpl::solveBlockValuePHINode should do that already for your testcase? It at least has the right kind of logic.

Allen updated this revision to Diff 437074.Jun 15 2022, 1:24 AM

Add checking the step is loop-invariant, and its loop's trip count is actually related to the PHI

Allen added inline comments.Jun 15 2022, 1:37 AM
llvm/lib/Analysis/ValueTracking.cpp
1532 ↗(On Diff #435835)

Thanks @efriedma for your patience.
when I look into the LazyValueInfoImpl::solveBlockValuePHINode (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/LazyValueInfo.cpp#L721), I find it also analyzes the PN->getIncomingBlock(i) one by one.
ie, the relationship between the current PHI node and the loop trip is not associated.

Therefore, I directly add the both check on the original basis. (Check the step is loop-invariant, and its loop's trip count is related to current PHI)

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
1757–1758 ↗(On Diff #435835)

Apply your comment, thanks @spatel

nikic added a subscriber: nikic.Jun 15 2022, 2:29 AM
nikic added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
1536 ↗(On Diff #437074)

This doesn't seem to be taking the predicate of the icmp into account?

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
1757–1758 ↗(On Diff #435835)

It doesn't look like this comment was applied: You need to create a separate patch that includes only this change, without any changes to ValueTracking.

nikic requested changes to this revision.Jun 15 2022, 4:01 AM

I'm not strongly opposed to adding some basic IV reasoning to computeKnownBits(), but the implementation needs to be much more careful. I think there are at least three bugs here (missing check for predicate, no check whether this is a loop exit or continue condition, no reasoning about how the start value range relates to the end value). This requires much more extensive testing (and those tests should be simpler, not involving float operations).

If you want to add this, you'd probably want to start over with just handling a canonical IV (step 1 incrementing IV with start smaller and end and ult or ne condition), and expand from there.

Generally, the utility that handles this kind of simplification based on IV ranges is SimplifyIndVars -- that one is based on the full power of SCEV, and will be able to deal with non-trivial recurrences. Extending it to handle this case is probably the most correct approach. Maybe there are complications with looking through float operations there though.

This revision now requires changes to proceed.Jun 15 2022, 4:01 AM
Allen added a comment.EditedJun 15 2022, 4:58 AM

I'm not strongly opposed to adding some basic IV reasoning to computeKnownBits(), but the implementation needs to be much more careful. I think there are at least three bugs here (missing check for predicate, no check whether this is a loop exit or continue condition, no reasoning about how the start value range relates to the end value). This requires much more extensive testing (and those tests should be simpler, not involving float operations).

Probably ill-considered, I think a basic IV start with positive A, end with positive B,and whatever his predicate of the icmp, the IV is always in the range [A, B], so if we can know that the most significant n bits of both A and B are 0, the most significant n bits of the currently IV is also necessarily 0, that's why I associate the start value with the end value.

If you want to add this, you'd probably want to start over with just handling a canonical IV (step 1 incrementing IV with start smaller and end and ult or ne condition), and expand from there.

Generally, the utility that handles this kind of simplification based on IV ranges is SimplifyIndVars -- that one is based on the full power of SCEV, and will be able to deal with non-trivial recurrences. Extending it to handle this case is probably the most correct approach. Maybe there are complications with looking through float operations there though.

Thanks @nikic so much for pointing out the derection to resolve this issue, I have redo in the SCEV.

Allen added inline comments.Jun 15 2022, 6:33 PM
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
1757–1758 ↗(On Diff #435835)

sorry, misssd mention https://reviews.llvm.org/D127854

Allen updated this revision to Diff 438116.Jun 18 2022, 5:39 AM
Allen retitled this revision from [InstCombine] fold float trunc into exact itofp for small constants to [IndVars] Fold int->fp->int cast for small PHI node.
Allen edited the summary of this revision. (Show Details)

redo base in SCEV according comment

Allen updated this revision to Diff 438885.Jun 21 2022, 6:02 PM

rebase to top

Allen abandoned this revision.Jul 8 2022, 6:10 PM

As D129191 implements a more general solution,so ignore this one.