Page MenuHomePhabricator

[LV] Add undef incoming value to loop-exit phis for the middle-block.
AbandonedPublic

Authored by fhahn on Fri, Apr 16, 10:19 AM.

Details

Summary

LV temporarily creates invalid IR, which can trip over SCEV. In
particular, LV adds a new branch to the exit block of the scalar loop.
This means the PHIs in the loop exit block now are invalid. To avoid
issues with SCEV, add an undef incoming value for the middle-block. This
will later be replaced by the concrete value after vectorization.

Fixes PR49538, PR49900.

Diff Detail

Event Timeline

fhahn created this revision.Fri, Apr 16, 10:19 AM
fhahn requested review of this revision.Fri, Apr 16, 10:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptFri, Apr 16, 10:19 AM

This is incorrect and will result in subtle miscompiles.

Why? Because SCEV is queried over the IR before point of undef insertion and replacement. SCEV will - correctly - assume that it can substitute any valid concrete value for the undef. However, the actual value later inserted is more constrained than undef. As such, the intermediate SCEV results - both as used immediately and as cached - will be incorrect for the final IR.

reames requested changes to this revision.Fri, Apr 16, 10:51 AM
This revision now requires changes to proceed.Fri, Apr 16, 10:51 AM
fhahn updated this revision to Diff 338374.Sun, Apr 18, 6:26 AM

This is incorrect and will result in subtle miscompiles.

Why? Because SCEV is queried over the IR before point of undef insertion and replacement. SCEV will - correctly - assume that it can substitute any valid concrete value for the undef. However, the actual value later inserted is more constrained than undef. As such, the intermediate SCEV results - both as used immediately and as cached - will be incorrect for the final IR.

Thanks for taking a look and apologies for not doing a good job at conveying my thinking. My rational was that at the point we add the undef incoming value, it is yet undetermined which edge will be taken, so SCEV will constrain the undef value, but only to the incoming value of the scalar loop. In the final IR, both incoming values should result in the same value at runtime. Does that make sense or am I still missing something?

After writing this up, I went back and checked what we use as temporary condition and realized that we are using true, which means that the correct incoming value from the scalar loop is dead, until we update the branch condition. This means SCEV would indeed be able to assume an arbitrary value for the PHI, which indeed would be incorrect. But I think we may be able to avoid this problem, by making the condition false initially, to make the undef incoming value dead, until we set the final branch condition. In this patch, I moved the code for setting the condition to happen after we are done with vector-code generation. The main uses of SCEV (runtime check generation) should happen before the completeLoopSkeleton, so it may be OK to keep the code there as well, but it seems safer to wait until the phis are actually fixed.

Note that this patch also splits up the code to create and set the condition, so there are no changes in the generated IR (creating the condition at the point it is set would result in a few instructions getting reordered). I am more than happy to discuss the best place, if the direction looks good in general.

Thanks for taking a look and apologies for not doing a good job at conveying my thinking. My rational was that at the point we add the undef incoming value, it is yet undetermined which edge will be taken, so SCEV will constrain the undef value, but only to the incoming value of the scalar loop. In the final IR, both incoming values should result in the same value at runtime. Does that make sense or am I still missing something?

I see where you're going here (assuming the fix you mention after this in your long comment), and this could maybe be made to work, but I'm still nervous.

I see two classes of potential issues.

  1. SCEV could cache a result which was true before we rewrote the branch conditions, but not true afterwards. (e.g. treated the phi as-if it were single entry) We could invalidate to resolve this, but that means our placement of invalidates after *every* condition modification has to be perfect or we get nasty bugs.
  1. The IR might not actually allow us to assume the scalar path and the vector path always produce the same values. I don't know the existing code structure well enough, do we ever mutate the scalar loop based on the existence of the prechecks or vector bodies? In particular, maybe when tail folding? If we do, there is no safe value to use.

I think what we need here is an actual "unknown until later" value. I'd be tempted to add an intrinsic for that, except I'm not quite sure if we need such a thing in a constant expression anywhere. This was one of the things I'd been hoping to chat through with you offline when you had a moment. :)

I'll also note that at the end of the day, I'm willing to defer to you if you think the approach here is good enough after reading the above. I don't want to block "better" on "perfect" here. If you do want to move forward with this approach, let me know and I'll do a review of the code (as opposed to the idea).

fhahn added a comment.Mon, Apr 19, 3:15 PM

Thanks for taking a look and apologies for not doing a good job at conveying my thinking. My rational was that at the point we add the undef incoming value, it is yet undetermined which edge will be taken, so SCEV will constrain the undef value, but only to the incoming value of the scalar loop. In the final IR, both incoming values should result in the same value at runtime. Does that make sense or am I still missing something?

I see where you're going here (assuming the fix you mention after this in your long comment), and this could maybe be made to work, but I'm still nervous.

I see two classes of potential issues.

  1. SCEV could cache a result which was true before we rewrote the branch conditions, but not true afterwards. (e.g. treated the phi as-if it were single entry) We could invalidate to resolve this, but that means our placement of invalidates after *every* condition modification has to be perfect or we get nasty bugs.
  1. The IR might not actually allow us to assume the scalar path and the vector path always produce the same values. I don't know the existing code structure well enough, do we ever mutate the scalar loop based on the existence of the prechecks or vector bodies? In particular, maybe when tail folding? If we do, there is no safe value to use.

I think what we need here is an actual "unknown until later" value. I'd be tempted to add an intrinsic for that, except I'm not quite sure if we need such a thing in a constant expression anywhere. This was one of the things I'd been hoping to chat through with you offline when you had a moment. :)

I think having an easy way to generate a 'unknown' value would be ideal in this case and the current patch tries to work around the fact that I don't think we currently have any way to express such a value. Agreed that discussing this offline would probably be much easier!

I'll also note that at the end of the day, I'm willing to defer to you if you think the approach here is good enough after reading the above. I don't want to block "better" on "perfect" here. If you do want to move forward with this approach, let me know and I'll do a review of the code (as opposed to the idea).

I don't think there's an urgent need to rush in a fix. While I think we should be fairly safe with respect to the 2 potential issues you mentioned given the way the code is generated at the moment, IMO it would be worth it to explore the alternative first.

fhahn abandoned this revision.Tue, May 4, 5:49 AM

Let's go with the simpler workaround for now D101487