This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Remove side effect of replaced constrained intrinsics
ClosedPublic

Authored by sepavloff on Jan 27 2022, 7:48 PM.

Details

Summary

If a constrained intrinsic call was replaced by some value, it was not
removed in some cases. The dangling instruction resulted in useless
instructions executed in runtime. It happened because constrained
intrinsics usually have side effect, it is used to model the interaction
with floating point-environment. In some cases it is correct behavior
but often the side effect is actually absent or can be ignored.

This change adds specific treatment of constrained intrinsics so that
their side effect can be removed if it actually absents.

Diff Detail

Event Timeline

sepavloff created this revision.Jan 27 2022, 7:48 PM
sepavloff requested review of this revision.Jan 27 2022, 7:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2022, 7:48 PM
efriedma added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3580

I'm not sure I understand the relevant invariant here. You're using SimplifyInstruction to assert that the simplified instruction has no side-effects? That isn't documented anywhere, but I guess it makes sense? Please confirm and add API documentation explaining this.

If it is safe, we should simplify all calls, not just for ConstrainedFPIntrinsic calls.

Also, you can just erase a call using eraseInstFromFunction; you don't need to mark it readnone and wait for another iteration of instcombine.

nikic added inline comments.Jan 28 2022, 12:17 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3580

If it is safe, we should simplify all calls, not just for ConstrainedFPIntrinsic calls.

This is definitely not safe in general -- obvious example being math libcalls that fold but are not dead due to errno. This needs to be constrained FP intrinsic specific contract.

sepavloff added inline comments.Jan 28 2022, 2:49 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3580

I admit the code is not obvious enough.

The key point here is that SimplifyInstruction does not return a replacement if the constrained intrinsic has side effect. So if control passed inside the if statement, we know that no actual side effect is associated with the intrinsic.

The next check if (CI.getNumUses() == 0) handles dangling instructions. They could not be removed due to the side effect but actually they have no such because SimplifyInstruction provided a replacement.

Also, you can just erase a call using eraseInstFromFunction;

InstCombiner will do it in the same iteration, if isInstructionTriviallyDead returns true for this instruction.

sepavloff updated this revision to Diff 403962.Jan 28 2022, 4:13 AM

Small changes to the patch

  • Add comments,
  • Add checks that replacement value is different from the original one.
sepavloff added inline comments.Jan 28 2022, 5:00 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
3580

If SimplifyInstruction returned a replacement even if side effect presents and the exception behavior is strict, the original instruction would not be deleted, because it is unknown if the side effect actually exists. SimplifyInstruction is an analysis, it cannot remove bogus side effect. To find out if the side effect actually absents the logic of SimplifyInstruction must be replayed, which is undesirable as it would increase compile time.

If SimplifyInstruction returns replacement only when it can replace the original expression without change of semantic, the useless code can be removed.

Does this patch require additional changes?

nikic added inline comments.Feb 11 2022, 12:14 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1194

You can't use RAUW in InstCombine. This should be return replaceInstUsesWith(II, V).

...actually, as we already called SimplifyCall above, this situation can only happen if you have a constrained intrinsic with use_empty(). I'd suggest to check for use_empty() + simplify succeeds and directly erase the call in that case. I think that will be more obvious that adding a readnone attribute to get it erased in a separate iteration of InstCombine.

3582

Can you please add a comment to InstructionSimplify that documents this additional requirement? That is, that constrained intrinsics are only allowed to be folded if they do not side effect (which is different from the usual InstSimplify rules).

Get rid of call attribute manipulations

sepavloff added inline comments.Feb 15 2022, 10:08 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1194

You are right. As now all things happen in the same pass, tweaking attributes is not needed. Instead constrained intrinsics can be handled specially in the InstCombine code that removes unused instructions.

3582

Added relevant documentation to the declaration of SimplifyCall.

kpn added a comment.Feb 22 2022, 5:18 AM

I'm still not clear on why, when a transform pass knows an instruction can be deleted, we are required to keep that instruction around anyway. It seems like these dangling, useless instructions would interfere with subsequent optimizations until finally they might or might not get removed here. And constant folding isn't sufficient to duplicate the logic of the transform pass that determined the instruction could be deleted.

For example, here's a small test that I'm working on for InstSimplify:

; NOTE: The fsub instruction is expected to remain, but the result isn't used.
define float @nonneg_ebstrict(i32 %a) #0 {

%fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
%sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
%sub = call nnan float @llvm.experimental.constrained.fsub.f32(float %sqra, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
ret float %sub

}

The input to the sqrt instruction will always be non-negative, so we know the output is valid, and we know the output will never be -0.0. This means the fsub can be safely eliminated. But the fsub will hang around forever, useless, because InstSimplify isn't allowed to tell anyone that it can be deleted, and nobody else can determine it again without duplicating InstSimplify.

I'm still not clear on why, when a transform pass knows an instruction can be deleted, we are required to keep that instruction around anyway.

Are we really required? Or the transform pass merely does not know that the instruction can be deleted? InstCombine is just this case, at least in constant folding.

It seems like these dangling, useless instructions would interfere with subsequent optimizations until finally they might or might not get removed here. And constant folding isn't sufficient to duplicate the logic of the transform pass that determined the instruction could be deleted.

InstCombine is called several times, it calls constant folding and instruction simplification functions. It looks like a good place to get rid of such useless instructions.

For example, here's a small test that I'm working on for InstSimplify:

; NOTE: The fsub instruction is expected to remain, but the result isn't used.
define float @nonneg_ebstrict(i32 %a) #0 {

%fpa = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %a, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
%sqra = call float @llvm.experimental.constrained.sqrt.f32(float %fpa, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
%sub = call nnan float @llvm.experimental.constrained.fsub.f32(float %sqra, float -0.0, metadata !"round.tonearest", metadata !"fpexcept.strict") #0
ret float %sub

}

The input to the sqrt instruction will always be non-negative, so we know the output is valid, and we know the output will never be -0.0. This means the fsub can be safely eliminated. But the fsub will hang around forever, useless, because InstSimplify isn't allowed to tell anyone that it can be deleted, and nobody else can determine it again without duplicating InstSimplify.

InstSimplify is not the only place where peephole transformations may occur. InstCombine can also do such transformations and it is less limited. The described transformation could be a part of it.

kpn added a comment.Feb 22 2022, 11:57 AM

It seems like these dangling, useless instructions would interfere with subsequent optimizations until finally they might or might not get removed here. And constant folding isn't sufficient to duplicate the logic of the transform pass that determined the instruction could be deleted.

InstCombine is called several times, it calls constant folding and instruction simplification functions. It looks like a good place to get rid of such useless instructions.

My point is that it shouldn't be needed for eliminating dead constrained intrinsics that are still around only because the exception behavior is "strict" when a transform pass has proved to itself that no exception can ever take place.

We do run InstCombiner often enough that it is a decent place to eliminate useless instructions. I don't think it is ideal, but it's fine.

It seems like these dangling, useless instructions would interfere with subsequent optimizations until finally they might or might not get removed here. And constant folding isn't sufficient to duplicate the logic of the transform pass that determined the instruction could be deleted.

InstCombine is called several times, it calls constant folding and instruction simplification functions. It looks like a good place to get rid of such useless instructions.

My point is that it shouldn't be needed for eliminating dead constrained intrinsics that are still around only because the exception behavior is "strict" when a transform pass has proved to itself that no exception can ever take place.

Absolutely. It requires that transformations that may remove intrinsic calls be aware about constrained intrinsics. Removing unneeded instruction in separate pass still might make sense in some cases, similar to separate DCE passes.

We do run InstCombiner often enough that it is a decent place to eliminate useless instructions. I don't think it is ideal, but it's fine.

It is not ideal, but InstCombiner would be a transformation pass that would know about constrained intrinsics. It calls SimplifyCall (and consequently calls ConstantFoldCall), so it already has all necessary information to make removal of constfolded intrinsics, no need to reevaluate them. At least removal of unneeded intrinsic call here wouldn't cost much.

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 8:56 PM

Does something prevent this change from accepting?

Now replaced instructions remain in the generated code, which makes their constant folding useless.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1194

Relying on SimplifyCall() here to determine whether the call has side effects is still more restrictive than necessary. For example, if the exception behavior argument is not fp::ExceptionBehavior::ebStrict and the call has no users, it can be removed even if the arguments are non-constant or the operation would raise an exception.

sepavloff marked an inline comment as done.Apr 4 2022, 2:15 AM
sepavloff added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1194

Actually there is no additional restrictions. wouldInstructionBeTriviallyDead already recognizes that a constrained intrinsic calls with the exception behavior argument other than ebStrict may be removed. So if such call result is not used, it is removed in prepareICWorklistFromFunction and does not participate in instruction combining. If such call is replaced by SimplifyCall, it is removed due to the same reason. The test f_eval_maytrap demonstrates this case.

sepavloff marked an inline comment as done.Apr 11 2022, 4:06 AM

Ping.

efriedma added inline comments.May 2 2022, 10:25 AM
llvm/include/llvm/Analysis/InstructionSimplify.h
305

"is actually absent".

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
4134 ↗(On Diff #408942)

It's not clear to me this is the only reason a constrained FP intrinsic can be returned. And even if it were, it's a fragile invariant.

Would it make sense to just have isInstructionTriviallyDead() do the right thing here? We already do something similar with isMathLibCallNoop().

kpn added inline comments.May 2 2022, 10:44 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
4134 ↗(On Diff #408942)

With "strict" exception behavior we are not allowed to eliminate any run-time traps. I still do not know how isInstructionTriviallyDead() can trivially know that an instruction will never trap and thus can know that removing the instruction is trivially safe.

efriedma added inline comments.May 2 2022, 11:39 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
4134 ↗(On Diff #408942)

It can prove it exactly the same way this patch is trying to do: by constant-folding the operation.

sepavloff added inline comments.May 3 2022, 4:47 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
4134 ↗(On Diff #408942)

Would it make sense to just have isInstructionTriviallyDead() do the right thing here? We already do something similar with isMathLibCallNoop().

It can prove it exactly the same way this patch is trying to do: by constant-folding the operation.

It could but it seems to be impractical. isInstructionTriviallyDead is called very often, so it must work enough fast. Making constant evaluations in every invocation of it would substantially increase compile time. The evaluation is made using APFloat, so even for simple operations like llvm.experimental.constrained.fadd it involves enough heavy computations. And for functions like llvm.experimental.constrained.fptosi or llvm.experimental.constrained.cos execution time would be definitely large.

Of course, most users wouldn't suffer from this, because the do not use 'strict' exception behavior. However in strict mode compile time may be unexplainably large and it eventually be noticed and considered as a bug.

Also isInstructionTriviallyDead is already overloaded with various checks and attempt adding constant evaluation to it may be considered as undesirable.

On the other hand, the assumption described in this comment is actually obvious and natural. If the code that simplifies IR constructs decides that this constrained intrinsic call may be replaced by some other constructs, we may ignore the side effect of this call. In other words the simplifying code is responsible for taking into account the side effect of the intrinsic.

kpn added inline comments.May 3 2022, 5:34 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
4134 ↗(On Diff #408942)

Personally, I would like to see isInstructionTriviallyDead() do the right thing. But the current plan to call back in to see if we can constant fold doesn't scale. Look at my changes to InstSimplify to see simplification cases that aren't constant folding.

In other words the simplifying code is responsible for taking into account the side effect of the intrinsic.

Yes. Serge hit the nail on the head.

If we go the route of having isInstructionTriviallyDead() re-derive the fact that an instruction can be eliminated by re-running code to determine it is safe then we need to also re-determine that the non-constant folding simplifications are safe. We would also need to re-run any other transformation-handling code that comes up in the future. That seems terribly heavyweight, and bad layering as well.

In an earlier ticket we saw Serge's suggestion of having transformation passes annotate instructions so the existing isInstructionTriviallyDead() implementation will recognize they can be trivially deleted. This suggestion was rejected for reasons I never saw explained. I'd love to understand the issues with that proposed solution at least.

That leaves us with options like this ticket, where instructions to be deleted hang around for multiple passes until InstCombine gets run. This solution also has, to a lesser extent I think, some of the scalability issues outlined above on top of being fragile as @efriedma pointed out.

Does anyone have any other suggestions that would solve the problem without creating new ones? Anyone?

sepavloff updated this revision to Diff 426668.May 3 2022, 5:37 AM

Fixed comment and rebased

efriedma added inline comments.May 3 2022, 12:02 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
4134 ↗(On Diff #408942)

In an earlier ticket we saw Serge's suggestion of having transformation passes annotate instructions so the existing isInstructionTriviallyDead() implementation will recognize they can be trivially deleted

Which is this? Don't recall seeing this.


I think to avoid the issue with hacking at InstCombinerImpl::run, you can do something like the following in visitCallInst:

if (CI.use_empty() && isa<ConstrainedFPIntrinsic>(CI)) {
  if (SimplifyCall(&CI, SQ.getWithInstruction(&CI))) {
    eraseInstFromFunction(CI);
    return nullptr;
  }
}
sepavloff updated this revision to Diff 427071.May 4 2022, 11:09 AM

Simplify code

sepavloff added inline comments.May 4 2022, 11:23 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
4134 ↗(On Diff #408942)

In an earlier ticket we saw Serge's suggestion of having transformation passes annotate instructions so the existing isInstructionTriviallyDead() implementation will recognize they can be trivially deleted

Which is this? Don't recall seeing this.

Such idea was proposed some time ago but was never implemented. Actually it looks like a complex solution because it requires some mechanism to detect points when intrinsic must be reevaluated because its arguments are changed. It increases cost of using constant evaluator in isInstructionTriviallyDead.

I think to avoid the issue with hacking at InstCombinerImpl::run, you can do something like the following in visitCallInst:

Indeed, it make the patch even more compact. Thank you!

This revision is now accepted and ready to land.May 4 2022, 11:40 AM
This revision was landed with ongoing or failed builds.May 4 2022, 10:04 PM
This revision was automatically updated to reflect the committed changes.
sepavloff reopened this revision.May 6 2022, 3:34 AM
This revision is now accepted and ready to land.May 6 2022, 3:34 AM
sepavloff updated this revision to Diff 427589.May 6 2022, 3:37 AM

Updated the patch according to the discussion

  • function SimplifyConstrainedFPCall was added,
  • simplified code in InstCombineCalls.cpp
reames accepted this revision.May 6 2022, 7:08 AM

LGTM w/suggest comment change applied.

llvm/include/llvm/Analysis/InstructionSimplify.h
311

Suggested wording change:

Unlike other SimplifyX routines, this function provides an additional contract. Specifically, it guarantees that if simplification succeeds that the intrinsic is side effect free. As a result, successful simplification can be used to delete the intrinsic not just replace its result.

This revision was landed with ongoing or failed builds.May 7 2022, 5:06 AM
This revision was automatically updated to reflect the committed changes.