Page MenuHomePhabricator

If constrained intrinsic is replaced, remove its side effect
Needs RevisionPublic

Authored by sepavloff on Nov 29 2021, 9:21 PM.

Details

Summary

Constrained intrinsics are declared as having side effect. During
constant folding it can be found that actually the operation does not
raise floating-point exceptions and may be eliminated. With this change
such instruction is marked as not having side effect, so that
subsequent DCE pass may remove it.

Diff Detail

Event Timeline

sepavloff created this revision.Nov 29 2021, 9:21 PM
sepavloff requested review of this revision.Nov 29 2021, 9:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 29 2021, 9:21 PM
nikic added a comment.Nov 30 2021, 1:52 AM

This is likely not going to be particularly useful, as the InstSimplifyPass is not really used outside of testing. To be generally applicable, this logic would have to be in wouldInstructionBeTriviallyDead().

nikic requested changes to this revision.Nov 30 2021, 6:27 AM

Marking as changes requested per above comment -- really don't think this is the right place to address this, since it will just create an incorrect picture of what LLVM can optimize in practice. Usually, this fold will happen via InstCombine rather than InstSimplifyPass. But I don't think changing InstCombine is the right solution either, as any number of passes do this kind of "simplify and DCE" pattern.

This revision now requires changes to proceed.Nov 30 2021, 6:27 AM
kpn added a comment.Nov 30 2021, 7:45 AM

Well, the problem is more widespread than that. IPSCCP can create instructions where it knows the result, and it knows the instruction will never trap. Consider how it can propagate two constants into a compare instruction. It knows the values are constants, it knows they are not undef, poison, NaN, Inf, it knows the value of the compare, and it uses that knowledge to further propagate constants and simplify the code. But it can't remove the constrained compare instruction because we don't have a good way to tell it the instruction has no side effects. It does call Instruction::isSafeToRemove(), and it does _not_ call wouldInstructionBeTriviallyDead(). A more centralized approach would be helpful.

Perhaps a way of asking the constrained intrinsic if it has a side effect? Then we can even eliminate constrained intrinsics that are "fpexcept.strict" but are known to never trap. This isn't ideal, either, but it's my current line of thinking, anyway. I don't have a really good solution.

This is likely not going to be particularly useful, as the InstSimplifyPass is not really used outside of testing. To be generally applicable, this logic would have to be in wouldInstructionBeTriviallyDead().

Why? This pass is run when clang is executed. Due to it the source file:

double func() {
  return 1.0 + 2.0;
}

compiled with:

clang -target x86_64-linux -S -O2 -ftrapping-math const-10.c

produces code:

	movsd	.LCPI0_0(%rip), %xmm0           # xmm0 = mem[0],zero
	retq

If setting attribute ReadNone in mayFoldConstrained is commented out, the resulting code contains add instruction:

	movsd	.LCPI0_0(%rip), %xmm0           # xmm0 = mem[0],zero
	addsd	.LCPI0_1(%rip), %xmm0
	movsd	.LCPI0_2(%rip), %xmm0           # xmm0 = mem[0],zero
	retq

But, as you noted in D110322, ConstantFolding.cpp is not right place for changing attributes. This patch just moves this change to more appropriate place.

Moving this logic to wouldInstructionBeTriviallyDead() does not seem right, because the decision about removing side effect comes from constant evaluation and not from the state of call object.

Marking as changes requested per above comment -- really don't think this is the right place to address this, since it will just create an incorrect picture of what LLVM can optimize in practice. Usually, this fold will happen via InstCombine rather than InstSimplifyPass. But I don't think changing InstCombine is the right solution either, as any number of passes do this kind of "simplify and DCE" pattern.

Sorry, I don't understand why this is not the right place. This pass makes constant folding and it is natural to fix attributes here, isn't it? What is wrong?

kpn added a comment.Nov 30 2021, 7:54 AM

Moving this logic to wouldInstructionBeTriviallyDead() does not seem right, because the decision about removing side effect comes from constant evaluation and not from the state of call object.

I'm sympathetic to this argument. If a transformation pass determines that a constrained intrinsic can be removed then it would be helpful if that pass could note that fact. This doesn't help us with analysis passes, though. And I'm a little worried about having to explicitly mark constrained intrinsics as removable since I don't know that we do that for any other instruction. If constrained intrinsics are the only time we do this then it seems error prone. Maybe I'm wrong?

nikic added a comment.Nov 30 2021, 7:58 AM

This is likely not going to be particularly useful, as the InstSimplifyPass is not really used outside of testing. To be generally applicable, this logic would have to be in wouldInstructionBeTriviallyDead().

Why? This pass is run when clang is executed. [...]

Yes, because we do happen to have a single InstSimplifyPass run at the very end of the pipeline (in https://github.com/llvm/llvm-project/blob/98dbcff19cfedb4e27d267310a76d616cd435447/llvm/lib/Passes/PassBuilderPipelines.cpp#L1194). However, mostly InstSimplify is only used as an analysis, with the most important consumer being InstCombine, which uses InstSimplify (the analysis, not the pass) and runs many times during the optimization pipeline. So while your code is indeed getting optimized, it is optimized very late, and you would likely see phase-ordering issues in a more practical example.

Moving this logic to wouldInstructionBeTriviallyDead() does not seem right, because the decision about removing side effect comes from constant evaluation and not from the state of call object.

I'm sympathetic to this argument. If a transformation pass determines that a constrained intrinsic can be removed then it would be helpful if that pass could note that fact. This doesn't help us with analysis passes, though. And I'm a little worried about having to explicitly mark constrained intrinsics as removable since I don't know that we do that for any other instruction. If constrained intrinsics are the only time we do this then it seems error prone. Maybe I'm wrong?

The ability to remove side effect is vital for the implementation of floating point arithmetic. There are several possible optimization techniques, which depend on this possibility. For example, if FP operations in a function have attributes "round.dynamic" and "fpexcept.ignore" and we know that rounding mode is not changed in this function, we could remove side effect of all the operations, which can give performance of non-constrained operations. As we cannot mix ordinary and constrained operation in the same function, the only way is to control side effect.

Moving this logic to wouldInstructionBeTriviallyDead() does not seem right, because the decision about removing side effect comes from constant evaluation and not from the state of call object.

I'm sympathetic to this argument. If a transformation pass determines that a constrained intrinsic can be removed then it would be helpful if that pass could note that fact. This doesn't help us with analysis passes, though. And I'm a little worried about having to explicitly mark constrained intrinsics as removable since I don't know that we do that for any other instruction. If constrained intrinsics are the only time we do this then it seems error prone. Maybe I'm wrong?

The ability to remove side effect is vital for the implementation of floating point arithmetic. There are several possible optimization techniques, which depend on this possibility. For example, if FP operations in a function have attributes "round.dynamic" and "fpexcept.ignore" and we know that rounding mode is not changed in this function, we could remove side effect of all the operations, which can give performance of non-constrained operations. As we cannot mix ordinary and constrained operation in the same function, the only way is to control side effect.

We aren't saying that the ocean isn't full of water, we are only saying that you are looking at the pond while thinking it's an ocean.
We are talking about different things here. Fixing InstSimplify pass will not do any good, because it's effectively not used.

kpn added a comment.Nov 30 2021, 8:25 AM

Moving this logic to wouldInstructionBeTriviallyDead() does not seem right, because the decision about removing side effect comes from constant evaluation and not from the state of call object.

I'm sympathetic to this argument. If a transformation pass determines that a constrained intrinsic can be removed then it would be helpful if that pass could note that fact. This doesn't help us with analysis passes, though. And I'm a little worried about having to explicitly mark constrained intrinsics as removable since I don't know that we do that for any other instruction. If constrained intrinsics are the only time we do this then it seems error prone. Maybe I'm wrong?

The ability to remove side effect is vital for the implementation of floating point arithmetic. There are several possible optimization techniques, which depend on this possibility. For example, if FP operations in a function have attributes "round.dynamic" and "fpexcept.ignore" and we know that rounding mode is not changed in this function, we could remove side effect of all the operations, which can give performance of non-constrained operations. As we cannot mix ordinary and constrained operation in the same function, the only way is to control side effect.

We aren't saying that the ocean isn't full of water, we are only saying that you are looking at the pond while thinking it's an ocean.
We are talking about different things here. Fixing InstSimplify pass will not do any good, because it's effectively not used.

A solution that works for all transformation passes would be quite useful indeed.

This is likely not going to be particularly useful, as the InstSimplifyPass is not really used outside of testing. To be generally applicable, this logic would have to be in wouldInstructionBeTriviallyDead().

Why? This pass is run when clang is executed. [...]

Yes, because we do happen to have a single InstSimplifyPass run at the very end of the pipeline (in https://github.com/llvm/llvm-project/blob/98dbcff19cfedb4e27d267310a76d616cd435447/llvm/lib/Passes/PassBuilderPipelines.cpp#L1194). However, mostly InstSimplify is only used as an analysis, with the most important consumer being InstCombine, which uses InstSimplify (the analysis, not the pass) and runs many times during the optimization pipeline. So while your code is indeed getting optimized, it is optimized very late, and you would likely see phase-ordering issues in a more practical example.

Probably the pass should run earlier, it is hard to reason without analyzing practical cases. If it is indeed so, we could move the pass or run it several times. Tuning pipeline is not unique for constant folding only. But without ability to remove instructions many optimizations do not make sense.

Moving this logic to wouldInstructionBeTriviallyDead() does not seem right, because the decision about removing side effect comes from constant evaluation and not from the state of call object.

I'm sympathetic to this argument. If a transformation pass determines that a constrained intrinsic can be removed then it would be helpful if that pass could note that fact. This doesn't help us with analysis passes, though. And I'm a little worried about having to explicitly mark constrained intrinsics as removable since I don't know that we do that for any other instruction. If constrained intrinsics are the only time we do this then it seems error prone. Maybe I'm wrong?

The ability to remove side effect is vital for the implementation of floating point arithmetic. There are several possible optimization techniques, which depend on this possibility. For example, if FP operations in a function have attributes "round.dynamic" and "fpexcept.ignore" and we know that rounding mode is not changed in this function, we could remove side effect of all the operations, which can give performance of non-constrained operations. As we cannot mix ordinary and constrained operation in the same function, the only way is to control side effect.

We aren't saying that the ocean isn't full of water, we are only saying that you are looking at the pond while thinking it's an ocean.
We are talking about different things here. Fixing InstSimplify pass will not do any good, because it's effectively not used.

It is used and does its job, as experiments demonstrate. Probably it is not the right place, but until we didn't met its limitations or elaborate a better way we could use this imperfect solution. Moving with small steps toward the perfect implementation.

kpn added a comment.Nov 30 2021, 9:09 AM

Probably the pass should run earlier, it is hard to reason without analyzing practical cases. If it is indeed so, we could move the pass or run it several times. Tuning pipeline is not unique for constant folding only. But without ability to remove instructions many optimizations do not make sense.

But InstSimplify is used for analysis long before it is run for transformations. Other passes just call into InstSimplify and bypass InstSimplify's transformation pass machinery. Meaning, parts of InstSimplify do run earlier in the pipeline _already_. And many if not most transformation passes eliminate instructions without going through InstSimplify. So tuning the pipeline is not the answer.

Coming up with a more general solution that can be used by multiple transformation passes is what we need.

Probably the pass should run earlier, it is hard to reason without analyzing practical cases. If it is indeed so, we could move the pass or run it several times. Tuning pipeline is not unique for constant folding only. But without ability to remove instructions many optimizations do not make sense.

But InstSimplify is used for analysis long before it is run for transformations. Other passes just call into InstSimplify and bypass InstSimplify's transformation pass machinery. Meaning, parts of InstSimplify do run earlier in the pipeline _already_.

As @nikic pointer out (https://reviews.llvm.org/D114766#3161692):

However, mostly InstSimplify is only used as an analysis, with the most important consumer being InstCombine, which uses InstSimplify (the analysis, not the pass) and runs many times during the optimization pipeline.

Call attribute fixing occurs only in the pass, which runs once.

And many if not most transformation passes eliminate instructions without going through InstSimplify. So tuning the pipeline is not the answer.

We do not eliminate instructions in InstSimplify. We mark them available for elimination. The elimination itself occurs in dedicated passes designed for just this purpose. So the question is only whether we may remove side effect from some calls.

Coming up with a more general solution that can be used by multiple transformation passes is what we need.

The solution actually is the call CI->addFnAttr(Attribute::ReadNone). Hardly there is a simpler or more general solution. We apply it when we know that the instruction is not needed anymore, also quite reasonable, why postpone this? The only possible concern is when we do the actions like constant folding, which may result in the attribute change. Does it occur too late? Or maybe it takes place at right moment? We do not know now. So I would propose to use with current state and change it when we found that there is a better solution.

Let me summarize some points of the solution:

  1. Changing call site attributes is safe. It is equivalent to the creation of a new call site, with new set of attributes, and then replacing the old one with it, an ubiquitous operation in LLVM.
  2. The attribute adjustment can be made as soon as the possibility of constant folding is detected.
  3. The place where constant evaluation occurs is a subject of pipeline tuning. It is obvious that it should not run too early because the frontend already makes constant expression evaluation, so the constant evaluation in IR pipeline mainly evaluates constants formed by previous passes. Current point where InstSimplify is run probably is not the best, but it must be good enough.

This is starting to sound like the isnan threads all over again. I don't envision making much progress with this approach...

kpn added a comment.Dec 1 2021, 10:04 AM

Marking as changes requested per above comment -- really don't think this is the right place to address this, since it will just create an incorrect picture of what LLVM can optimize in practice. Usually, this fold will happen via InstCombine rather than InstSimplifyPass. But I don't think changing InstCombine is the right solution either, as any number of passes do this kind of "simplify and DCE" pattern.

Let's back up. If we try and move logic into wouldInstructionBeTriviallyDead() then what would that look like? Transform and analysis passes both have loads of information that is not available to wouldInstructionBeTriviallyDead(). Trying to gather that information again would be non-trivial and wouldn't scale since it would be a layering violation if wouldInstructionBeTriviallyDead() tried to gather it. So information about whether or not a constrained intrinsic can be eliminated would need to be passed into wouldInstructionBeTriviallyDead(). And we'll have to go pass by pass implementing this passing around of information. What would or should that look like?

Let me float this idea: Would a bool passed in as an argument be sufficient? As in "I, an arbitrary transform pass, think this instruction is dead, but what does wouldInstructionBeTriviallyDead() say about it?"

@nikic, @lebedev.ri: Is there another approach that makes more sense?

nikic added a comment.Dec 1 2021, 12:11 PM

Marking as changes requested per above comment -- really don't think this is the right place to address this, since it will just create an incorrect picture of what LLVM can optimize in practice. Usually, this fold will happen via InstCombine rather than InstSimplifyPass. But I don't think changing InstCombine is the right solution either, as any number of passes do this kind of "simplify and DCE" pattern.

Let's back up. If we try and move logic into wouldInstructionBeTriviallyDead() then what would that look like? Transform and analysis passes both have loads of information that is not available to wouldInstructionBeTriviallyDead(). Trying to gather that information again would be non-trivial and wouldn't scale since it would be a layering violation if wouldInstructionBeTriviallyDead() tried to gather it. So information about whether or not a constrained intrinsic can be eliminated would need to be passed into wouldInstructionBeTriviallyDead(). And we'll have to go pass by pass implementing this passing around of information. What would or should that look like?

Let me float this idea: Would a bool passed in as an argument be sufficient? As in "I, an arbitrary transform pass, think this instruction is dead, but what does wouldInstructionBeTriviallyDead() say about it?"

@nikic, @lebedev.ri: Is there another approach that makes more sense?

I think constrained FP intrinsics are basically in the same situation as math libcalls with errno. These get constant folded, and removal is handled via isMathLibCallNoop() in wouldInstructionBeTriviallyDead(). I would expected constrained FP intrinsics to basically work the same way. Of course, the implementation can be shared with the actual folding code, so for example wouldInstructionBeTriviallyDead() could check whether all intrinsic args are constant, if they are call the constrained FP folding function internally, and then say that it is trivially dead if it folds. (Or something more nuanced -- presumably there are cases where it's possible to determine the result but not drop, or maybe drop but not fold. Not familiar with the details here.)

Changing the infrastructure so that InstSimplify/ConstantFolding could report that the instruction is in fact dead (and not just the result known) is certainly also a possibility, but it's likely not the path of least resistance.

I think constrained FP intrinsics are basically in the same situation as math libcalls with errno. These get constant folded, and removal is handled via isMathLibCallNoop() in wouldInstructionBeTriviallyDead(). I would expected constrained FP intrinsics to basically work the same way. Of course, the implementation can be shared with the actual folding code, so for example wouldInstructionBeTriviallyDead() could check whether all intrinsic args are constant, if they are call the constrained FP folding function internally, and then say that it is trivially dead if it folds. (Or something more nuanced -- presumably there are cases where it's possible to determine the result but not drop, or maybe drop but not fold. Not familiar with the details here.)

Changing the infrastructure so that InstSimplify/ConstantFolding could report that the instruction is in fact dead (and not just the result known) is certainly also a possibility, but it's likely not the path of least resistance.

Removal of constrained intrinsic calls is not limited by constant folding. InstSimplify can make replacement of calls with non-const arguments, like the transformation 'x * 0 -> 0' in fast-math mode. Other similar transformations would be added in future. Recognition if all these cases in wouldInstructionBeTriviallyDead would make it too large.