This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
ConstantFolding.h
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
Transforms/InstCombine/
-
InstCombine/
-
InstCombiner.h
-
lib/
-
Analysis/
2/7
ConstantFolding.cpp
-
TargetTransformInfo.cpp
-
Target/
-
AArch64/
-
AArch64TargetTransformInfo.h
-
AArch64TargetTransformInfo.cpp
-
ARM/
-
ARMTargetTransformInfo.h
-
ARMTargetTransformInfo.cpp
-
Transforms/InstCombine/
-
InstCombine/
-
InstructionCombining.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
AArch64/
2/3
constant-fold-fp-denormal.ll
-
ARM/
-
constant-fold-fp-denormal.ll

Differential D116952

[ConstantFolding] Respect denormal handling mode attributes when folding instructions
ClosedPublic

Authored by dcandler on Jan 10 2022, 9:15 AM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper
nikic
lebedev.ri

Commits

rGd3919a8cc503: [ConstantFolding] Respect denormal handling mode attributes when folding…

Summary

Depending on the environment, a floating point instruction should
treat denormal inputs as zero, and/or flush a denormal output to zero.
Denormals are not currently accounted for when an instruction gets
folded to a constant, which can lead to differences in output between
a folded and a unfolded instruction when running on the target. The
denormal handling mode can be set by the function level attribute
denormal-fp-math, which this patch uses to determine whether any
denormal inputs to or outputs from folding should be zero, and that
the sign is set appropriately.

Diff Detail

Event Timeline

dcandler created this revision.Jan 10 2022, 9:15 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 10 2022, 9:15 AM

dcandler requested review of this revision.Jan 10 2022, 9:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2022, 9:15 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Does "original operation" mean what hardware would do? Not all targets support flushing denormals to zero. Even within X86, SSE can flush denormals to zero, but X87 can't.

Harbormaster completed remote builds in B142463: Diff 398664.Jan 10 2022, 10:01 AM

Thanks for highlighting that. Producing the appropriate result for the hardware was what I meant, so based on that I would have to rework this to handle different targets.

It doesn't look like constant folding currently has such target knowledge, and the only obvious solution I can see would be to use TargetTransformInfo to determine if the target should flush denormals from folding. However this would also mean passing TTI around much more in order to pass it down through to the constant folding functions wherever they get used.

lenary added a subscriber: lenary.Jan 13 2022, 4:31 AM

I've updated the patch to use TargetTransformInfo to determine whether the target supports flushing to zero. I'm slightly wary, as there is an warning discouraging using TargetTransformInfo, but I unsure of an alternative.

lenary added inline comments.Jan 20 2022, 5:22 AM

llvm/lib/Analysis/ConstantFolding.cpp
1007	You should probably also explain that the `return nullptr` means if you don't have a TTI, you are explicitly preventing any constant folding of operations that produce denormals, rather than continuing to fold to the denormal value.

Harbormaster completed remote builds in B144562: Diff 401607.Jan 20 2022, 5:55 AM

Ping for any other comments.

I've also added the suggested comment in the code. And yes, one result of this patch would indeed be that calls to constant folding from passes other than instruction combining would not be able to fold floating point instructions that result in a denormal - since they lack TTI. While it would be possible to add TTI to those passes, that becomes a significantly larger change and I was unsure how necessary that would be if instruction combining can already handle this case.

Harbormaster completed remote builds in B146043: Diff 403669.Jan 27 2022, 12:11 PM

Definitely not a fan of this change, on multiple levels. I don't particularly look forward to having a TTI dependency in constant folding, but more immediately, I don't think I really understand the LangRef basis for this change.

Could you explain which constituent part of fast FMF specifically allows this optimization? I don't see anything related to denormals, and interpreting afn to apply to primitive operations would be a bit of a stretch.

How does this relate to the denormal-fp-math function attribute? I would have thought that this is the one that controls denormal flushing behavior.

This revision now requires changes to proceed.Jan 29 2022, 1:27 AM

Sorry, I didn't look at this review sooner, but I agree with @nikic. The TTI warning comment in instcombine was supposed to prevent this kind of proposal, and this seems to be mixing up seemingly unrelated pieces of FP behavior. It might help to see a source/motivating example to understand if there's any realistic solution to the problem.

Thanks for taking a look, it does appear I misunderstood a few things.

The original motivation was that downstream we found a case where the output of the compiled program is different between O0 and O1: at O0 a floating point instruction executes on the target and produces a zero, but at O1 the instruction gets constant folded to a denormal value. So I have been exploring whether it would be possible to handle denormals at the time of folding, since this occurs before other combination passes (e.g DAGCombiner).

The denormal-fp-math attribute does contains the relevant information about the floating point envionment, and should already be accessible during folding via the instruction pointer (assuming it belongs to a function at the time). So I believe I could potentially rework this to avoid pulling in target info by checking denormal-fp-math instead, if that would be more acceptable. Reading the language reference, the attribute doesn't mandate flushing denormals to zero, but does suggest inputs should be treated as zero, which constant folding also does not currently respect. On testing, this can lead to similar differences in ouput when folding a floating point instruction where one input is a denormal, so it may make sense to check the inputs as well as the output in ConstantFoldInstOperands.

In D116952#3304298, @dcandler wrote:

Thanks for taking a look, it does appear I misunderstood a few things.

The original motivation was that downstream we found a case where the output of the compiled program is different between O0 and O1: at O0 a floating point instruction executes on the target and produces a zero, but at O1 the instruction gets constant folded to a denormal value. So I have been exploring whether it would be possible to handle denormals at the time of folding, since this occurs before other combination passes (e.g DAGCombiner).

The denormal-fp-math attribute does contains the relevant information about the floating point envionment, and should already be accessible during folding via the instruction pointer (assuming it belongs to a function at the time). So I believe I could potentially rework this to avoid pulling in target info by checking denormal-fp-math instead, if that would be more acceptable. Reading the language reference, the attribute doesn't mandate flushing denormals to zero, but does suggest inputs should be treated as zero, which constant folding also does not currently respect. On testing, this can lead to similar differences in ouput when folding a floating point instruction where one input is a denormal, so it may make sense to check the inputs as well as the output in ConstantFoldInstOperands.

If we can use the function attribute to get the desired result, I think that would be fine.
In an ideal world, we would have all of the FP settings in one place, but FMF became part of the bonus bits in an instruction, and there's not enough space there to represent variations like denorm or sqrt specializations.
If the attribute is not specified as needed, then we should clarify/enhance that in LangRef.

I've updated the patch with a new version which now takes the denormal handling mode from the function attribute, and adjusted the title/summary to reflect this. This supports both different settings for inputs and outputs to the instruction, as well as whether values get flushed positive zero or the sign is preserved.

While testing I found the denormal-fp-math-f32 attribute was being set unexpectedly, but I've created a separate patch to deal with that: https://reviews.llvm.org/D122589. It should not be an issue for this patch however, since it simply uses the attributes as given.

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 8:52 AM

Herald added a subscriber: StephenFan. · View Herald Transcript

ConstantFoldBinaryOpOperands has other callers; are you planning to fix each of them separately?

Assuming IEEE denormal handling if we can't find the parent Function seems a bit dubious, but maybe it's the best we can do for now. It's not clear to me how this interacts with floating-point constant expressions (e.g. ConstantExpr::getFAdd). I guess we could just kill off floating-point constant expressions, since they aren't really useful in practice, but that's a non-trivial effort.

Harbormaster completed remote builds in B156574: Diff 418604.Mar 29 2022, 1:02 PM

When I originally checked the other calls to ConstantFoldBinaryOpOperands did not look like they would potentially be handling floating point instructions, although on second look, I missed InstructionSimplify::foldOrCommuteConstant. The same approach should work there too though, so I can expand the patch to cover that usage.

If the instruction lacks a parent function, then the alternative to defaulting to IEEE would be not folding at all. The impact to that could be limited to only instructions to where a denormal is detected in the input/output; if there's no denormal, there's no need for a parent function so folding can proceed.

Constant expressions won't be currently be affected by this, although potentially they could also follow the principle of only getting folded if denormals are not involved.

dcandler mentioned this in D122589: Additionally set f32 mode with denormal-fp-math.Mar 30 2022, 9:35 AM

In D116952#3417056, @dcandler wrote:

If the instruction lacks a parent function, then the alternative to defaulting to IEEE would be not folding at all. The impact to that could be limited to only instructions to where a denormal is detected in the input/output; if there's no denormal, there's no need for a parent function so folding can proceed.

Maybe... refusing to fold only when we see a denormal might lead to bugs which only show up for specific constants, though. I think the current approach of just continuing to do IEEE folding is fine as an incremental step.

Constant expressions won't be currently be affected by this, although potentially they could also follow the principle of only getting folded if denormals are not involved.

My concern with constant expressions here is mostly optimizations turning instructions into constant expressions, e.g. TargetFolder/ConstantFolder. If frontends create constant expressions, it's less important what happens.

spatel added inline comments.Mar 30 2022, 11:21 AM

llvm/lib/Analysis/ConstantFolding.cpp
1006	fneg is not a computational FP operation; it's a signbit operation. For example on x86 with SSE, it's implemented with a vector integer xor instruction, so it is not affected by denorm FP mode. I'm not sure what happens on targets that have a real fneg instruction. Either way, we need at least one test to check the behavior.

I've removed FNeg from the changes; indeed it wasn't affected by denormal mode wherever I tried. That did allow me to refactor slightly to wrap around just ConstantFoldBinaryOpOperands, and fall back to that when dealing with a constant expression or functionless instruction just as before. So for the moment the denormal mode information is only applied in situations where it is available, which is one step forward at least.

Harbormaster completed remote builds in B158705: Diff 421535.Apr 8 2022, 9:43 AM

Please add a testcase for opt -instsimplify.

llvm/lib/Analysis/InstructionSimplify.cpp
612 ↗	(On Diff #421535)	Maybe we should just return early if CxtI is null, instead of falling back to ConstantFoldBinaryOpOperands?

In D116952#3439943, @efriedma wrote:

Please add a testcase for opt -instsimplify.

Right - I don't think we want any -instcombine tests with this patch. It should be completely testable from -instsimplify. And we should vary the opcodes (not just fmul), so we have at least partial coverage for each one (plus a negative test for "fneg").

arsenm added a subscriber: arsenm.Apr 18 2022, 1:37 PM

arsenm added inline comments.

llvm/lib/Analysis/ConstantFolding.cpp
1345	Don't need llvm::
llvm/test/Transforms/InstCombine/AArch64/constant-fold-fp-denormal.ll
4	Don't need a specific target here
5	Would it be helpful to have some constantexpr cases in a global initializer?

I've moved the tests to instsimplify, and expanded them out to cover more cases and additional instructions. While there may be some overlap, this structures it a bit better and ensures cases don't get conflated: depending on the instruction some zero results can be obtained from either the input getting zeroed or the output getting zeroed, so it's better to test both separately.

dcandler marked 3 inline comments as done.May 3 2022, 1:56 AM

dcandler added inline comments.

llvm/lib/Analysis/InstructionSimplify.cpp
612 ↗	(On Diff #421535)	Returning early here would change the result for existing cases where there is no instruction pointer, so constant expressions won't get folded where they would before.
llvm/test/Transforms/InstCombine/AArch64/constant-fold-fp-denormal.ll
5	I don't think it's necessary to test those if they aren't going to be affected by setting the function attribute.

Harbormaster completed remote builds in B162392: Diff 426602.May 3 2022, 2:36 AM

I don't understand why we have (duplicate?) tests with -instcombine. Also, why are there ARM and AArch test file variants? IIUC, the target makes no difference - the behavior is completely specified by the function attributes.
I like that we're testing each opcode with each attribute combination for thoroughness, but I'd prefer to have that all in one file rather than split by opcode. Wouldn't it be easier to see the progression if the tests were ordered based on the attributes rather than opcode? Ie, if we're testing that an input is flushed to zero, then that denorm constant could be repeated N times in a row independently of the opcode.

Once we have the right set of tests in place, you can pre-commit them with baseline CHECK lines, and then it will be easier to see how this patch changes functionality (and we can add test comments to explain more if needed).

Sorry, the instcombine tests are from the previous version and shouldn't have been included in the diff.

Originally I put the tests in ARM/AArch64 because those seemed the relevant ones where you'd expect to see the attributes with all the different modes, but you're right; having the tests in both is redundant, and no target is needed at all really when the test is specifying the attributes. So I will combine the opcodes into one file, and can move them down a folder.

On ordering, while one set of inputs would work for multiple attributes/opcodes when testing the inputs are correctly flushed, testing that the output is flushed requires specific inputs for each opcode/attribute combination to produce a subnormal output. Where possible, I tried to pick input values relevant to the opcode where one set of inputs produced different results based on the attribute and grouped based on that, since then the effect of the attribute is visible at a glance. For example with fadd, the same inputs and opcode can produce four different results depending on which attribute is used, and the result of the input getting flushed is distinct from the result when the output is flushed. Keeping those tests together felt more readable than continually changing the inputs to order by attribute first.

Tests moved out and pre-committed in https://reviews.llvm.org/D125807

dcandler added a parent revision: D125807: [ConstantFolding] Pre-commit tests showing denormal handling during folding.May 17 2022, 9:49 AM

Harbormaster completed remote builds in B164924: Diff 430100.May 17 2022, 10:49 AM

Ping

LGTM - thanks for the thorough tests!
See inline for some minor cleanups.

llvm/lib/Analysis/ConstantFolding.cpp
1028	typo: separately Hopefully, we'll get rid of FP constant expressions, so there won't be a discrepancy in the future.
1029	"if a constant"
1336	typo: separately
1384	typo: instruction

This revision was not accepted when it landed; it landed in state Needs Review.Jun 20 2022, 8:43 AM

This revision was landed with ongoing or failed builds.

Closed by commit rGd3919a8cc503: [ConstantFolding] Respect denormal handling mode attributes when folding… (authored by dcandler). · Explain Why

This revision was automatically updated to reflect the committed changes.

dcandler added a commit: rGd3919a8cc503: [ConstantFolding] Respect denormal handling mode attributes when folding….

define zeroext i1 @foo() #0 {
  %_add = fadd fast double 1.264810e-321, 3.789480e-321
  %_res = fcmp fast une double %_add, 5.054290e-321
  ret i1 %_res
}

attributes #0 = { "denormal-fp-math"="positive-zero" }

llc 1.ll -mtriple=powerpc64le-unknown-linux-gnu

Hi, this patch causes mis-compile for above case, now %_res is true while before this patch it is false. We can not handle the denormal constantFP in fcmp? Will the denormal constantFP be in other opcodes as well?

Thanks.

In D116952#3600716, @shchenz wrote:
define zeroext i1 @foo() #0 {
  %_add = fadd fast double 1.264810e-321, 3.789480e-321
  %_res = fcmp fast une double %_add, 5.054290e-321
  ret i1 %_res
}

attributes #0 = { "denormal-fp-math"="positive-zero" }
llc 1.ll -mtriple=powerpc64le-unknown-linux-gnu
Hi, this patch causes mis-compile for above case, now %_res is true while before this patch it is false. We can not handle the denormal constantFP in fcmp? Will the denormal constantFP be in other opcodes as well?

Alive says that LLVM is correct:

define i1 @foo() denormal-fp-math=positive-zero,positive-zero {
  %_add = fadd double 0.000000, 0.000000, exceptions=ignore
  %_res = fcmp une double %_add, 0.000000
  ret i1 %_res
}
=>
define i1 @foo() noread nowrite nofree willreturn denormal-fp-math=positive-zero,positive-zero {
  ret i1 1
}
Transformation seems to be correct!

In D116952#3601629, @nlopes wrote:
In D116952#3600716, @shchenz wrote:
define zeroext i1 @foo() #0 {
  %_add = fadd fast double 1.264810e-321, 3.789480e-321
  %_res = fcmp fast une double %_add, 5.054290e-321
  ret i1 %_res
}

attributes #0 = { "denormal-fp-math"="positive-zero" }
llc 1.ll -mtriple=powerpc64le-unknown-linux-gnu
Hi, this patch causes mis-compile for above case, now %_res is true while before this patch it is false. We can not handle the denormal constantFP in fcmp? Will the denormal constantFP be in other opcodes as well?
Alive says that LLVM is correct:
define i1 @foo() denormal-fp-math=positive-zero,positive-zero {
  %_add = fadd double 0.000000, 0.000000, exceptions=ignore
  %_res = fcmp une double %_add, 0.000000
  ret i1 %_res
}
=>
define i1 @foo() noread nowrite nofree willreturn denormal-fp-math=positive-zero,positive-zero {
  ret i1 1
}
Transformation seems to be correct!

Alive 2 says "ERROR: Couldn't prove the correctness of the transformation"...

https://alive2.llvm.org/ce/z/GPLj4Z

And from the semantic of the case, %_add not equal to 5.054290e-321 should be wrong, (1.264810e-321 + 3.789480e-321 == 5.054290e-321), so we should expect false here?

In D116952#3601819, @shchenz wrote:
In D116952#3601629, @nlopes wrote:
In D116952#3600716, @shchenz wrote:
define zeroext i1 @foo() #0 {
  %_add = fadd fast double 1.264810e-321, 3.789480e-321
  %_res = fcmp fast une double %_add, 5.054290e-321
  ret i1 %_res
}

attributes #0 = { "denormal-fp-math"="positive-zero" }
llc 1.ll -mtriple=powerpc64le-unknown-linux-gnu
Hi, this patch causes mis-compile for above case, now %_res is true while before this patch it is false. We can not handle the denormal constantFP in fcmp? Will the denormal constantFP be in other opcodes as well?
Alive says that LLVM is correct:
define i1 @foo() denormal-fp-math=positive-zero,positive-zero {
  %_add = fadd double 0.000000, 0.000000, exceptions=ignore
  %_res = fcmp une double %_add, 0.000000
  ret i1 %_res
}
=>
define i1 @foo() noread nowrite nofree willreturn denormal-fp-math=positive-zero,positive-zero {
  ret i1 1
}
Transformation seems to be correct!
Alive 2 says "ERROR: Couldn't prove the correctness of the transformation"...

https://alive2.llvm.org/ce/z/GPLj4Z

The online version is outdated, sorry.

And from the semantic of the case, %_add not equal to 5.054290e-321 should be wrong, (1.264810e-321 + 3.789480e-321 == 5.054290e-321), so we should expect false here?

%_add = #x00000000000003ff
Which is a subnormal, so per the function attribute it is changed to +0.0.

Hi @nlopes, thanks for providing the useful info. However I am still not very clear about how to deal with our internal failure after this patch.

define i1 @foo() denormal-fp-math=positive-zero,positive-zero {
  %_add = fadd double 0.000000, 0.000000, exceptions=ignore
  %_res = fcmp une double %_add, 0.000000
  ret i1 %_res
}
=>
define i1 @foo() noread nowrite nofree willreturn denormal-fp-math=positive-zero,positive-zero {
  ret i1 1
}

The alive result is very confusing. I don't understand why 0.000000 + 0.000000 != 0x000000 when denormal-fp-math=positive-zero, could you help to explain?
I know you said online Alive2 is outdated, but seems the online Alive2 gets opposite result for the above 0.000000 case, it verifies ret i1 0 as the valid transformation. https://alive2.llvm.org/ce/z/PjhR3U

There is a C case too:

int main(void)
{
  double a = 1.264810e-321;
  double b = 3.789480e-321;

  return (a + b != 5.054290e-321);
}

clang 1.c -Ofast -fdenormal-fp-math=positive-zero without this patch, it gets 0 and with this patch, it gets 1. Our internal test expects 0 here.

I also tested above case with XLC(-Ofast -qnostrict)/GCC(-Ofast -funsafe-math-optimizations) on PowerPC, they both get 0.

Could you please tell me what's wrong with our internal failure? Thanks in advance. @nlopes @dcandler

In D116952#3603700, @shchenz wrote:
Hi @nlopes, thanks for providing the useful info. However I am still not very clear about how to deal with our internal failure after this patch.
define i1 @foo() denormal-fp-math=positive-zero,positive-zero {
  %_add = fadd double 0.000000, 0.000000, exceptions=ignore
  %_res = fcmp une double %_add, 0.000000
  ret i1 %_res
}
=>
define i1 @foo() noread nowrite nofree willreturn denormal-fp-math=positive-zero,positive-zero {
  ret i1 1
}
The alive result is very confusing. I don't understand why 0.000000 + 0.000000 != 0x000000 when denormal-fp-math=positive-zero, could you help to explain?

True, the output isn't great (floats are truncated, hence the 0.000000, which is not what's underneath). But what matters is the final result.

I know you said online Alive2 is outdated, but seems the online Alive2 gets opposite result for the above 0.000000 case, it verifies ret i1 0 as the valid transformation. https://alive2.llvm.org/ce/z/PjhR3U

The online version if Alive2 doesn't implement the denormal-fp-math attribute.

There is a C case too:
int main(void)
{
  double a = 1.264810e-321;
  double b = 3.789480e-321;

  return (a + b != 5.054290e-321);
}
clang 1.c -Ofast -fdenormal-fp-math=positive-zero without this patch, it gets 0 and with this patch, it gets 1. Our internal test expects 0 here.

Look at the generated assembly with -O0 without and without -fdenormal-fp-math. There's no difference. So it seems that this flag doesn't guarantee anything (it's best effort) or it's not fully implemented yet.
Nevertheless, your internal test is wrong. Check the math here: https://en.wikipedia.org/wiki/Double-precision_floating-point_format#Exponent_encoding

In D116952#3604095, @nlopes wrote:
In D116952#3603700, @shchenz wrote:
Hi @nlopes, thanks for providing the useful info. However I am still not very clear about how to deal with our internal failure after this patch.
define i1 @foo() denormal-fp-math=positive-zero,positive-zero {
  %_add = fadd double 0.000000, 0.000000, exceptions=ignore
  %_res = fcmp une double %_add, 0.000000
  ret i1 %_res
}
=>
define i1 @foo() noread nowrite nofree willreturn denormal-fp-math=positive-zero,positive-zero {
  ret i1 1
}
The alive result is very confusing. I don't understand why 0.000000 + 0.000000 != 0x000000 when denormal-fp-math=positive-zero, could you help to explain?
True, the output isn't great (floats are truncated, hence the 0.000000, which is not what's underneath). But what matters is the final result.

I know you said online Alive2 is outdated, but seems the online Alive2 gets opposite result for the above 0.000000 case, it verifies ret i1 0 as the valid transformation. https://alive2.llvm.org/ce/z/PjhR3U

The online version if Alive2 doesn't implement the denormal-fp-math attribute.
There is a C case too:
int main(void)
{
  double a = 1.264810e-321;
  double b = 3.789480e-321;

  return (a + b != 5.054290e-321);
}
clang 1.c -Ofast -fdenormal-fp-math=positive-zero without this patch, it gets 0 and with this patch, it gets 1. Our internal test expects 0 here.
Look at the generated assembly with -O0 without and without -fdenormal-fp-math. There's no difference. So it seems that this flag doesn't guarantee anything (it's best effort) or it's not fully implemented yet.
Nevertheless, your internal test is wrong. Check the math here: https://en.wikipedia.org/wiki/Double-precision_floating-point_format#Exponent_encoding

Thanks. I need some time to have a better understanding.

So GCC/XLC both returning 0 for the C case is caused by -fdenormal-fp-math=positive-zero not implemented or not used in the command line? I tested with clang, without -fdenormal-fp-math=positive-zero, it also returns 0 with this patch.

dcandler mentioned this in rGd3919a8cc503: [ConstantFolding] Respect denormal handling mode attributes when folding….Jun 23 2022, 5:13 AM

In D116952#3600716, @shchenz wrote:

Hi, this patch causes mis-compile for above case, now %_res is true while before this patch it is false. We can not handle the denormal constantFP in fcmp? Will the denormal constantFP be in other opcodes as well?

Catching up after being out for a few days...
Yes, we'll need to make more of FP constant-folding aware of these function attributes to get consistent results.
D128647 looks like it will fix fcmp. We'll need something similar for constant-folded FP intrinsics and libcalls too. And there was a comment about updating LangRef to document the behavior (the flushing mode does not affect signbit ops like fneg/fabs/copysign).

In D116952#3618733, @spatel wrote:

In D116952#3600716, @shchenz wrote:

Hi, this patch causes mis-compile for above case, now %_res is true while before this patch it is false. We can not handle the denormal constantFP in fcmp? Will the denormal constantFP be in other opcodes as well?

Catching up after being out for a few days...
Yes, we'll need to make more of FP constant-folding aware of these function attributes to get consistent results.
D128647 looks like it will fix fcmp. We'll need something similar for constant-folded FP intrinsics and libcalls too. And there was a comment about updating LangRef to document the behavior (the flushing mode does not affect signbit ops like fneg/fabs/copysign).

Thanks for confirming.

spatel mentioned this in D128647: [InstructionSimplify] handle denormal constant input for fcmp.Jun 29 2022, 7:27 AM

spatel mentioned this in D127964: [DCE] Eliminate no-op atan and atan2 calls.Aug 8 2022, 7:31 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

ConstantFolding.h

7 lines

TargetTransformInfo.h

9 lines

TargetTransformInfoImpl.h

4 lines

Transforms/

InstCombine/

InstCombiner.h

12 lines

lib/

Analysis/

ConstantFolding.cpp

50 lines

TargetTransformInfo.cpp

5 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

2 lines

AArch64TargetTransformInfo.cpp

19 lines

ARM/

ARMTargetTransformInfo.h

2 lines

ARMTargetTransformInfo.cpp

18 lines

Transforms/

InstCombine/

InstructionCombining.cpp

7 lines

test/

Transforms/

InstCombine/

AArch64/

constant-fold-fp-denormal.ll

31 lines

ARM/

constant-fold-fp-denormal.ll

31 lines

Diff 401607

llvm/include/llvm/Analysis/ConstantFolding.h

	Show All 25 Lines
	class Constant;			class Constant;
	class ConstantExpr;			class ConstantExpr;
	class DSOLocalEquivalent;			class DSOLocalEquivalent;
	class DataLayout;			class DataLayout;
	class Function;			class Function;
	class GlobalValue;			class GlobalValue;
	class Instruction;			class Instruction;
	class TargetLibraryInfo;			class TargetLibraryInfo;
				class TargetTransformInfo;
	class Type;			class Type;

	/// If this constant is a constant offset from a global, return the global and			/// If this constant is a constant offset from a global, return the global and
	/// the constant. Because of constantexprs, this function is recursive.			/// the constant. Because of constantexprs, this function is recursive.
	/// If the global is part of a dso_local_equivalent constant, return it through			/// If the global is part of a dso_local_equivalent constant, return it through
	/// `Equiv` if it is provided.			/// `Equiv` if it is provided.
	bool IsConstantOffsetFromGlobal(Constant C, GlobalValue &GV, APInt &Offset,			bool IsConstantOffsetFromGlobal(Constant C, GlobalValue &GV, APInt &Offset,
	const DataLayout &DL,			const DataLayout &DL,
	DSOLocalEquivalent **DSOEquiv = nullptr);			DSOLocalEquivalent **DSOEquiv = nullptr);

	/// ConstantFoldInstruction - Try to constant fold the specified instruction.			/// ConstantFoldInstruction - Try to constant fold the specified instruction.
	/// If successful, the constant result is returned, if not, null is returned.			/// If successful, the constant result is returned, if not, null is returned.
	/// Note that this fails if not all of the operands are constant. Otherwise,			/// Note that this fails if not all of the operands are constant. Otherwise,
	/// this function can only fail when attempting to fold instructions like loads			/// this function can only fail when attempting to fold instructions like loads
	/// and stores, which have no constant expression form.			/// and stores, which have no constant expression form.
	Constant ConstantFoldInstruction(Instruction I, const DataLayout &DL,			Constant ConstantFoldInstruction(Instruction I, const DataLayout &DL,
	const TargetLibraryInfo *TLI = nullptr);			const TargetLibraryInfo *TLI = nullptr,
				const TargetTransformInfo *TTI = nullptr);

	/// ConstantFoldConstant - Fold the constant using the specified DataLayout.			/// ConstantFoldConstant - Fold the constant using the specified DataLayout.
	/// This function always returns a non-null constant: Either the folding result,			/// This function always returns a non-null constant: Either the folding result,
	/// or the original constant if further folding is not possible.			/// or the original constant if further folding is not possible.
	Constant ConstantFoldConstant(const Constant C, const DataLayout &DL,			Constant ConstantFoldConstant(const Constant C, const DataLayout &DL,
	const TargetLibraryInfo *TLI = nullptr);			const TargetLibraryInfo *TLI = nullptr);

	/// ConstantFoldInstOperands - Attempt to constant fold an instruction with the			/// ConstantFoldInstOperands - Attempt to constant fold an instruction with the
	/// specified operands. If successful, the constant result is returned, if not,			/// specified operands. If successful, the constant result is returned, if not,
	/// null is returned. Note that this function can fail when attempting to			/// null is returned. Note that this function can fail when attempting to
	/// fold instructions like loads and stores, which have no constant expression			/// fold instructions like loads and stores, which have no constant expression
	/// form.			/// form.
	///			///
	Constant ConstantFoldInstOperands(Instruction I, ArrayRef<Constant *> Ops,			Constant ConstantFoldInstOperands(Instruction I, ArrayRef<Constant *> Ops,
	const DataLayout &DL,			const DataLayout &DL,
	const TargetLibraryInfo *TLI = nullptr);			const TargetLibraryInfo *TLI = nullptr,
				const TargetTransformInfo *TTI = nullptr);

	/// ConstantFoldCompareInstOperands - Attempt to constant fold a compare			/// ConstantFoldCompareInstOperands - Attempt to constant fold a compare
	/// instruction (icmp/fcmp) with the specified operands. If it fails, it			/// instruction (icmp/fcmp) with the specified operands. If it fails, it
	/// returns a constant expression of the specified operands.			/// returns a constant expression of the specified operands.
	///			///
	Constant *			Constant *
	ConstantFoldCompareInstOperands(unsigned Predicate, Constant *LHS,			ConstantFoldCompareInstOperands(unsigned Predicate, Constant *LHS,
	Constant *RHS, const DataLayout &DL,			Constant *RHS, const DataLayout &DL,
	▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 812 Lines • ▼ Show 20 Lines	bool allowsMisalignedMemoryAccesses(LLVMContext &Context, unsigned BitWidth,
bool *Fast = nullptr) const;		bool *Fast = nullptr) const;

/// Return hardware support for population count.		/// Return hardware support for population count.
PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) const;		PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) const;

/// Return true if the hardware has a fast square-root instruction.		/// Return true if the hardware has a fast square-root instruction.
bool haveFastSqrt(Type *Ty) const;		bool haveFastSqrt(Type *Ty) const;

		/// Return true if folding a floating-point instruction to a constant
		/// should produce zero instead of a denormal
		bool enableFPDenormalFlushToZero(const Instruction &Inst) const;

/// Return true if it is faster to check if a floating-point value is NaN		/// Return true if it is faster to check if a floating-point value is NaN
/// (or not-NaN) versus a comparison against a constant FP zero value.		/// (or not-NaN) versus a comparison against a constant FP zero value.
/// Targets should override this if materializing a 0.0 for comparison is		/// Targets should override this if materializing a 0.0 for comparison is
/// generally as cheap as checking for ordered/unordered.		/// generally as cheap as checking for ordered/unordered.
bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) const;		bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) const;

/// Return the expected cost of supporting the floating point operation		/// Return the expected cost of supporting the floating point operation
/// of the specified type.		/// of the specified type.
▲ Show 20 Lines • Show All 751 Lines • ▼ Show 20 Lines	public:
virtual bool isFPVectorizationPotentiallyUnsafe() = 0;		virtual bool isFPVectorizationPotentiallyUnsafe() = 0;
virtual bool allowsMisalignedMemoryAccesses(LLVMContext &Context,		virtual bool allowsMisalignedMemoryAccesses(LLVMContext &Context,
unsigned BitWidth,		unsigned BitWidth,
unsigned AddressSpace,		unsigned AddressSpace,
Align Alignment,		Align Alignment,
bool *Fast) = 0;		bool *Fast) = 0;
virtual PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) = 0;		virtual PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) = 0;
virtual bool haveFastSqrt(Type *Ty) = 0;		virtual bool haveFastSqrt(Type *Ty) = 0;
		virtual bool enableFPDenormalFlushToZero(const Instruction &Inst) = 0;
virtual bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) = 0;		virtual bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) = 0;
virtual InstructionCost getFPOpCost(Type *Ty) = 0;		virtual InstructionCost getFPOpCost(Type *Ty) = 0;
virtual InstructionCost getIntImmCodeSizeCost(unsigned Opc, unsigned Idx,		virtual InstructionCost getIntImmCodeSizeCost(unsigned Opc, unsigned Idx,
const APInt &Imm, Type *Ty) = 0;		const APInt &Imm, Type *Ty) = 0;
virtual InstructionCost getIntImmCost(const APInt &Imm, Type *Ty,		virtual InstructionCost getIntImmCost(const APInt &Imm, Type *Ty,
TargetCostKind CostKind) = 0;		TargetCostKind CostKind) = 0;
virtual InstructionCost getIntImmCostInst(unsigned Opc, unsigned Idx,		virtual InstructionCost getIntImmCostInst(unsigned Opc, unsigned Idx,
const APInt &Imm, Type *Ty,		const APInt &Imm, Type *Ty,
▲ Show 20 Lines • Show All 438 Lines • ▼ Show 20 Lines	bool allowsMisalignedMemoryAccesses(LLVMContext &Context, unsigned BitWidth,
return Impl.allowsMisalignedMemoryAccesses(Context, BitWidth, AddressSpace,		return Impl.allowsMisalignedMemoryAccesses(Context, BitWidth, AddressSpace,
Alignment, Fast);		Alignment, Fast);
}		}
PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) override {		PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) override {
return Impl.getPopcntSupport(IntTyWidthInBit);		return Impl.getPopcntSupport(IntTyWidthInBit);
}		}
bool haveFastSqrt(Type *Ty) override { return Impl.haveFastSqrt(Ty); }		bool haveFastSqrt(Type *Ty) override { return Impl.haveFastSqrt(Ty); }

		bool enableFPDenormalFlushToZero(const Instruction &Inst) override {
		return Impl.enableFPDenormalFlushToZero(Inst);
		}

bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) override {		bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) override {
return Impl.isFCmpOrdCheaperThanFCmpZero(Ty);		return Impl.isFCmpOrdCheaperThanFCmpZero(Ty);
}		}

InstructionCost getFPOpCost(Type *Ty) override {		InstructionCost getFPOpCost(Type *Ty) override {
return Impl.getFPOpCost(Ty);		return Impl.getFPOpCost(Ty);
}		}

▲ Show 20 Lines • Show All 435 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	public:
}		}

TTI::PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) const {		TTI::PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) const {
return TTI::PSK_Software;		return TTI::PSK_Software;
}		}

bool haveFastSqrt(Type *Ty) const { return false; }		bool haveFastSqrt(Type *Ty) const { return false; }

		bool enableFPDenormalFlushToZero(const Instruction &Inst) const {
		return false;
		}

bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) const { return true; }		bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) const { return true; }

InstructionCost getFPOpCost(Type *Ty) const {		InstructionCost getFPOpCost(Type *Ty) const {
return TargetTransformInfo::TCC_Basic;		return TargetTransformInfo::TCC_Basic;
}		}

InstructionCost getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx,		InstructionCost getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx,
const APInt &Imm, Type *Ty) const {		const APInt &Imm, Type *Ty) const {
▲ Show 20 Lines • Show All 860 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/InstCombine/InstCombiner.h

	Show All 37 Lines
	class TargetLibraryInfo;			class TargetLibraryInfo;
	class TargetTransformInfo;			class TargetTransformInfo;

	/// The core instruction combiner logic.			/// The core instruction combiner logic.
	///			///
	/// This class provides both the logic to recursively visit instructions and			/// This class provides both the logic to recursively visit instructions and
	/// combine them.			/// combine them.
	class LLVM_LIBRARY_VISIBILITY InstCombiner {			class LLVM_LIBRARY_VISIBILITY InstCombiner {
	/// Only used to call target specific intrinsic combining.
	/// It must NOT be used for any other purpose, as InstCombine is a
	/// target-independent canonicalization transform.
	TargetTransformInfo &TTI;

	public:			public:
	/// Maximum size of array considered when transforming.			/// Maximum size of array considered when transforming.
	uint64_t MaxArraySizeForCombine = 0;			uint64_t MaxArraySizeForCombine = 0;

	/// An IRBuilder that automatically inserts new instructions into the			/// An IRBuilder that automatically inserts new instructions into the
	/// worklist.			/// worklist.
	using BuilderTy = IRBuilder<TargetFolder, IRBuilderCallbackInserter>;			using BuilderTy = IRBuilder<TargetFolder, IRBuilderCallbackInserter>;
	BuilderTy &Builder;			BuilderTy &Builder;

	protected:			protected:
	/// A worklist of the instructions that need to be simplified.			/// A worklist of the instructions that need to be simplified.
	InstructionWorklist &Worklist;			InstructionWorklist &Worklist;

	// Mode in which we are running the combiner.			// Mode in which we are running the combiner.
	const bool MinimizeSize;			const bool MinimizeSize;

	AAResults *AA;			AAResults *AA;

	// Required analyses.			// Required analyses.
	AssumptionCache &AC;			AssumptionCache &AC;
	TargetLibraryInfo &TLI;			TargetLibraryInfo &TLI;
				/// Only used to call target specific intrinsic combining.
				/// It must NOT be used for any other purpose, as InstCombine is a
				/// target-independent canonicalization transform.
				TargetTransformInfo &TTI;
	DominatorTree &DT;			DominatorTree &DT;
	const DataLayout &DL;			const DataLayout &DL;
	const SimplifyQuery SQ;			const SimplifyQuery SQ;
	OptimizationRemarkEmitter &ORE;			OptimizationRemarkEmitter &ORE;
	BlockFrequencyInfo *BFI;			BlockFrequencyInfo *BFI;
	ProfileSummaryInfo *PSI;			ProfileSummaryInfo *PSI;

	// Optional analyses. When non-null, these can both be used to do better			// Optional analyses. When non-null, these can both be used to do better
	// combining and will be updated to reflect any changes.			// combining and will be updated to reflect any changes.
	LoopInfo *LI;			LoopInfo *LI;

	bool MadeIRChange = false;			bool MadeIRChange = false;

	public:			public:
	InstCombiner(InstructionWorklist &Worklist, BuilderTy &Builder,			InstCombiner(InstructionWorklist &Worklist, BuilderTy &Builder,
	bool MinimizeSize, AAResults *AA, AssumptionCache &AC,			bool MinimizeSize, AAResults *AA, AssumptionCache &AC,
	TargetLibraryInfo &TLI, TargetTransformInfo &TTI,			TargetLibraryInfo &TLI, TargetTransformInfo &TTI,
	DominatorTree &DT, OptimizationRemarkEmitter &ORE,			DominatorTree &DT, OptimizationRemarkEmitter &ORE,
	BlockFrequencyInfo BFI, ProfileSummaryInfo PSI,			BlockFrequencyInfo BFI, ProfileSummaryInfo PSI,
	const DataLayout &DL, LoopInfo *LI)			const DataLayout &DL, LoopInfo *LI)
	: TTI(TTI), Builder(Builder), Worklist(Worklist),			: Builder(Builder), Worklist(Worklist),
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - : Builder(Builder), Worklist(Worklist), - MinimizeSize(MinimizeSize), AA(AA), AC(AC), TLI(TLI), TTI(TTI), DT(DT), DL(DL), + : Builder(Builder), Worklist(Worklist), MinimizeSize(MinimizeSize), + AA(AA), AC(AC), TLI(TLI), TTI(TTI), DT(DT), DL(DL), Lint: Pre-merge checks: clang-format: please reformat the code ``` - : Builder(Builder), Worklist(Worklist)…
	MinimizeSize(MinimizeSize), AA(AA), AC(AC), TLI(TLI), DT(DT), DL(DL),			MinimizeSize(MinimizeSize), AA(AA), AC(AC), TLI(TLI), TTI(TTI), DT(DT), DL(DL),
	SQ(DL, &TLI, &DT, &AC), ORE(ORE), BFI(BFI), PSI(PSI), LI(LI) {}			SQ(DL, &TLI, &DT, &AC), ORE(ORE), BFI(BFI), PSI(PSI), LI(LI) {}

	virtual ~InstCombiner() {}			virtual ~InstCombiner() {}

	/// Return the source operand of a potentially bitcasted value while			/// Return the source operand of a potentially bitcasted value while
	/// optionally checking if it has one use. If there is no bitcast or the one			/// optionally checking if it has one use. If there is no bitcast or the one
	/// use check is not met, return the input value itself.			/// use check is not met, return the input value itself.
	static Value peekThroughBitcast(Value V, bool OneUseOnly = false) {			static Value peekThroughBitcast(Value V, bool OneUseOnly = false) {
	▲ Show 20 Lines • Show All 433 Lines • Show Last 20 Lines

llvm/lib/Analysis/ConstantFolding.cpp

Show All 20 Lines
#include "llvm/ADT/APSInt.h"		#include "llvm/ADT/APSInt.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/TargetFolder.h"		#include "llvm/Analysis/TargetFolder.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/Config/config.h"		#include "llvm/Config/config.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
▲ Show 20 Lines • Show All 953 Lines • ▼ Show 20 Lines
/// Attempt to constant fold an instruction with the		/// Attempt to constant fold an instruction with the
/// specified opcode and operands. If successful, the constant result is		/// specified opcode and operands. If successful, the constant result is
/// returned, if not, null is returned. Note that this function can fail when		/// returned, if not, null is returned. Note that this function can fail when
/// attempting to fold instructions like loads and stores, which have no		/// attempting to fold instructions like loads and stores, which have no
/// constant expression form.		/// constant expression form.
Constant ConstantFoldInstOperandsImpl(const Value InstOrCE, unsigned Opcode,		Constant ConstantFoldInstOperandsImpl(const Value InstOrCE, unsigned Opcode,
ArrayRef<Constant *> Ops,		ArrayRef<Constant *> Ops,
const DataLayout &DL,		const DataLayout &DL,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI,
		const TargetTransformInfo *TTI = nullptr) {
Type *DestTy = InstOrCE->getType();		Type *DestTy = InstOrCE->getType();

if (Instruction::isUnaryOp(Opcode))		if (Instruction::isUnaryOp(Opcode)) {
return ConstantFoldUnaryOpOperand(Opcode, Ops[0], DL);		Constant *C = ConstantFoldUnaryOpOperand(Opcode, Ops[0], DL);
		if (auto *CFP = dyn_cast<ConstantFP>(C)) {
		/// If folding produces a floating point denormal, check whether
		spatelUnsubmitted Done Reply Inline Actions fneg is not a computational FP operation; it's a signbit operation. For example on x86 with SSE, it's implemented with a vector integer xor instruction, so it is not affected by denorm FP mode. I'm not sure what happens on targets that have a real fneg instruction. Either way, we need at least one test to check the behavior. spatel: fneg is not a computational FP operation; it's a signbit operation. For example on x86 with SSE…
		/// it should be forced to zero.
		lenaryUnsubmitted Not Done Reply Inline Actions You should probably also explain that the `return nullptr` means if you don't have a TTI, you are explicitly preventing any constant folding of operations that produce denormals, rather than continuing to fold to the denormal value. lenary: You should probably also explain that the `return nullptr` means if you don't have a TTI, you…
		if (CFP->getValueAPF().isDenormal()) {
		if (auto *I = dyn_cast<Instruction>(InstOrCE)) {
		if (!TTI)
		return nullptr;
		if (TTI->enableFPDenormalFlushToZero(*I))
		return Constant::getNullValue(C->getType());
		}
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -Constant ConstantFoldInstOperandsImpl(const Value InstOrCE, unsigned Opcode, - ArrayRef<Constant > Ops, - const DataLayout &DL, - const TargetLibraryInfo TLI, - const TargetTransformInfo TTI = nullptr) { +Constant +ConstantFoldInstOperandsImpl(const Value InstOrCE, unsigned Opcode, + ArrayRef<Constant > Ops, const DataLayout &DL, + const TargetLibraryInfo TLI, + const TargetTransformInfo TTI = nullptr) { Lint: Pre-merge checks: clang-format: please reformat the code ``` -Constant *ConstantFoldInstOperandsImpl(const Value…
		}
		}
		return C;
		}

if (Instruction::isBinaryOp(Opcode))		if (Instruction::isBinaryOp(Opcode)) {
return ConstantFoldBinaryOpOperands(Opcode, Ops[0], Ops[1], DL);		Constant *C = ConstantFoldBinaryOpOperands(Opcode, Ops[0], Ops[1], DL);
		if (auto *CFP = dyn_cast<ConstantFP>(C)) {
		/// If folding produces a floating point denormal, check whether
		/// it should be forced to zero.
		if (CFP->getValueAPF().isDenormal()) {
		if (auto *I = dyn_cast<Instruction>(InstOrCE)) {
		if (!TTI)
		return nullptr;
		spatelUnsubmitted Not Done Reply Inline Actions typo: separately Hopefully, we'll get rid of FP constant expressions, so there won't be a discrepancy in the future. spatel: typo: separately Hopefully, we'll get rid of FP constant expressions, so there won't be a…
		if (TTI->enableFPDenormalFlushToZero(*I))
		spatelUnsubmitted Not Done Reply Inline Actions "if a constant" spatel: "if a constant"
		return Constant::getNullValue(C->getType());
		}
		}
		}
		return C;
		}

if (Instruction::isCast(Opcode))		if (Instruction::isCast(Opcode))
return ConstantFoldCastOperand(Opcode, Ops[0], DestTy, DL);		return ConstantFoldCastOperand(Opcode, Ops[0], DestTy, DL);

if (auto *GEP = dyn_cast<GEPOperator>(InstOrCE)) {		if (auto *GEP = dyn_cast<GEPOperator>(InstOrCE)) {
if (Constant *C = SymbolicallyEvaluateGEP(GEP, Ops, DL, TLI))		if (Constant *C = SymbolicallyEvaluateGEP(GEP, Ops, DL, TLI))
return C;		return C;

▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	ConstantFoldConstantImpl(const Constant *C, const DataLayout &DL,

assert(isa<ConstantVector>(C));		assert(isa<ConstantVector>(C));
return ConstantVector::get(Ops);		return ConstantVector::get(Ops);
}		}

} // end anonymous namespace		} // end anonymous namespace

Constant llvm::ConstantFoldInstruction(Instruction I, const DataLayout &DL,		Constant llvm::ConstantFoldInstruction(Instruction I, const DataLayout &DL,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI,
		const TargetTransformInfo *TTI) {
// Handle PHI nodes quickly here...		// Handle PHI nodes quickly here...
if (auto *PN = dyn_cast<PHINode>(I)) {		if (auto *PN = dyn_cast<PHINode>(I)) {
Constant *CommonValue = nullptr;		Constant *CommonValue = nullptr;

SmallDenseMap<Constant , Constant > FoldedOps;		SmallDenseMap<Constant , Constant > FoldedOps;
for (Value *Incoming : PN->incoming_values()) {		for (Value *Incoming : PN->incoming_values()) {
// If the incoming value is undef then skip it. Note that while we could		// If the incoming value is undef then skip it. Note that while we could
// skip the value if it is equal to the phi node itself we choose not to		// skip the value if it is equal to the phi node itself we choose not to
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	Constant llvm::ConstantFoldInstruction(Instruction I, const DataLayout &DL,
}		}

if (auto *IVI = dyn_cast<InsertValueInst>(I))		if (auto *IVI = dyn_cast<InsertValueInst>(I))
return ConstantExpr::getInsertValue(Ops[0], Ops[1], IVI->getIndices());		return ConstantExpr::getInsertValue(Ops[0], Ops[1], IVI->getIndices());

if (auto *EVI = dyn_cast<ExtractValueInst>(I))		if (auto *EVI = dyn_cast<ExtractValueInst>(I))
return ConstantExpr::getExtractValue(Ops[0], EVI->getIndices());		return ConstantExpr::getExtractValue(Ops[0], EVI->getIndices());

return ConstantFoldInstOperands(I, Ops, DL, TLI);		return ConstantFoldInstOperands(I, Ops, DL, TLI, TTI);
}		}

Constant llvm::ConstantFoldConstant(const Constant C, const DataLayout &DL,		Constant llvm::ConstantFoldConstant(const Constant C, const DataLayout &DL,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI) {
SmallDenseMap<Constant , Constant > FoldedOps;		SmallDenseMap<Constant , Constant > FoldedOps;
return ConstantFoldConstantImpl(C, DL, TLI, FoldedOps);		return ConstantFoldConstantImpl(C, DL, TLI, FoldedOps);
}		}

Constant llvm::ConstantFoldInstOperands(Instruction I,		Constant llvm::ConstantFoldInstOperands(Instruction I,
ArrayRef<Constant *> Ops,		ArrayRef<Constant *> Ops,
const DataLayout &DL,		const DataLayout &DL,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI,
return ConstantFoldInstOperandsImpl(I, I->getOpcode(), Ops, DL, TLI);		const TargetTransformInfo *TTI) {
		return ConstantFoldInstOperandsImpl(I, I->getOpcode(), Ops, DL, TLI, TTI);
}		}

Constant *llvm::ConstantFoldCompareInstOperands(unsigned IntPredicate,		Constant *llvm::ConstantFoldCompareInstOperands(unsigned IntPredicate,
Constant Ops0, Constant Ops1,		Constant Ops0, Constant Ops1,
const DataLayout &DL,		const DataLayout &DL,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI) {
CmpInst::Predicate Predicate = (CmpInst::Predicate)IntPredicate;		CmpInst::Predicate Predicate = (CmpInst::Predicate)IntPredicate;
// fold: icmp (inttoptr x), null -> icmp x, 0		// fold: icmp (inttoptr x), null -> icmp x, 0
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	Constant llvm::ConstantFoldBinaryOpOperands(unsigned Opcode, Constant LHS,

return ConstantExpr::get(Opcode, LHS, RHS);		return ConstantExpr::get(Opcode, LHS, RHS);
}		}

Constant llvm::ConstantFoldCastOperand(unsigned Opcode, Constant C,		Constant llvm::ConstantFoldCastOperand(unsigned Opcode, Constant C,
Type *DestTy, const DataLayout &DL) {		Type *DestTy, const DataLayout &DL) {
assert(Instruction::isCast(Opcode));		assert(Instruction::isCast(Opcode));
switch (Opcode) {		switch (Opcode) {
default:		default:
		spatelUnsubmitted Not Done Reply Inline Actions typo: separately spatel: typo: separately
llvm_unreachable("Missing case");		llvm_unreachable("Missing case");
case Instruction::PtrToInt:		case Instruction::PtrToInt:
if (auto *CE = dyn_cast<ConstantExpr>(C)) {		if (auto *CE = dyn_cast<ConstantExpr>(C)) {
Constant *FoldedValue = nullptr;		Constant *FoldedValue = nullptr;
// If the input is a inttoptr, eliminate the pair. This requires knowing		// If the input is a inttoptr, eliminate the pair. This requires knowing
// the width of a pointer, so it can't be done in ConstantExpr::getCast.		// the width of a pointer, so it can't be done in ConstantExpr::getCast.
if (CE->getOpcode() == Instruction::IntToPtr) {		if (CE->getOpcode() == Instruction::IntToPtr) {
// zext/trunc the inttoptr to pointer size.		// zext/trunc the inttoptr to pointer size.
FoldedValue = ConstantExpr::getIntegerCast(		FoldedValue = ConstantExpr::getIntegerCast(
		arsenmUnsubmitted Done Reply Inline Actions Don't need llvm:: arsenm: Don't need llvm::
CE->getOperand(0), DL.getIntPtrType(CE->getType()),		CE->getOperand(0), DL.getIntPtrType(CE->getType()),
/IsSigned=/false);		/IsSigned=/false);
} else if (auto *GEP = dyn_cast<GEPOperator>(CE)) {		} else if (auto *GEP = dyn_cast<GEPOperator>(CE)) {
// If we have GEP, we can perform the following folds:		// If we have GEP, we can perform the following folds:
// (ptrtoint (gep null, x)) -> x		// (ptrtoint (gep null, x)) -> x
// (ptrtoint (gep (gep null, x), y) -> x + y, etc.		// (ptrtoint (gep (gep null, x), y) -> x + y, etc.
unsigned BitWidth = DL.getIndexTypeSizeInBits(GEP->getType());		unsigned BitWidth = DL.getIndexTypeSizeInBits(GEP->getType());
APInt BaseOffset(BitWidth, 0);		APInt BaseOffset(BitWidth, 0);
Show All 22 Lines	if (auto *CE = dyn_cast<ConstantExpr>(C)) {
unsigned MidIntSize = CE->getType()->getScalarSizeInBits();		unsigned MidIntSize = CE->getType()->getScalarSizeInBits();

if (MidIntSize >= SrcPtrSize) {		if (MidIntSize >= SrcPtrSize) {
unsigned SrcAS = SrcPtr->getType()->getPointerAddressSpace();		unsigned SrcAS = SrcPtr->getType()->getPointerAddressSpace();
if (SrcAS == DestTy->getPointerAddressSpace())		if (SrcAS == DestTy->getPointerAddressSpace())
return FoldBitCast(CE->getOperand(0), DestTy, DL);		return FoldBitCast(CE->getOperand(0), DestTy, DL);
}		}
}		}
}		}
		spatelUnsubmitted Not Done Reply Inline Actions typo: instruction spatel: typo: instruction

return ConstantExpr::getCast(Opcode, C, DestTy);		return ConstantExpr::getCast(Opcode, C, DestTy);
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::UIToFP:		case Instruction::UIToFP:
▲ Show 20 Lines • Show All 1,849 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 531 Lines • ▼ Show 20 Lines
	TargetTransformInfo::getPopcntSupport(unsigned IntTyWidthInBit) const {			TargetTransformInfo::getPopcntSupport(unsigned IntTyWidthInBit) const {
	return TTIImpl->getPopcntSupport(IntTyWidthInBit);			return TTIImpl->getPopcntSupport(IntTyWidthInBit);
	}			}

	bool TargetTransformInfo::haveFastSqrt(Type *Ty) const {			bool TargetTransformInfo::haveFastSqrt(Type *Ty) const {
	return TTIImpl->haveFastSqrt(Ty);			return TTIImpl->haveFastSqrt(Ty);
	}			}

				bool TargetTransformInfo::enableFPDenormalFlushToZero(
				const Instruction &Inst) const {
				return TTIImpl->enableFPDenormalFlushToZero(Inst);
				}

	bool TargetTransformInfo::isFCmpOrdCheaperThanFCmpZero(Type *Ty) const {			bool TargetTransformInfo::isFCmpOrdCheaperThanFCmpZero(Type *Ty) const {
	return TTIImpl->isFCmpOrdCheaperThanFCmpZero(Ty);			return TTIImpl->isFCmpOrdCheaperThanFCmpZero(Ty);
	}			}

	InstructionCost TargetTransformInfo::getFPOpCost(Type *Ty) const {			InstructionCost TargetTransformInfo::getFPOpCost(Type *Ty) const {
	InstructionCost Cost = TTIImpl->getFPOpCost(Ty);			InstructionCost Cost = TTIImpl->getFPOpCost(Ty);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	▲ Show 20 Lines • Show All 649 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	public:			public:
	explicit AArch64TTIImpl(const AArch64TargetMachine *TM, const Function &F)			explicit AArch64TTIImpl(const AArch64TargetMachine *TM, const Function &F)
	: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),			: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),
	TLI(ST->getTargetLowering()) {}			TLI(ST->getTargetLowering()) {}

	bool areInlineCompatible(const Function *Caller,			bool areInlineCompatible(const Function *Caller,
	const Function *Callee) const;			const Function *Callee) const;

				bool enableFPDenormalFlushToZero(const Instruction &Inst) const;

	/// \name Scalar TTI Implementations			/// \name Scalar TTI Implementations
	/// @{			/// @{

	using BaseT::getIntImmCost;			using BaseT::getIntImmCost;
	InstructionCost getIntImmCost(int64_t Val);			InstructionCost getIntImmCost(int64_t Val);
	InstructionCost getIntImmCost(const APInt &Imm, Type *Ty,			InstructionCost getIntImmCost(const APInt &Imm, Type *Ty,
	TTI::TargetCostKind CostKind);			TTI::TargetCostKind CostKind);
	InstructionCost getIntImmCostInst(unsigned Opcode, unsigned Idx,			InstructionCost getIntImmCostInst(unsigned Opcode, unsigned Idx,
	▲ Show 20 Lines • Show All 252 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	bool AArch64TTIImpl::areInlineCompatible(const Function *Caller,
const FeatureBitset &CalleeBits =		const FeatureBitset &CalleeBits =
TM.getSubtargetImpl(*Callee)->getFeatureBits();		TM.getSubtargetImpl(*Callee)->getFeatureBits();

// Inline a callee if its target-features are a subset of the callers		// Inline a callee if its target-features are a subset of the callers
// target-features.		// target-features.
return (CallerBits & CalleeBits) == CalleeBits;		return (CallerBits & CalleeBits) == CalleeBits;
}		}

		bool AArch64TTIImpl::enableFPDenormalFlushToZero(
		const Instruction &Inst) const {
		if (!Inst.isFast())
		return false;

		switch (Inst.getOpcode()) {
		case Instruction::FNeg:
		case Instruction::FAdd:
		case Instruction::FSub:
		case Instruction::FMul:
		case Instruction::FDiv:
		case Instruction::FRem:
		return true;
		default:
		break;
		}
		return false;
		}

/// Calculate the cost of materializing a 64-bit value. This helper		/// Calculate the cost of materializing a 64-bit value. This helper
/// method might only calculate a fraction of a larger immediate. Therefore it		/// method might only calculate a fraction of a larger immediate. Therefore it
/// is valid to return a cost of ZERO.		/// is valid to return a cost of ZERO.
InstructionCost AArch64TTIImpl::getIntImmCost(int64_t Val) {		InstructionCost AArch64TTIImpl::getIntImmCost(int64_t Val) {
// Check if the immediate can be encoded within an instruction.		// Check if the immediate can be encoded within an instruction.
if (Val == 0 \|\| AArch64_AM::isLogicalImmediate(Val, 64))		if (Val == 0 \|\| AArch64_AM::isLogicalImmediate(Val, 64))
return 0;		return 0;

▲ Show 20 Lines • Show All 2,496 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	public:			public:
	explicit ARMTTIImpl(const ARMBaseTargetMachine *TM, const Function &F)			explicit ARMTTIImpl(const ARMBaseTargetMachine *TM, const Function &F)
	: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),			: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),
	TLI(ST->getTargetLowering()) {}			TLI(ST->getTargetLowering()) {}

	bool areInlineCompatible(const Function *Caller,			bool areInlineCompatible(const Function *Caller,
	const Function *Callee) const;			const Function *Callee) const;

				bool enableFPDenormalFlushToZero(const Instruction &Inst) const;

	bool enableInterleavedAccessVectorization() { return true; }			bool enableInterleavedAccessVectorization() { return true; }

	TTI::AddressingModeKind			TTI::AddressingModeKind
	getPreferredAddressingMode(const Loop L, ScalarEvolution SE) const;			getPreferredAddressingMode(const Loop L, ScalarEvolution SE) const;

	/// Floating-point computation using ARMv8 AArch32 Advanced			/// Floating-point computation using ARMv8 AArch32 Advanced
	/// SIMD instructions remains unchanged from ARMv7. Only AArch64 SIMD			/// SIMD instructions remains unchanged from ARMv7. Only AArch64 SIMD
	/// and Arm MVE are IEEE-754 compliant.			/// and Arm MVE are IEEE-754 compliant.
	▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	bool MatchExact = (CallerBits & ~InlineFeaturesAllowed) ==
(CalleeBits & ~InlineFeaturesAllowed);		(CalleeBits & ~InlineFeaturesAllowed);
// For features in the allowed list, the callee's features must be a subset of		// For features in the allowed list, the callee's features must be a subset of
// the callers'.		// the callers'.
bool MatchSubset = ((CallerBits & CalleeBits) & InlineFeaturesAllowed) ==		bool MatchSubset = ((CallerBits & CalleeBits) & InlineFeaturesAllowed) ==
(CalleeBits & InlineFeaturesAllowed);		(CalleeBits & InlineFeaturesAllowed);
return MatchExact && MatchSubset;		return MatchExact && MatchSubset;
}		}

		bool ARMTTIImpl::enableFPDenormalFlushToZero(const Instruction &Inst) const {
		if (!Inst.isFast())
		return false;

		switch (Inst.getOpcode()) {
		case Instruction::FNeg:
		case Instruction::FAdd:
		case Instruction::FSub:
		case Instruction::FMul:
		case Instruction::FDiv:
		case Instruction::FRem:
		return true;
		default:
		break;
		}
		return false;
		}

TTI::AddressingModeKind		TTI::AddressingModeKind
ARMTTIImpl::getPreferredAddressingMode(const Loop *L,		ARMTTIImpl::getPreferredAddressingMode(const Loop *L,
ScalarEvolution *SE) const {		ScalarEvolution *SE) const {
if (ST->hasMVEIntegerOps())		if (ST->hasMVEIntegerOps())
return TTI::AMK_PostIndexed;		return TTI::AMK_PostIndexed;

if (L->getHeader()->getParent()->hasOptSize())		if (L->getHeader()->getParent()->hasOptSize())
return TTI::AMK_None;		return TTI::AMK_None;
▲ Show 20 Lines • Show All 2,248 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 3,906 Lines • ▼ Show 20 Lines	while (!Worklist.isEmpty()) {
}		}

if (!DebugCounter::shouldExecute(VisitCounter))		if (!DebugCounter::shouldExecute(VisitCounter))
continue;		continue;

// Instruction isn't dead, see if we can constant propagate it.		// Instruction isn't dead, see if we can constant propagate it.
if (!I->use_empty() &&		if (!I->use_empty() &&
(I->getNumOperands() == 0 \|\| isa<Constant>(I->getOperand(0)))) {		(I->getNumOperands() == 0 \|\| isa<Constant>(I->getOperand(0)))) {
if (Constant *C = ConstantFoldInstruction(I, DL, &TLI)) {		if (Constant *C = ConstantFoldInstruction(I, DL, &TLI, &TTI)) {
LLVM_DEBUG(dbgs() << "IC: ConstFold to: " << C << " from: " << I		LLVM_DEBUG(dbgs() << "IC: ConstFold to: " << C << " from: " << I
<< '\n');		<< '\n');

// Add operands to the worklist.		// Add operands to the worklist.
replaceInstUsesWith(*I, C);		replaceInstUsesWith(*I, C);
++NumConstProp;		++NumConstProp;
if (isInstructionTriviallyDead(I, &TLI))		if (isInstructionTriviallyDead(I, &TLI))
eraseInstFromFunction(*I);		eraseInstFromFunction(*I);
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
///		///
/// This has a couple of tricks to make the code faster and more powerful. In		/// This has a couple of tricks to make the code faster and more powerful. In
/// particular, we constant fold and DCE instructions as we go, to avoid adding		/// particular, we constant fold and DCE instructions as we go, to avoid adding
/// them to the worklist (this significantly speeds up instcombine on code where		/// them to the worklist (this significantly speeds up instcombine on code where
/// many instructions are dead or constant). Additionally, if we find a branch		/// many instructions are dead or constant). Additionally, if we find a branch
/// whose condition is a known constant, we only visit the reachable successors.		/// whose condition is a known constant, we only visit the reachable successors.
static bool prepareICWorklistFromFunction(Function &F, const DataLayout &DL,		static bool prepareICWorklistFromFunction(Function &F, const DataLayout &DL,
const TargetLibraryInfo *TLI,		const TargetLibraryInfo *TLI,
		const TargetTransformInfo *TTI,
InstructionWorklist &ICWorklist) {		InstructionWorklist &ICWorklist) {
bool MadeIRChange = false;		bool MadeIRChange = false;
SmallPtrSet<BasicBlock *, 32> Visited;		SmallPtrSet<BasicBlock *, 32> Visited;
SmallVector<BasicBlock*, 256> Worklist;		SmallVector<BasicBlock*, 256> Worklist;
Worklist.push_back(&F.front());		Worklist.push_back(&F.front());

SmallVector<Instruction *, 128> InstrsForInstructionWorklist;		SmallVector<Instruction *, 128> InstrsForInstructionWorklist;
DenseMap<Constant , Constant > FoldedConstants;		DenseMap<Constant , Constant > FoldedConstants;
AliasScopeTracker SeenAliasScopes;		AliasScopeTracker SeenAliasScopes;

do {		do {
BasicBlock *BB = Worklist.pop_back_val();		BasicBlock *BB = Worklist.pop_back_val();

// We have now visited this block! If we've already been here, ignore it.		// We have now visited this block! If we've already been here, ignore it.
if (!Visited.insert(BB).second)		if (!Visited.insert(BB).second)
continue;		continue;

for (Instruction &Inst : llvm::make_early_inc_range(*BB)) {		for (Instruction &Inst : llvm::make_early_inc_range(*BB)) {
// ConstantProp instruction if trivially constant.		// ConstantProp instruction if trivially constant.
if (!Inst.use_empty() &&		if (!Inst.use_empty() &&
(Inst.getNumOperands() == 0 \|\| isa<Constant>(Inst.getOperand(0))))		(Inst.getNumOperands() == 0 \|\| isa<Constant>(Inst.getOperand(0))))
if (Constant *C = ConstantFoldInstruction(&Inst, DL, TLI)) {		if (Constant *C = ConstantFoldInstruction(&Inst, DL, TLI, TTI)) {
LLVM_DEBUG(dbgs() << "IC: ConstFold to: " << *C << " from: " << Inst		LLVM_DEBUG(dbgs() << "IC: ConstFold to: " << *C << " from: " << Inst
<< '\n');		<< '\n');
Inst.replaceAllUsesWith(C);		Inst.replaceAllUsesWith(C);
++NumConstProp;		++NumConstProp;
if (isInstructionTriviallyDead(&Inst, TLI))		if (isInstructionTriviallyDead(&Inst, TLI))
Inst.eraseFromParent();		Inst.eraseFromParent();
MadeIRChange = true;		MadeIRChange = true;
continue;		continue;
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	if (Iteration > MaxIterations) {
<< " on " << F.getName()		<< " on " << F.getName()
<< " reached; stopping before reaching a fixpoint\n");		<< " reached; stopping before reaching a fixpoint\n");
break;		break;
}		}

LLVM_DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "		LLVM_DEBUG(dbgs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on "
<< F.getName() << "\n");		<< F.getName() << "\n");

MadeIRChange \|= prepareICWorklistFromFunction(F, DL, &TLI, Worklist);		MadeIRChange \|= prepareICWorklistFromFunction(F, DL, &TLI, &TTI, Worklist);

InstCombinerImpl IC(Worklist, Builder, F.hasMinSize(), AA, AC, TLI, TTI, DT,		InstCombinerImpl IC(Worklist, Builder, F.hasMinSize(), AA, AC, TLI, TTI, DT,
ORE, BFI, PSI, DL, LI);		ORE, BFI, PSI, DL, LI);
IC.MaxArraySizeForCombine = MaxArraySize;		IC.MaxArraySizeForCombine = MaxArraySize;

if (!IC.run())		if (!IC.run())
break;		break;

▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/AArch64/constant-fold-fp-denormal.ll

This file was added.

				; RUN: opt -S -instcombine < %s \| FileCheck %s

				target triple = "aarch64--linux-gnu"

				arsenmUnsubmitted Done Reply Inline Actions Don't need a specific target here arsenm: Don't need a specific target here
				define float @test_float() {
				arsenmUnsubmitted Not Done Reply Inline Actions Would it be helpful to have some constantexpr cases in a global initializer? arsenm: Would it be helpful to have some constantexpr cases in a global initializer?
				dcandlerAuthorUnsubmitted Done Reply Inline Actions I don't think it's necessary to test those if they aren't going to be affected by setting the function attribute. dcandler: I don't think it's necessary to test those if they aren't going to be affected by setting the…
				; CHECK-LABEL: @test_float(
				; CHECK-NEXT: ret float 0x3800000000000000
				%mul = fmul float 0x3810000000000000, 5.000000e-01
				ret float %mul
				}

				define double @test_double() {
				; CHECK-LABEL: @test_double(
				; CHECK-NEXT: ret double 0x8000000000000
				%mul = fmul double 0x10000000000000, 5.000000e-01
				ret double %mul
				}

				define float @test_float_fast() {
				; CHECK-LABEL: @test_float_fast(
				; CHECK-NEXT: ret float 0.000000e+00
				%mul = fmul fast float 0x3810000000000000, 5.000000e-01
				ret float %mul
				}

				define double @test_double_fast() {
				; CHECK-LABEL: @test_double_fast(
				; CHECK-NEXT: ret double 0.000000e+00
				%mul = fmul fast double 0x10000000000000, 5.000000e-01
				ret double %mul
				}

llvm/test/Transforms/InstCombine/ARM/constant-fold-fp-denormal.ll

This file was added.

				; RUN: opt -S -instcombine < %s \| FileCheck %s

				target triple = "armv8-arm-none-eabi"

				define float @test_float() {
				; CHECK-LABEL: @test_float(
				; CHECK-NEXT: ret float 0x3800000000000000
				%mul = fmul float 0x3810000000000000, 5.000000e-01
				ret float %mul
				}

				define double @test_double() {
				; CHECK-LABEL: @test_double(
				; CHECK-NEXT: ret double 0x8000000000000
				%mul = fmul double 0x10000000000000, 5.000000e-01
				ret double %mul
				}

				define float @test_float_fast() {
				; CHECK-LABEL: @test_float_fast(
				; CHECK-NEXT: ret float 0.000000e+00
				%mul = fmul fast float 0x3810000000000000, 5.000000e-01
				ret float %mul
				}

				define double @test_double_fast() {
				; CHECK-LABEL: @test_double_fast(
				; CHECK-NEXT: ret double 0.000000e+00
				%mul = fmul fast double 0x10000000000000, 5.000000e-01
				ret double %mul
				}

This is an archive of the discontinued LLVM Phabricator instance.

[ConstantFolding] Respect denormal handling mode attributes when folding instructionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 401607

llvm/include/llvm/Analysis/ConstantFolding.h

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/Transforms/InstCombine/InstCombiner.h

llvm/lib/Analysis/ConstantFolding.cpp

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

llvm/test/Transforms/InstCombine/AArch64/constant-fold-fp-denormal.ll

llvm/test/Transforms/InstCombine/ARM/constant-fold-fp-denormal.ll

[ConstantFolding] Respect denormal handling mode attributes when folding instructions
ClosedPublic