This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
6
SelectionDAGNodes.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
6
SelectionDAGBuilder.h
10
SelectionDAGBuilder.cpp
-
TargetLowering.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
fmf-flags.ll

Differential D37686

[DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management.
Needs ReviewPublic

Authored by jbhateja on Sep 11 2017, 6:45 AM.

Download Raw Diff

Tokens

"Y So Serious" token, awarded by post.kadirselcuk.

Details

Reviewers

spatel
craig.topper
asbirlea
anemet
ahatanak
RKSimon
sanjoy

Diff Detail

Build Status

Buildable 10526
Build 10526: arc lint + arc unit

Event Timeline

jbhateja created this revision.Sep 11 2017, 6:45 AM

Harbormaster completed remote builds in B10076: Diff 114585.Sep 11 2017, 6:47 AM

jbhateja added reviewers: spatel, RKSimon.Sep 11 2017, 6:48 AM

jbhateja added a subscriber: llvm-commits.

RKSimon added inline comments.Sep 11 2017, 7:29 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
112	auto *OFBinOp
118	auto *ExactOp

@RKSimon Anything else or should I check this in as NFC.

In D37686#866495, @jbhateja wrote:

@RKSimon Anything else or should I check this in as NFC.

@spatel needs to review

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6622	remove newline diff
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
666	is this needed in this patch?

There is a functional difference from this improvement (several FP intrinsics should now correctly propagate flags), but I'm not sure if it's visible without fixing something else in DAGCombiner to recognize the flags.

How does this work if an instruction maps to multiple nodes? For example, the FMA intrinsic can map to 2 nodes?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
95	Formatting/spacing is non-standard here and below. Run clang-format?
99–100	shorten: if (SDNode *Node = SelDB->getDAGNode(Instr)) {
7956–7959	Don't need this anymore?
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
666	Formatting/spacing is non-standard here and below. Run clang-format?

Adding potential reviewers from D37616.

Adding Adam and Akira as reviewers since they helped review earlier changes to SDNodeFlags.

Here's a potential test case that would show a difference from having FMF on a sqrt intrinsic:

define float @fast_recip_sqrt(float %x) {
  %y = call fast float @llvm.sqrt.f32(float %x)
  %z = fdiv fast float 1.0,  %y
  ret float %z
}
declare float @llvm.sqrt.f32(float) nounwind readonly

...but as I said earlier, we need to fix the DAGCombiner code where this fold is implemented to recognize the flags on the individual nodes. Currently, it just checks the global state:

if (Options.UnsafeFPMath) {

On x86 currently, this will use the full-precision sqrtss+divss, but it should be using rsqrtss followed by mulss/addss to refine the estimate.

In D37686#866521, @spatel wrote:

There is a functional difference from this improvement (several FP intrinsics should now correctly propagate flags), but I'm not sure if it's visible without fixing something else in DAGCombiner to recognize the flags.

How does this work if an instruction maps to multiple nodes? For example, the FMA intrinsic can map to 2 nodes?

This propagation happens during SelectionDAGBuilder::visit, I scanned through various instructions and there is 1:1 mapping b/w instructions and initial SDNode created for it.
Like
for llvm.fma (Intrinsic Instruction ) -> SDNode(ISD::FMA )
for llvm.minnum (Insturction) -> SDNode(ISD::FMINNUM/FMINNAN)

etc. These initial SDNode are later lowered/expanded during following DAG phases.

In D37686#866584, @spatel wrote:
Here's a potential test case that would show a difference from having FMF on a sqrt intrinsic:
define float @fast_recip_sqrt(float %x) {
  %y = call fast float @llvm.sqrt.f32(float %x)
  %z = fdiv fast float 1.0,  %y
  ret float %z
}
declare float @llvm.sqrt.f32(float) nounwind readonly
...but as I said earlier, we need to fix the DAGCombiner code where this fold is implemented to recognize the flags on the individual nodes. Currently, it just checks the global state:
if (Options.UnsafeFPMath) {
On x86 currently, this will use the full-precision sqrtss+divss, but it should be using rsqrtss followed by mulss/addss to refine the estimate.

Ok, we also have another usage of Fast Maths flage in reviev D37616. Can you please file a bugzilla to track suggested potential improvment.

In D37686#866641, @jbhateja wrote:

In D37686#866521, @spatel wrote:

How does this work if an instruction maps to multiple nodes? For example, the FMA intrinsic can map to 2 nodes?

This propagation happens during SelectionDAGBuilder::visit, I scanned through various instructions and there is 1:1 mapping b/w instructions and initial SDNode created for it.

case Intrinsic::fmuladd: {
  EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
  if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&
      TLI.isFMAFasterThanFMulAndFAdd(VT)) {
    setValue(&I, DAG.getNode(ISD::FMA, sdl,
                             getValue(I.getArgOperand(0)).getValueType(),
                             getValue(I.getArgOperand(0)),
                             getValue(I.getArgOperand(1)),
                             getValue(I.getArgOperand(2))));
  } else {
    // TODO: Intrinsic calls should have fast-math-flags.
    SDValue Mul = DAG.getNode(ISD::FMUL, sdl,
                              getValue(I.getArgOperand(0)).getValueType(),
                              getValue(I.getArgOperand(0)),
                              getValue(I.getArgOperand(1)));
    SDValue Add = DAG.getNode(ISD::FADD, sdl,
                              getValue(I.getArgOperand(0)).getValueType(),
                              Mul,
                              getValue(I.getArgOperand(2)));
    setValue(&I, Add);
  }

Ok, we also have another usage of Fast Maths flage in reviev D37616. Can you please file a bugzilla to track suggested potential improvment.

https://bugs.llvm.org/show_bug.cgi?id=34558

Review comments resolution + flags propagation over operands.

Ping @reviewers

hfinkel added a subscriber: hfinkel.Sep 12 2017, 5:46 PM

hfinkel added inline comments.

include/llvm/CodeGen/SelectionDAGNodes.h
360	These flags need comments explaining what they are and how/when they're used.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
664	returns -> Returns

@reviewers, are there any more comments apart from last comments, this is just to save iteration, thanks for your time in reviews.

Review comments handling.

Harbormaster completed remote builds in B10180: Diff 115022.Sep 13 2017, 6:19 AM

hfinkel added inline comments.Sep 13 2017, 8:12 AM

include/llvm/CodeGen/SelectionDAGNodes.h
360	Add a comma after true.
364	Please explain here when these flags are set/unset and why.

Review comments resolution.

Harbormaster completed remote builds in B10188: Diff 115056.Sep 13 2017, 9:16 AM

I pointed out 2 more places where I think we can eliminate the existing transfer of flags. I think you should do a complete audit for those.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2765–2766	Delete?
2773–2774	Delete?
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
666	Same question as earlier (I don't think it was answered). Can we use the existing SelectionDAGBuilder::getValue() to get to the node's flags?

hfinkel added inline comments.Sep 13 2017, 3:57 PM

include/llvm/CodeGen/SelectionDAGNodes.h
367	Okay. Somewhere we need a description of the algorithm. Something like: When processing an instruction with X kind of flags, we set flag Y on something in order to ensure something. This maintains the invariant that whatever. Then, after doing whatever, we set /unset the flag Z.

Rebase from trunk.
More changes to cover review comments.
Test usage of fast-math flags over nodes at some places, it fixes PR34558.
More places where flags over node needs to be checked, to be done incrementally.

Herald added a subscriber: javed.absar. · View Herald TranscriptSep 16 2017, 6:24 AM

ping @ reviewers.

Ping @reviewers.

Please see the section under "Sometimes code reviews will take longer than you would hope for" regarding ping time:
https://llvm.org/docs/DeveloperPolicy.html#code-reviews

Also, this patch has grown in functionality since the last rev, but there are still no tests. If you want to demonstrate the effect of propagating the flags, pick just one DAG combine where that happens (ideally, the simplest case) and add tests to show the functional difference.

include/llvm/CodeGen/SelectionDAGNodes.h
933	I don't think this is going to work as you're hoping for. If possible, please split this and any related changes into a separate follow-up patch.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
666	Asking for the 3rd time: is this necessary?

Review comments resolutions.

Harbormaster completed remote builds in B10491: Diff 116159.Sep 21 2017, 3:07 AM

jbhateja added inline comments.Sep 21 2017, 3:10 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
666	SelectionDAGBuilder::getValue() creates a new Values and puts it into a NodeMap if it does not exist and SelectionDAGBuilder::getDAGNode() check NodeMap and returns a DAG node only if it exists.

I've added some more FMF tests at rL313893 which I think this patch will miscompile. Please rebase/update.

As I suggested before, this patch shouldn't try to enable multiple DAG combines with node-level FMF. It's not as straightforward as you might think.

Pick exactly one combine if you want to show that this patch is working as intended. The llvm.muladd intrinsic test that you have here with a target that supports 'fma' (not plain x86) seems like a good choice to me. If we have a strict op in IR, it should produce an fma instruction. If we have a fast op in IR, it should produce the simpler fmul instruction?

In D37686#877864, @spatel wrote:

I've added some more FMF tests at rL313893 which I think this patch will miscompile. Please rebase/update.

As I suggested before, this patch shouldn't try to enable multiple DAG combines with node-level FMF. It's not as straightforward as you might think.

Pick exactly one combine if you want to show that this patch is working as intended. The llvm.muladd intrinsic test that you have here with a target that supports 'fma' (not plain x86) seems like a good choice to me. If we have a strict op in IR, it should produce an fma instruction. If we have a fast op in IR, it should produce the simpler fmul instruction?

My understanding and code changes are based LLVM Ref Manual 's section about Fast-Math flags" (http://llvm.org/docs/LangRef.html#fast-math-flags)

Which say for FMF flag NaN "Allow optimizations to assume the arguments and result are not NaN".

Now in following case which has been added by you

%y = call float @llvm.sqrt.f32(float %x)
%z = fdiv fast float 1.0, %y
ret float %z

We dont have fast flag over intrinsic but DAGCombining for fdiv sees a fast flag and assume result (%z) and arguments (constant , %y) as not a Nan and goes ahead and generates a reciprocal sqrt. If you remove fast from fdiv and add it to intrinsic then FMF opt at fdiv will not kick in.

Can you please let me know what you expected here.

In D37686#877951, @jbhateja wrote:
In D37686#877864, @spatel wrote:

I've added some more FMF tests at rL313893 which I think this patch will miscompile. Please rebase/update.

As I suggested before, this patch shouldn't try to enable multiple DAG combines with node-level FMF. It's not as straightforward as you might think.

Pick exactly one combine if you want to show that this patch is working as intended. The llvm.muladd intrinsic test that you have here with a target that supports 'fma' (not plain x86) seems like a good choice to me. If we have a strict op in IR, it should produce an fma instruction. If we have a fast op in IR, it should produce the simpler fmul instruction?

My understanding and code changes are based LLVM Ref Manual 's section about Fast-Math flags" (http://llvm.org/docs/LangRef.html#fast-math-flags)

Which say for FMF flag NaN "Allow optimizations to assume the arguments and result are not NaN".

Now in following case which has been added by you
%y = call float @llvm.sqrt.f32(float %x)
%z = fdiv fast float 1.0, %y
ret float %z
We dont have fast flag over intrinsic but DAGCombining for fdiv sees a fast flag and assume result (%z) and arguments (constant , %y) as not a Nan and goes ahead and generates a reciprocal sqrt. If you remove fast from fdiv and add it to intrinsic then FMF opt at fdiv will not kick in.

Can you please let me know what you expected here.

I expect that the sqrt result is strict. Ie, it should use sqrtss if this is x86-64. We're not allowed to use rsqrtss and lose precision on that op.

That said, my memory of exactly how op-level FMF should work is fuzzy. If anyone else remembers or can link to threads where we've discussed this, please feel free to jump in. :)

In D37686#877960, @spatel wrote:
In D37686#877951, @jbhateja wrote:
In D37686#877864, @spatel wrote:

I've added some more FMF tests at rL313893 which I think this patch will miscompile. Please rebase/update.

As I suggested before, this patch shouldn't try to enable multiple DAG combines with node-level FMF. It's not as straightforward as you might think.

Pick exactly one combine if you want to show that this patch is working as intended. The llvm.muladd intrinsic test that you have here with a target that supports 'fma' (not plain x86) seems like a good choice to me. If we have a strict op in IR, it should produce an fma instruction. If we have a fast op in IR, it should produce the simpler fmul instruction?

My understanding and code changes are based LLVM Ref Manual 's section about Fast-Math flags" (http://llvm.org/docs/LangRef.html#fast-math-flags)

Which say for FMF flag NaN "Allow optimizations to assume the arguments and result are not NaN".

Now in following case which has been added by you
%y = call float @llvm.sqrt.f32(float %x)
%z = fdiv fast float 1.0, %y
ret float %z
We dont have fast flag over intrinsic but DAGCombining for fdiv sees a fast flag and assume result (%z) and arguments (constant , %y) as not a Nan and goes ahead and generates a reciprocal sqrt. If you remove fast from fdiv and add it to intrinsic then FMF opt at fdiv will not kick in.

Can you please let me know what you expected here.
I expect that the sqrt result is strict. Ie, it should use sqrtss if this is x86-64. We're not allowed to use rsqrtss and lose precision on that op.

That said, my memory of exactly how op-level FMF should work is fuzzy. If anyone else remembers or can link to threads where we've discussed this, please feel free to jump in. :)

Exactly, that is why I added a routine to get Unified flags which intersects flags of a node with flags of its operands in the earlier version of this patch , i think it will be right to inject that code with this patch [it was removed from current version of patch as per your comments]

In D37686#877960, @spatel wrote:
In D37686#877951, @jbhateja wrote:
In D37686#877864, @spatel wrote:

I've added some more FMF tests at rL313893 which I think this patch will miscompile. Please rebase/update.

As I suggested before, this patch shouldn't try to enable multiple DAG combines with node-level FMF. It's not as straightforward as you might think.

Pick exactly one combine if you want to show that this patch is working as intended. The llvm.muladd intrinsic test that you have here with a target that supports 'fma' (not plain x86) seems like a good choice to me. If we have a strict op in IR, it should produce an fma instruction. If we have a fast op in IR, it should produce the simpler fmul instruction?

My understanding and code changes are based LLVM Ref Manual 's section about Fast-Math flags" (http://llvm.org/docs/LangRef.html#fast-math-flags)

Which say for FMF flag NaN "Allow optimizations to assume the arguments and result are not NaN".

Now in following case which has been added by you
%y = call float @llvm.sqrt.f32(float %x)
%z = fdiv fast float 1.0, %y
ret float %z
We dont have fast flag over intrinsic but DAGCombining for fdiv sees a fast flag and assume result (%z) and arguments (constant , %y) as not a Nan and goes ahead and generates a reciprocal sqrt. If you remove fast from fdiv and add it to intrinsic then FMF opt at fdiv will not kick in.

Can you please let me know what you expected here.
I expect that the sqrt result is strict. Ie, it should use sqrtss if this is x86-64. We're not allowed to use rsqrtss and lose precision on that op.

I think there's a good argument either way here...

While fast does imply nnan, and nnan does propagate backward, and it also implies arcp, and arcp does not propagate backward. arcp applied only to the instruction to which it's attached. In this case, we're allowed to use a reciprocal approximation to the division, and not the sqrt. However, we could argue that using the rsqrt is like doing the sqrt exactly and then just approximating the division. If there are no other users of the sqrt itself, there seems to be little semantic difference.

That said, my memory of exactly how op-level FMF should work is fuzzy. If anyone else remembers or can link to threads where we've discussed this, please feel free to jump in. :)

In D37686#877972, @jbhateja wrote:
In D37686#877960, @spatel wrote:
In D37686#877951, @jbhateja wrote:
In D37686#877864, @spatel wrote:

I've added some more FMF tests at rL313893 which I think this patch will miscompile. Please rebase/update.

As I suggested before, this patch shouldn't try to enable multiple DAG combines with node-level FMF. It's not as straightforward as you might think.

Pick exactly one combine if you want to show that this patch is working as intended. The llvm.muladd intrinsic test that you have here with a target that supports 'fma' (not plain x86) seems like a good choice to me. If we have a strict op in IR, it should produce an fma instruction. If we have a fast op in IR, it should produce the simpler fmul instruction?

My understanding and code changes are based LLVM Ref Manual 's section about Fast-Math flags" (http://llvm.org/docs/LangRef.html#fast-math-flags)

Which say for FMF flag NaN "Allow optimizations to assume the arguments and result are not NaN".

Now in following case which has been added by you
%y = call float @llvm.sqrt.f32(float %x)
%z = fdiv fast float 1.0, %y
ret float %z
We dont have fast flag over intrinsic but DAGCombining for fdiv sees a fast flag and assume result (%z) and arguments (constant , %y) as not a Nan and goes ahead and generates a reciprocal sqrt. If you remove fast from fdiv and add it to intrinsic then FMF opt at fdiv will not kick in.

Can you please let me know what you expected here.
I expect that the sqrt result is strict. Ie, it should use sqrtss if this is x86-64. We're not allowed to use rsqrtss and lose precision on that op.

That said, my memory of exactly how op-level FMF should work is fuzzy. If anyone else remembers or can link to threads where we've discussed this, please feel free to jump in. :)
Exactly, that is why I added a routine to get Unified flags which intersects flags of a node with flags of its operands in the earlier version of this patch , i think it will be right to inject that code with this patch [it was removed from current version of patch as per your comments]

I think you're mixing up flag propagation as it applies to creation of nodes with flag combining when folding operations. These are 2 different things. This patch is about propagating from IR to nodes at creation time (at least that's what I think it should be limited to based on the title).

RKSimon resigned from this revision.Sep 21 2017, 12:53 PM

In D37686#877987, @hfinkel wrote:

While fast does imply nnan, and nnan does propagate backward, and it also implies arcp, and arcp does not propagate backward. arcp applied only to the instruction to which it's attached. In this case, we're allowed to use a reciprocal approximation to the division, and not the sqrt. However, we could argue that using the rsqrt is like doing the sqrt exactly and then just approximating the division. If there are no other users of the sqrt itself, there seems to be little semantic difference.

I thought we leaned to the more conservative (less propagating) model in IR. Not sure if that would be different here in the DAG or if rsqrt is a special case. Either way, it doesn't affect the logic at node creation time?

Here's a sqrt multi-use case to think about:

define float @multiuse_strict_sqrt(float %x, i1 %cmp) {
  %y = call float @llvm.sqrt.f32(float %x)
  %z = fdiv fast float 1.0,  %y
  br i1 %cmp, label %t, label %f

t:
  ret float %z
f:
  %add = fadd float %y, 1.0
  ret float %add
}

Should this be:

## BB#0:
	sqrtss	%xmm0, %xmm0 <--- shared sqrt op because sqrt is treated as strict
	testb	$1, %dil
	je	LBB0_2
## BB#1:                                ## %t
	movss	LCPI0_0(%rip), %xmm1  
	divss	%xmm0, %xmm1
	movaps	%xmm1, %xmm0
	retq
LBB0_2:                                 ## %f
	addss	LCPI0_0(%rip), %xmm0
	retq

Or:

## BB#0:
	testb	$1, %dil
	je	LBB0_2
## BB#1:                                ## %t
	vrsqrtss	%xmm0, %xmm0, %xmm1   <--- fast sqrt is assumed part of fast div
	vmulss	%xmm1, %xmm0, %xmm0
	vmulss	%xmm1, %xmm0, %xmm0
	vaddss	LCPI0_0(%rip), %xmm0, %xmm0
	vmulss	LCPI0_1(%rip), %xmm1, %xmm1
	vmulss	%xmm0, %xmm1, %xmm0
	retq
LBB0_2:                                 ## %f
	vsqrtss	%xmm0, %xmm0, %xmm0  <--- strict sqrt only applies on this path
	vaddss	LCPI0_2(%rip), %xmm0, %xmm0
	retq

In D37686#878093, @spatel wrote:

In D37686#877987, @hfinkel wrote:

While fast does imply nnan, and nnan does propagate backward, and it also implies arcp, and arcp does not propagate backward. arcp applied only to the instruction to which it's attached. In this case, we're allowed to use a reciprocal approximation to the division, and not the sqrt. However, we could argue that using the rsqrt is like doing the sqrt exactly and then just approximating the division. If there are no other users of the sqrt itself, there seems to be little semantic difference.

I thought we leaned to the more conservative (less propagating) model in IR. Not sure if that would be different here in the DAG or if rsqrt is a special case.

It's a special case in that it's our only combined reciprocal operation.

Either way, it doesn't affect the logic at node creation time?

I don't think that it needs to do so.

Here's a sqrt multi-use case to think about:

define float @multiuse_strict_sqrt(float %x, i1 %cmp) {
  %y = call float @llvm.sqrt.f32(float %x)
  %z = fdiv fast float 1.0,  %y
  br i1 %cmp, label %t, label %f

t:
  ret float %z
f:
  %add = fadd float %y, 1.0
  ret float %add
}

Should this be:

## BB#0:
	sqrtss	%xmm0, %xmm0 <--- shared sqrt op because sqrt is treated as strict
	testb	$1, %dil
	je	LBB0_2
## BB#1:                                ## %t
	movss	LCPI0_0(%rip), %xmm1  
	divss	%xmm0, %xmm1
	movaps	%xmm1, %xmm0
	retq
LBB0_2:                                 ## %f
	addss	LCPI0_0(%rip), %xmm0
	retq

This is one valid option. The divss could also be a rcpss (+mul) if we'd like.

Or:

## BB#0:
	testb	$1, %dil
	je	LBB0_2
## BB#1:                                ## %t
	vrsqrtss	%xmm0, %xmm0, %xmm1   <--- fast sqrt is assumed part of fast div
	vmulss	%xmm1, %xmm0, %xmm0
	vmulss	%xmm1, %xmm0, %xmm0
	vaddss	LCPI0_0(%rip), %xmm0, %xmm0
	vmulss	LCPI0_1(%rip), %xmm1, %xmm1
	vmulss	%xmm0, %xmm1, %xmm0
	retq
LBB0_2:                                 ## %f
	vsqrtss	%xmm0, %xmm0, %xmm0  <--- strict sqrt only applies on this path
	vaddss	LCPI0_2(%rip), %xmm0, %xmm0
	retq

This is another valid option. Either of these seem allowable.

Updating test case with more than one uses of sqrt / mul.

Harbormaster completed remote builds in B10526: Diff 116305.Sep 21 2017, 10:58 PM

@spatel , @reviewiews , can this land now into trunk ?

In D37686#879590, @jbhateja wrote:

@spatel , @reviewiews , can this land now into trunk ?

I haven't actually looked at the builder changes since you revised them, so I defer to @hfinkel is he's already approved that part.

I still don't understand why we should put the combiner changes into this patch. I'd like to see progress on using the flags too, but I think those should be separate patches with tests that cover all of the potential ambiguity that we've raised here.

In D37686#880234, @spatel wrote:

In D37686#879590, @jbhateja wrote:

@spatel , @reviewiews , can this land now into trunk ?

I haven't actually looked at the builder changes since you revised them, so I defer to @hfinkel is he's already approved that part.

I don't think that I approved anything yet. I can take a holistic look at the patch later today.

I still don't understand why we should put the combiner changes into this patch. I'd like to see progress on using the flags too, but I think those should be separate patches with tests that cover all of the potential ambiguity that we've raised here.

In D37686#880253, @hfinkel wrote:

In D37686#880234, @spatel wrote:

In D37686#879590, @jbhateja wrote:

@spatel , @reviewiews , can this land now into trunk ?

I haven't actually looked at the builder changes since you revised them, so I defer to @hfinkel is he's already approved that part.

I don't think that I approved anything yet. I can take a holistic look at the patch later today.

@hfinkel , Just a reminder for review clearence.

I still don't understand why we should put the combiner changes into this patch. I'd like to see progress on using the flags too, but I think those should be separate patches with tests that cover all of the potential ambiguity that we've raised here.

@reviewers, please let me know if there are any more comments on this patch.

spatel mentioned this in D39304: [IR] redefine 'reassoc' fast-math-flag and add 'trans' fast-math-flag.Nov 2 2017, 7:34 AM

hfinkel added inline comments.Dec 16 2017, 10:23 AM

include/llvm/CodeGen/SelectionDAGNodes.h
396	We need a comment here explaining what the Commit parameter does (and how/when it is used).
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
95	shared b/w -> shared by
123	I'm a bit worried about propagating the integer flags automatically this way. Maybe this is fine in practice, but if we're adding some kind of implicit contract here, we should clearly document it. An operation that is exact, or does not overflow in some way, could be implemented in terms of operations that do (and, then, having the flags on those intermediate nodes wouldn't be correct).

spatel mentioned this in D44245: Propagate flags to SDValue in SplitVecOp_VECREDUCE.Mar 8 2018, 6:12 AM

spatel mentioned this in D45710: Fast Math Flag mapping into SDNode.Apr 17 2018, 7:56 AM

spatel mentioned this in D46483: Propagate fast math flags via IR optimizations and code generation.May 6 2018, 9:05 AM

spatel mentioned this in D46563: intrinsic management for fast math sub flags.May 14 2018, 1:35 PM

spatel mentioned this in D46854: [DAG] propagate FMF for all FPMathOperators.May 14 2018, 3:48 PM

spatel mentioned this in rL332358: [DAG] propagate FMF for all FPMathOperators.May 15 2018, 7:20 AM

mcberg2017 mentioned this in D47749: guard fsqrt with fmf sub flags.Jun 5 2018, 6:51 PM

• post.kadirselcuk added a subscriber: • post.kadirselcuk.Jul 10 2021, 8:41 PM

Herald added a subscriber: pengfei. · View Herald TranscriptJul 10 2021, 8:41 PM

• post.kadirselcuk awarded a token.Jul 10 2021, 8:41 PM

• post.kadirselcuk added a subscriber: Restricted Project.Jul 10 2021, 8:43 PM

sanjoy resigned from this revision.Jan 29 2022, 5:33 PM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

SelectionDAGNodes.h

82 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

37 lines

SelectionDAGBuilder.h

4 lines

SelectionDAGBuilder.cpp

172 lines

TargetLowering.cpp

2 lines

Target/

AArch64/

AArch64ISelLowering.cpp

6 lines

test/

CodeGen/

X86/

fmf-flags.ll

36 lines

Diff 116305

include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	template<> struct simplify_type<SDUse> {
static SimpleType getSimplifiedValue(SDUse &Val) {		static SimpleType getSimplifiedValue(SDUse &Val) {
return Val.getNode();		return Val.getNode();
}		}
};		};

/// These are IR-level optimization flags that may be propagated to SDNodes.		/// These are IR-level optimization flags that may be propagated to SDNodes.
/// TODO: This data structure should be shared by the IR optimizer and the		/// TODO: This data structure should be shared by the IR optimizer and the
/// the backend.		/// the backend.
		/// Propagation of Flags from Instruction to SDNode is done by
		/// SDNodeFlagsAcquirer after the DAG node is created. Any flags which are set
		/// during the Build DAG are eventually merged with flags which are present over
		/// Instruction (IR).
struct SDNodeFlags {		struct SDNodeFlags {
private:		private:
// This bit is used to determine if the flags are in a defined state.		// This bit is used to determine if the flags are in a defined state.
// Flag bits can only be masked out during intersection if the masking flags		// Flag bits can only be masked out during intersection if the masking flags
// are defined.		// are defined.
bool AnyDefined : 1;		bool AnyDefined : 1;

		hfinkelUnsubmitted Not Done Reply Inline Actions These flags need comments explaining what they are and how/when they're used. hfinkel: These flags need comments explaining what they are and how/when they're used.
		hfinkelUnsubmitted Not Done Reply Inline Actions Add a comma after true. hfinkel: Add a comma after true.
		// Following two bit are used for Flags propagation from
		// a DAG node to its operands. When Propagate bit is set then
		// Flags from DAG node are propagated to only those operands which
		// have their Acquire bit set.
		hfinkelUnsubmitted Not Done Reply Inline Actions Please explain here when these flags are set/unset and why. hfinkel: Please explain here when these flags are set/unset and why.
		// These bits are set by invocation of
		// SDNodeFlagsAcquirer::PropagateFlagsToOperands and reset once the
		// propagation is through.
		hfinkelUnsubmitted Not Done Reply Inline Actions Okay. Somewhere we need a description of the algorithm. Something like: When processing an instruction with X kind of flags, we set flag Y on something in order to ensure something. This maintains the invariant that whatever. Then, after doing whatever, we set /unset the flag Z. hfinkel: Okay. Somewhere we need a description of the algorithm. Something like: When processing an…
		bool PropagateFlagsToOperands : 1;
		bool AcquireFlagsFromUser : 1;

bool NoUnsignedWrap : 1;		bool NoUnsignedWrap : 1;
bool NoSignedWrap : 1;		bool NoSignedWrap : 1;
bool Exact : 1;		bool Exact : 1;
bool UnsafeAlgebra : 1;		bool UnsafeAlgebra : 1;
bool NoNaNs : 1;		bool NoNaNs : 1;
bool NoInfs : 1;		bool NoInfs : 1;
bool NoSignedZeros : 1;		bool NoSignedZeros : 1;
bool AllowReciprocal : 1;		bool AllowReciprocal : 1;
bool VectorReduction : 1;		bool VectorReduction : 1;
bool AllowContract : 1;		bool AllowContract : 1;

public:		public:
/// Default constructor turns off all optimization flags.		/// Default constructor turns off all optimization flags.
SDNodeFlags()		SDNodeFlags()
: AnyDefined(false), NoUnsignedWrap(false), NoSignedWrap(false),		: AnyDefined(false), PropagateFlagsToOperands(false),
		AcquireFlagsFromUser(false), NoUnsignedWrap(false), NoSignedWrap(false),
Exact(false), UnsafeAlgebra(false), NoNaNs(false), NoInfs(false),		Exact(false), UnsafeAlgebra(false), NoNaNs(false), NoInfs(false),
NoSignedZeros(false), AllowReciprocal(false), VectorReduction(false),		NoSignedZeros(false), AllowReciprocal(false), VectorReduction(false),
AllowContract(false) {}		AllowContract(false) {}

/// Sets the state of the flags to the defined state.		/// Sets the state of the flags to the defined state.
void setDefined() { AnyDefined = true; }		void setDefined(bool Val) { AnyDefined = Val; }
/// Returns true if the flags are in a defined state.		/// Returns true if the flags are in a defined state.
bool isDefined() const { return AnyDefined; }		bool isDefined() const { return AnyDefined; }

// These are mutators for each flag.		// These are mutators for each flag.
		hfinkelUnsubmitted Not Done Reply Inline Actions We need a comment here explaining what the Commit parameter does (and how/when it is used). hfinkel: We need a comment here explaining what the Commit parameter does (and how/when it is used).
void setNoUnsignedWrap(bool b) {		void setNoUnsignedWrap(bool b, bool Commit = true) {
setDefined();		setDefined(Commit);
NoUnsignedWrap = b;		NoUnsignedWrap = b;
}		}
void setNoSignedWrap(bool b) {		void setNoSignedWrap(bool b, bool Commit = true) {
setDefined();		setDefined(Commit);
NoSignedWrap = b;		NoSignedWrap = b;
}		}
void setExact(bool b) {		void setExact(bool b, bool Commit = true) {
setDefined();		setDefined(Commit);
Exact = b;		Exact = b;
}		}
void setUnsafeAlgebra(bool b) {		void setUnsafeAlgebra(bool b, bool Commit = true) {
setDefined();		setDefined(Commit);
UnsafeAlgebra = b;		UnsafeAlgebra = b;
}		}
void setNoNaNs(bool b) {		void setNoNaNs(bool b, bool Commit = true) {
setDefined();		setDefined(Commit);
NoNaNs = b;		NoNaNs = b;
}		}
void setNoInfs(bool b) {		void setNoInfs(bool b, bool Commit = true) {
setDefined();		setDefined(Commit);
NoInfs = b;		NoInfs = b;
}		}
void setNoSignedZeros(bool b) {		void setNoSignedZeros(bool b, bool Commit = true) {
setDefined();		setDefined(Commit);
NoSignedZeros = b;		NoSignedZeros = b;
}		}
void setAllowReciprocal(bool b) {		void setAllowReciprocal(bool b, bool Commit = true) {
setDefined();		setDefined(Commit);
AllowReciprocal = b;		AllowReciprocal = b;
}		}
void setVectorReduction(bool b) {		void setVectorReduction(bool b, bool Commit = true) {
setDefined();		setDefined(Commit);
VectorReduction = b;		VectorReduction = b;
}		}
void setAllowContract(bool b) {		void setAllowContract(bool b, bool Commit = true) {
setDefined();		setDefined(Commit);
AllowContract = b;		AllowContract = b;
}		}
		void setAcquireFlagsFromUser(bool b) { AcquireFlagsFromUser = b; }
		void setPropagateFlagsToOperands(bool b) { PropagateFlagsToOperands = b; }

// These are accessors for each flag.		// These are accessors for each flag.
bool hasNoUnsignedWrap() const { return NoUnsignedWrap; }		bool hasNoUnsignedWrap() const { return NoUnsignedWrap; }
bool hasNoSignedWrap() const { return NoSignedWrap; }		bool hasNoSignedWrap() const { return NoSignedWrap; }
bool hasExact() const { return Exact; }		bool hasExact() const { return Exact; }
bool hasUnsafeAlgebra() const { return UnsafeAlgebra; }		bool hasUnsafeAlgebra() const { return UnsafeAlgebra; }
bool hasNoNaNs() const { return NoNaNs; }		bool hasNoNaNs() const { return NoNaNs; }
bool hasNoInfs() const { return NoInfs; }		bool hasNoInfs() const { return NoInfs; }
bool hasNoSignedZeros() const { return NoSignedZeros; }		bool hasNoSignedZeros() const { return NoSignedZeros; }
bool hasAllowReciprocal() const { return AllowReciprocal; }		bool hasAllowReciprocal() const { return AllowReciprocal; }
bool hasVectorReduction() const { return VectorReduction; }		bool hasVectorReduction() const { return VectorReduction; }
bool hasAllowContract() const { return AllowContract; }		bool hasAllowContract() const { return AllowContract; }

		bool hasPropagateFlagsToOperands() const { return PropagateFlagsToOperands; }
		bool hasAcquireFlagsFromUser() const { return AcquireFlagsFromUser; }

/// Clear any flags in this flag set that aren't also set in Flags.		/// Clear any flags in this flag set that aren't also set in Flags.
/// If the given Flags are undefined then don't do anything.		/// If the given Flags are undefined then don't do anything.
void intersectWith(const SDNodeFlags Flags) {		void intersectWith(const SDNodeFlags Flags) {
if (!Flags.isDefined())		if (!Flags.isDefined())
return;		return;
NoUnsignedWrap &= Flags.NoUnsignedWrap;		NoUnsignedWrap &= Flags.NoUnsignedWrap;
NoSignedWrap &= Flags.NoSignedWrap;		NoSignedWrap &= Flags.NoSignedWrap;
Exact &= Flags.Exact;		Exact &= Flags.Exact;
UnsafeAlgebra &= Flags.UnsafeAlgebra;		UnsafeAlgebra &= Flags.UnsafeAlgebra;
NoNaNs &= Flags.NoNaNs;		NoNaNs &= Flags.NoNaNs;
NoInfs &= Flags.NoInfs;		NoInfs &= Flags.NoInfs;
NoSignedZeros &= Flags.NoSignedZeros;		NoSignedZeros &= Flags.NoSignedZeros;
AllowReciprocal &= Flags.AllowReciprocal;		AllowReciprocal &= Flags.AllowReciprocal;
VectorReduction &= Flags.VectorReduction;		VectorReduction &= Flags.VectorReduction;
AllowContract &= Flags.AllowContract;		AllowContract &= Flags.AllowContract;
		AnyDefined = true;
}		}

		void mergeWith(const SDNodeFlags Flags) {
		if (!Flags.isDefined())
		return;
		NoUnsignedWrap \|= Flags.NoUnsignedWrap;
		NoSignedWrap \|= Flags.NoSignedWrap;
		Exact \|= Flags.Exact;
		UnsafeAlgebra \|= Flags.UnsafeAlgebra;
		NoNaNs \|= Flags.NoNaNs;
		NoInfs \|= Flags.NoInfs;
		NoSignedZeros \|= Flags.NoSignedZeros;
		AllowReciprocal \|= Flags.AllowReciprocal;
		VectorReduction \|= Flags.VectorReduction;
		AllowContract \|= Flags.AllowContract;
		AnyDefined = true;
		}

};		};

/// Represents one node in the SelectionDAG.		/// Represents one node in the SelectionDAG.
///		///
class SDNode : public FoldingSetNode, public ilist_node<SDNode> {		class SDNode : public FoldingSetNode, public ilist_node<SDNode> {
private:		private:
/// The operation that this node performs.		/// The operation that this node performs.
int16_t NodeType;		int16_t NodeType;
▲ Show 20 Lines • Show All 428 Lines • ▼ Show 20 Lines	public:

/// Clear any flags in this node that aren't also set in Flags.		/// Clear any flags in this node that aren't also set in Flags.
/// If Flags is not in a defined state then this has no effect.		/// If Flags is not in a defined state then this has no effect.
void intersectFlagsWith(const SDNodeFlags Flags);		void intersectFlagsWith(const SDNodeFlags Flags);

/// Return the number of values defined/returned by this operator.		/// Return the number of values defined/returned by this operator.
unsigned getNumValues() const { return NumValues; }		unsigned getNumValues() const { return NumValues; }

/// Return the type of a specified result.		/// Return the type of a specified result.
		spatelUnsubmitted Not Done Reply Inline Actions I don't think this is going to work as you're hoping for. If possible, please split this and any related changes into a separate follow-up patch. spatel: I don't think this is going to work as you're hoping for. If possible, please split this and…
EVT getValueType(unsigned ResNo) const {		EVT getValueType(unsigned ResNo) const {
assert(ResNo < NumValues && "Illegal result number!");		assert(ResNo < NumValues && "Illegal result number!");
return ValueList[ResNo];		return ValueList[ResNo];
}		}

/// Return the type of a specified result as a simple type.		/// Return the type of a specified result as a simple type.
MVT getSimpleValueType(unsigned ResNo) const {		MVT getSimpleValueType(unsigned ResNo) const {
return getValueType(ResNo).getSimpleVT();		return getValueType(ResNo).getSimpleVT();
▲ Show 20 Lines • Show All 1,431 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,139 Lines • ▼ Show 20 Lines	if (DAG.getDataLayout().isBigEndian()) {
unsigned EVTStoreBits = ExtVT.getStoreSizeInBits();		unsigned EVTStoreBits = ExtVT.getStoreSizeInBits();
ShAmt = LVTStoreBits - EVTStoreBits - ShAmt;		ShAmt = LVTStoreBits - EVTStoreBits - ShAmt;
}		}

uint64_t PtrOff = ShAmt / 8;		uint64_t PtrOff = ShAmt / 8;
unsigned NewAlign = MinAlign(LN0->getAlignment(), PtrOff);		unsigned NewAlign = MinAlign(LN0->getAlignment(), PtrOff);
SDLoc DL(LN0);		SDLoc DL(LN0);
// The original load itself didn't wrap, so an offset within it doesn't.		// The original load itself didn't wrap, so an offset within it doesn't.
SDNodeFlags Flags;		SDNodeFlags Flags = LN0->getFlags();
Flags.setNoUnsignedWrap(true);
SDValue NewPtr = DAG.getNode(ISD::ADD, DL,		SDValue NewPtr = DAG.getNode(ISD::ADD, DL,
PtrType, LN0->getBasePtr(),		PtrType, LN0->getBasePtr(),
DAG.getConstant(PtrOff, DL, PtrType),		DAG.getConstant(PtrOff, DL, PtrType),
Flags);		Flags);
AddToWorklist(NewPtr.getNode());		AddToWorklist(NewPtr.getNode());

SDValue Load;		SDValue Load;
if (ExtType == ISD::NON_EXTLOAD)		if (ExtType == ISD::NON_EXTLOAD)
▲ Show 20 Lines • Show All 1,403 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFMULForFMADistributiveCombine(SDNode *N) {
// The transforms below are incorrect when x == 0 and y == inf, because the		// The transforms below are incorrect when x == 0 and y == inf, because the
// intermediate multiplication produces a nan.		// intermediate multiplication produces a nan.
if (!Options.NoInfsFPMath)		if (!Options.NoInfsFPMath)
return SDValue();		return SDValue();

// Floating-point multiply-add without intermediate rounding.		// Floating-point multiply-add without intermediate rounding.
bool HasFMA =		bool HasFMA =
(Options.AllowFPOpFusion == FPOpFusion::Fast \|\| Options.UnsafeFPMath) &&		(Options.AllowFPOpFusion == FPOpFusion::Fast \|\| Options.UnsafeFPMath) &&
TLI.isFMAFasterThanFMulAndFAdd(VT) &&		TLI.isFMAFasterThanFMulAndFAdd(VT) &&
(!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::FMA, VT));		(!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::FMA, VT));

// Floating-point multiply-add with intermediate rounding. This can result		// Floating-point multiply-add with intermediate rounding. This can result
// in a less precise result due to the changed rounding order.		// in a less precise result due to the changed rounding order.
bool HasFMAD = Options.UnsafeFPMath &&		bool HasFMAD = Options.UnsafeFPMath &&
(LegalOperations && TLI.isOperationLegal(ISD::FMAD, VT));		(LegalOperations && TLI.isOperationLegal(ISD::FMAD, VT));

// No valid opcode, do not combine.		// No valid opcode, do not combine.
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFADD(SDNode *N) {
if (Options.NoSignedZerosFPMath \|\| N->getFlags().hasNoSignedZeros()) {		if (Options.NoSignedZerosFPMath \|\| N->getFlags().hasNoSignedZeros()) {
// fold (fadd A, 0) -> A		// fold (fadd A, 0) -> A
if (ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1))		if (ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1))
if (N1C->isZero())		if (N1C->isZero())
return N0;		return N0;
}		}

// If 'unsafe math' is enabled, fold lots of things.		// If 'unsafe math' is enabled, fold lots of things.
if (Options.UnsafeFPMath) {		if (Options.UnsafeFPMath \|\| Flags.hasUnsafeAlgebra()) {
// No FP constant should be created after legalization as Instruction		// No FP constant should be created after legalization as Instruction
// Selection pass has a hard time dealing with FP constants.		// Selection pass has a hard time dealing with FP constants.
bool AllowNewConst = (Level < AfterLegalizeDAG);		bool AllowNewConst = (Level < AfterLegalizeDAG);

// fold (fadd (fadd x, c1), c2) -> (fadd x, (fadd c1, c2))		// fold (fadd (fadd x, c1), c2) -> (fadd x, (fadd c1, c2))
if (N1CFP && N0.getOpcode() == ISD::FADD && N0.getNode()->hasOneUse() &&		if (N1CFP && N0.getOpcode() == ISD::FADD && N0.getNode()->hasOneUse() &&
isConstantFPBuildVectorOrConstantFP(N0.getOperand(1)))		isConstantFPBuildVectorOrConstantFP(N0.getOperand(1)))
return DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(0),		return DAG.getNode(ISD::FADD, DL, VT, N0.getOperand(0),
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	if (N0CFP && N0CFP->isZero()) {
if (isNegatibleForFree(N1, LegalOperations, TLI, &Options))		if (isNegatibleForFree(N1, LegalOperations, TLI, &Options))
return GetNegatedExpression(N1, DAG, LegalOperations);		return GetNegatedExpression(N1, DAG, LegalOperations);
if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))		if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))
return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);		return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);
}		}
}		}

// If 'unsafe math' is enabled, fold lots of things.		// If 'unsafe math' is enabled, fold lots of things.
if (Options.UnsafeFPMath) {		if (Options.UnsafeFPMath \|\| Flags.hasUnsafeAlgebra()) {
// (fsub A, 0) -> A		// (fsub A, 0) -> A
if (N1CFP && N1CFP->isZero())		if (N1CFP && N1CFP->isZero())
return N0;		return N0;

// (fsub x, x) -> 0.0		// (fsub x, x) -> 0.0
if (N0 == N1)		if (N0 == N1)
return DAG.getConstantFP(0.0f, DL, VT);		return DAG.getConstantFP(0.0f, DL, VT);

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFMUL(SDNode *N) {

// fold (fmul A, 1.0) -> A		// fold (fmul A, 1.0) -> A
if (N1CFP && N1CFP->isExactlyValue(1.0))		if (N1CFP && N1CFP->isExactlyValue(1.0))
return N0;		return N0;

if (SDValue NewSel = foldBinOpIntoSelect(N))		if (SDValue NewSel = foldBinOpIntoSelect(N))
return NewSel;		return NewSel;

if (Options.UnsafeFPMath) {		if (Options.UnsafeFPMath \|\| Flags.hasUnsafeAlgebra()) {
// fold (fmul A, 0) -> 0		// fold (fmul A, 0) -> 0
if (N1CFP && N1CFP->isZero())		if (N1CFP && N1CFP->isZero())
return N1;		return N1;

// fold (fmul (fmul x, c1), c2) -> (fmul x, (fmul c1, c2))		// fold (fmul (fmul x, c1), c2) -> (fmul x, (fmul c1, c2))
if (N0.getOpcode() == ISD::FMUL) {		if (N0.getOpcode() == ISD::FMUL) {
// Fold scalars or any vector constants (not just splats).		// Fold scalars or any vector constants (not just splats).
// This fold is done in general by InstCombine, but extra fmul insts		// This fold is done in general by InstCombine, but extra fmul insts
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFMA(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
SDValue N2 = N->getOperand(2);		SDValue N2 = N->getOperand(2);
ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);		ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);
ConstantFPSDNode *N1CFP = dyn_cast<ConstantFPSDNode>(N1);		ConstantFPSDNode *N1CFP = dyn_cast<ConstantFPSDNode>(N1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc DL(N);		SDLoc DL(N);
const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;

// Constant fold FMA.		// Constant fold FMA.
if (isa<ConstantFPSDNode>(N0) &&		if (isa<ConstantFPSDNode>(N0) &&
isa<ConstantFPSDNode>(N1) &&		isa<ConstantFPSDNode>(N1) &&
isa<ConstantFPSDNode>(N2)) {		isa<ConstantFPSDNode>(N2)) {
return DAG.getNode(ISD::FMA, DL, VT, N0, N1, N2);		return DAG.getNode(ISD::FMA, DL, VT, N0, N1, N2);
}		}

if (Options.UnsafeFPMath) {		SDNodeFlags Flags = N->getFlags();
		bool UnsafeFPMath = Options.UnsafeFPMath \|\| Flags.hasUnsafeAlgebra();
		if (UnsafeFPMath) {
if (N0CFP && N0CFP->isZero())		if (N0CFP && N0CFP->isZero())
return N2;		return N2;
if (N1CFP && N1CFP->isZero())		if (N1CFP && N1CFP->isZero())
return N2;		return N2;
}		}
// TODO: The FMA node should have flags that propagate to these nodes.		// TODO: The FMA node should have flags that propagate to these nodes.
if (N0CFP && N0CFP->isExactlyValue(1.0))		if (N0CFP && N0CFP->isExactlyValue(1.0))
return DAG.getNode(ISD::FADD, SDLoc(N), VT, N1, N2);		return DAG.getNode(ISD::FADD, SDLoc(N), VT, N1, N2);
if (N1CFP && N1CFP->isExactlyValue(1.0))		if (N1CFP && N1CFP->isExactlyValue(1.0))
return DAG.getNode(ISD::FADD, SDLoc(N), VT, N0, N2);		return DAG.getNode(ISD::FADD, SDLoc(N), VT, N0, N2);

// Canonicalize (fma c, x, y) -> (fma x, c, y)		// Canonicalize (fma c, x, y) -> (fma x, c, y)
if (isConstantFPBuildVectorOrConstantFP(N0) &&		if (isConstantFPBuildVectorOrConstantFP(N0) &&
!isConstantFPBuildVectorOrConstantFP(N1))		!isConstantFPBuildVectorOrConstantFP(N1))
return DAG.getNode(ISD::FMA, SDLoc(N), VT, N1, N0, N2);		return DAG.getNode(ISD::FMA, SDLoc(N), VT, N1, N0, N2);

// TODO: FMA nodes should have flags that propagate to the created nodes.		if (UnsafeFPMath) {
// For now, create a Flags object for use with all unsafe math transforms.
SDNodeFlags Flags;
Flags.setUnsafeAlgebra(true);

if (Options.UnsafeFPMath) {
// (fma x, c1, (fmul x, c2)) -> (fmul x, c1+c2)		// (fma x, c1, (fmul x, c2)) -> (fmul x, c1+c2)
if (N2.getOpcode() == ISD::FMUL && N0 == N2.getOperand(0) &&		if (N2.getOpcode() == ISD::FMUL && N0 == N2.getOperand(0) &&
isConstantFPBuildVectorOrConstantFP(N1) &&		isConstantFPBuildVectorOrConstantFP(N1) &&
isConstantFPBuildVectorOrConstantFP(N2.getOperand(1))) {		isConstantFPBuildVectorOrConstantFP(N2.getOperand(1))) {
return DAG.getNode(ISD::FMUL, DL, VT, N0,		return DAG.getNode(ISD::FMUL, DL, VT, N0,
DAG.getNode(ISD::FADD, DL, VT, N1, N2.getOperand(1),		DAG.getNode(ISD::FADD, DL, VT, N1, N2.getOperand(1),
Flags), Flags);		Flags), Flags);
}		}
Show All 21 Lines	if (N1CFP->isExactlyValue(-1.0) &&
(!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))) {		(!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))) {
SDValue RHSNeg = DAG.getNode(ISD::FNEG, DL, VT, N0);		SDValue RHSNeg = DAG.getNode(ISD::FNEG, DL, VT, N0);
AddToWorklist(RHSNeg.getNode());		AddToWorklist(RHSNeg.getNode());
// TODO: The FMA node should have flags that propagate to this node.		// TODO: The FMA node should have flags that propagate to this node.
return DAG.getNode(ISD::FADD, DL, VT, N2, RHSNeg);		return DAG.getNode(ISD::FADD, DL, VT, N2, RHSNeg);
}		}
}		}

if (Options.UnsafeFPMath) {		if (UnsafeFPMath) {
// (fma x, c, x) -> (fmul x, (c+1))		// (fma x, c, x) -> (fmul x, (c+1))
if (N1CFP && N0 == N2) {		if (N1CFP && N0 == N2) {
return DAG.getNode(ISD::FMUL, DL, VT, N0,		return DAG.getNode(ISD::FMUL, DL, VT, N0,
DAG.getNode(ISD::FADD, DL, VT, N1,		DAG.getNode(ISD::FADD, DL, VT, N1,
DAG.getConstantFP(1.0, DL, VT), Flags),		DAG.getConstantFP(1.0, DL, VT), Flags),
Flags);		Flags);
}		}

▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFDIV(SDNode *N) {

// fold (fdiv c1, c2) -> c1/c2		// fold (fdiv c1, c2) -> c1/c2
if (N0CFP && N1CFP)		if (N0CFP && N1CFP)
return DAG.getNode(ISD::FDIV, SDLoc(N), VT, N0, N1, Flags);		return DAG.getNode(ISD::FDIV, SDLoc(N), VT, N0, N1, Flags);

if (SDValue NewSel = foldBinOpIntoSelect(N))		if (SDValue NewSel = foldBinOpIntoSelect(N))
return NewSel;		return NewSel;

if (Options.UnsafeFPMath) {		bool UnsafeFPMath = Options.UnsafeFPMath \|\| Flags.hasUnsafeAlgebra();
		if (UnsafeFPMath) {
// fold (fdiv X, c2) -> fmul X, 1/c2 if losing precision is acceptable.		// fold (fdiv X, c2) -> fmul X, 1/c2 if losing precision is acceptable.
if (N1CFP) {		if (N1CFP) {
// Compute the reciprocal 1.0 / c2.		// Compute the reciprocal 1.0 / c2.
const APFloat &N1APF = N1CFP->getValueAPF();		const APFloat &N1APF = N1CFP->getValueAPF();
APFloat Recip(N1APF.getSemantics(), 1); // 1.0		APFloat Recip(N1APF.getSemantics(), 1); // 1.0
APFloat::opStatus st = Recip.divide(N1APF, APFloat::rmNearestTiesToEven);		APFloat::opStatus st = Recip.divide(N1APF, APFloat::rmNearestTiesToEven);
// Only do the transform if the reciprocal is a legal fp immediate that		// Only do the transform if the reciprocal is a legal fp immediate that
// isn't too nasty (eg NaN, denormal, ...).		// isn't too nasty (eg NaN, denormal, ...).
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
SDValue DAGCombiner::visitFSQRT(SDNode *N) {		SDValue DAGCombiner::visitFSQRT(SDNode *N) {
if (!DAG.getTarget().Options.UnsafeFPMath)		if (!DAG.getTarget().Options.UnsafeFPMath)
return SDValue();		return SDValue();

SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
if (TLI.isFsqrtCheap(N0, DAG))		if (TLI.isFsqrtCheap(N0, DAG))
return SDValue();		return SDValue();

// TODO: FSQRT nodes should have flags that propagate to the created nodes.		return buildSqrtEstimate(N0, N->getFlags());
// For now, create a Flags object for use with all unsafe math transforms.
SDNodeFlags Flags;
Flags.setUnsafeAlgebra(true);
return buildSqrtEstimate(N0, Flags);
}		}

/// copysign(x, fp_extend(y)) -> copysign(x, y)		/// copysign(x, fp_extend(y)) -> copysign(x, y)
/// copysign(x, fp_round(y)) -> copysign(x, y)		/// copysign(x, fp_round(y)) -> copysign(x, y)
static inline bool CanCombineFCOPYSIGN_EXTEND_ROUND(SDNode *N) {		static inline bool CanCombineFCOPYSIGN_EXTEND_ROUND(SDNode *N) {
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
if ((N1.getOpcode() == ISD::FP_EXTEND \|\|		if ((N1.getOpcode() == ISD::FP_EXTEND \|\|
N1.getOpcode() == ISD::FP_ROUND)) {		N1.getOpcode() == ISD::FP_ROUND)) {
▲ Show 20 Lines • Show All 4,043 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != NumElems; ++i) {

unsigned ExtIndex = N->getOperand(i).getConstantOperandVal(1);		unsigned ExtIndex = N->getOperand(i).getConstantOperandVal(1);
if (VectorMask[i] == (int)LeftIdx) {		if (VectorMask[i] == (int)LeftIdx) {
Mask[i] = ExtIndex;		Mask[i] = ExtIndex;
} else if (VectorMask[i] == (int)LeftIdx + 1) {		} else if (VectorMask[i] == (int)LeftIdx + 1) {
Mask[i] = Vec2Offset + ExtIndex;		Mask[i] = Vec2Offset + ExtIndex;
}		}
}		}

// The type the input vectors may have changed above.		// The type the input vectors may have changed above.
InVT1 = VecIn1.getValueType();		InVT1 = VecIn1.getValueType();

// If we already have a VecIn2, it should have the same type as VecIn1.		// If we already have a VecIn2, it should have the same type as VecIn1.
// If we don't, get an undef/zero vector of the appropriate type.		// If we don't, get an undef/zero vector of the appropriate type.
VecIn2 = VecIn2.getNode() ? VecIn2 : DAG.getUNDEF(InVT1);		VecIn2 = VecIn2.getNode() ? VecIn2 : DAG.getUNDEF(InVT1);
assert(InVT1 == VecIn2.getValueType() && "Unexpected second input type.");		assert(InVT1 == VecIn2.getValueType() && "Unexpected second input type.");

▲ Show 20 Lines • Show All 3,093 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Show First 20 Lines • Show All 655 Lines • ▼ Show 20 Lines	public:
SDValue getCopyFromRegs(const Value V, Type Ty);		SDValue getCopyFromRegs(const Value V, Type Ty);

// resolveDanglingDebugInfo - if we saw an earlier dbg_value referring to V,		// resolveDanglingDebugInfo - if we saw an earlier dbg_value referring to V,
// generate the debug data structures now that we've seen its definition.		// generate the debug data structures now that we've seen its definition.
void resolveDanglingDebugInfo(const Value *V, SDValue Val);		void resolveDanglingDebugInfo(const Value *V, SDValue Val);
SDValue getValue(const Value *V);		SDValue getValue(const Value *V);
bool findValue(const Value *V) const;		bool findValue(const Value *V) const;

		// Returns DAG node of SDValue present in NodeMap for
		hfinkelUnsubmitted Not Done Reply Inline Actions returns -> Returns hfinkel: returns -> Returns
		// a given Value.
		SDNode getDAGNode(const Value );
		spatelUnsubmitted Not Done Reply Inline Actions Formatting/spacing is non-standard here and below. Run clang-format? spatel: Formatting/spacing is non-standard here and below. Run clang-format?
		RKSimonUnsubmitted Not Done Reply Inline Actions is this needed in this patch? RKSimon: is this needed in this patch?
		spatelUnsubmitted Not Done Reply Inline Actions Same question as earlier (I don't think it was answered). Can we use the existing SelectionDAGBuilder::getValue() to get to the node's flags? spatel: Same question as earlier (I don't think it was answered). Can we use the existing…
		spatelUnsubmitted Not Done Reply Inline Actions Asking for the 3rd time: is this necessary? spatel: Asking for the 3rd time: is this necessary?
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions SelectionDAGBuilder::getValue() creates a new Values and puts it into a NodeMap if it does not exist and SelectionDAGBuilder::getDAGNode() check NodeMap and returns a DAG node only if it exists. jbhateja: SelectionDAGBuilder::getValue() creates a new Values and puts it into a NodeMap if it does not…

SDValue getNonRegisterValue(const Value *V);		SDValue getNonRegisterValue(const Value *V);
SDValue getValueImpl(const Value *V);		SDValue getValueImpl(const Value *V);

void setValue(const Value *V, SDValue NewN) {		void setValue(const Value *V, SDValue NewN) {
SDValue &N = NodeMap[V];		SDValue &N = NodeMap[V];
assert(!N.getNode() && "Already set a value for this node!");		assert(!N.getNode() && "Already set a value for this node!");
N = NewN;		N = NewN;
}		}
▲ Show 20 Lines • Show All 360 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
static unsigned LimitFloatPrecision;		static unsigned LimitFloatPrecision;

static cl::opt<unsigned, true>		static cl::opt<unsigned, true>
LimitFPPrecision("limit-float-precision",		LimitFPPrecision("limit-float-precision",
cl::desc("Generate low-precision inline sequences "		cl::desc("Generate low-precision inline sequences "
"for some float libcalls"),		"for some float libcalls"),
cl::location(LimitFloatPrecision),		cl::location(LimitFloatPrecision),
cl::init(0));		cl::init(0));

		static bool isVectorReductionOp(const User *I);

		/// This class is used for propagating Flags from Instruction to SDNode.
		/// These flags are later used by accessing SDNode during different
		/// DAG phases.
		/// Propagation is done once the DAG node is created. Any flag which is
		/// applied during Build DAG phase is eventually merged with the flags
		/// over Instruction. Since a DAG node could be shared b/w multiple Instructions
		spatelUnsubmitted Not Done Reply Inline Actions Formatting/spacing is non-standard here and below. Run clang-format? spatel: Formatting/spacing is non-standard here and below. Run clang-format?
		hfinkelUnsubmitted Not Done Reply Inline Actions shared b/w -> shared by hfinkel: shared b/w -> shared by
		/// thus flags held by node are intersection of flags contributed by
		/// each instruction.
		class SDNodeFlagsAcquirer {
		public:
		SDNodeFlagsAcquirer(const Instruction I, SelectionDAGBuilder SDB)
		spatelUnsubmitted Not Done Reply Inline Actions shorten: if (SDNode Node = SelDB->getDAGNode(Instr)) { spatel:* shorten: if (SDNode *Node = SelDB->getDAGNode(Instr)) {
		: Instr(I), SelDB(SDB) {}

		~SDNodeFlagsAcquirer() {
		SDNode *Node = SelDB->getDAGNode(Instr);
		if (Node) {
		SDNodeFlags InstrFlags;
		SDNodeFlags Flags = Node->getFlags();
		bool PropFlagsToOperands = Flags.hasPropagateFlagsToOperands();

		if (isa<FPMathOperator>(*Instr)) {
		InstrFlags.setNoNaNs(Instr->hasNoNaNs());
		InstrFlags.setNoInfs(Instr->hasNoInfs());
		RKSimonUnsubmitted Not Done Reply Inline Actions auto OFBinOp RKSimon:* auto *OFBinOp
		InstrFlags.setUnsafeAlgebra(Instr->hasUnsafeAlgebra());
		InstrFlags.setNoSignedZeros(Instr->hasNoSignedZeros());
		InstrFlags.setAllowContract(Instr->hasAllowContract());
		InstrFlags.setAllowReciprocal(Instr->hasAllowReciprocal());
		}

		RKSimonUnsubmitted Not Done Reply Inline Actions auto ExactOp RKSimon:* auto *ExactOp
		if (auto *OFBinOp = dyn_cast<const OverflowingBinaryOperator>(Instr)) {
		InstrFlags.setNoSignedWrap(OFBinOp->hasNoSignedWrap());
		InstrFlags.setNoUnsignedWrap(OFBinOp->hasNoUnsignedWrap());
		}

		hfinkelUnsubmitted Not Done Reply Inline Actions I'm a bit worried about propagating the integer flags automatically this way. Maybe this is fine in practice, but if we're adding some kind of implicit contract here, we should clearly document it. An operation that is exact, or does not overflow in some way, could be implemented in terms of operations that do (and, then, having the flags on those intermediate nodes wouldn't be correct). hfinkel: I'm a bit worried about propagating the integer flags automatically this way. Maybe this is…
		if (auto *ExactOp = dyn_cast<const PossiblyExactOperator>(Instr))
		InstrFlags.setExact(ExactOp->isExact());

		if (isVectorReductionOp(Instr))
		InstrFlags.setVectorReduction(true);

		Flags.setAcquireFlagsFromUser(false);
		Flags.setPropagateFlagsToOperands(false);

		if (!Flags.isDefined())
		Flags.mergeWith(InstrFlags);
		else
		Flags.intersectWith(InstrFlags);

		Node->setFlags(Flags);
		if (PropFlagsToOperands)
		std::for_each(Node->op_begin(), Node->op_end(),
		[&](const SDValue &Val) {
		if (Val.getNode()->getFlags().hasAcquireFlagsFromUser())
		Val.getNode()->setFlags(Node->getFlags());
		});
		}
		}

		// This function sets the Propagation bit over Parent DAG Node
		// and Acquire bit over Operand DAG node[s] which inherits the
		// flags from its parent.
		static void PropagateFlagsToOperands(SDValue &Parent,
		ArrayRef<SDValue> Operands) {
		SDNodeFlags PFlags = Parent.getNode()->getFlags();
		PFlags.setPropagateFlagsToOperands(true);
		Parent.getNode()->setFlags(PFlags);

		SDNodeFlags CFlags;
		CFlags.setAcquireFlagsFromUser(true);
		for (auto &Val : Operands)
		Val.getNode()->setFlags(CFlags);
		}

		private:
		const Instruction *Instr;
		SelectionDAGBuilder *SelDB;
		};

// Limit the width of DAG chains. This is important in general to prevent		// Limit the width of DAG chains. This is important in general to prevent
// DAG-based analysis from blowing up. For example, alias analysis and		// DAG-based analysis from blowing up. For example, alias analysis and
// load clustering may not complete in reasonable time. It is difficult to		// load clustering may not complete in reasonable time. It is difficult to
// recognize and avoid this situation within each individual analysis, and		// recognize and avoid this situation within each individual analysis, and
// future analyses are likely to have the same behavior. Limiting DAG width is		// future analyses are likely to have the same behavior. Limiting DAG width is
// the safe approach and will be especially important with global DAGs.		// the safe approach and will be especially important with global DAGs.
//		//
// MaxParallelChains default is arbitrarily high to avoid affecting		// MaxParallelChains default is arbitrarily high to avoid affecting
▲ Show 20 Lines • Show All 877 Lines • ▼ Show 20 Lines	SDValue SelectionDAGBuilder::getControlRoot() {
Root = DAG.getNode(ISD::TokenFactor, getCurSDLoc(), MVT::Other,		Root = DAG.getNode(ISD::TokenFactor, getCurSDLoc(), MVT::Other,
PendingExports);		PendingExports);
PendingExports.clear();		PendingExports.clear();
DAG.setRoot(Root);		DAG.setRoot(Root);
return Root;		return Root;
}		}

void SelectionDAGBuilder::visit(const Instruction &I) {		void SelectionDAGBuilder::visit(const Instruction &I) {
		SDNodeFlagsAcquirer Flags(&I, this);

// Set up outgoing PHI node register values before emitting the terminator.		// Set up outgoing PHI node register values before emitting the terminator.
if (isa<TerminatorInst>(&I)) {		if (isa<TerminatorInst>(&I)) {
HandlePHINodesInSuccessorBlocks(I.getParent());		HandlePHINodesInSuccessorBlocks(I.getParent());
}		}

// Increase the SDNodeOrder if dealing with a non-debug instruction.		// Increase the SDNodeOrder if dealing with a non-debug instruction.
if (!isa<DbgInfoIntrinsic>(I))		if (!isa<DbgInfoIntrinsic>(I))
++SDNodeOrder;		++SDNodeOrder;
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	if (It != FuncInfo.ValueMap.end()) {
Result = RFV.getCopyFromRegs(DAG, FuncInfo, getCurSDLoc(), Chain, nullptr,		Result = RFV.getCopyFromRegs(DAG, FuncInfo, getCurSDLoc(), Chain, nullptr,
V);		V);
resolveDanglingDebugInfo(V, Result);		resolveDanglingDebugInfo(V, Result);
}		}

return Result;		return Result;
}		}

		SDNode * SelectionDAGBuilder::getDAGNode(const Value *V) {
		if (NodeMap.find(V) == NodeMap.end())
		return nullptr;
		return NodeMap[V].getNode();
		}

/// getValue - Return an SDValue for the given Value.		/// getValue - Return an SDValue for the given Value.
SDValue SelectionDAGBuilder::getValue(const Value *V) {		SDValue SelectionDAGBuilder::getValue(const Value *V) {
// If we already have an SDValue for this value, use it. It's important		// If we already have an SDValue for this value, use it. It's important
// to do this first, so that we don't create a CopyFromReg if we already		// to do this first, so that we don't create a CopyFromReg if we already
// have a regular SDValue.		// have a regular SDValue.
SDValue &N = NodeMap[V];		SDValue &N = NodeMap[V];
if (N.getNode()) return N;		if (N.getNode()) return N;

▲ Show 20 Lines • Show All 349 Lines • ▼ Show 20 Lines	if (!FuncInfo.CanLowerReturn) {
SmallVector<EVT, 4> ValueVTs;		SmallVector<EVT, 4> ValueVTs;
SmallVector<uint64_t, 4> Offsets;		SmallVector<uint64_t, 4> Offsets;
ComputeValueVTs(TLI, DL, I.getOperand(0)->getType(), ValueVTs, &Offsets);		ComputeValueVTs(TLI, DL, I.getOperand(0)->getType(), ValueVTs, &Offsets);
unsigned NumValues = ValueVTs.size();		unsigned NumValues = ValueVTs.size();

// An aggregate return value cannot wrap around the address space, so		// An aggregate return value cannot wrap around the address space, so
// offsets to its parts don't wrap either.		// offsets to its parts don't wrap either.
SDNodeFlags Flags;		SDNodeFlags Flags;
Flags.setNoUnsignedWrap(true);		Flags.setNoUnsignedWrap(true, false);

SmallVector<SDValue, 4> Chains(NumValues);		SmallVector<SDValue, 4> Chains(NumValues);
for (unsigned i = 0; i != NumValues; ++i) {		for (unsigned i = 0; i != NumValues; ++i) {
SDValue Add = DAG.getNode(ISD::ADD, getCurSDLoc(),		SDValue Add = DAG.getNode(ISD::ADD, getCurSDLoc(),
RetPtr.getValueType(), RetPtr,		RetPtr.getValueType(), RetPtr,
DAG.getIntPtrConstant(Offsets[i],		DAG.getIntPtrConstant(Offsets[i],
getCurSDLoc()),		getCurSDLoc()),
Flags);		Flags);
▲ Show 20 Lines • Show All 1,197 Lines • ▼ Show 20 Lines	static bool isVectorReductionOp(const User *I) {
}		}
return ReduxExtracted;		return ReduxExtracted;
}		}

void SelectionDAGBuilder::visitBinary(const User &I, unsigned OpCode) {		void SelectionDAGBuilder::visitBinary(const User &I, unsigned OpCode) {
SDValue Op1 = getValue(I.getOperand(0));		SDValue Op1 = getValue(I.getOperand(0));
SDValue Op2 = getValue(I.getOperand(1));		SDValue Op2 = getValue(I.getOperand(1));

bool nuw = false;		if (isVectorReductionOp(&I))
bool nsw = false;
bool exact = false;
bool vec_redux = false;
FastMathFlags FMF;

if (const OverflowingBinaryOperator *OFBinOp =
dyn_cast<const OverflowingBinaryOperator>(&I)) {
nuw = OFBinOp->hasNoUnsignedWrap();
nsw = OFBinOp->hasNoSignedWrap();
}
if (const PossiblyExactOperator *ExactOp =
dyn_cast<const PossiblyExactOperator>(&I))
exact = ExactOp->isExact();
if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(&I))
FMF = FPOp->getFastMathFlags();

if (isVectorReductionOp(&I)) {
vec_redux = true;
DEBUG(dbgs() << "Detected a reduction operation:" << I << "\n");		DEBUG(dbgs() << "Detected a reduction operation:" << I << "\n");
}

SDNodeFlags Flags;
Flags.setExact(exact);
Flags.setNoSignedWrap(nsw);
Flags.setNoUnsignedWrap(nuw);
Flags.setVectorReduction(vec_redux);
Flags.setAllowReciprocal(FMF.allowReciprocal());
Flags.setAllowContract(FMF.allowContract());
Flags.setNoInfs(FMF.noInfs());
Flags.setNoNaNs(FMF.noNaNs());
Flags.setNoSignedZeros(FMF.noSignedZeros());
Flags.setUnsafeAlgebra(FMF.unsafeAlgebra());

SDValue BinNodeValue = DAG.getNode(OpCode, getCurSDLoc(), Op1.getValueType(),		SDValue BinNodeValue = DAG.getNode(OpCode, getCurSDLoc(), Op1.getValueType(),
Op1, Op2, Flags);		Op1, Op2);
setValue(&I, BinNodeValue);		setValue(&I, BinNodeValue);
}		}

void SelectionDAGBuilder::visitShift(const User &I, unsigned Opcode) {		void SelectionDAGBuilder::visitShift(const User &I, unsigned Opcode) {
SDValue Op1 = getValue(I.getOperand(0));		SDValue Op1 = getValue(I.getOperand(0));
SDValue Op2 = getValue(I.getOperand(1));		SDValue Op2 = getValue(I.getOperand(1));

EVT ShiftTy = DAG.getTargetLoweringInfo().getShiftAmountTy(		EVT ShiftTy = DAG.getTargetLoweringInfo().getShiftAmountTy(
Show All 15 Lines	if (!I.getType()->isVectorTy() && Op2.getValueType() != ShiftTy) {
// optimization early.		// optimization early.
else if (ShiftSize >= Log2_32_Ceil(Op2.getValueSizeInBits()))		else if (ShiftSize >= Log2_32_Ceil(Op2.getValueSizeInBits()))
Op2 = DAG.getNode(ISD::TRUNCATE, DL, ShiftTy, Op2);		Op2 = DAG.getNode(ISD::TRUNCATE, DL, ShiftTy, Op2);
// Otherwise we'll need to temporarily settle for some other convenient		// Otherwise we'll need to temporarily settle for some other convenient
// type. Type legalization will make adjustments once the shiftee is split.		// type. Type legalization will make adjustments once the shiftee is split.
else		else
Op2 = DAG.getZExtOrTrunc(Op2, DL, MVT::i32);		Op2 = DAG.getZExtOrTrunc(Op2, DL, MVT::i32);
}		}

bool nuw = false;		SDValue Res = DAG.getNode(Opcode, getCurSDLoc(), Op1.getValueType(), Op1, Op2);
		spatelUnsubmitted Not Done Reply Inline Actions Delete? spatel: Delete?
bool nsw = false;
bool exact = false;

if (Opcode == ISD::SRL \|\| Opcode == ISD::SRA \|\| Opcode == ISD::SHL) {

if (const OverflowingBinaryOperator *OFBinOp =
dyn_cast<const OverflowingBinaryOperator>(&I)) {
nuw = OFBinOp->hasNoUnsignedWrap();
nsw = OFBinOp->hasNoSignedWrap();
}
if (const PossiblyExactOperator *ExactOp =
dyn_cast<const PossiblyExactOperator>(&I))
exact = ExactOp->isExact();
}
SDNodeFlags Flags;
Flags.setExact(exact);
Flags.setNoSignedWrap(nsw);
Flags.setNoUnsignedWrap(nuw);
SDValue Res = DAG.getNode(Opcode, getCurSDLoc(), Op1.getValueType(), Op1, Op2,
Flags);
setValue(&I, Res);		setValue(&I, Res);
}		}

void SelectionDAGBuilder::visitSDiv(const User &I) {		void SelectionDAGBuilder::visitSDiv(const User &I) {
SDValue Op1 = getValue(I.getOperand(0));		SDValue Op1 = getValue(I.getOperand(0));
SDValue Op2 = getValue(I.getOperand(1));		SDValue Op2 = getValue(I.getOperand(1));

SDNodeFlags Flags;		setValue(&I, DAG.getNode(ISD::SDIV, getCurSDLoc(), Op1.getValueType(), Op1, Op2));
		spatelUnsubmitted Not Done Reply Inline Actions Delete? spatel: Delete?
Flags.setExact(isa<PossiblyExactOperator>(&I) &&
cast<PossiblyExactOperator>(&I)->isExact());
setValue(&I, DAG.getNode(ISD::SDIV, getCurSDLoc(), Op1.getValueType(), Op1,
Op2, Flags));
}		}

void SelectionDAGBuilder::visitICmp(const User &I) {		void SelectionDAGBuilder::visitICmp(const User &I) {
ICmpInst::Predicate predicate = ICmpInst::BAD_ICMP_PREDICATE;		ICmpInst::Predicate predicate = ICmpInst::BAD_ICMP_PREDICATE;
if (const ICmpInst *IC = dyn_cast<ICmpInst>(&I))		if (const ICmpInst *IC = dyn_cast<ICmpInst>(&I))
predicate = IC->getPredicate();		predicate = IC->getPredicate();
else if (const ConstantExpr *IC = dyn_cast<ConstantExpr>(&I))		else if (const ConstantExpr *IC = dyn_cast<ConstantExpr>(&I))
predicate = ICmpInst::Predicate(IC->getPredicate());		predicate = ICmpInst::Predicate(IC->getPredicate());
▲ Show 20 Lines • Show All 604 Lines • ▼ Show 20 Lines	if (StructType *StTy = GTI.getStructTypeOrNull()) {
if (Field) {		if (Field) {
// N = N + Offset		// N = N + Offset
uint64_t Offset = DL->getStructLayout(StTy)->getElementOffset(Field);		uint64_t Offset = DL->getStructLayout(StTy)->getElementOffset(Field);

// In an inbounds GEP with an offset that is nonnegative even when		// In an inbounds GEP with an offset that is nonnegative even when
// interpreted as signed, assume there is no unsigned overflow.		// interpreted as signed, assume there is no unsigned overflow.
SDNodeFlags Flags;		SDNodeFlags Flags;
if (int64_t(Offset) >= 0 && cast<GEPOperator>(I).isInBounds())		if (int64_t(Offset) >= 0 && cast<GEPOperator>(I).isInBounds())
Flags.setNoUnsignedWrap(true);		Flags.setNoUnsignedWrap(true, false);

N = DAG.getNode(ISD::ADD, dl, N.getValueType(), N,		N = DAG.getNode(ISD::ADD, dl, N.getValueType(), N,
DAG.getConstant(Offset, dl, N.getValueType()), Flags);		DAG.getConstant(Offset, dl, N.getValueType()), Flags);
}		}
} else {		} else {
MVT PtrTy =		MVT PtrTy =
DAG.getTargetLoweringInfo().getPointerTy(DAG.getDataLayout(), AS);		DAG.getTargetLoweringInfo().getPointerTy(DAG.getDataLayout(), AS);
unsigned PtrSize = PtrTy.getSizeInBits();		unsigned PtrSize = PtrTy.getSizeInBits();
Show All 14 Lines	if (StructType *StTy = GTI.getStructTypeOrNull()) {
SDValue OffsVal = VectorWidth ?		SDValue OffsVal = VectorWidth ?
DAG.getConstant(Offs, dl, EVT::getVectorVT(Context, PtrTy, VectorWidth)) :		DAG.getConstant(Offs, dl, EVT::getVectorVT(Context, PtrTy, VectorWidth)) :
DAG.getConstant(Offs, dl, PtrTy);		DAG.getConstant(Offs, dl, PtrTy);

// In an inbouds GEP with an offset that is nonnegative even when		// In an inbouds GEP with an offset that is nonnegative even when
// interpreted as signed, assume there is no unsigned overflow.		// interpreted as signed, assume there is no unsigned overflow.
SDNodeFlags Flags;		SDNodeFlags Flags;
if (Offs.isNonNegative() && cast<GEPOperator>(I).isInBounds())		if (Offs.isNonNegative() && cast<GEPOperator>(I).isInBounds())
Flags.setNoUnsignedWrap(true);		Flags.setNoUnsignedWrap(true, false);

N = DAG.getNode(ISD::ADD, dl, N.getValueType(), N, OffsVal, Flags);		N = DAG.getNode(ISD::ADD, dl, N.getValueType(), N, OffsVal, Flags);
continue;		continue;
}		}

// N = N + Idx * ElementSize;		// N = N + Idx * ElementSize;
SDValue IdxN = getValue(Idx);		SDValue IdxN = getValue(Idx);

▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	unsigned StackAlign =
DAG.getSubtarget().getFrameLowering()->getStackAlignment();		DAG.getSubtarget().getFrameLowering()->getStackAlignment();
if (Align <= StackAlign)		if (Align <= StackAlign)
Align = 0;		Align = 0;

// Round the size of the allocation up to the stack alignment size		// Round the size of the allocation up to the stack alignment size
// by add SA-1 to the size. This doesn't overflow because we're computing		// by add SA-1 to the size. This doesn't overflow because we're computing
// an address inside an alloca.		// an address inside an alloca.
SDNodeFlags Flags;		SDNodeFlags Flags;
Flags.setNoUnsignedWrap(true);		Flags.setNoUnsignedWrap(true, false);
AllocSize = DAG.getNode(ISD::ADD, dl,		AllocSize = DAG.getNode(ISD::ADD, dl,
AllocSize.getValueType(), AllocSize,		AllocSize.getValueType(), AllocSize,
DAG.getIntPtrConstant(StackAlign - 1, dl), Flags);		DAG.getIntPtrConstant(StackAlign - 1, dl), Flags);

// Mask out the low bits for alignment purposes.		// Mask out the low bits for alignment purposes.
AllocSize = DAG.getNode(ISD::AND, dl,		AllocSize = DAG.getNode(ISD::AND, dl,
AllocSize.getValueType(), AllocSize,		AllocSize.getValueType(), AllocSize,
DAG.getIntPtrConstant(~(uint64_t)(StackAlign - 1),		DAG.getIntPtrConstant(~(uint64_t)(StackAlign - 1),
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitLoad(const LoadInst &I) {
SDLoc dl = getCurSDLoc();		SDLoc dl = getCurSDLoc();

if (isVolatile)		if (isVolatile)
Root = TLI.prepareVolatileOrAtomicLoad(Root, dl, DAG);		Root = TLI.prepareVolatileOrAtomicLoad(Root, dl, DAG);

// An aggregate load cannot wrap around the address space, so offsets to its		// An aggregate load cannot wrap around the address space, so offsets to its
// parts don't wrap either.		// parts don't wrap either.
SDNodeFlags Flags;		SDNodeFlags Flags;
Flags.setNoUnsignedWrap(true);		Flags.setNoUnsignedWrap(true, false);

SmallVector<SDValue, 4> Values(NumValues);		SmallVector<SDValue, 4> Values(NumValues);
SmallVector<SDValue, 4> Chains(std::min(MaxParallelChains, NumValues));		SmallVector<SDValue, 4> Chains(std::min(MaxParallelChains, NumValues));
EVT PtrVT = Ptr.getValueType();		EVT PtrVT = Ptr.getValueType();
unsigned ChainI = 0;		unsigned ChainI = 0;
for (unsigned i = 0; i != NumValues; ++i, ++ChainI) {		for (unsigned i = 0; i != NumValues; ++i, ++ChainI) {
// Serializing loads here may result in excessive register pressure, and		// Serializing loads here may result in excessive register pressure, and
// TokenFactor places arbitrary choke points on the scheduler. SD scheduling		// TokenFactor places arbitrary choke points on the scheduler. SD scheduling
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	if (I.isVolatile())
MMOFlags \|= MachineMemOperand::MOVolatile;		MMOFlags \|= MachineMemOperand::MOVolatile;
if (I.getMetadata(LLVMContext::MD_nontemporal) != nullptr)		if (I.getMetadata(LLVMContext::MD_nontemporal) != nullptr)
MMOFlags \|= MachineMemOperand::MONonTemporal;		MMOFlags \|= MachineMemOperand::MONonTemporal;
MMOFlags \|= TLI.getMMOFlags(I);		MMOFlags \|= TLI.getMMOFlags(I);

// An aggregate load cannot wrap around the address space, so offsets to its		// An aggregate load cannot wrap around the address space, so offsets to its
// parts don't wrap either.		// parts don't wrap either.
SDNodeFlags Flags;		SDNodeFlags Flags;
Flags.setNoUnsignedWrap(true);		Flags.setNoUnsignedWrap(true, false);

unsigned ChainI = 0;		unsigned ChainI = 0;
for (unsigned i = 0; i != NumValues; ++i, ++ChainI) {		for (unsigned i = 0; i != NumValues; ++i, ++ChainI) {
// See visitLoad comments.		// See visitLoad comments.
if (ChainI == MaxParallelChains) {		if (ChainI == MaxParallelChains) {
SDValue Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,		SDValue Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
makeArrayRef(Chains.data(), ChainI));		makeArrayRef(Chains.data(), ChainI));
Root = Chain;		Root = Chain;
▲ Show 20 Lines • Show All 1,766 Lines • ▼ Show 20 Lines	if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&
SDValue Mul = DAG.getNode(ISD::FMUL, sdl,		SDValue Mul = DAG.getNode(ISD::FMUL, sdl,
getValue(I.getArgOperand(0)).getValueType(),		getValue(I.getArgOperand(0)).getValueType(),
getValue(I.getArgOperand(0)),		getValue(I.getArgOperand(0)),
getValue(I.getArgOperand(1)));		getValue(I.getArgOperand(1)));
SDValue Add = DAG.getNode(ISD::FADD, sdl,		SDValue Add = DAG.getNode(ISD::FADD, sdl,
getValue(I.getArgOperand(0)).getValueType(),		getValue(I.getArgOperand(0)).getValueType(),
Mul,		Mul,
getValue(I.getArgOperand(2)));		getValue(I.getArgOperand(2)));

		SDNodeFlagsAcquirer::PropagateFlagsToOperands(Add,{Mul});
setValue(&I, Add);		setValue(&I, Add);
}		}
return nullptr;		return nullptr;
}		}
case Intrinsic::convert_to_fp16:		case Intrinsic::convert_to_fp16:
setValue(&I, DAG.getNode(ISD::BITCAST, sdl, MVT::i16,		setValue(&I, DAG.getNode(ISD::BITCAST, sdl, MVT::i16,
DAG.getNode(ISD::FP_ROUND, sdl, MVT::f16,		DAG.getNode(ISD::FP_ROUND, sdl, MVT::f16,
getValue(I.getArgOperand(0)),		getValue(I.getArgOperand(0)),
▲ Show 20 Lines • Show All 1,066 Lines • ▼ Show 20 Lines	bool SelectionDAGBuilder::visitBinaryFloatCall(const CallInst &I,
SDValue Tmp0 = getValue(I.getArgOperand(0));		SDValue Tmp0 = getValue(I.getArgOperand(0));
SDValue Tmp1 = getValue(I.getArgOperand(1));		SDValue Tmp1 = getValue(I.getArgOperand(1));
EVT VT = Tmp0.getValueType();		EVT VT = Tmp0.getValueType();
setValue(&I, DAG.getNode(Opcode, getCurSDLoc(), VT, Tmp0, Tmp1));		setValue(&I, DAG.getNode(Opcode, getCurSDLoc(), VT, Tmp0, Tmp1));
return true;		return true;
}		}

void SelectionDAGBuilder::visitCall(const CallInst &I) {		void SelectionDAGBuilder::visitCall(const CallInst &I) {
// Handle inline assembly differently.		// Handle inline assembly differently.
		RKSimonUnsubmitted Not Done Reply Inline Actions remove newline diff RKSimon: remove newline diff
if (isa<InlineAsm>(I.getCalledValue())) {		if (isa<InlineAsm>(I.getCalledValue())) {
visitInlineAsm(&I);		visitInlineAsm(&I);
return;		return;
}		}

MachineModuleInfo &MMI = DAG.getMachineFunction().getMMI();		MachineModuleInfo &MMI = DAG.getMachineFunction().getMMI();
computeUsesVAFloatArgument(I, MMI);		computeUsesVAFloatArgument(I, MMI);

▲ Show 20 Lines • Show All 1,317 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitVectorReduce(const CallInst &I,
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
SDValue Op1 = getValue(I.getArgOperand(0));		SDValue Op1 = getValue(I.getArgOperand(0));
SDValue Op2;		SDValue Op2;
if (I.getNumArgOperands() > 1)		if (I.getNumArgOperands() > 1)
Op2 = getValue(I.getArgOperand(1));		Op2 = getValue(I.getArgOperand(1));
SDLoc dl = getCurSDLoc();		SDLoc dl = getCurSDLoc();
EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
SDValue Res;		SDValue Res;
FastMathFlags FMF;		FastMathFlags FMF;
if (isa<FPMathOperator>(I))		if (isa<FPMathOperator>(I))
FMF = I.getFastMathFlags();		FMF = I.getFastMathFlags();
SDNodeFlags SDFlags;
SDFlags.setNoNaNs(FMF.noNaNs());

		spatelUnsubmitted Not Done Reply Inline Actions Don't need this anymore? spatel: Don't need this anymore?
switch (Intrinsic) {		switch (Intrinsic) {
case Intrinsic::experimental_vector_reduce_fadd:		case Intrinsic::experimental_vector_reduce_fadd:
if (FMF.unsafeAlgebra())		if (FMF.unsafeAlgebra())
Res = DAG.getNode(ISD::VECREDUCE_FADD, dl, VT, Op2);		Res = DAG.getNode(ISD::VECREDUCE_FADD, dl, VT, Op2);
else		else
Res = DAG.getNode(ISD::VECREDUCE_STRICT_FADD, dl, VT, Op1, Op2);		Res = DAG.getNode(ISD::VECREDUCE_STRICT_FADD, dl, VT, Op1, Op2);
break;		break;
case Intrinsic::experimental_vector_reduce_fmul:		case Intrinsic::experimental_vector_reduce_fmul:
Show All 25 Lines	case Intrinsic::experimental_vector_reduce_smin:
break;		break;
case Intrinsic::experimental_vector_reduce_umax:		case Intrinsic::experimental_vector_reduce_umax:
Res = DAG.getNode(ISD::VECREDUCE_UMAX, dl, VT, Op1);		Res = DAG.getNode(ISD::VECREDUCE_UMAX, dl, VT, Op1);
break;		break;
case Intrinsic::experimental_vector_reduce_umin:		case Intrinsic::experimental_vector_reduce_umin:
Res = DAG.getNode(ISD::VECREDUCE_UMIN, dl, VT, Op1);		Res = DAG.getNode(ISD::VECREDUCE_UMIN, dl, VT, Op1);
break;		break;
case Intrinsic::experimental_vector_reduce_fmax: {		case Intrinsic::experimental_vector_reduce_fmax: {
Res = DAG.getNode(ISD::VECREDUCE_FMAX, dl, VT, Op1, SDFlags);		Res = DAG.getNode(ISD::VECREDUCE_FMAX, dl, VT, Op1);
break;		break;
}		}
case Intrinsic::experimental_vector_reduce_fmin: {		case Intrinsic::experimental_vector_reduce_fmin: {
Res = DAG.getNode(ISD::VECREDUCE_FMIN, dl, VT, Op1, SDFlags);		Res = DAG.getNode(ISD::VECREDUCE_FMIN, dl, VT, Op1);
break;		break;
}		}
default:		default:
llvm_unreachable("Unhandled vector reduce intrinsic");		llvm_unreachable("Unhandled vector reduce intrinsic");
}		}
setValue(&I, Res);		setValue(&I, Res);
}		}

▲ Show 20 Lines • Show All 1,925 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 2,940 Lines • ▼ Show 20 Lines	static SDValue BuildExactSDIV(const TargetLowering &TLI, SDValue Op1, APInt d,
// Shift the value upfront if it is even, so the LSB is one.		// Shift the value upfront if it is even, so the LSB is one.
unsigned ShAmt = d.countTrailingZeros();		unsigned ShAmt = d.countTrailingZeros();
if (ShAmt) {		if (ShAmt) {
// TODO: For UDIV use SRL instead of SRA.		// TODO: For UDIV use SRL instead of SRA.
SDValue Amt =		SDValue Amt =
DAG.getConstant(ShAmt, dl, TLI.getShiftAmountTy(Op1.getValueType(),		DAG.getConstant(ShAmt, dl, TLI.getShiftAmountTy(Op1.getValueType(),
DAG.getDataLayout()));		DAG.getDataLayout()));
SDNodeFlags Flags;		SDNodeFlags Flags;
Flags.setExact(true);		Flags.setExact(Op1.getNode()->getFlags().hasExact());
Op1 = DAG.getNode(ISD::SRA, dl, Op1.getValueType(), Op1, Amt, Flags);		Op1 = DAG.getNode(ISD::SRA, dl, Op1.getValueType(), Op1, Amt, Flags);
Created.push_back(Op1.getNode());		Created.push_back(Op1.getNode());
d.ashrInPlace(ShAmt);		d.ashrInPlace(ShAmt);
}		}

// Calculate the multiplicative inverse, using Newton's method.		// Calculate the multiplicative inverse, using Newton's method.
APInt t, xn = d;		APInt t, xn = d;
while ((t = d*xn) != 1)		while ((t = d*xn) != 1)
▲ Show 20 Lines • Show All 924 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,955 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::getSqrtEstimate(SDValue Operand,
bool Reciprocal) const {		bool Reciprocal) const {
if (Enabled == ReciprocalEstimate::Enabled \|\|		if (Enabled == ReciprocalEstimate::Enabled \|\|
(Enabled == ReciprocalEstimate::Unspecified && Subtarget->useRSqrt()))		(Enabled == ReciprocalEstimate::Unspecified && Subtarget->useRSqrt()))
if (SDValue Estimate = getEstimate(Subtarget, AArch64ISD::FRSQRTE, Operand,		if (SDValue Estimate = getEstimate(Subtarget, AArch64ISD::FRSQRTE, Operand,
DAG, ExtraSteps)) {		DAG, ExtraSteps)) {
SDLoc DL(Operand);		SDLoc DL(Operand);
EVT VT = Operand.getValueType();		EVT VT = Operand.getValueType();

SDNodeFlags Flags;		SDNodeFlags Flags = Operand.getNode()->getFlags();
Flags.setUnsafeAlgebra(true);

// Newton reciprocal square root iteration: E * 0.5 * (3 - X * E^2)		// Newton reciprocal square root iteration: E * 0.5 * (3 - X * E^2)
// AArch64 reciprocal square root iteration instruction: 0.5 * (3 - M * N)		// AArch64 reciprocal square root iteration instruction: 0.5 * (3 - M * N)
for (int i = ExtraSteps; i > 0; --i) {		for (int i = ExtraSteps; i > 0; --i) {
SDValue Step = DAG.getNode(ISD::FMUL, DL, VT, Estimate, Estimate,		SDValue Step = DAG.getNode(ISD::FMUL, DL, VT, Estimate, Estimate,
Flags);		Flags);
Step = DAG.getNode(AArch64ISD::FRSQRTS, DL, VT, Operand, Step, Flags);		Step = DAG.getNode(AArch64ISD::FRSQRTS, DL, VT, Operand, Step, Flags);
Estimate = DAG.getNode(ISD::FMUL, DL, VT, Estimate, Step, Flags);		Estimate = DAG.getNode(ISD::FMUL, DL, VT, Estimate, Step, Flags);
Show All 22 Lines	SDValue AArch64TargetLowering::getRecipEstimate(SDValue Operand,
SelectionDAG &DAG, int Enabled,		SelectionDAG &DAG, int Enabled,
int &ExtraSteps) const {		int &ExtraSteps) const {
if (Enabled == ReciprocalEstimate::Enabled)		if (Enabled == ReciprocalEstimate::Enabled)
if (SDValue Estimate = getEstimate(Subtarget, AArch64ISD::FRECPE, Operand,		if (SDValue Estimate = getEstimate(Subtarget, AArch64ISD::FRECPE, Operand,
DAG, ExtraSteps)) {		DAG, ExtraSteps)) {
SDLoc DL(Operand);		SDLoc DL(Operand);
EVT VT = Operand.getValueType();		EVT VT = Operand.getValueType();

SDNodeFlags Flags;		SDNodeFlags Flags = Operand.getNode()->getFlags();
Flags.setUnsafeAlgebra(true);

// Newton reciprocal iteration: E * (2 - X * E)		// Newton reciprocal iteration: E * (2 - X * E)
// AArch64 reciprocal iteration instruction: (2 - M * N)		// AArch64 reciprocal iteration instruction: (2 - M * N)
for (int i = ExtraSteps; i > 0; --i) {		for (int i = ExtraSteps; i > 0; --i) {
SDValue Step = DAG.getNode(AArch64ISD::FRECPS, DL, VT, Operand,		SDValue Step = DAG.getNode(AArch64ISD::FRECPS, DL, VT, Operand,
Estimate, Flags);		Estimate, Flags);
Estimate = DAG.getNode(ISD::FMUL, DL, VT, Estimate, Step, Flags);		Estimate = DAG.getNode(ISD::FMUL, DL, VT, Estimate, Step, Flags);
}		}
▲ Show 20 Lines • Show All 5,933 Lines • Show Last 20 Lines

test/CodeGen/X86/fmf-flags.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s -check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s -check-prefix=X64
	; RUN: llc < %s -mtriple=i686-unknown \| FileCheck %s -check-prefix=X86			; RUN: llc < %s -mtriple=i686-unknown \| FileCheck %s -check-prefix=X86

	declare float @llvm.sqrt.f32(float %x);			declare float @llvm.sqrt.f32(float %x);

	define float @fast_recip_sqrt(float %x) {			define float @fast_recip_sqrt(float %x) {
	; X64-LABEL: fast_recip_sqrt:			; X64-LABEL: fast_recip_sqrt:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: sqrtss %xmm0, %xmm1			; X64-NEXT: rsqrtss %xmm0, %xmm1
	; X64-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X64-NEXT: mulss %xmm1, %xmm0
	; X64-NEXT: divss %xmm1, %xmm0			; X64-NEXT: mulss %xmm1, %xmm0
				; X64-NEXT: addss {{.*}}(%rip), %xmm0
				; X64-NEXT: mulss {{.*}}(%rip), %xmm1
				; X64-NEXT: mulss %xmm1, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: fast_recip_sqrt:			; X86-LABEL: fast_recip_sqrt:
	; X86: # BB#0:			; X86: # BB#0:
	; X86-NEXT: flds {{[0-9]+}}(%esp)			; X86-NEXT: flds {{[0-9]+}}(%esp)
	; X86-NEXT: fsqrt			; X86-NEXT: fsqrt
	; X86-NEXT: fld1			; X86-NEXT: fld1
	; X86-NEXT: fdivp %st(1)			; X86-NEXT: fdivp %st(1)
	; X86-NEXT: retl			; X86-NEXT: retl
	%y = call fast float @llvm.sqrt.f32(float %x)			%y = call fast float @llvm.sqrt.f32(float %x)
	%z = fdiv fast float 1.0, %y			%z = fdiv fast float 1.0, %y
	ret float %z			ret float %z
	}			}

	declare float @llvm.fmuladd.f32(float %a, float %b, float %c);			declare float @llvm.fmuladd.f32(float %a, float %b, float %c);

	define float @fast_fmuladd_opts(float %a , float %b , float %c) {			define float @fast_fmuladd_opts(float %a , float %b , float %c) {
	; X64-LABEL: fast_fmuladd_opts:			; X64-LABEL: fast_fmuladd_opts:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: movaps %xmm0, %xmm1			; X64-NEXT: mulss {{.*}}(%rip), %xmm0
	; X64-NEXT: addss %xmm1, %xmm1
	; X64-NEXT: addss %xmm0, %xmm1
	; X64-NEXT: movaps %xmm1, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: fast_fmuladd_opts:			; X86-LABEL: fast_fmuladd_opts:
	; X86: # BB#0:			; X86: # BB#0:
	; X86-NEXT: flds {{[0-9]+}}(%esp)			; X86-NEXT: flds {{[0-9]+}}(%esp)
	; X86-NEXT: fld %st(0)			; X86-NEXT: fmuls {{\.LCPI.*}}
	; X86-NEXT: fadd %st(1)
	; X86-NEXT: faddp %st(1)
	; X86-NEXT: retl			; X86-NEXT: retl
	%res = call fast float @llvm.fmuladd.f32(float %a, float 2.0, float %a)			%res = call fast float @llvm.fmuladd.f32(float %a, float 2.0, float %a)
	ret float %res			ret float %res
	}			}

	; The multiply is strict.			; The multiply is strict.

	@mul1 = common global double 0.000000e+00, align 4			@mul1 = common global double 0.000000e+00, align 4

	define double @not_so_fast_mul_add(double %x) {			define double @not_so_fast_mul_add(double %x) {
	; X64-LABEL: not_so_fast_mul_add:			; X64-LABEL: not_so_fast_mul_add:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; X64-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; X64-NEXT: mulsd %xmm0, %xmm1			; X64-NEXT: mulsd %xmm0, %xmm1
	; X64-NEXT: addsd %xmm1, %xmm0			; X64-NEXT: mulsd {{.*}}(%rip), %xmm0
	; X64-NEXT: movsd %xmm1, {{.*}}(%rip)			; X64-NEXT: movsd %xmm1, {{.*}}(%rip)
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: not_so_fast_mul_add:			; X86-LABEL: not_so_fast_mul_add:
	; X86: # BB#0:			; X86: # BB#0:
	; X86-NEXT: fldl {{[0-9]+}}(%esp)			; X86-NEXT: fldl {{[0-9]+}}(%esp)
	; X86-NEXT: fld %st(0)			; X86-NEXT: fld %st(0)
	; X86-NEXT: fmull {{\.LCPI.*}}			; X86-NEXT: fmull {{\.LCPI.*}}
	; X86-NEXT: fadd %st(0), %st(1)			; X86-NEXT: fxch %st(1)
				; X86-NEXT: fmull {{\.LCPI.*}}
				; X86-NEXT: fxch %st(1)
	; X86-NEXT: fstpl mul1			; X86-NEXT: fstpl mul1
	; X86-NEXT: retl			; X86-NEXT: retl
	%m = fmul double %x, 4.2			%m = fmul double %x, 4.2
	%a = fadd fast double %m, %x			%a = fadd fast double %m, %x
	store double %m, double* @mul1, align 4			store double %m, double* @mul1, align 4
	ret double %a			ret double %a
	}			}

	; The sqrt is strict.			; The sqrt is strict.

	@sqrt1 = common global float 0.000000e+00, align 4			@sqrt1 = common global float 0.000000e+00, align 4

	define float @not_so_fast_recip_sqrt(float %x) {			define float @not_so_fast_recip_sqrt(float %x) {
	; X64-LABEL: not_so_fast_recip_sqrt:			; X64-LABEL: not_so_fast_recip_sqrt:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: sqrtss %xmm0, %xmm1			; X64-NEXT: rsqrtss %xmm0, %xmm1
	; X64-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X64-NEXT: sqrtss %xmm0, %xmm2
	; X64-NEXT: divss %xmm1, %xmm0			; X64-NEXT: mulss %xmm1, %xmm0
	; X64-NEXT: movss %xmm1, {{.*}}(%rip)			; X64-NEXT: mulss %xmm1, %xmm0
				; X64-NEXT: addss {{.*}}(%rip), %xmm0
				; X64-NEXT: mulss {{.*}}(%rip), %xmm1
				; X64-NEXT: mulss %xmm1, %xmm0
				; X64-NEXT: movss %xmm2, {{.*}}(%rip)
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: not_so_fast_recip_sqrt:			; X86-LABEL: not_so_fast_recip_sqrt:
	; X86: # BB#0:			; X86: # BB#0:
	; X86-NEXT: flds {{[0-9]+}}(%esp)			; X86-NEXT: flds {{[0-9]+}}(%esp)
	; X86-NEXT: fsqrt			; X86-NEXT: fsqrt
	; X86-NEXT: fld1			; X86-NEXT: fld1
	; X86-NEXT: fdiv %st(1)			; X86-NEXT: fdiv %st(1)
	Show All 10 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 116305

include/llvm/CodeGen/SelectionDAGNodes.h

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/CodeGen/SelectionDAG/TargetLowering.cpp

lib/Target/AArch64/AArch64ISelLowering.cpp

test/CodeGen/X86/fmf-flags.ll

[DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management.
Needs ReviewPublic