This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
2
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
-
SelectionDAGBuilder.h
-
SelectionDAGBuilder.cpp
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPUISelLowering.h
3
AMDGPUISelLowering.cpp
-
SIISelLowering.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
fdiv32-to-rcp-folding.ll
-
fmuladd.f16.ll
-
fmuladd.f32.ll
1
fneg-combines.ll
-
llvm.amdgcn.fmed3.ll
-
selectcc-opt.ll
-
set-dx10.ll

Differential D73978

[WIP][FPEnv] Don't transform FSUB(-0.0,X)->FNEG(X) when flushing denormals
AbandonedPublic

Authored by cameron.mcinally on Feb 4 2020, 10:07 AM.

Download Raw Diff

Details

Reviewers

arsenm
spatel
craig.topper
andrew.w.kaylor
kpn
uweigand
pengfei
sepavloff

Summary

When in a mode that flushes denormals, we don't want to transform FSUB(-0.0,X) -> FNEG(X). The former is an arith operation that will flush a denormal input to 0. The latter is a bitwise operation that will only flip the sign bit.

Marked as [WIP] since the logic is a little weird. Hoping @arsenm and others can offer some guidance...

Notice that we still perform the transformation when in DenormalMode::IEEE. This is counter-intuitive. IEEE-754 is what specifies that these operations are distinct, but only in regards to side-effects, not denormal flushing. LLVM optimizations do not preserve side-effects, and both operation results will be bitwise identical when we're not flushing denormals, so I think this is the correct thing to do.

Although, there's also the problem of this transform changing the sign of a NaN in DenormalMode::IEEE. Do we want to take that into consideration? E.g. an FSUB(-0.0, NaN) should produce a canonical NaN with the same payload, while FNEG(NaN) produces -NaN. If I'm not mistaken, IEEE-754 doesn't specify the sign of a NaN result, besides being a canonical NaN.

Also notice that we still perform the transformation when in DenormalMode::Invalid. I believe that Invalid is actually a flush to zero mode. However, I think it makes sense to leave the default mode unchanged wrt disabling this transform. There could be a very small (and hard to measure) performance penalty for using a proper FSUB on some targets.

Thoughts about any of this?

Diff Detail

Event Timeline

cameron.mcinally created this revision.Feb 4 2020, 10:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2020, 10:07 AM

Herald added subscribers: llvm-commits, hiraditya, wdng. · View Herald Transcript

arsenm added inline comments.Feb 4 2020, 10:17 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12358–12360	This will need updating for the splitting the input and output patch I just committed
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2997–3015	Why does SelectionDAGBuilder bother doing this fold at all? It should just directly translate the fsub?

arsenm added inline comments.Feb 4 2020, 10:19 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2997–3015	I think this should be just ripped out in a separate patch. The same problem seems to have been copied to IRTranslator, which should also be removed

The DenormalMode::Invalid is a temporary state and should not really be a concern. It should be invalid and never seen after D69989

cameron.mcinally marked 2 inline comments as done.Feb 4 2020, 11:14 AM

cameron.mcinally added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12358–12360	Ok, thanks. Will wait for the builds to go green and then update.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2997–3015	@sanjay, what do you think? Seems reasonable to me. I think it made sense to do this when FNEG(X) was the canonical form of FSUB(-0.0, X). Wouldn't want two forms floating around for even a small amount of time. But now that there are cases where the operations are distinct through llc, it seems ok to wait until DAGCombine.

spatel added inline comments.Feb 4 2020, 12:31 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2997–3015	Yes, now that we can use DenormMode to distinguish target behavior, it seems better to do it later in DAGCombiner if that would make sense for the target.

cameron.mcinally planned changes to this revision.Feb 5 2020, 7:21 AM

cameron.mcinally marked an inline comment as done.

cameron.mcinally added inline comments.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2997–3015	I'll have to put a pin in this for now. Removing this block is causing regressions in about 15 tests. The regressions appear to be subtle lowering differences, so I suspect it will take some time to straighten them out.

cameron.mcinally marked an inline comment as not done.Feb 12 2020, 7:32 AM

cameron.mcinally added inline comments.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2997–3015	I looked into removing this and there are warts underneath. Some are surmountable (different lowerings), but one is worrisome. I.e. the case where the FNeg operand is undef: FNEG(undef) -> undef FSUB(-0.0, undef) -> NaN That is, removing this transform propagates NaNs where we previously had undef values. Any thoughts on how to proceed? Do we want to minimize code differences by keeping this transform in place? Or are we okay moving forward with the undef->NaN change?

spatel added inline comments.Feb 12 2020, 11:42 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2997–3015	Did that difference show up as real regressions or something benign? We could add a special-case fold for this here or getNode() if it helps: fsub C, undef --> undef (as long as C is not NaN or Inf?)

arsenm added inline comments.Feb 12 2020, 12:58 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2997–3015	This sounds more correct to me. I don't see why this would be special cased

spatel added inline comments.Feb 12 2020, 1:56 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2997–3015	It's a special-case in the sense that folding to NaN is correct in general. Just dealing with this particular pattern is also a special-case because we could do something similar for all FP ops, not just fsub with constant operand 0. But we'll need to work out if/how the corner cases differ per opcode.

cameron.mcinally marked an inline comment as not done.Feb 13 2020, 7:18 AM

cameron.mcinally added inline comments.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2997–3015	Benign llc regression tests. The NaN and undef propagate differently, so the asm differences appear worse than they are. We could add a special-case fold for this here or getNode() if it helps: fsub C, undef --> undef (as long as C is not NaN or Inf?) That's a good idea. I don't feel strongly about it, but the current transform might be more obvious than adding a special case fold though.

spatel mentioned this in D74713: [ConstantFold] fold fsub -0.0, undef to undef rather than NaN.Feb 17 2020, 6:50 AM

spatel added inline comments.Feb 20 2020, 6:43 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2997–3015	I tried to generalize this over in D74713, but there doesn't appear to be support for bending the theoretical definition of undef to the practical constraints of real-life floating-point. So our options are: Add a constant fold for this exact case: fsub -0.0, undef --> undef Ignore the diffs caused by removing this transform. I'd lean toward #1 (I can limit D74713 to that case as the start of that effort).

Thanks, Sanjay. I'm okay with either approach.

I'll pick this up again in the near future. I've been distracted with another project...

spatel mentioned this in rGd799190851fd: [ConstantFold] fold fsub -0.0, undef to undef rather than NaN.Feb 21 2020, 5:21 AM

spatel mentioned this in rGa253a2a793cd: [SDAG] fold fsub -0.0, undef to undef rather than NaN.Feb 23 2020, 8:39 AM

In D73978#1884643, @cameron.mcinally wrote:

Thanks, Sanjay. I'm okay with either approach.

rGa253a2a793cd: [SDAG] fold fsub -0.0, undef to undef rather than NaN.

Thanks for that patch, Sanjay.

I have another issue which I hope you can help me sort out. There's a transform in narrowExtractedVectorBinOp(...) in DAGCombiner.cpp:

// extract (binop B0, B1), N --> binop (extract B0, N), (extract B1, N)

This transform only happens for binops, so we don't see it when SelectionDAGBuilder converts the FSUB->FNEG.

The IR is...

%rhs_neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %rhs
%splat = shufflevector <4 x float> %rhs_neg, <4 x float> undef, <2 x i32> <i32 3, i32 3>

and after DAGCombine we end up with DAGs like this...

FNEG:
<               t9: v4f32 = bitcast t8
<             t24: v4f32 = fneg t9
<           t15: v2f32 = extract_subvector t24, Constant:i64<2>
<         t17: v2f32 = vector_shuffle<1,1> t15, undef:v2f32

FSUB:
>               t29: v1i64 = extract_subvector t8, Constant:i64<1>
>             t30: v2f32 = bitcast t29
>           t32: v2f32 = fneg t30
>         t17: v2f32 = vector_shuffle<1,1> t32, undef:v2f32

Moving the extract to the operands (FSUB) is a problem on AArch64 since the extract could be rolled into the shuffle (FNEG). E.g.:

FNEG:
<             t9: v4f32 = bitcast t8
<           t24: v4f32 = fneg t9
<         t26: v2f32 = AArch64ISD::DUPLANE32 t24, Constant:i64<3>

FSUB:
>                 t29: v1i64 = extract_subvector t8, Constant:i64<1>
>               t30: v2f32 = bitcast t29
>             t32: v2f32 = fneg t30
>           t36: v4f32 = insert_subvector undef:v4f32, t32, Constant:i32<0>
>         t37: v2f32 = AArch64ISD::DUPLANE32 t36, Constant:i64<1>

Any insight on the best way to correct this difference? I suppose I could fix up the extract+insert at the MachineInstruction level, but that doesn't seem like the correct fix since other targets could have the same problem.

I'm also a little skeptical about moving the extracts to the operands, and if it's a win in the general case. Seems like it would be stronger after any extract+insert peeps have occurred, but I suppose that's why it's done in DAGCombine. :/

spatel mentioned this in rG894ce940db59: [AArch64] add tests for fake fneg; NFC.Feb 26 2020, 8:05 AM

spatel mentioned this in rGb3d0c798367d: [DAGCombiner] avoid narrowing fake fneg vector op.Feb 26 2020, 8:33 AM

In D73978#1890311, @cameron.mcinally wrote:
Thanks for that patch, Sanjay.

I have another issue which I hope you can help me sort out. There's a transform in narrowExtractedVectorBinOp(...) in DAGCombiner.cpp:

// extract (binop B0, B1), N --> binop (extract B0, N), (extract B1, N)

This transform only happens for binops, so we don't see it when SelectionDAGBuilder converts the FSUB->FNEG.

The IR is...
%rhs_neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %rhs
%splat = shufflevector <4 x float> %rhs_neg, <4 x float> undef, <2 x i32> <i32 3, i32 3>
and after DAGCombine we end up with DAGs like this...
FNEG:
<               t9: v4f32 = bitcast t8
<             t24: v4f32 = fneg t9
<           t15: v2f32 = extract_subvector t24, Constant:i64<2>
<         t17: v2f32 = vector_shuffle<1,1> t15, undef:v2f32

FSUB:
>               t29: v1i64 = extract_subvector t8, Constant:i64<1>
>             t30: v2f32 = bitcast t29
>           t32: v2f32 = fneg t30
>         t17: v2f32 = vector_shuffle<1,1> t32, undef:v2f32
Moving the extract to the operands (FSUB) is a problem on AArch64 since the extract could be rolled into the shuffle (FNEG). E.g.:
FNEG:
<             t9: v4f32 = bitcast t8
<           t24: v4f32 = fneg t9
<         t26: v2f32 = AArch64ISD::DUPLANE32 t24, Constant:i64<3>

FSUB:
>                 t29: v1i64 = extract_subvector t8, Constant:i64<1>
>               t30: v2f32 = bitcast t29
>             t32: v2f32 = fneg t30
>           t36: v4f32 = insert_subvector undef:v4f32, t32, Constant:i32<0>
>         t37: v2f32 = AArch64ISD::DUPLANE32 t36, Constant:i64<1>
Any insight on the best way to correct this difference? I suppose I could fix up the extract+insert at the MachineInstruction level, but that doesn't seem like the correct fix since other targets could have the same problem.

I'm also a little skeptical about moving the extracts to the operands, and if it's a win in the general case. Seems like it would be stronger after any extract+insert peeps have occurred, but I suppose that's why it's done in DAGCombine. :/

The motivation for narrowExtractedVectorBinOp() was to shrink unnecessarily wide vector ops on x86 (256/512-bit vector code can run much slower than 128-bit vector code).
But we want to avoid moving fneg around too much because it can be folded into some other op for free in many cases. We can show there's an inconsistency in the handling in an independent example, so:
rGb3d0c798367d

Let me know if that works to remove the problem here.

Thanks again, Sanjay. That did help. I have other issues to work through on AMDGPU, but it's getting closer...

In D73978#1893862, @cameron.mcinally wrote:

Thanks again, Sanjay. That did help. I have other issues to work through on AMDGPU, but it's getting closer...

As a heads up, AMDGPU doesn’t respect the denormal attribute yet and still uses the custom subtarget features. The patch to switch is posted but held up by its dependencies

spatel mentioned this in D75576: [SDAG] simplify FP binops to undef.Mar 3 2020, 2:59 PM

spatel mentioned this in rG29a2b20ab363: [SDAG] simplify FP binops to undef.Mar 4 2020, 7:54 AM

Rebase and AMDGPU test changes to elucidate a problem with this Diff.

@arsenm, The problem in the AMDGPU tests is that FSUB(-0.0, X) is not folding into the following instruction, as it would if it was transformed into an FNEG(X).

It's probably okay to fold some of these. E.g.

-  %fneg.a = fsub float -0.000000e+00, %a
+  %fneg.a = fneg float %a
   %add = fadd float %fneg.a, %b

If we're flushing input to zero, it's probably okay to fold a FSUB(-0,X) into the FADD, since the FADD will flush denorms. Although, if we're flushing output to zero, that probably is NOT ok, since something like FADD(largest_denorm, largest_denorm) would return a normal.

I guess what I'm really asking is how important is this to AMDGPU? It seems to be the only target that is upset about the changes in this Diff.

Would it be enough to update the CHECK lines to not expect a FSUB(-0,X) to fold? Or does this need more peeps to fold the cases where it's safe? And if the latter, should we move ahead with this Diff and optimize later?

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptMar 31 2020, 8:31 AM

In D73978#1952649, @cameron.mcinally wrote:
Rebase and AMDGPU test changes to elucidate a problem with this Diff.

@arsenm, The problem in the AMDGPU tests is that FSUB(-0.0, X) is not folding into the following instruction, as it would if it was transformed into an FNEG(X).

It's probably okay to fold some of these. E.g.
-  %fneg.a = fsub float -0.000000e+00, %a
+  %fneg.a = fneg float %a
   %add = fadd float %fneg.a, %b
If we're flushing input to zero, it's probably okay to fold a FSUB(-0,X) into the FADD, since the FADD will flush denorms. Although, if we're flushing output to zero, that probably is NOT ok, since something like FADD(largest_denorm, largest_denorm) would return a normal.

I guess what I'm really asking is how important is this to AMDGPU? It seems to be the only target that is upset about the changes in this Diff.

AMDGPU isn't respecting the new attributes yet. My patches to switch to it are still working their way through the review/commit process

arsenm added inline comments.Mar 31 2020, 2:45 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12590	Shouldn't be considering invalid anymore

Remove DenormalMode::Invalid check as suggested by @arsenm.

cameron.mcinally marked 2 inline comments as done.Apr 1 2020, 7:43 AM

AMDGPU should now be properly respecting the new attributes

Thanks, Matt. It looks like preventing the FSUB->FNEG transform is still causing trouble with folding the negate into instructions. E.g.

<scrubbed>/clang/llvm-project/llvm/test/CodeGen/AMDGPU/v_mac_f16.ll:125:7: error: SI: expected string not found in input
; SI: v_mad_f32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, -v{{[0-9]+}}

It seems that there are a handful of new test failures too.

Any suggestions on how to proceed?

Should we not expect the explicit FSUB(-0,X) to fold under denormal flushing modes? (Too big a hammer, but correct)

Or maybe fold the FSUB(-0,X) into the instruction in the backend where possible? (Might cause some slightly wrong answers, unless we're careful)

In D73978#1959879, @cameron.mcinally wrote:
Thanks, Matt. It looks like preventing the FSUB->FNEG transform is still causing trouble with folding the negate into instructions. E.g.
<scrubbed>/clang/llvm-project/llvm/test/CodeGen/AMDGPU/v_mac_f16.ll:125:7: error: SI: expected string not found in input
; SI: v_mad_f32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, -v{{[0-9]+}}
It seems that there are a handful of new test failures too.

Any suggestions on how to proceed?

Should we not expect the explicit FSUB(-0,X) to fold under denormal flushing modes? (Too big a hammer, but correct)

That's what I would expect. Additional context is needed to know the flush will be performed elsewhere

Or maybe fold the FSUB(-0,X) into the instruction in the backend where possible? (Might cause some slightly wrong answers, unless we're careful)

I don't think we need to fold this in the target, we should be able to fold based on another instruction we know will flush. In the sample you gave there, the f16 operation was promoted to f32 and the conversion should also flush

In D73978#1960092, @arsenm wrote:

Should we not expect the explicit FSUB(-0,X) to fold under denormal flushing modes? (Too big a hammer, but correct)

That's what I would expect. Additional context is needed to know the flush will be performed elsewhere

Or maybe fold the FSUB(-0,X) into the instruction in the backend where possible? (Might cause some slightly wrong answers, unless we're careful)

I don't think we need to fold this in the target, we should be able to fold based on another instruction we know will flush. In the sample you gave there, the f16 operation was promoted to f32 and the conversion should also flush

Good point. I suppose we'd need a switch to check if the user's opcode is a flushing operation.

That's kind of ugly though. Anyone know of a better way to do it?

In D73978#1960683, @cameron.mcinally wrote:

In D73978#1960092, @arsenm wrote:

Should we not expect the explicit FSUB(-0,X) to fold under denormal flushing modes? (Too big a hammer, but correct)

That's what I would expect. Additional context is needed to know the flush will be performed elsewhere

Or maybe fold the FSUB(-0,X) into the instruction in the backend where possible? (Might cause some slightly wrong answers, unless we're careful)

I don't think we need to fold this in the target, we should be able to fold based on another instruction we know will flush. In the sample you gave there, the f16 operation was promoted to f32 and the conversion should also flush

Good point. I suppose we'd need a switch to check if the user's opcode is a flushing operation.

That's kind of ugly though. Anyone know of a better way to do it?

Also the if the input is flushing

ychen added a subscriber: ychen.Apr 3 2020, 6:52 PM

Sorry for the long wait time. I'm still working on this. The AMDGPU tests are proving hard to clean up. Update hopefully coming soon...

Made some more progress on sorting out the AMDGPU backend, but I'm running up against walls: some optimization opportunities that will need further work; some newly exposed bugs in existing code; and some are my lack of experience with the AMDGPU instruction set. I added FIXME comments with some details about the cases I'm not familiar with. @arsenm Any comments on these changes?

The general intent of this patch is to check if the FSUB(+-0, X)->FNEG(X) transform is safe while in a DAZ/FTZ mode. This is done by checking if all uses of a FSUB(+-0, X) will flush denormals. If so, the transform is safe to do. Most cases are caught okay, but some are trickier. I couldn't solve them all.

(Digressing: I think we need a TableGen flag for instructions that could flush denormals.)

arsenm added inline comments.Jun 16 2020, 10:15 AM

llvm/include/llvm/CodeGen/TargetLowering.h
464	AMDGPU basically already has this, but it requires a depth argument similar to computeKnownBits.
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
834	We're already doing this, but I'm made somewhat uncomfortable by how constant folding is done. We don't insert a canonicalize when constant folding, so if you check isCanonicalized(x), but x is constant folded away into something that should have flushed, this won't be quite right. I guess the way it's defined, this only matters when folding canonicalize inputs?
846	Weird to have FMAXNUM but not FMINNUM. I also think we have a defective implementation for subtargets where the instructions don't read the FP mode. We inspect the inputs of the generic node rather than introducing a target specific wrapper with the broken behavior
llvm/test/CodeGen/AMDGPU/fneg-combines.ll
11	Correct, this most of these are for source modifier folding purposes only

cameron.mcinally marked an inline comment as done.Jun 16 2020, 12:23 PM

cameron.mcinally added inline comments.

llvm/include/llvm/CodeGen/TargetLowering.h
464	I did see isCanonicalized(...), but it looks like it goes the other direction. I.e. isCanonicalized(...) checks to see if the predecessor is already canonicalized. willCanonicalize(...) checks to see if the successors will canonicalize the result of the operation. There are a number of existing tests that begin with an FSUB(-0, X), so that's why I choose this solution. I think we'd eventually want both directions, for completeness. But I noticed that isCanonicalized(...) only exists in SIISelLowering, and I didn't want to mess around with something I didn't fully understand.
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
846	Agreed. This switch is only opcodes that existed in current testing, so there are some gaps. I should probably add FMINNUM under a separate patch. That said, I could introduce a test case pre-commit and then fix it in this patch. That's probably the right way to go forward. This switch is also likely incorrect at the edges (e.g. FMED3, FMA). I don't fully understand all the intricacies of AMDGPU flushing [as seen in isCanonicalized(...)]. There's more work needed here.

Remove FMAXNUM and a couple other opcodes from the willCanonicalize(...) switch. They are not currently tested, but rather leftover junk from building out this code. I was mistaken.

Ping. @arsenm

I know there are some problems with the current implementation, but I think it's a good first step. Landing the DAGCombiner changes is probably worth the edge-case precision bugs, so that other backends don't regress. In particular, the current DAGCombiner::visitFSub(...) code is vulnerable now. Thoughts on any of this?

The thing I'm somewhat worried about is a subtlety with constant folding. Constant folding will blindly fold unaware of whatever canonicalization needed to happen. willCanonicalize may have lied if something happened later that caused the canonicalizing operation to constant fold away

In D73978#2151964, @arsenm wrote:

The thing I'm somewhat worried about is a subtlety with constant folding. Constant folding will blindly fold unaware of whatever canonicalization needed to happen. willCanonicalize may have lied if something happened later that caused the canonicalizing operation to constant fold away

Ah, good point. I remember you saying that before, but I didn't absorb it at the time.

That's a sticky problem. We could wait until the MachineInstr level to do the FSUB->FNEG transform, to ensure that constant folding completed. But I suspect (pretty certain) that we'll have missed other FNEG peeps we'd want by then. So that won't work.

In general, it would be good to go for functional correctness first, and then try to optimize. That's kind of a problem for this specific project though, since so many existing tests would need to be updated. I'm not sure what to do. Will need to think about it...

In D73978#2153300, @cameron.mcinally wrote:

In D73978#2151964, @arsenm wrote:

The thing I'm somewhat worried about is a subtlety with constant folding. Constant folding will blindly fold unaware of whatever canonicalization needed to happen. willCanonicalize may have lied if something happened later that caused the canonicalizing operation to constant fold away

Ah, good point. I remember you saying that before, but I didn't absorb it at the time.

That's a sticky problem. We could wait until the MachineInstr level to do the FSUB->FNEG transform, to ensure that constant folding completed. But I suspect (pretty certain) that we'll have missed other FNEG peeps we'd want by then. So that won't work.

In general, it would be good to go for functional correctness first, and then try to optimize. That's kind of a problem for this specific project though, since so many existing tests would need to be updated. I'm not sure what to do. Will need to think about it...

If we modeled everything correctly, the correct thing to do would be to insert canonicalizes whenever occurs (but that's a massive change). I'm not a huge fan of getNode doing constant folding, so maybe eliminating that at least would help?

cameron.mcinally mentioned this in D84056: [FPEnv] Don't transform FSUB(-0, X) -> FNEG(X) in SelectionDAGBuilder..Jul 17 2020, 10:47 AM

Abandoning this Diff since most of it was covered in D84056. Will prepare a new patch to remove the problematic FSUB DAGCombine soon.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

5 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

26 lines

SelectionDAGBuilder.h

2 lines

SelectionDAGBuilder.cpp

14 lines

Target/

AMDGPU/

AMDGPUISelLowering.h

1 line

AMDGPUISelLowering.cpp

45 lines

SIISelLowering.cpp

19 lines

test/

CodeGen/

AMDGPU/

fdiv32-to-rcp-folding.ll

14 lines

12 lines

12 lines

274 lines

2 lines

6 lines

15 lines

Diff 271192

llvm/include/llvm/CodeGen/TargetLowering.h

Context not available.
	return true;	return true;
	}	}

		/// Return true if denormals will be flushed to zero.
		virtual bool willCanonicalize(SelectionDAG &DAG, SDNode *N) const {
		arsenmUnsubmitted Not Done Reply Inline Actions AMDGPU basically already has this, but it requires a depth argument similar to computeKnownBits. arsenm: AMDGPU basically already has this, but it requires a depth argument similar to computeKnownBits.
		cameron.mcinallyAuthorUnsubmitted Not Done Reply Inline Actions I did see isCanonicalized(...), but it looks like it goes the other direction. I.e. isCanonicalized(...) checks to see if the predecessor is already canonicalized. willCanonicalize(...) checks to see if the successors will canonicalize the result of the operation. There are a number of existing tests that begin with an FSUB(-0, X), so that's why I choose this solution. I think we'd eventually want both directions, for completeness. But I noticed that isCanonicalized(...) only exists in SIISelLowering, and I didn't want to mess around with something I didn't fully understand. cameron.mcinally: I did see isCanonicalized(...), but it looks like it goes the other direction. I.e.
		return false;
		}

	/// Return true if SQRT(X) shouldn't be replaced with X*RSQRT(X).	/// Return true if SQRT(X) shouldn't be replaced with X*RSQRT(X).
	virtual bool isFsqrtCheap(SDValue X, SelectionDAG &DAG) const {	virtual bool isFsqrtCheap(SDValue X, SelectionDAG &DAG) const {
	// Default behavior is to replace SQRT(X) with X*RSQRT(X).	// Default behavior is to replace SQRT(X) with X*RSQRT(X).
Context not available.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

Context not available.
	}	}

	// (fsub -0.0, N1) -> -N1	// (fsub -0.0, N1) -> -N1
	// NOTE: It is safe to transform an FSUB(-0.0,X) into an FNEG(X), since the
	// FSUB does not specify the sign bit of a NaN. Also note that for
	// the same reason, the inverse transform is not safe, unless fast math
	// flags are in play.
	if (N0CFP && N0CFP->isZero()) {	if (N0CFP && N0CFP->isZero()) {
	if (N0CFP->isNegative() \|\|	if (N0CFP->isNegative() \|\|
	(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros())) {	(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros())) {
	if (SDValue NegN1 =	// We cannot replace an FSUB(+-0.0,X) with FNEG(X) when denormals are
	TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize))	// flushed to zero, unless all users treat denorms as zero (DAZ).
	return NegN1;	DenormalMode DenormMode = DAG.getDenormalMode(VT);
	if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))
	return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);	// Check that all uses will flush denorms to zero.
		bool Flushed = true;
		for (auto UI = N->use_begin(), E = N->use_end(); UI != E; ++UI)
		if (!TLI.willCanonicalize(DAG, *UI))
		Flushed = false;

		if (Flushed \|\| (DenormMode == DenormalMode::getIEEE())) {
		if (SDValue NegN1 =
		TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize))
		return NegN1;
		if (!LegalOperations \|\| TLI.isOperationLegal(ISD::FNEG, VT))
		return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);
		}
	}	}
	}	}

Context not available.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Context not available.
	void visitAdd(const User &I) { visitBinary(I, ISD::ADD); }	void visitAdd(const User &I) { visitBinary(I, ISD::ADD); }
	void visitFAdd(const User &I) { visitBinary(I, ISD::FADD); }	void visitFAdd(const User &I) { visitBinary(I, ISD::FADD); }
	void visitSub(const User &I) { visitBinary(I, ISD::SUB); }	void visitSub(const User &I) { visitBinary(I, ISD::SUB); }
	void visitFSub(const User &I);	void visitFSub(const User &I) { visitBinary(I, ISD::FSUB); }
	void visitMul(const User &I) { visitBinary(I, ISD::MUL); }	void visitMul(const User &I) { visitBinary(I, ISD::MUL); }
	void visitFMul(const User &I) { visitBinary(I, ISD::FMUL); }	void visitFMul(const User &I) { visitBinary(I, ISD::FMUL); }
	void visitURem(const User &I) { visitBinary(I, ISD::UREM); }	void visitURem(const User &I) { visitBinary(I, ISD::UREM); }
Context not available.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

Context not available.
	DAG.setRoot(DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, DAG.getRoot()));	DAG.setRoot(DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, DAG.getRoot()));
	}	}

	void SelectionDAGBuilder::visitFSub(const User &I) {
	// -0.0 - X --> fneg
	Type *Ty = I.getType();
	if (isa<Constant>(I.getOperand(0)) &&
	I.getOperand(0) == ConstantFP::getZeroValueForNegation(Ty)) {
	SDValue Op2 = getValue(I.getOperand(1));
	setValue(&I, DAG.getNode(ISD::FNEG, getCurSDLoc(),
	Op2.getValueType(), Op2));
	return;
	}

	visitBinary(I, ISD::FSUB);
	}

	void SelectionDAGBuilder::visitUnary(const User &I, unsigned Opcode) {	void SelectionDAGBuilder::visitUnary(const User &I, unsigned Opcode) {
	SDNodeFlags Flags;	SDNodeFlags Flags;

Context not available.

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

Context not available.
	NegatibleCost &Cost,	NegatibleCost &Cost,
	unsigned Depth) const override;	unsigned Depth) const override;

		bool willCanonicalize(SelectionDAG &DAG, SDNode *N) const override;
	bool isNarrowingProfitable(EVT VT1, EVT VT2) const override;	bool isNarrowingProfitable(EVT VT1, EVT VT2) const override;

	EVT getTypeForExtReturn(LLVMContext &Context, EVT VT,	EVT getTypeForExtReturn(LLVMContext &Context, EVT VT,
Context not available.

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Context not available.
	ForCodeSize, Cost, Depth);	ForCodeSize, Cost, Depth);
	}	}

		// Return true if the Opcode will treat denormals as zero (DAZ).
		arsenmUnsubmitted Not Done Reply Inline Actions We're already doing this, but I'm made somewhat uncomfortable by how constant folding is done. We don't insert a canonicalize when constant folding, so if you check isCanonicalized(x), but x is constant folded away into something that should have flushed, this won't be quite right. I guess the way it's defined, this only matters when folding canonicalize inputs? arsenm: We're already doing this, but I'm made somewhat uncomfortable by how constant folding is done.
		bool AMDGPUTargetLowering::willCanonicalize(SelectionDAG &DAG, SDNode *N) const {
		// FIXME: This is not a complete list. This only represents current
		// testing.
		switch (N->getOpcode()) {
		default: return false;
		case ISD::FCANONICALIZE:
		case ISD::FADD:
		case ISD::FSUB:
		case ISD::FMUL:
		case ISD::FMA:
		case ISD::FMAD:
		case ISD::FP_EXTEND:
		arsenmUnsubmitted Not Done Reply Inline Actions Weird to have FMAXNUM but not FMINNUM. I also think we have a defective implementation for subtargets where the instructions don't read the FP mode. We inspect the inputs of the generic node rather than introducing a target specific wrapper with the broken behavior arsenm: Weird to have FMAXNUM but not FMINNUM. I also think we have a defective implementation for…
		cameron.mcinallyAuthorUnsubmitted Not Done Reply Inline Actions Agreed. This switch is only opcodes that existed in current testing, so there are some gaps. I should probably add FMINNUM under a separate patch. That said, I could introduce a test case pre-commit and then fix it in this patch. That's probably the right way to go forward. This switch is also likely incorrect at the edges (e.g. FMED3, FMA). I don't fully understand all the intricacies of AMDGPU flushing [as seen in isCanonicalized(...)]. There's more work needed here. cameron.mcinally: Agreed. This switch is only opcodes that existed in current testing, so there are some gaps. I…
		case ISD::FP_ROUND:
		case ISD::FP_TO_SINT:
		case ISD::FP_TO_UINT:
		case ISD::FTRUNC:
		case ISD::FSQRT:
		case AMDGPUISD::CLAMP:
		case AMDGPUISD::FMAD_FTZ:
		case AMDGPUISD::FMED3:
		case AMDGPUISD::RCP:
		return true;
		case ISD::FNEG:
		case ISD::EXTRACT_VECTOR_ELT:
		case ISD::EXTRACT_SUBVECTOR: {
		for (auto UI = N->use_begin(), E = N->use_end(); UI != E; ++UI)
		if (!willCanonicalize(DAG, *UI))
		return false;
		return true;
		}
		case ISD::INTRINSIC_WO_CHAIN: {
		unsigned IntrinsicID
		= cast<ConstantSDNode>(N->getOperand(0))->getZExtValue();
		switch (IntrinsicID) {
		case Intrinsic::amdgcn_fdiv_fast:
		return true;
		}
		return false;
		}
		}

		llvm_unreachable("invalid operation");
		}

	//===---------------------------------------------------------------------===//	//===---------------------------------------------------------------------===//
	// Target Properties	// Target Properties
	//===---------------------------------------------------------------------===//	//===---------------------------------------------------------------------===//
Context not available.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

Context not available.
	switch (Opcode) {	switch (Opcode) {
	// These will flush denorms if required.	// These will flush denorms if required.
	case ISD::FADD:	case ISD::FADD:
	case ISD::FSUB:
	case ISD::FMUL:	case ISD::FMUL:
	case ISD::FCEIL:	case ISD::FCEIL:
	case ISD::FFLOOR:	case ISD::FFLOOR:
Context not available.
	case AMDGPUISD::CVT_F32_UBYTE2:	case AMDGPUISD::CVT_F32_UBYTE2:
	case AMDGPUISD::CVT_F32_UBYTE3:	case AMDGPUISD::CVT_F32_UBYTE3:
	return true;	return true;
		case ISD::FSUB: {
		SDValue N0 = Op.getOperand(0);
		ConstantFPSDNode *N0CFP = isConstOrConstSplatFP(N0, true);
		const TargetOptions &Options = DAG.getTarget().Options;
		const SDNodeFlags Flags = Op->getFlags();

		// FIXME: This works around a bug with FCANONICALIZE. Legalize
		// will remove the FCANONICALIZE before the FSUB(-0,X)->FNEG(X)
		// transform is considered.
		// FSUB(+-0.0, X) will become FNEG(X)
		if (N0CFP && N0CFP->isZero()) {
		if (N0CFP->isNegative() \|\|
		(Options.NoSignedZerosFPMath \|\| Flags.hasNoSignedZeros())) {
		return false;
		}
		}

		return true;
		}
	// It can/will be lowered or combined as a bit operation.	// It can/will be lowered or combined as a bit operation.
	// Need to check their input recursively to handle.	// Need to check their input recursively to handle.
	case ISD::FNEG:	case ISD::FNEG:
Context not available.

llvm/test/CodeGen/AMDGPU/fdiv32-to-rcp-folding.ll

Context not available.

	; GCN-LABEL: {{^}}div_v4_c_by_x_25ulp:	; GCN-LABEL: {{^}}div_v4_c_by_x_25ulp:
	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, 2.0{{$}}	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, 2.0{{$}}
	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, 2.0{{$}}
	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}
		; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, 2.0{{$}}
	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}
	; GCN-DENORM-DAG: v_rcp_f32_e32	; GCN-DENORM-DAG: v_rcp_f32_e32
	; GCN-DENORM-DAG: v_rcp_f32_e32	; GCN-DENORM-DAG: v_rcp_f32_e32
Context not available.
	}	}

	; GCN-LABEL: {{^}}div_v4_c_by_minus_x_25ulp:	; GCN-LABEL: {{^}}div_v4_c_by_minus_x_25ulp:
	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, 2.0{{$}}
	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, 2.0{{$}}
	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, 2.0{{$}}
	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}	; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, 2.0{{$}}
	; GCN-DENORM-DAG: v_rcp_f32_e32	; GCN-DENORM-DAG: v_rcp_f32_e32
	; GCN-DENORM-DAG: v_rcp_f32_e32	; GCN-DENORM-DAG: v_rcp_f32_e32

Context not available.

	; GCN-DENORM-DAG: v_div_fmas_f32	; GCN-DENORM-DAG: v_div_fmas_f32
	; GCN-DENORM-DAG: v_div_fmas_f32	; GCN-DENORM-DAG: v_div_fmas_f32
	; GCN-DENORM-DAG: v_div_fixup_f32 {{.*}}, -2.0{{$}}	; GCN-DENORM-DAG: v_div_fixup_f32 {{.*}}, 2.0{{$}}
	; GCN-DENORM-DAG: v_div_fixup_f32 {{.*}}, -2.0{{$}}	; GCN-DENORM-DAG: v_div_fixup_f32 {{.*}}, 2.0{{$}}

	; GCN-FLUSH-DAG: v_rcp_f32_e32	; GCN-FLUSH-DAG: v_rcp_f32_e32
	; GCN-FLUSH-DAG: v_rcp_f32_e64	; GCN-FLUSH-DAG: v_rcp_f32_e64
Context not available.

llvm/test/CodeGen/AMDGPU/fmuladd.f16.ll

Context not available.
	ret void	ret void
	}	}

		; FIXME: The MAD only folds the FSUB(-0,X) when the FNEG(X) transform
		; happens in SelectionDAGBuilder. DAGCombiner probably needs to
		; be updated to fold the FNEG after visitFSUB(...) runs.

	; GCN-LABEL: {{^}}fmuladd_neg_2.0_neg_a_b_f16	; GCN-LABEL: {{^}}fmuladd_neg_2.0_neg_a_b_f16
	; GCN: {{buffer\|flat\|global}}_load_ushort [[R1:v[0-9]+]],	; GCN: {{buffer\|flat\|global}}_load_ushort [[R1:v[0-9]+]],
	; GCN: {{buffer\|flat\|global}}_load_ushort [[R2:v[0-9]+]],	; GCN: {{buffer\|flat\|global}}_load_ushort [[R2:v[0-9]+]],
Context not available.
	%r1 = load volatile half, half addrspace(1)* %gep.0	%r1 = load volatile half, half addrspace(1)* %gep.0
	%r2 = load volatile half, half addrspace(1)* %gep.1	%r2 = load volatile half, half addrspace(1)* %gep.1

	%r1.fneg = fsub half -0.000000e+00, %r1	%r1.fneg = fneg half %r1

	%r3 = tail call half @llvm.fmuladd.f16(half -2.0, half %r1.fneg, half %r2)	%r3 = tail call half @llvm.fmuladd.f16(half -2.0, half %r1.fneg, half %r2)
	store half %r3, half addrspace(1)* %gep.out	store half %r3, half addrspace(1)* %gep.out
Context not available.
	%r1 = load volatile half, half addrspace(1)* %gep.0	%r1 = load volatile half, half addrspace(1)* %gep.0
	%r2 = load volatile half, half addrspace(1)* %gep.1	%r2 = load volatile half, half addrspace(1)* %gep.1

	%r1.fneg = fsub half -0.000000e+00, %r1	%r1.fneg = fneg half %r1

	%r3 = tail call half @llvm.fmuladd.f16(half 2.0, half %r1.fneg, half %r2)	%r3 = tail call half @llvm.fmuladd.f16(half 2.0, half %r1.fneg, half %r2)
	store half %r3, half addrspace(1)* %gep.out	store half %r3, half addrspace(1)* %gep.out
Context not available.
	; GFX10-DENORM-CONTRACT: v_fmac_f16_e32 [[REGC]], [[REGA]], [[REGB]]	; GFX10-DENORM-CONTRACT: v_fmac_f16_e32 [[REGC]], [[REGA]], [[REGB]]

	; GCN-DENORM-STRICT: v_mul_f16_e32 [[TMP:v[0-9]+]], [[REGA]], [[REGB]]	; GCN-DENORM-STRICT: v_mul_f16_e32 [[TMP:v[0-9]+]], [[REGA]], [[REGB]]
	; GCN-DENORM-STRICT: v_add_f16_e32 [[RESULT:v[0-9]+]], [[REGC]], [[TMP]]	; GCN-DENORM-STRICT: v_add_f16_e32 [[RESULT:v[0-9]+]], [[TMP]], [[REGC]]
	; VI-DENORM: flat_store_short v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]	; VI-DENORM: flat_store_short v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]

	; GFX10-FLUSH: v_mul_f16_e32 [[TMP:v[0-9]+]], [[REGA]], [[REGB]]	; GFX10-FLUSH: v_mul_f16_e32 [[TMP:v[0-9]+]], [[REGA]], [[REGB]]
	; GFX10-FLUSH: v_add_f16_e32 [[RESULT:v[0-9]+]], [[REGC]], [[TMP]]	; GFX10-FLUSH: v_add_f16_e32 [[RESULT:v[0-9]+]], [[TMP]], [[REGC]]
	; GFX10-FLUSH: global_store_short v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]	; GFX10-FLUSH: global_store_short v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]
	; GFX10-DENORM-STRICT: global_store_short v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]	; GFX10-DENORM-STRICT: global_store_short v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]
	; GFX10-DENORM-CONTRACT: global_store_short v{{\[[0-9]+:[0-9]+\]}}, [[REGC]]	; GFX10-DENORM-CONTRACT: global_store_short v{{\[[0-9]+:[0-9]+\]}}, [[REGC]]
Context not available.

llvm/test/CodeGen/AMDGPU/fmuladd.f32.ll

Context not available.
	ret void	ret void
	}	}

		; FIXME: The MAD only folds the FSUB(-0,X) when the FNEG(X) transform
		; happens in SelectionDAGBuilder. DAGCombiner probably needs to
		; be updated to fold the FNEG after visitFSUB(...) runs.

	; XXX	; XXX
	; GCN-LABEL: {{^}}fmuladd_neg_2.0_neg_a_b_f32	; GCN-LABEL: {{^}}fmuladd_neg_2.0_neg_a_b_f32
	; GCN: {{buffer\|flat\|global}}_load_dword [[R1:v[0-9]+]],	; GCN: {{buffer\|flat\|global}}_load_dword [[R1:v[0-9]+]],
Context not available.
	%r1 = load volatile float, float addrspace(1)* %gep.0	%r1 = load volatile float, float addrspace(1)* %gep.0
	%r2 = load volatile float, float addrspace(1)* %gep.1	%r2 = load volatile float, float addrspace(1)* %gep.1

	%r1.fneg = fsub float -0.000000e+00, %r1	%r1.fneg = fneg float %r1

	%r3 = tail call float @llvm.fmuladd.f32(float -2.0, float %r1.fneg, float %r2)	%r3 = tail call float @llvm.fmuladd.f32(float -2.0, float %r1.fneg, float %r2)
	store float %r3, float addrspace(1)* %gep.out	store float %r3, float addrspace(1)* %gep.out
Context not available.
	%r1 = load volatile float, float addrspace(1)* %gep.0	%r1 = load volatile float, float addrspace(1)* %gep.0
	%r2 = load volatile float, float addrspace(1)* %gep.1	%r2 = load volatile float, float addrspace(1)* %gep.1

	%r1.fneg = fsub float -0.000000e+00, %r1	%r1.fneg = fneg float %r1

	%r3 = tail call float @llvm.fmuladd.f32(float 2.0, float %r1.fneg, float %r2)	%r3 = tail call float @llvm.fmuladd.f32(float 2.0, float %r1.fneg, float %r2)
	store float %r3, float addrspace(1)* %gep.out	store float %r3, float addrspace(1)* %gep.out
Context not available.
	; GCN-DENORM-FASTFMA-CONTRACT: v_fma_f32 [[RESULT:v[0-9]+]], [[REGA]], [[REGB]], [[REGC]]	; GCN-DENORM-FASTFMA-CONTRACT: v_fma_f32 [[RESULT:v[0-9]+]], [[REGA]], [[REGB]], [[REGC]]

	; GCN-DENORM-SLOWFMA-CONTRACT: v_mul_f32_e32 [[TMP:v[0-9]+]], [[REGA]], [[REGB]]	; GCN-DENORM-SLOWFMA-CONTRACT: v_mul_f32_e32 [[TMP:v[0-9]+]], [[REGA]], [[REGB]]
	; GCN-DENORM-SLOWFMA-CONTRACT: v_add_f32_e32 [[RESULT:v[0-9]+]], [[REGC]], [[TMP]]	; GCN-DENORM-SLOWFMA-CONTRACT: v_add_f32_e32 [[RESULT:v[0-9]+]], [[TMP]], [[REGC]]

	; GCN-DENORM-STRICT: v_mul_f32_e32 [[TMP:v[0-9]+]], [[REGA]], [[REGB]]	; GCN-DENORM-STRICT: v_mul_f32_e32 [[TMP:v[0-9]+]], [[REGA]], [[REGB]]
	; GCN-DENORM-STRICT: v_add_f32_e32 [[RESULT:v[0-9]+]], [[REGC]], [[TMP]]	; GCN-DENORM-STRICT: v_add_f32_e32 [[RESULT:v[0-9]+]], [[TMP]], [[REGC]]

	; SI-DENORM: buffer_store_dword [[RESULT]]	; SI-DENORM: buffer_store_dword [[RESULT]]
	; VI-DENORM: {{global\|flat}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]	; VI-DENORM: {{global\|flat}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]
Context not available.

llvm/test/CodeGen/AMDGPU/fneg-combines.ll

Context not available.
	; fadd tests	; fadd tests
	; --------------------------------------------------------------------------------	; --------------------------------------------------------------------------------

		; FIXME: I think we want to test FNEG(X) folding here. The FSUB(-0,X) case is
		arsenmUnsubmitted Not Done Reply Inline Actions Correct, this most of these are for source modifier folding purposes only arsenm: Correct, this most of these are for source modifier folding purposes only
		; uninteresting. Unless these tests should be split into
		; GCN-FLUSH/GCN-DENORM checks.

	; GCN-LABEL: {{^}}v_fneg_add_f32:	; GCN-LABEL: {{^}}v_fneg_add_f32:
	; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]	; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]
	; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]	; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%add = fadd float %a, %b	%add = fadd float %a, %b
	%fneg = fsub float -0.000000e+00, %add	%fneg = fneg float %add
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%add = fadd float %a, %b	%add = fadd float %a, %b
	%fneg = fsub float -0.000000e+00, %add	%fneg = fneg float %add
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %add, float addrspace(1)* %out	store volatile float %add, float addrspace(1)* %out
	ret void	ret void
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%add = fadd float %a, %b	%add = fadd float %a, %b
	%fneg = fsub float -0.000000e+00, %add	%fneg = fneg float %add
	%use1 = fmul float %add, 4.0	%use1 = fmul float %add, 4.0
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %use1, float addrspace(1)* %out	store volatile float %use1, float addrspace(1)* %out
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%add = fadd float %fneg.a, %b	%add = fadd float %fneg.a, %b
	%fneg = fsub float -0.000000e+00, %add	%fneg = fneg float %add
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.b = fsub float -0.000000e+00, %b	%fneg.b = fneg float %b
	%add = fadd float %a, %fneg.b	%add = fadd float %a, %fneg.b
	%fneg = fsub float -0.000000e+00, %add	%fneg = fneg float %add
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%fneg.b = fsub float -0.000000e+00, %b	%fneg.b = fneg float %b
	%add = fadd float %fneg.a, %fneg.b	%add = fadd float %fneg.a, %fneg.b
	%fneg = fsub float -0.000000e+00, %add	%fneg = fneg float %add
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%add = fadd float %fneg.a, %b	%add = fadd float %fneg.a, %b
	%fneg = fsub float -0.000000e+00, %add	%fneg = fneg float %add
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %fneg.a, float addrspace(1)* %out	store volatile float %fneg.a, float addrspace(1)* %out
	ret void	ret void
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%add = fadd float %fneg.a, %b	%add = fadd float %fneg.a, %b
	%fneg = fsub float -0.000000e+00, %add	%fneg = fneg float %add
	%use1 = fmul float %fneg.a, %c	%use1 = fmul float %fneg.a, %c
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %use1, float addrspace(1)* %out	store volatile float %use1, float addrspace(1)* %out
Context not available.
	%tmp9 = fmul reassoc nnan arcp contract float 0.000000e+00, %tmp8	%tmp9 = fmul reassoc nnan arcp contract float 0.000000e+00, %tmp8
	%.i188 = fadd float %tmp9, 0.000000e+00	%.i188 = fadd float %tmp9, 0.000000e+00
	%tmp10 = fcmp uge float %.i188, %tmp2	%tmp10 = fcmp uge float %.i188, %tmp2
	%tmp11 = fsub float -0.000000e+00, %.i188	%tmp11 = fneg float %.i188
	%.i092 = select i1 %tmp10, float %tmp2, float %tmp11	%.i092 = select i1 %tmp10, float %tmp2, float %tmp11
	%tmp12 = fcmp ule float %.i092, 0.000000e+00	%tmp12 = fcmp ule float %.i092, 0.000000e+00
	%.i198 = select i1 %tmp12, float 0.000000e+00, float 0x7FF8000000000000	%.i198 = select i1 %tmp12, float 0.000000e+00, float 0x7FF8000000000000
Context not available.
	%tmp9 = fmul reassoc nnan arcp contract float 0.000000e+00, %tmp8	%tmp9 = fmul reassoc nnan arcp contract float 0.000000e+00, %tmp8
	%.i188 = fadd float %tmp9, 0.000000e+00	%.i188 = fadd float %tmp9, 0.000000e+00
	%tmp10 = fcmp uge float %.i188, %tmp2	%tmp10 = fcmp uge float %.i188, %tmp2
	%tmp11 = fsub float -0.000000e+00, %.i188	%tmp11 = fneg float %.i188
	%.i092 = select i1 %tmp10, float %tmp2, float %tmp11	%.i092 = select i1 %tmp10, float %tmp2, float %tmp11
	%tmp12 = fcmp ule float %.i092, 0.000000e+00	%tmp12 = fcmp ule float %.i092, 0.000000e+00
	%.i198 = select i1 %tmp12, float 0.000000e+00, float 0x7FF8000000000000	%.i198 = select i1 %tmp12, float 0.000000e+00, float 0x7FF8000000000000
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%mul = fmul float %a, %b	%mul = fmul float %a, %b
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%mul = fmul float %a, %b	%mul = fmul float %a, %b
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %mul, float addrspace(1)* %out	store volatile float %mul, float addrspace(1)* %out
	ret void	ret void
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%mul = fmul float %a, %b	%mul = fmul float %a, %b
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	%use1 = fmul float %mul, 4.0	%use1 = fmul float %mul, 4.0
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %use1, float addrspace(1)* %out	store volatile float %use1, float addrspace(1)* %out
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%mul = fmul float %fneg.a, %b	%mul = fmul float %fneg.a, %b
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.b = fsub float -0.000000e+00, %b	%fneg.b = fneg float %b
	%mul = fmul float %a, %fneg.b	%mul = fmul float %a, %fneg.b
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%fneg.b = fsub float -0.000000e+00, %b	%fneg.b = fneg float %b
	%mul = fmul float %fneg.a, %fneg.b	%mul = fmul float %fneg.a, %fneg.b
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%mul = fmul float %fneg.a, %b	%mul = fmul float %fneg.a, %b
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %fneg.a, float addrspace(1)* %out	store volatile float %fneg.a, float addrspace(1)* %out
	ret void	ret void
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%mul = fmul float %fneg.a, %b	%mul = fmul float %fneg.a, %b
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	%use1 = fmul float %fneg.a, %c	%use1 = fmul float %fneg.a, %c
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %use1, float addrspace(1)* %out	store volatile float %use1, float addrspace(1)* %out
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%min = call float @llvm.minnum.f32(float %a, float %b)	%min = call float @llvm.minnum.f32(float %a, float %b)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps float @v_fneg_minnum_f32_no_ieee(float %a, float %b) #0 {	define amdgpu_ps float @v_fneg_minnum_f32_no_ieee(float %a, float %b) #0 {
	%min = call float @llvm.minnum.f32(float %a, float %b)	%min = call float @llvm.minnum.f32(float %a, float %b)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	ret float %fneg	ret float %fneg
	}	}

Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%min = call float @llvm.minnum.f32(float %a, float %a)	%min = call float @llvm.minnum.f32(float %a, float %a)
	%min.fneg = fsub float -0.0, %min	%min.fneg = fneg float %min
	store float %min.fneg, float addrspace(1)* %out.gep	store float %min.fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps float @v_fneg_self_minnum_f32_no_ieee(float %a) #0 {	define amdgpu_ps float @v_fneg_self_minnum_f32_no_ieee(float %a) #0 {
	%min = call float @llvm.minnum.f32(float %a, float %a)	%min = call float @llvm.minnum.f32(float %a, float %a)
	%min.fneg = fsub float -0.0, %min	%min.fneg = fneg float %min
	ret float %min.fneg	ret float %min.fneg
	}	}

Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%min = call float @llvm.minnum.f32(float 4.0, float %a)	%min = call float @llvm.minnum.f32(float 4.0, float %a)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps float @v_fneg_posk_minnum_f32_no_ieee(float %a) #0 {	define amdgpu_ps float @v_fneg_posk_minnum_f32_no_ieee(float %a) #0 {
	%min = call float @llvm.minnum.f32(float 4.0, float %a)	%min = call float @llvm.minnum.f32(float 4.0, float %a)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	ret float %fneg	ret float %fneg
	}	}

Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%min = call float @llvm.minnum.f32(float -4.0, float %a)	%min = call float @llvm.minnum.f32(float -4.0, float %a)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps float @v_fneg_negk_minnum_f32_no_ieee(float %a) #0 {	define amdgpu_ps float @v_fneg_negk_minnum_f32_no_ieee(float %a) #0 {
	%min = call float @llvm.minnum.f32(float -4.0, float %a)	%min = call float @llvm.minnum.f32(float -4.0, float %a)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	ret float %fneg	ret float %fneg
	}	}

Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%min = call float @llvm.minnum.f32(float 0.0, float %a)	%min = call float @llvm.minnum.f32(float 0.0, float %a)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%min = call float @llvm.minnum.f32(float -0.0, float %a)	%min = call float @llvm.minnum.f32(float -0.0, float %a)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%min = call float @llvm.minnum.f32(float 0x3FC45F3060000000, float %a)	%min = call float @llvm.minnum.f32(float 0x3FC45F3060000000, float %a)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%min = call float @llvm.minnum.f32(float 0xBFC45F3060000000, float %a)	%min = call float @llvm.minnum.f32(float 0xBFC45F3060000000, float %a)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps float @v_fneg_neg0_minnum_f32_no_ieee(float %a) #0 {	define amdgpu_ps float @v_fneg_neg0_minnum_f32_no_ieee(float %a) #0 {
	%min = call float @llvm.minnum.f32(float -0.0, float %a)	%min = call float @llvm.minnum.f32(float -0.0, float %a)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	ret float %fneg	ret float %fneg
	}	}

Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%min = call float @llvm.minnum.f32(float 0.0, float %a)	%min = call float @llvm.minnum.f32(float 0.0, float %a)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	%mul = fmul float %fneg, %b	%mul = fmul float %fneg, %b
	store float %mul, float addrspace(1)* %out.gep	store float %mul, float addrspace(1)* %out.gep
	ret void	ret void
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%min = call float @llvm.minnum.f32(float 0x3FC45F3060000000, float %a)	%min = call float @llvm.minnum.f32(float 0x3FC45F3060000000, float %a)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	%mul = fmul float %fneg, %b	%mul = fmul float %fneg, %b
	store float %mul, float addrspace(1)* %out.gep	store float %mul, float addrspace(1)* %out.gep
	ret void	ret void
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps float @v_fneg_0_minnum_foldable_use_f32_no_ieee(float %a, float %b) #0 {	define amdgpu_ps float @v_fneg_0_minnum_foldable_use_f32_no_ieee(float %a, float %b) #0 {
	%min = call float @llvm.minnum.f32(float 0.0, float %a)	%min = call float @llvm.minnum.f32(float 0.0, float %a)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	%mul = fmul float %fneg, %b	%mul = fmul float %fneg, %b
	ret float %mul	ret float %mul
	}	}
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%min = call float @llvm.minnum.f32(float %a, float %b)	%min = call float @llvm.minnum.f32(float %a, float %b)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	%use1 = fmul float %min, 4.0	%use1 = fmul float %min, 4.0
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %use1, float addrspace(1)* %out	store volatile float %use1, float addrspace(1)* %out
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps <2 x float> @v_fneg_minnum_multi_use_minnum_f32_no_ieee(float %a, float %b) #0 {	define amdgpu_ps <2 x float> @v_fneg_minnum_multi_use_minnum_f32_no_ieee(float %a, float %b) #0 {
	%min = call float @llvm.minnum.f32(float %a, float %b)	%min = call float @llvm.minnum.f32(float %a, float %b)
	%fneg = fsub float -0.000000e+00, %min	%fneg = fneg float %min
	%use1 = fmul float %min, 4.0	%use1 = fmul float %min, 4.0
	%ins0 = insertelement <2 x float> undef, float %fneg, i32 0	%ins0 = insertelement <2 x float> undef, float %fneg, i32 0
	%ins1 = insertelement <2 x float> %ins0, float %use1, i32 1	%ins1 = insertelement <2 x float> %ins0, float %use1, i32 1
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%max = call float @llvm.maxnum.f32(float %a, float %b)	%max = call float @llvm.maxnum.f32(float %a, float %b)
	%fneg = fsub float -0.000000e+00, %max	%fneg = fneg float %max
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps float @v_fneg_maxnum_f32_no_ieee(float %a, float %b) #0 {	define amdgpu_ps float @v_fneg_maxnum_f32_no_ieee(float %a, float %b) #0 {
	%max = call float @llvm.maxnum.f32(float %a, float %b)	%max = call float @llvm.maxnum.f32(float %a, float %b)
	%fneg = fsub float -0.000000e+00, %max	%fneg = fneg float %max
	ret float %fneg	ret float %fneg
	}	}

Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%max = call float @llvm.maxnum.f32(float %a, float %a)	%max = call float @llvm.maxnum.f32(float %a, float %a)
	%max.fneg = fsub float -0.0, %max	%max.fneg = fneg float %max
	store float %max.fneg, float addrspace(1)* %out.gep	store float %max.fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps float @v_fneg_self_maxnum_f32_no_ieee(float %a) #0 {	define amdgpu_ps float @v_fneg_self_maxnum_f32_no_ieee(float %a) #0 {
	%max = call float @llvm.maxnum.f32(float %a, float %a)	%max = call float @llvm.maxnum.f32(float %a, float %a)
	%max.fneg = fsub float -0.0, %max	%max.fneg = fneg float %max
	ret float %max.fneg	ret float %max.fneg
	}	}

Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%max = call float @llvm.maxnum.f32(float 4.0, float %a)	%max = call float @llvm.maxnum.f32(float 4.0, float %a)
	%fneg = fsub float -0.000000e+00, %max	%fneg = fneg float %max
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps float @v_fneg_posk_maxnum_f32_no_ieee(float %a) #0 {	define amdgpu_ps float @v_fneg_posk_maxnum_f32_no_ieee(float %a) #0 {
	%max = call float @llvm.maxnum.f32(float 4.0, float %a)	%max = call float @llvm.maxnum.f32(float 4.0, float %a)
	%fneg = fsub float -0.000000e+00, %max	%fneg = fneg float %max
	ret float %fneg	ret float %fneg
	}	}

Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%max = call float @llvm.maxnum.f32(float -4.0, float %a)	%max = call float @llvm.maxnum.f32(float -4.0, float %a)
	%fneg = fsub float -0.000000e+00, %max	%fneg = fneg float %max
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps float @v_fneg_negk_maxnum_f32_no_ieee(float %a) #0 {	define amdgpu_ps float @v_fneg_negk_maxnum_f32_no_ieee(float %a) #0 {
	%max = call float @llvm.maxnum.f32(float -4.0, float %a)	%max = call float @llvm.maxnum.f32(float -4.0, float %a)
	%fneg = fsub float -0.000000e+00, %max	%fneg = fneg float %max
	ret float %fneg	ret float %fneg
	}	}

Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%max = call float @llvm.maxnum.f32(float 0.0, float %a)	%max = call float @llvm.maxnum.f32(float 0.0, float %a)
	%fneg = fsub float -0.000000e+00, %max	%fneg = fneg float %max
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%max = call float @llvm.maxnum.f32(float -0.0, float %a)	%max = call float @llvm.maxnum.f32(float -0.0, float %a)
	%fneg = fsub float -0.000000e+00, %max	%fneg = fneg float %max
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps float @v_fneg_neg0_maxnum_f32_no_ieee(float %a) #0 {	define amdgpu_ps float @v_fneg_neg0_maxnum_f32_no_ieee(float %a) #0 {
	%max = call float @llvm.maxnum.f32(float -0.0, float %a)	%max = call float @llvm.maxnum.f32(float -0.0, float %a)
	%fneg = fsub float -0.000000e+00, %max	%fneg = fneg float %max
	ret float %fneg	ret float %fneg
	}	}

Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%max = call float @llvm.maxnum.f32(float 0.0, float %a)	%max = call float @llvm.maxnum.f32(float 0.0, float %a)
	%fneg = fsub float -0.000000e+00, %max	%fneg = fneg float %max
	%mul = fmul float %fneg, %b	%mul = fmul float %fneg, %b
	store float %mul, float addrspace(1)* %out.gep	store float %mul, float addrspace(1)* %out.gep
	ret void	ret void
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps float @v_fneg_0_maxnum_foldable_use_f32_no_ieee(float %a, float %b) #0 {	define amdgpu_ps float @v_fneg_0_maxnum_foldable_use_f32_no_ieee(float %a, float %b) #0 {
	%max = call float @llvm.maxnum.f32(float 0.0, float %a)	%max = call float @llvm.maxnum.f32(float 0.0, float %a)
	%fneg = fsub float -0.000000e+00, %max	%fneg = fneg float %max
	%mul = fmul float %fneg, %b	%mul = fmul float %fneg, %b
	ret float %mul	ret float %mul
	}	}
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%max = call float @llvm.maxnum.f32(float %a, float %b)	%max = call float @llvm.maxnum.f32(float %a, float %b)
	%fneg = fsub float -0.000000e+00, %max	%fneg = fneg float %max
	%use1 = fmul float %max, 4.0	%use1 = fmul float %max, 4.0
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %use1, float addrspace(1)* %out	store volatile float %use1, float addrspace(1)* %out
Context not available.
	; GCN-NEXT: ; return	; GCN-NEXT: ; return
	define amdgpu_ps <2 x float> @v_fneg_maxnum_multi_use_maxnum_f32_no_ieee(float %a, float %b) #0 {	define amdgpu_ps <2 x float> @v_fneg_maxnum_multi_use_maxnum_f32_no_ieee(float %a, float %b) #0 {
	%max = call float @llvm.maxnum.f32(float %a, float %b)	%max = call float @llvm.maxnum.f32(float %a, float %b)
	%fneg = fsub float -0.000000e+00, %max	%fneg = fneg float %max
	%use1 = fmul float %max, 4.0	%use1 = fmul float %max, 4.0
	%ins0 = insertelement <2 x float> undef, float %fneg, i32 0	%ins0 = insertelement <2 x float> undef, float %fneg, i32 0
	%ins1 = insertelement <2 x float> %ins0, float %use1, i32 1	%ins1 = insertelement <2 x float> %ins0, float %use1, i32 1
Context not available.
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%fma = call float @llvm.fma.f32(float %a, float %b, float %c)	%fma = call float @llvm.fma.f32(float %a, float %b, float %c)
	%fneg = fsub float -0.000000e+00, %fma	%fneg = fneg float %fma
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%fma = call float @llvm.fma.f32(float %a, float %b, float %c)	%fma = call float @llvm.fma.f32(float %a, float %b, float %c)
	%fneg = fsub float -0.000000e+00, %fma	%fneg = fneg float %fma
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %fma, float addrspace(1)* %out	store volatile float %fma, float addrspace(1)* %out
	ret void	ret void
Context not available.
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%fma = call float @llvm.fma.f32(float %a, float %b, float %c)	%fma = call float @llvm.fma.f32(float %a, float %b, float %c)
	%fneg = fsub float -0.000000e+00, %fma	%fneg = fneg float %fma
	%use1 = fmul float %fma, 4.0	%use1 = fmul float %fma, 4.0
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %use1, float addrspace(1)* %out	store volatile float %use1, float addrspace(1)* %out
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%fma = call float @llvm.fma.f32(float %fneg.a, float %b, float %c)	%fma = call float @llvm.fma.f32(float %fneg.a, float %b, float %c)
	%fneg = fsub float -0.000000e+00, %fma	%fneg = fneg float %fma
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%fneg.b = fsub float -0.000000e+00, %b	%fneg.b = fneg float %b
	%fma = call float @llvm.fma.f32(float %a, float %fneg.b, float %c)	%fma = call float @llvm.fma.f32(float %a, float %fneg.b, float %c)
	%fneg = fsub float -0.000000e+00, %fma	%fneg = fneg float %fma
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%fneg.b = fsub float -0.000000e+00, %b	%fneg.b = fneg float %b
	%fma = call float @llvm.fma.f32(float %fneg.a, float %fneg.b, float %c)	%fma = call float @llvm.fma.f32(float %fneg.a, float %fneg.b, float %c)
	%fneg = fsub float -0.000000e+00, %fma	%fneg = fneg float %fma
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%fneg.c = fsub float -0.000000e+00, %c	%fneg.c = fneg float %c
	%fma = call float @llvm.fma.f32(float %fneg.a, float %b, float %fneg.c)	%fma = call float @llvm.fma.f32(float %fneg.a, float %b, float %fneg.c)
	%fneg = fsub float -0.000000e+00, %fma	%fneg = fneg float %fma
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%fneg.c = fsub float -0.000000e+00, %c	%fneg.c = fneg float %c
	%fma = call float @llvm.fma.f32(float %a, float %b, float %fneg.c)	%fma = call float @llvm.fma.f32(float %a, float %b, float %fneg.c)
	%fneg = fsub float -0.000000e+00, %fma	%fneg = fneg float %fma
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%fma = call float @llvm.fma.f32(float %fneg.a, float %b, float %c)	%fma = call float @llvm.fma.f32(float %fneg.a, float %b, float %c)
	%fneg = fsub float -0.000000e+00, %fma	%fneg = fneg float %fma
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %fneg.a, float addrspace(1)* %out	store volatile float %fneg.a, float addrspace(1)* %out
	ret void	ret void
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%fma = call float @llvm.fma.f32(float %fneg.a, float %b, float %c)	%fma = call float @llvm.fma.f32(float %fneg.a, float %b, float %c)
	%fneg = fsub float -0.000000e+00, %fma	%fneg = fneg float %fma
	%use1 = fmul float %fneg.a, %d	%use1 = fmul float %fneg.a, %d
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %use1, float addrspace(1)* %out	store volatile float %use1, float addrspace(1)* %out
Context not available.
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%fma = call float @llvm.fmuladd.f32(float %a, float %b, float %c)	%fma = call float @llvm.fmuladd.f32(float %a, float %b, float %c)
	%fneg = fsub float -0.000000e+00, %fma	%fneg = fneg float %fma
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%fma = call float @llvm.fmuladd.f32(float %a, float %b, float %c)	%fma = call float @llvm.fmuladd.f32(float %a, float %b, float %c)
	%fneg = fsub float -0.000000e+00, %fma	%fneg = fneg float %fma
	%use1 = fmul float %fma, 4.0	%use1 = fmul float %fma, 4.0
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %use1, float addrspace(1)* %out	store volatile float %use1, float addrspace(1)* %out
Context not available.
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds double, double addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds double, double addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%fpext = fpext float %fneg.a to double	%fpext = fpext float %fneg.a to double
	%fneg = fsub double -0.000000e+00, %fpext	%fneg = fsub double -0.000000e+00, %fpext
	store double %fneg, double addrspace(1)* %out.gep	store double %fneg, double addrspace(1)* %out.gep
Context not available.
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds double, double addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds double, double addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%fpext = fpext float %fneg.a to double	%fpext = fpext float %fneg.a to double
	%fneg = fsub double -0.000000e+00, %fpext	%fneg = fsub double -0.000000e+00, %fpext
	store volatile double %fneg, double addrspace(1)* %out.gep	store volatile double %fneg, double addrspace(1)* %out.gep
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile half, half addrspace(1)* %a.gep	%a = load volatile half, half addrspace(1)* %a.gep
	%fpext = fpext half %a to float	%fpext = fpext half %a to float
	%fneg = fsub float -0.000000e+00, %fpext	%fneg = fneg float %fpext
	store volatile float %fneg, float addrspace(1)* %out.gep	store volatile float %fneg, float addrspace(1)* %out.gep
	store volatile float %fpext, float addrspace(1)* %out.gep	store volatile float %fpext, float addrspace(1)* %out.gep
	ret void	ret void
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile half, half addrspace(1)* %a.gep	%a = load volatile half, half addrspace(1)* %a.gep
	%fpext = fpext half %a to float	%fpext = fpext half %a to float
	%fneg = fsub float -0.000000e+00, %fpext	%fneg = fneg float %fpext
	%mul = fmul float %fpext, 4.0	%mul = fmul float %fpext, 4.0
	store volatile float %fneg, float addrspace(1)* %out.gep	store volatile float %fneg, float addrspace(1)* %out.gep
	store volatile float %mul, float addrspace(1)* %out.gep	store volatile float %mul, float addrspace(1)* %out.gep
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile double, double addrspace(1)* %a.gep	%a = load volatile double, double addrspace(1)* %a.gep
	%fpround = fptrunc double %a to float	%fpround = fptrunc double %a to float
	%fneg = fsub float -0.000000e+00, %fpround	%fneg = fneg float %fpround
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%a = load volatile double, double addrspace(1)* %a.gep	%a = load volatile double, double addrspace(1)* %a.gep
	%fneg.a = fsub double -0.000000e+00, %a	%fneg.a = fsub double -0.000000e+00, %a
	%fpround = fptrunc double %fneg.a to float	%fpround = fptrunc double %fneg.a to float
	%fneg = fsub float -0.000000e+00, %fpround	%fneg = fneg float %fpround
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%a = load volatile double, double addrspace(1)* %a.gep	%a = load volatile double, double addrspace(1)* %a.gep
	%fneg.a = fsub double -0.000000e+00, %a	%fneg.a = fsub double -0.000000e+00, %a
	%fpround = fptrunc double %fneg.a to float	%fpround = fptrunc double %fneg.a to float
	%fneg = fsub float -0.000000e+00, %fpround	%fneg = fneg float %fpround
	store volatile float %fneg, float addrspace(1)* %out.gep	store volatile float %fneg, float addrspace(1)* %out.gep
	store volatile double %fneg.a, double addrspace(1)* undef	store volatile double %fneg.a, double addrspace(1)* undef
	ret void	ret void
Context not available.
	%a = load volatile double, double addrspace(1)* %a.gep	%a = load volatile double, double addrspace(1)* %a.gep
	%fneg.a = fsub double -0.000000e+00, %a	%fneg.a = fsub double -0.000000e+00, %a
	%fpround = fptrunc double %fneg.a to float	%fpround = fptrunc double %fneg.a to float
	%fneg = fsub float -0.000000e+00, %fpround	%fneg = fneg float %fpround
	%use1 = fmul double %fneg.a, %c	%use1 = fmul double %fneg.a, %c
	store volatile float %fneg, float addrspace(1)* %out.gep	store volatile float %fneg, float addrspace(1)* %out.gep
	store volatile double %use1, double addrspace(1)* undef	store volatile double %use1, double addrspace(1)* undef
Context not available.
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds half, half addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds half, half addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%fpround = fptrunc float %fneg.a to half	%fpround = fptrunc float %fneg.a to half
	%fneg = fsub half -0.000000e+00, %fpround	%fneg = fsub half -0.000000e+00, %fpround
	store half %fneg, half addrspace(1)* %out.gep	store half %fneg, half addrspace(1)* %out.gep
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile double, double addrspace(1)* %a.gep	%a = load volatile double, double addrspace(1)* %a.gep
	%fpround = fptrunc double %a to float	%fpround = fptrunc double %a to float
	%fneg = fsub float -0.000000e+00, %fpround	%fneg = fneg float %fpround
	store volatile float %fneg, float addrspace(1)* %out.gep	store volatile float %fneg, float addrspace(1)* %out.gep
	store volatile float %fpround, float addrspace(1)* %out.gep	store volatile float %fpround, float addrspace(1)* %out.gep
	ret void	ret void
Context not available.
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds half, half addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds half, half addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%fpround = fptrunc float %fneg.a to half	%fpround = fptrunc float %fneg.a to half
	%fneg = fsub half -0.000000e+00, %fpround	%fneg = fsub half -0.000000e+00, %fpround
	store volatile half %fneg, half addrspace(1)* %out.gep	store volatile half %fneg, half addrspace(1)* %out.gep
Context not available.
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds half, half addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds half, half addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%fpround = fptrunc float %fneg.a to half	%fpround = fptrunc float %fneg.a to half
	%fneg = fsub half -0.000000e+00, %fpround	%fneg = fsub half -0.000000e+00, %fpround
	%use1 = fmul float %fneg.a, %c	%use1 = fmul float %fneg.a, %c
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%rcp = call float @llvm.amdgcn.rcp.f32(float %a)	%rcp = call float @llvm.amdgcn.rcp.f32(float %a)
	%fneg = fsub float -0.000000e+00, %rcp	%fneg = fneg float %rcp
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%rcp = call float @llvm.amdgcn.rcp.f32(float %fneg.a)	%rcp = call float @llvm.amdgcn.rcp.f32(float %fneg.a)
	%fneg = fsub float -0.000000e+00, %rcp	%fneg = fneg float %rcp
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%rcp = call float @llvm.amdgcn.rcp.f32(float %fneg.a)	%rcp = call float @llvm.amdgcn.rcp.f32(float %fneg.a)
	%fneg = fsub float -0.000000e+00, %rcp	%fneg = fneg float %rcp
	store volatile float %fneg, float addrspace(1)* %out.gep	store volatile float %fneg, float addrspace(1)* %out.gep
	store volatile float %fneg.a, float addrspace(1)* undef	store volatile float %fneg.a, float addrspace(1)* undef
	ret void	ret void
Context not available.
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%rcp = call float @llvm.amdgcn.rcp.f32(float %fneg.a)	%rcp = call float @llvm.amdgcn.rcp.f32(float %fneg.a)
	%fneg = fsub float -0.000000e+00, %rcp	%fneg = fneg float %rcp
	%use1 = fmul float %fneg.a, %c	%use1 = fmul float %fneg.a, %c
	store volatile float %fneg, float addrspace(1)* %out.gep	store volatile float %fneg, float addrspace(1)* %out.gep
	store volatile float %use1, float addrspace(1)* undef	store volatile float %use1, float addrspace(1)* undef
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%mul = call float @llvm.amdgcn.fmul.legacy(float %a, float %b)	%mul = call float @llvm.amdgcn.fmul.legacy(float %a, float %b)
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%mul = call float @llvm.amdgcn.fmul.legacy(float %a, float %b)	%mul = call float @llvm.amdgcn.fmul.legacy(float %a, float %b)
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %mul, float addrspace(1)* %out	store volatile float %mul, float addrspace(1)* %out
	ret void	ret void
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%mul = call float @llvm.amdgcn.fmul.legacy(float %a, float %b)	%mul = call float @llvm.amdgcn.fmul.legacy(float %a, float %b)
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	%use1 = call float @llvm.amdgcn.fmul.legacy(float %mul, float 4.0)	%use1 = call float @llvm.amdgcn.fmul.legacy(float %mul, float 4.0)
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %use1, float addrspace(1)* %out	store volatile float %use1, float addrspace(1)* %out
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%mul = call float @llvm.amdgcn.fmul.legacy(float %fneg.a, float %b)	%mul = call float @llvm.amdgcn.fmul.legacy(float %fneg.a, float %b)
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.b = fsub float -0.000000e+00, %b	%fneg.b = fneg float %b
	%mul = call float @llvm.amdgcn.fmul.legacy(float %a, float %fneg.b)	%mul = call float @llvm.amdgcn.fmul.legacy(float %a, float %fneg.b)
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%fneg.b = fsub float -0.000000e+00, %b	%fneg.b = fneg float %b
	%mul = call float @llvm.amdgcn.fmul.legacy(float %fneg.a, float %fneg.b)	%mul = call float @llvm.amdgcn.fmul.legacy(float %fneg.a, float %fneg.b)
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%mul = call float @llvm.amdgcn.fmul.legacy(float %fneg.a, float %b)	%mul = call float @llvm.amdgcn.fmul.legacy(float %fneg.a, float %b)
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %fneg.a, float addrspace(1)* %out	store volatile float %fneg.a, float addrspace(1)* %out
	ret void	ret void
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%fneg.a = fsub float -0.000000e+00, %a	%fneg.a = fneg float %a
	%mul = call float @llvm.amdgcn.fmul.legacy(float %fneg.a, float %b)	%mul = call float @llvm.amdgcn.fmul.legacy(float %fneg.a, float %b)
	%fneg = fsub float -0.000000e+00, %mul	%fneg = fneg float %mul
	%use1 = call float @llvm.amdgcn.fmul.legacy(float %fneg.a, float %c)	%use1 = call float @llvm.amdgcn.fmul.legacy(float %fneg.a, float %c)
	store volatile float %fneg, float addrspace(1)* %out	store volatile float %fneg, float addrspace(1)* %out
	store volatile float %use1, float addrspace(1)* %out	store volatile float %use1, float addrspace(1)* %out
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%sin = call float @llvm.sin.f32(float %a)	%sin = call float @llvm.sin.f32(float %a)
	%fneg = fsub float -0.000000e+00, %sin	%fneg = fneg float %sin
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%sin = call float @llvm.amdgcn.sin.f32(float %a)	%sin = call float @llvm.amdgcn.sin.f32(float %a)
	%fneg = fsub float -0.0, %sin	%fneg = fneg float %sin
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%trunc = call float @llvm.trunc.f32(float %a)	%trunc = call float @llvm.trunc.f32(float %a)
	%fneg = fsub float -0.0, %trunc	%fneg = fneg float %trunc
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%round = call float @llvm.round.f32(float %a)	%round = call float @llvm.round.f32(float %a)
	%fneg = fsub float -0.0, %round	%fneg = fneg float %round
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%rint = call float @llvm.rint.f32(float %a)	%rint = call float @llvm.rint.f32(float %a)
	%fneg = fsub float -0.0, %rint	%fneg = fneg float %rint
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%nearbyint = call float @llvm.nearbyint.f32(float %a)	%nearbyint = call float @llvm.nearbyint.f32(float %a)
	%fneg = fsub float -0.0, %nearbyint	%fneg = fneg float %nearbyint
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext	%out.gep = getelementptr inbounds float, float addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%trunc = call float @llvm.canonicalize.f32(float %a)	%trunc = call float @llvm.canonicalize.f32(float %a)
	%fneg = fsub float -0.0, %trunc	%fneg = fneg float %trunc
	store float %fneg, float addrspace(1)* %out.gep	store float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
	}	}
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%mul = fmul float %a, %b	%mul = fmul float %a, %b
	%fneg = fsub float -0.0, %mul	%fneg = fneg float %mul
	%intrp0 = call float @llvm.amdgcn.interp.p1(float %fneg, i32 0, i32 0, i32 0)	%intrp0 = call float @llvm.amdgcn.interp.p1(float %fneg, i32 0, i32 0, i32 0)
	%intrp1 = call float @llvm.amdgcn.interp.p1(float %fneg, i32 1, i32 0, i32 0)	%intrp1 = call float @llvm.amdgcn.interp.p1(float %fneg, i32 1, i32 0, i32 0)
	store volatile float %intrp0, float addrspace(1)* %out.gep	store volatile float %intrp0, float addrspace(1)* %out.gep
Context not available.
	%a = load volatile float, float addrspace(1)* %a.gep	%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%mul = fmul float %a, %b	%mul = fmul float %a, %b
	%fneg = fsub float -0.0, %mul	%fneg = fneg float %mul
	%intrp0 = call float @llvm.amdgcn.interp.p2(float 4.0, float %fneg, i32 0, i32 0, i32 0)	%intrp0 = call float @llvm.amdgcn.interp.p2(float 4.0, float %fneg, i32 0, i32 0, i32 0)
	%intrp1 = call float @llvm.amdgcn.interp.p2(float 4.0, float %fneg, i32 1, i32 0, i32 0)	%intrp1 = call float @llvm.amdgcn.interp.p2(float 4.0, float %fneg, i32 1, i32 0, i32 0)
	store volatile float %intrp0, float addrspace(1)* %out.gep	store volatile float %intrp0, float addrspace(1)* %out.gep
Context not available.
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%mul = fmul float %a, %b	%mul = fmul float %a, %b
	%fneg = fsub float -0.0, %mul	%fneg = fneg float %mul
	%cmp0 = icmp eq i32 %d, 0	%cmp0 = icmp eq i32 %d, 0
	br i1 %cmp0, label %if, label %endif	br i1 %cmp0, label %if, label %endif

Context not available.
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%mul = fmul float %a, %b	%mul = fmul float %a, %b
	%fneg = fsub float -0.0, %mul	%fneg = fneg float %mul
	call void asm sideeffect "; use $0", "v"(float %fneg) #0	call void asm sideeffect "; use $0", "v"(float %fneg) #0
	store volatile float %fneg, float addrspace(1)* %out.gep	store volatile float %fneg, float addrspace(1)* %out.gep
	ret void	ret void
Context not available.
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep
	%mul = fmul float %a, %b	%mul = fmul float %a, %b
	%fneg = fsub float -0.0, %mul	%fneg = fneg float %mul
	call void asm sideeffect "; use $0", "v"(float %fneg) #0	call void asm sideeffect "; use $0", "v"(float %fneg) #0
	store volatile float %mul, float addrspace(1)* %out.gep	store volatile float %mul, float addrspace(1)* %out.gep
	ret void	ret void
Context not available.
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep

	%fneg.a = fsub float -0.0, %a	%fneg.a = fneg float %a
	%fma0 = call float @llvm.fma.f32(float %fneg.a, float %b, float %c)	%fma0 = call float @llvm.fma.f32(float %fneg.a, float %b, float %c)
	%fma1 = call float @llvm.fma.f32(float %fneg.a, float %c, float 2.0)	%fma1 = call float @llvm.fma.f32(float %fneg.a, float %c, float 2.0)

Context not available.
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep

	%fneg.a = fsub float -0.0, %a	%fneg.a = fneg float %a
	%mul0 = fmul float %fneg.a, %b	%mul0 = fmul float %fneg.a, %b
	%mul1 = fmul float %fneg.a, %c	%mul1 = fmul float %fneg.a, %c

Context not available.
	%b = load volatile float, float addrspace(1)* %b.gep	%b = load volatile float, float addrspace(1)* %b.gep
	%c = load volatile float, float addrspace(1)* %c.gep	%c = load volatile float, float addrspace(1)* %c.gep

	%fneg.a = fsub float -0.0, %a	%fneg.a = fneg float %a
	%fma0 = call float @llvm.fma.f32(float %fneg.a, float %b, float 2.0)	%fma0 = call float @llvm.fma.f32(float %fneg.a, float %b, float 2.0)
	%mul1 = fmul float %fneg.a, %c	%mul1 = fmul float %fneg.a, %c

Context not available.
	%d = load volatile float, float addrspace(1)* %d.gep	%d = load volatile float, float addrspace(1)* %d.gep

	%fma0 = call float @llvm.fma.f32(float %a, float %b, float 2.0)	%fma0 = call float @llvm.fma.f32(float %a, float %b, float 2.0)
	%fneg.fma0 = fsub float -0.0, %fma0	%fneg.fma0 = fneg float %fma0
	%mul1 = fmul float %fneg.fma0, %c	%mul1 = fmul float %fneg.fma0, %c
	%mul2 = fmul float %fneg.fma0, %d	%mul2 = fmul float %fneg.fma0, %d

Context not available.
	%d = load volatile float, float addrspace(1)* %d.gep	%d = load volatile float, float addrspace(1)* %d.gep

	%trunc.a = call float @llvm.trunc.f32(float %a)	%trunc.a = call float @llvm.trunc.f32(float %a)
	%trunc.fneg.a = fsub float -0.0, %trunc.a	%trunc.fneg.a = fneg float %trunc.a
	%fma0 = call float @llvm.fma.f32(float %trunc.fneg.a, float %b, float %c)	%fma0 = call float @llvm.fma.f32(float %trunc.fneg.a, float %b, float %c)
	store volatile float %fma0, float addrspace(1)* %out	store volatile float %fma0, float addrspace(1)* %out
	ret void	ret void
Context not available.
	%d = load volatile float, float addrspace(1)* %d.gep	%d = load volatile float, float addrspace(1)* %d.gep

	%trunc.a = call float @llvm.trunc.f32(float %a)	%trunc.a = call float @llvm.trunc.f32(float %a)
	%trunc.fneg.a = fsub float -0.0, %trunc.a	%trunc.fneg.a = fneg float %trunc.a
	%fma0 = call float @llvm.fma.f32(float %trunc.fneg.a, float %b, float %c)	%fma0 = call float @llvm.fma.f32(float %trunc.fneg.a, float %b, float %c)
	%mul1 = fmul float %trunc.a, %d	%mul1 = fmul float %trunc.a, %d
	store volatile float %fma0, float addrspace(1)* %out	store volatile float %fma0, float addrspace(1)* %out
Context not available.

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fmed3.ll

Context not available.

	; GCN-LABEL: {{^}}test_fneg_fmed3_multi_use:	; GCN-LABEL: {{^}}test_fneg_fmed3_multi_use:
	; GCN: v_med3_f32 [[MED3:v[0-9]+]], -s{{[0-9]+}}, -v{{[0-9]+}}, -v{{[0-9]+}}	; GCN: v_med3_f32 [[MED3:v[0-9]+]], -s{{[0-9]+}}, -v{{[0-9]+}}, -v{{[0-9]+}}
	; GCN: v_mul_f32_e32 v{{[0-9]+}}, -4.0, [[MED3]]	; GCN: v_mul_f32_e64 v{{[0-9]+}}, -[[MED3]], 4.0
	define amdgpu_kernel void @test_fneg_fmed3_multi_use(float addrspace(1)* %out, float %src0, float %src1, float %src2) #1 {	define amdgpu_kernel void @test_fneg_fmed3_multi_use(float addrspace(1)* %out, float %src0, float %src1, float %src2) #1 {
	%med3 = call float @llvm.amdgcn.fmed3.f32(float %src0, float %src1, float %src2)	%med3 = call float @llvm.amdgcn.fmed3.f32(float %src0, float %src1, float %src2)
	%neg.med3 = fsub float -0.0, %med3	%neg.med3 = fsub float -0.0, %med3
Context not available.

llvm/test/CodeGen/AMDGPU/selectcc-opt.ll

Context not available.
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

		; FIXME: Not sure what to do about these tests. The FSUB(-0.0,X) is being
		; folded into the select, before there's a chance to convert to FNEG(X).

	; FUNC-LABEL: {{^}}test_a:	; FUNC-LABEL: {{^}}test_a:
	; EG-NOT: CND	; EG-NOT: CND
Context not available.
	entry:	entry:
	%0 = fcmp olt float %in, 0.000000e+00	%0 = fcmp olt float %in, 0.000000e+00
	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
	%2 = fsub float -0.000000e+00, %1	%2 = fneg float %1
	%3 = fptosi float %2 to i32	%3 = fptosi float %2 to i32
	%4 = bitcast i32 %3 to float	%4 = bitcast i32 %3 to float
	%5 = bitcast float %4 to i32	%5 = bitcast float %4 to i32
Context not available.
	entry:	entry:
	%0 = fcmp olt float %in, 0.0	%0 = fcmp olt float %in, 0.0
	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
	%2 = fsub float -0.000000e+00, %1	%2 = fneg float %1
	%3 = fptosi float %2 to i32	%3 = fptosi float %2 to i32
	%4 = bitcast i32 %3 to float	%4 = bitcast i32 %3 to float
	%5 = bitcast float %4 to i32	%5 = bitcast float %4 to i32
Context not available.

llvm/test/CodeGen/AMDGPU/set-dx10.ll

Context not available.
	; to store integer true (-1) and false (0) values are lowered to one of the	; to store integer true (-1) and false (0) values are lowered to one of the
	; SET*DX10 instructions.	; SET*DX10 instructions.

		; FIXME: Not sure what to do about these tests. The FSUB(-0.0,X) is being
		; folded into the select, before there's a chance to convert to FNEG(X).

	; CHECK: {{^}}fcmp_une_select_fptosi:	; CHECK: {{^}}fcmp_une_select_fptosi:
	; CHECK: LSHR	; CHECK: LSHR
	; CHECK-NEXT: SETNE_DX10 * {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].Z, literal.y,	; CHECK-NEXT: SETNE_DX10 * {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].Z, literal.y,
Context not available.
	entry:	entry:
	%0 = fcmp une float %in, 5.0	%0 = fcmp une float %in, 5.0
	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
	%2 = fsub float -0.000000e+00, %1	%2 = fneg float %1
	%3 = fptosi float %2 to i32	%3 = fptosi float %2 to i32
	store i32 %3, i32 addrspace(1)* %out	store i32 %3, i32 addrspace(1)* %out
	ret void	ret void
Context not available.
	entry:	entry:
	%0 = fcmp oeq float %in, 5.0	%0 = fcmp oeq float %in, 5.0
	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
	%2 = fsub float -0.000000e+00, %1	%2 = fneg float %1
	%3 = fptosi float %2 to i32	%3 = fptosi float %2 to i32
	store i32 %3, i32 addrspace(1)* %out	store i32 %3, i32 addrspace(1)* %out
	ret void	ret void
Context not available.
	entry:	entry:
	%0 = fcmp ogt float %in, 5.0	%0 = fcmp ogt float %in, 5.0
	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
	%2 = fsub float -0.000000e+00, %1	%2 = fneg float %1
	%3 = fptosi float %2 to i32	%3 = fptosi float %2 to i32
	store i32 %3, i32 addrspace(1)* %out	store i32 %3, i32 addrspace(1)* %out
	ret void	ret void
Context not available.
	entry:	entry:
	%0 = fcmp oge float %in, 5.0	%0 = fcmp oge float %in, 5.0
	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
	%2 = fsub float -0.000000e+00, %1	%2 = fneg float %1
	%3 = fptosi float %2 to i32	%3 = fptosi float %2 to i32
	store i32 %3, i32 addrspace(1)* %out	store i32 %3, i32 addrspace(1)* %out
	ret void	ret void
Context not available.
	entry:	entry:
	%0 = fcmp ole float %in, 5.0	%0 = fcmp ole float %in, 5.0
	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
	%2 = fsub float -0.000000e+00, %1	%2 = fneg float %1
	%3 = fptosi float %2 to i32	%3 = fptosi float %2 to i32
	store i32 %3, i32 addrspace(1)* %out	store i32 %3, i32 addrspace(1)* %out
	ret void	ret void
Context not available.
	entry:	entry:
	%0 = fcmp olt float %in, 5.0	%0 = fcmp olt float %in, 5.0
	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00	%1 = select i1 %0, float 1.000000e+00, float 0.000000e+00
	%2 = fsub float -0.000000e+00, %1	%2 = fneg float %1
	%3 = fptosi float %2 to i32	%3 = fptosi float %2 to i32
	store i32 %3, i32 addrspace(1)* %out	store i32 %3, i32 addrspace(1)* %out
	ret void	ret void
Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

[WIP][FPEnv] Don't transform FSUB(-0.0,X)->FNEG(X) when flushing denormalsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 271192

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/test/CodeGen/AMDGPU/fdiv32-to-rcp-folding.ll

llvm/test/CodeGen/AMDGPU/fmuladd.f16.ll

llvm/test/CodeGen/AMDGPU/fmuladd.f32.ll

llvm/test/CodeGen/AMDGPU/fneg-combines.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fmed3.ll

llvm/test/CodeGen/AMDGPU/selectcc-opt.ll

llvm/test/CodeGen/AMDGPU/set-dx10.ll

[WIP][FPEnv] Don't transform FSUB(-0.0,X)->FNEG(X) when flushing denormals
AbandonedPublic