This is an archive of the discontinued LLVM Phabricator instance.

[InstSimplify] fp_binop X, NaN --> NaN
ClosedPublic

Authored by spatel on Mar 15 2018, 8:41 AM.

Details

Summary

Building on the recent clarification of FP in IR and identical to current undef operand folding, fold any FP binop with a NaN operand to NaN.

I don't know what to do with the AMDGPU tests that expected to use a NaN FP constant as an int value.

Diff Detail

Event Timeline

spatel created this revision.Mar 15 2018, 8:41 AM
mike.dvoretsky added inline comments.
lib/Analysis/InstructionSimplify.cpp
4168

Why are we emitting NaNs from an undef operand here?

4171

According to IEEE 754, we should preserve the payload bits of the NaN operand here (either one if both are NaN). This code creates a NaN with all payload bits unset instead.

spatel added a subscriber: scanon.Mar 16 2018, 9:33 AM
spatel added inline comments.
lib/Analysis/InstructionSimplify.cpp
4168
4171

As I understand it, preserving the payload is not required (cc @scanon ) - but we can do that if preferred. And maybe that will preserve the AMDGPU test behavior.

scanon added inline comments.Mar 16 2018, 9:37 AM
lib/Analysis/InstructionSimplify.cpp
4171

Preserving the payload is not required. It is *recommended* that the payload of the result be the payload of one of the operands, and if we can do that without additional complexity, let's do it. There is no defined rule for choosing which one, but ideally it should be commutative and associative.

Actual hardware implementations off the top of my head, in cases where both are NaN, either return the signaling NaN's payload if one is signaling, or return the "first" operand's payload ("first" in some machine-code assembly operand order), or return a default NaN (arm with DN bit set in FPCR, for example). Other behaviors are possible.

spatel updated this revision to Diff 138765.Mar 16 2018, 2:10 PM

Patch updated:
Propagate the existing NaN constant. We don't try to quiet a signaling NaN. The only case where we would not return the existing constant is a vector with undef elements (in that case just return a default NaN).

This patch doesn't deal with cases where both operands are NaN. That's handled by constant folding. Evidence of that behavior is provided by the fneg tests in the test file (nothing changing here).

The AMDGPU tests now show the expected constant values (but I'm still not sure exactly what was intended in those tests).

This patch doesn't deal with cases where both operands are NaN. That's handled by constant folding. Evidence of that behavior is provided by the fneg tests in the test file (nothing changing here).

That wasn't clear: the fneg tests show evidence that a binop with 2 constant operands is already folded before we reach here. I'm not sure yet if we have test coverage for instructions with 2 NaN operands.

For the AMDGPU imm tests, you could please change the affected ones to operate on i32/i16 instead, so that the result uses v_add_u32? Thanks!

I don't expect this approach to work for the double cases, I think you can just leave those as-is.

For the AMDGPU imm tests, you could please change the affected ones to operate on i32/i16 instead, so that the result uses v_add_u32? Thanks!

I don't expect this approach to work for the double cases, I think you can just leave those as-is.

Do you mean bitcast back and forth? Example:

define amdgpu_kernel void @add_inline_imm_neg_1_f32(float addrspace(1)* %out, float %x) {
  %xbc = bitcast float %x to i32
  %y = add i32 %xbc, -1
  %ybc = bitcast i32 %y to float
  store float %ybc, float addrspace(1)* %out
  ret void
}
$ llc test/CodeGen/AMDGPU/imm.ll -o - -amdgpu-scalarize-global-loads=false  -march=amdgcn -mcpu=tonga -mattr=-flat-for-global
...
	s_load_dwordx2 s[4:5], s[0:1], 0x24
	s_load_dword s0, s[0:1], 0x2c
	s_mov_b32 s7, 0xf000
	s_mov_b32 s6, -1
	s_waitcnt lgkmcnt(0)
	v_add_f32_e64 v0, s0, -1.0
	buffer_store_dword v0, off, s[4:7], 0
	s_endpgm

Do you mean bitcast back and forth? Example:

Yes, except I don't see how end up with -1.0 as inline constant in the v_add. Here's what I get, which looks more correct:

s_load_dwordx2 s[4:5], s[0:1], 0x24
s_load_dword s0, s[0:1], 0x2c
s_mov_b32 s7, 0xf000
s_mov_b32 s6, -1
s_waitcnt lgkmcnt(0)
s_add_i32 s0, s0, -1
v_mov_b32_e32 v0, s0
buffer_store_dword v0, off, s[4:7], 0
s_endpgm

Do you mean bitcast back and forth? Example:

Yes, except I don't see how end up with -1.0 as inline constant in the v_add. Here's what I get, which looks more correct:

s_load_dwordx2 s[4:5], s[0:1], 0x24
s_load_dword s0, s[0:1], 0x2c
s_mov_b32 s7, 0xf000
s_mov_b32 s6, -1
s_waitcnt lgkmcnt(0)
s_add_i32 s0, s0, -1
v_mov_b32_e32 v0, s0
buffer_store_dword v0, off, s[4:7], 0
s_endpgm

Yes - I must've copy-pasted the wrong chunk here. I updated all of the AMDGPU tests other than the f64 ones to be independent of this patch:
rL327890
rL327891

spatel updated this revision to Diff 138985.Mar 19 2018, 12:43 PM

Patch updated:
Removed AMDGPU tests that were made NaN-free.

Thank you for making this change. The rest looks good to me, too.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 21 2018, 12:34 PM
This revision was automatically updated to reflect the committed changes.