Building on the recent clarification of FP in IR and identical to current undef operand folding, fold any FP binop with a NaN operand to NaN.
I don't know what to do with the AMDGPU tests that expected to use a NaN FP constant as an int value.
Differential D44521
[InstSimplify] fp_binop X, NaN --> NaN spatel on Mar 15 2018, 8:41 AM. Authored by
Details Building on the recent clarification of FP in IR and identical to current undef operand folding, fold any FP binop with a NaN operand to NaN. I don't know what to do with the AMDGPU tests that expected to use a NaN FP constant as an int value.
Diff Detail
Event Timeline
Comment Actions Patch updated: This patch doesn't deal with cases where both operands are NaN. That's handled by constant folding. Evidence of that behavior is provided by the fneg tests in the test file (nothing changing here). The AMDGPU tests now show the expected constant values (but I'm still not sure exactly what was intended in those tests). Comment Actions That wasn't clear: the fneg tests show evidence that a binop with 2 constant operands is already folded before we reach here. I'm not sure yet if we have test coverage for instructions with 2 NaN operands. Comment Actions For the AMDGPU imm tests, you could please change the affected ones to operate on i32/i16 instead, so that the result uses v_add_u32? Thanks! I don't expect this approach to work for the double cases, I think you can just leave those as-is. Comment Actions Do you mean bitcast back and forth? Example: define amdgpu_kernel void @add_inline_imm_neg_1_f32(float addrspace(1)* %out, float %x) { %xbc = bitcast float %x to i32 %y = add i32 %xbc, -1 %ybc = bitcast i32 %y to float store float %ybc, float addrspace(1)* %out ret void } $ llc test/CodeGen/AMDGPU/imm.ll -o - -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=tonga -mattr=-flat-for-global ... s_load_dwordx2 s[4:5], s[0:1], 0x24 s_load_dword s0, s[0:1], 0x2c s_mov_b32 s7, 0xf000 s_mov_b32 s6, -1 s_waitcnt lgkmcnt(0) v_add_f32_e64 v0, s0, -1.0 buffer_store_dword v0, off, s[4:7], 0 s_endpgm Comment Actions Yes, except I don't see how end up with -1.0 as inline constant in the v_add. Here's what I get, which looks more correct: s_load_dwordx2 s[4:5], s[0:1], 0x24 s_load_dword s0, s[0:1], 0x2c s_mov_b32 s7, 0xf000 s_mov_b32 s6, -1 s_waitcnt lgkmcnt(0) s_add_i32 s0, s0, -1 v_mov_b32_e32 v0, s0 buffer_store_dword v0, off, s[4:7], 0 s_endpgm |