Building on the recent clarification of FP in IR and identical to current undef operand folding, fold any FP binop with a NaN operand to NaN.
I don't know what to do with the AMDGPU tests that expected to use a NaN FP constant as an int value.
Paths
| Differential D44521
[InstSimplify] fp_binop X, NaN --> NaN ClosedPublic Authored by spatel on Mar 15 2018, 8:41 AM.
Details Summary Building on the recent clarification of FP in IR and identical to current undef operand folding, fold any FP binop with a NaN operand to NaN. I don't know what to do with the AMDGPU tests that expected to use a NaN FP constant as an int value.
Diff Detail Event TimelineHerald added subscribers: tpr, nhaehnle, wdng, mcrosier. · View Herald TranscriptMar 15 2018, 8:41 AM spatel mentioned this in D44550: [InstCombine] canonicalize fcmp+select to fabs.Mar 16 2018, 8:38 AM mike.dvoretsky added inline comments. spatel added inline comments.
Comment Actions Patch updated: This patch doesn't deal with cases where both operands are NaN. That's handled by constant folding. Evidence of that behavior is provided by the fneg tests in the test file (nothing changing here). The AMDGPU tests now show the expected constant values (but I'm still not sure exactly what was intended in those tests). Comment Actions
That wasn't clear: the fneg tests show evidence that a binop with 2 constant operands is already folded before we reach here. I'm not sure yet if we have test coverage for instructions with 2 NaN operands. Comment Actions For the AMDGPU imm tests, you could please change the affected ones to operate on i32/i16 instead, so that the result uses v_add_u32? Thanks! I don't expect this approach to work for the double cases, I think you can just leave those as-is. Comment Actions
Do you mean bitcast back and forth? Example: define amdgpu_kernel void @add_inline_imm_neg_1_f32(float addrspace(1)* %out, float %x) { %xbc = bitcast float %x to i32 %y = add i32 %xbc, -1 %ybc = bitcast i32 %y to float store float %ybc, float addrspace(1)* %out ret void } $ llc test/CodeGen/AMDGPU/imm.ll -o - -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=tonga -mattr=-flat-for-global ... s_load_dwordx2 s[4:5], s[0:1], 0x24 s_load_dword s0, s[0:1], 0x2c s_mov_b32 s7, 0xf000 s_mov_b32 s6, -1 s_waitcnt lgkmcnt(0) v_add_f32_e64 v0, s0, -1.0 buffer_store_dword v0, off, s[4:7], 0 s_endpgm Comment Actions
Yes, except I don't see how end up with -1.0 as inline constant in the v_add. Here's what I get, which looks more correct: s_load_dwordx2 s[4:5], s[0:1], 0x24 s_load_dword s0, s[0:1], 0x2c s_mov_b32 s7, 0xf000 s_mov_b32 s6, -1 s_waitcnt lgkmcnt(0) s_add_i32 s0, s0, -1 v_mov_b32_e32 v0, s0 buffer_store_dword v0, off, s[4:7], 0 s_endpgm Comment Actions
Yes - I must've copy-pasted the wrong chunk here. I updated all of the AMDGPU tests other than the f64 ones to be independent of this patch: This revision was not accepted when it landed; it landed in state Needs Review.Mar 21 2018, 12:34 PM Closed by commit rL328140: [InstSimplify] fp_binop X, NaN --> NaN (authored by spatel). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 138985 lib/Analysis/InstructionSimplify.cpp
test/CodeGen/AMDGPU/imm.ll
test/Transforms/InstSimplify/fp-nan.ll
|
Why are we emitting NaNs from an undef operand here?