This is an archive of the discontinued LLVM Phabricator instance.

Fix asserts in AMDGCN fmed3 folding by handling more cases of NaN
ClosedPublic

Authored by alan-baker on Jun 25 2018, 12:04 PM.

Details

Summary

Better NaN handling for AMDGCN fmed3.

All operands are checked for NaN now. The checks were moved before the canonicalization to provide a better mapping from fclamp. Changed the behaviour of fmed3(x,y,NaN) to return max(x,y) instead of min(x,y) in light of this. Updated tests as a result and added some new cases to cover the fix.

Diff Detail

Repository
rL LLVM

Event Timeline

alan-baker created this revision.Jun 25 2018, 12:04 PM
arsenm added inline comments.Jun 25 2018, 12:51 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
3429–3432

This comment is sort of confusing me. Is it saying there is still a problem?

Probably shouldn't mention fclamp, as no operation called fclamp is involved here.

3447–3452

I don't really like repeating all of these so many times, but it's probably clearer than the alternatives. Should this used FastMathFlagGuard to avoid repeating the copyFastMathFlags at least?

test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
1337–1357

Since the undef value check is repeated so many times, better add versions of these with undef instead of nan to make sure those work

alan-baker added inline comments.Jun 25 2018, 1:36 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
3429–3432

If fmed3(x,y,z) was used to model fclamp(x,y,z) (i.e. fmin(fmax(x,y), z)), and x or y are determined to be a constant prior to determining one of the operands is a NaN, then the canonicalization will swap operands around. If later an operand is determined to be a NaN then the evaluation of the fmed3 may no longer match the original fclamp operation. So there is a potential problem if not all the constants are determined prior to the invocation of inst combine, but are for subsequent invocations. This problem is specifically for modelling fclamp, which is why I mentioned it here. I can remove the comment if you prefer, but LLPC maps fclamp directly to fmed3 (and I'm not sure I see a performant alternative without introducing a fclamp intrinsic).

alan-baker added inline comments.Jun 25 2018, 1:48 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
3447–3452

Calls don't automatically inherit fast math flags from the builder so creating the guard doesn't work. If it's preferable, I could refactor to create the call instruction first and in a subsequent if take the name and copy the fast math flags.

Adding extra tests.

alan-baker marked an inline comment as done.Jun 25 2018, 1:57 PM
arsenm added inline comments.Jun 26 2018, 2:01 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
3429–3432

There should be a comment, it just needs to be reworded. As is I think it relies on knowing the history of you adding this code here before the existing code

3447–3452

I guess that's sort of a bug in the IRBuilder but I can sort of see why it is that way. I guess factoring that way may be better

Refactor NaN checks to reduce duplication. Reword comment.

alan-baker marked an inline comment as done.Jun 26 2018, 6:17 AM
arsenm accepted this revision.Jun 27 2018, 5:15 AM

LGTM

This revision is now accepted and ready to land.Jun 27 2018, 5:15 AM

I don't have commit access to LLVM, could you please land this patch on my behalf?

arsenm closed this revision.Jul 5 2018, 10:10 AM

r336375