This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Lower arith::RemFOp to __nv_remainder on NVIDIA GPU.
AbandonedPublic

Authored by bchetioui on Dec 27 2022, 12:03 AM.

Details

Diff Detail

Event Timeline

bchetioui created this revision.Dec 27 2022, 12:03 AM
bchetioui requested review of this revision.Dec 27 2022, 12:03 AM

Fix clang format errors.

ThomasRaoux requested changes to this revision.Dec 27 2022, 1:38 AM
ThomasRaoux added inline comments.
mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
263

RemFOp already lowers to llvm instruction frem (https://llvm.org/docs/LangRef.html#frem-instruction). Trying it out this works fine through the NVPTX backend.
Therefore I don't think we need that. What problem is it trying to solve?

This revision now requires changes to proceed.Dec 27 2022, 1:38 AM
bchetioui requested review of this revision.Dec 28 2022, 11:51 PM

The fix above corrects an edge case in which the result of the remainder operation is NaN when the second argument is inf.
This was detected in TensorFlow (https://github.com/tensorflow/tensorflow/issues/58369). (Sorry for the delay, I had to fix a problem with my bug repro :-))

Requesting review again with this additional information.

The fix above corrects an edge case in which the result of the remainder operation is NaN when the second argument is inf.
This was detected in TensorFlow (https://github.com/tensorflow/tensorflow/issues/58369). (Sorry for the delay, I had to fix a problem with my bug repro :-))

Requesting review again with this additional information.

Thanks for explaining. I don't think this is the right fix as this would mean the other existing lowering of arith.remf would be incorrect and cause miscompile if applied before this pattern.
Looking at it a bit, llvm instruction frem should return the result you want for frem 1.f, inf : f32. The instruction is defined here https://llvm.org/docs/LangRef.html#frem-instruction and says the semantic should match libm fmod. fmod(1.f, inf) returns 1.f.
The bug seems to be in NVPTX backend: https://github.com/llvm/llvm-project/blob/2784b243e38f7d526a40838a854554b53ebeb41e/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td#L1156
The sequence emitted won't do the right thing for inf, it should be fixed there.

Thanks for pointing this out, I agree that this would be a better fix.
I will work on a fix there.

The proposed fix is at https://reviews.llvm.org/D140846. Thanks again for pointing me there!