Page MenuHomePhabricator

[CUDA] Work around a bug in rint() caused by a broken implementation provided by CUDA.
ClosedPublic

Authored by tra on Aug 4 2020, 12:05 PM.

Details

Summary

Normally math functions are forwarded to nv_* counterparts provided by CUDA's
libdevice bitcode. However,
nv_rint*() functions there have a bug -- they use
round() which rounds *up* instead of rounding towards the nearest integer, so we
end up with rint(2.5f) producing 3.0 instead of expected 2.0. The broken bitcode
is not actually used by NVCC itself, which has both a work-around in CUDA
headers and, in recent versions, uses correct implementations in NVCC's built-ins.

This patch implements equivalent workaround and directs rint/rintf to
__builtin_rint/rintf that produce correct results.

Diff Detail

Event Timeline

tra created this revision.Aug 4 2020, 12:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2020, 12:05 PM
tra requested review of this revision.Aug 4 2020, 12:05 PM
jlebar accepted this revision.Aug 4 2020, 12:25 PM

LGTM, and can we write a test in the test-suite?

This revision is now accepted and ready to land.Aug 4 2020, 12:25 PM
tra updated this revision to Diff 283359.Aug 5 2020, 12:49 PM
tra edited the summary of this revision. (Show Details)

Also fixed the same bug in nearbyint().

This revision was landed with ongoing or failed builds.Aug 5 2020, 1:14 PM
This revision was automatically updated to reflect the committed changes.