This is an archive of the discontinued LLVM Phabricator instance.

Fix bugs in GPUToNVVM lowering
ClosedPublic

Authored by harsh on Jan 24 2022, 4:02 PM.

Details

Summary

The current lowering from GPU to NVVM does
not correctly handle the following cases when
lowering the gpu shuffle op.

  1. When the active width is set to 32 (all lanes),

then the current approach computes (1 << 32) -1 which
results in poison values in the LLVM IR. We fix this by
defining the active mask as (-1) >> (32 - width).

  1. In the case of shuffle up, the computation of the third

operand c has to be different from the other 3 modes due to
the op definition in the ISA reference.
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html)
Specifically, the predicate value is computed as j >= maxLane
for up and j <= maxLane for all other modes. We fix this by
computing maskAndClamp as 32 - width for this mode.

TEST: We modify the existing test and add more checks for the up mode.

Diff Detail

Event Timeline

harsh created this revision.Jan 24 2022, 4:02 PM
harsh requested review of this revision.Jan 24 2022, 4:02 PM
ThomasRaoux accepted this revision.Jan 24 2022, 5:25 PM
ThomasRaoux added a subscriber: ThomasRaoux.

LG, please address the variable naming before landing the patch.

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
95

this is actually the number of inactive lanes? Should probably be named numLeadInactiveLane?

This revision is now accepted and ready to land.Jan 24 2022, 5:25 PM
harsh added inline comments.Jan 24 2022, 6:15 PM
mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
95

yes you are right. will change.

harsh updated this revision to Diff 402728.Jan 24 2022, 6:23 PM

Updated patch based on Thomas' comments.

This revision was landed with ongoing or failed builds.Jan 24 2022, 7:25 PM
This revision was automatically updated to reflect the committed changes.