This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Fix nvvm.match.sync*.i64 intrinsics return type (i64 -> i32)
ClosedPublic

Authored by krisb on Feb 24 2022, 9:38 AM.

Details

Summary

NVVM IR specification defines them with i32 return type [0]:

The following intrinsics synchronize a subset of threads in a warp and then
broadcast and compare a value across threads in the subset.

declare i32 @llvm.nvvm.match.any.sync.i64(i32 %membermask, i64 %value)
declare {i32, i1} @llvm.nvvm.match.all.sync.i64(i32 %membermask, i64 %value)
...
The i32 return value is a 32-bit mask where bit position in mask corresponds
to thread’s laneid.

as well as PTX ISA spec [1]:

9.7.12.8. Parallel Synchronization and Communication Instructions: match.sync
...
Syntax
match.any.sync.type  d, a, membermask;
match.all.sync.type  d[|p], a, membermask;

Description
...
Destination d is a 32-bit mask where bit position in mask corresponds
to thread’s laneid.

So it doesn't make sense to define them with any other return type.

Additionally, ptxas doesn't accept intructions, produced by NVPTX
backend. Here is the ptxas output for llvm/test/CodeGen/NVPTX/match.ll:

ptxas match.ptx, line 44; error   : Arguments mismatch for instruction 'match'
ptxas match.ptx, line 45; error   : Arguments mismatch for instruction 'match'
ptxas match.ptx, line 46; error   : Arguments mismatch for instruction 'match'
ptxas match.ptx, line 47; error   : Arguments mismatch for instruction 'match'
ptxas match.ptx, line 98; error   : Arguments mismatch for instruction 'match'

After this patch, it compiles with no issues.

[0] https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#unique_341827171
[1] https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-match-sync

Diff Detail

Event Timeline

krisb created this revision.Feb 24 2022, 9:38 AM
krisb requested review of this revision.Feb 24 2022, 9:38 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 24 2022, 9:38 AM
tra added a comment.Feb 24 2022, 11:17 AM

Good catch. Thank you for the fix.

clang/include/clang/Basic/BuiltinsNVPTX.def
476–477

I've also noticed that the PTX spec also says Requires sm_70 or higher., so we may want to fix the constraint, too.

krisb updated this revision to Diff 411341.Feb 25 2022, 12:27 AM

Add SM_70 requirement for 'match' builtins.

krisb updated this revision to Diff 411416.Feb 25 2022, 7:31 AM

Fix a test.

tra accepted this revision.Feb 28 2022, 12:01 PM

LGTM.

This revision is now accepted and ready to land.Feb 28 2022, 12:01 PM
This revision was landed with ongoing or failed builds.Mar 1 2022, 2:27 AM
This revision was automatically updated to reflect the committed changes.
krisb added a comment.Mar 1 2022, 2:27 AM

@tra thank you for looking at this!