This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Provide integer SIMD functions for CUDA-9.2
ClosedPublic

Authored by tra on Jul 12 2018, 4:14 PM.

Details

Summary

CUDA-9.2 made all integer SIMD functions into compiler builtins,
so clang no longer has access to the implementation of these
functions in either headers of libdevice and has to provide
its own implementation.

This is mostly a 1:1 mapping to a corresponding PTX instructions
with an exception of vhadd2/vhadd4 that don't have an equivalent
instruction and had to be implemented with a bit hack.

Performance of this implementation will be suboptimal for SM_50
and newer GPUs where PTXAS generates noticeably worse code for
the SIMD instructions compared to the code it generates
for the inline assembly generated by nvcc (or used to come
with CUDA headers).

Diff Detail

Repository
rC Clang

Event Timeline

tra created this revision.Jul 12 2018, 4:14 PM
bkramer accepted this revision.Jul 18 2018, 8:02 AM
bkramer added inline comments.
clang/lib/Headers/__clang_cuda_device_functions.h
1080 ↗(On Diff #155302)

Should this really saturate?

1095 ↗(On Diff #155302)

vabsdiff4?

This revision is now accepted and ready to land.Jul 18 2018, 8:02 AM
bkramer requested changes to this revision.Jul 18 2018, 8:03 AM
This revision now requires changes to proceed.Jul 18 2018, 8:03 AM
tra added a comment.Jul 18 2018, 9:30 AM

I'm in the middle of writing the tests for these as it's very easy to mess things up. I'll update the patch once I run it through the tests.

Another problem with the patch in the current form is that these instructions apparently do not accept immediate arguments. PTX is a never ending source of surprises...

tra updated this revision to Diff 156386.Jul 19 2018, 4:47 PM

Fixed inline asm syntax.
Added workaround for the bug in __vmaxs2() discovered during testing().

I've got set of tests for these functions that I'll add to test-suite shortly. AFAICT this implementation matches nvidia's bit-to-bit.

tra updated this revision to Diff 156397.Jul 19 2018, 5:09 PM

Fixed the issues pointed out by bkramer@.
Apparently. sat does not matter for vabsdiff instruction with unsigned operands.
My tests were also missing __vabsssN.

tra marked 2 inline comments as done.Jul 19 2018, 5:13 PM

Ben, PTAL.

clang/lib/Headers/__clang_cuda_device_functions.h
1080 ↗(On Diff #155302)

Hmm. My tests didn't catch this. I wonder if ptxas just ignores .sat here.
Yup. I've confirmed that the tests do run on this function and do trigger if I intentionally introduce an error.
In any case, I've removed the .sat as it should not be there.

1095 ↗(On Diff #155302)

Ah. I've missed __vabsssN in my tests. Fixed both the header and the tests.

This revision is now accepted and ready to land.Jul 20 2018, 2:58 AM
This revision was automatically updated to reflect the committed changes.
tra marked 2 inline comments as done.