This is an archive of the discontinued LLVM Phabricator instance.

[GPGPU] Add support for NVIDIA libdevice
ClosedPublic

Authored by grosser on Jul 20 2017, 3:44 PM.

Details

Summary

This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp* and cos functions. Other functions
will be enabled as needed.

Event Timeline

grosser created this revision.Jul 20 2017, 3:44 PM
bollu added inline comments.Jul 20 2017, 3:54 PM
lib/CodeGen/PPCGCodeGeneration.cpp
1314

Consider changing to SmallSet? That feels correct in terms of semantics (a set of functions.)

1418

const bool? :)

2131

consider moving computing RequiresLibDevice to a pure function?

2138

I believe we can assert at this point, since this is not a "mis-compile" in the strictest sense of the word?.

2148

nit: trible -> triple.

tra added a subscriber: tra.Jul 20 2017, 3:59 PM
tra added inline comments.
lib/CodeGen/PPCGCodeGeneration.cpp
107–111

This is something that is useful for all NVPTX users and should probably live there and it should not have any hardcoded path -- it's too easy to end up silently picking wrong library otherwise.

Hardcoded compute_20 is also problematic because it should depend on particular GPU arch we're compiling for.

Considering that LLVM has no idea about CUDA SDL location, this is sommething that should always be explicitly specified. Either base path + libdevice name derived from GPU arch, or complete path to specific libdevice variant (i.e. it's completely up to the user to provide correct libdevice).

3020

Bllock -> Block

test/GPGPU/libdevice-functions-copied-into-kernel.ll
38–69

Can the test be reduced to just expf() call?

test/GPGPU/libdevice-functions-copied-into-kernel_libdevice.bc
1–6

This file should be under Inputs/ directory (see NVPTX tests for example) and have .ll extension.

bollu edited edge metadata.Jul 21 2017, 7:08 AM

Is there some way to test this without having libdevice? The tests break on my mac.

bollu added inline comments.Jul 24 2017, 3:18 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1314

could you also please add sqrt to the list? this is present in the COSMO kernel.

singam-sanjay added inline comments.Jul 24 2017, 8:09 AM
lib/CodeGen/PPCGCodeGeneration.cpp
107

Would it be better to call this CUDALibDevice or CULibDevice instead ? since this applies only to NVPTX

109

Consider changing this to "/usr/local/cuda/nvvm/libdevice/libdevice.compute_20_10.bc". That would work on most Linux platforms by default.

I'm not sure if PTX code for a compute capability 2 device would run on any newer device. Is it possible to initialize this after figuring out the CC of device 0 ?

Also, I heard that CUDA SDK 8 would be the last to support CC 2.x. CUDA 9 supports all CCs from 3.0.

609

Would addCULibDevice be a better name ?

grosser updated this revision to Diff 107971.Jul 24 2017, 2:48 PM
grosser marked 8 inline comments as done.

Address review comments

grosser marked an inline comment as done.Jul 24 2017, 2:49 PM

Thank you for all the good reviews. I tried to address all of them.

Best,
Tobias

lib/CodeGen/PPCGCodeGeneration.cpp
107

I use CUDALibDevice

107–111

@tra: Thank you for your comment. This is the very first commit to introduce this feature. We currently are in early beta tests. The library location is supposed to be provided by the user by setting the path with polly-acc-libdevice. I set the option to a very basic default. For now I expect the user to adjust this default. In the future we can add some generic infrastructure to LLVM to derive this path automatically. If a specific fixed path is too confusing I can also use an empty default and always prompt for a path.

109

Changed. using /usr/local/cuda is indeed a good idea.

I would like to start with the oldest library. We can later add support for different library versions. I don't think we can query device 0, as we might compile on different platform as where we run the final code. However, we can make this depend on polly-acc-cuda-version.

609

Changed.

1314

Done.!

1314

I use a std::set. That should be good enough for now.

2131

Done.

2138

I use report_fatal_error as suggested by Michael.

2148

Done!

3020

Fixed in r308715.

test/GPGPU/libdevice-functions-copied-into-kernel.ll
38–69

Unlikely. This is a test for Polly-ACC, where we auto-offload to CUDA. For this we need at least some parallelism, which means some loop.

test/GPGPU/libdevice-functions-copied-into-kernel_libdevice.bc
1–6

Very good idea. I adopted it.

How is this different from passing libdevice to either of the -mlink-cuda-bitcode or -mlink-bticode-file options ?

lib/CodeGen/PPCGCodeGeneration.cpp
2154

Could this be avoided by implementing Triple::isCompatibleWith() for nvptx?

tra added inline comments.Jul 24 2017, 3:51 PM
lib/CodeGen/PPCGCodeGeneration.cpp
107–111

I believe no default would be a better option in this case as it minimizes possibility for things to go wrong silently.

113

The variable appears to be unused.

On a side note, please consider that there's already a way to specify GPU variant for NVPTX back-end. This option is either going to be redundant or you'll need a good explanation for what's supposed to happen when its value conflicts with whatever GPU variant NVPTX back-end thinks it's supposed to generate the code for.

118

This also appears to be unused.
Please remove them from this patch and re-introduce them in subsequent patches that really need them.

2122

!empty()

2133

What's supposed to happen in this case? Consider adding a diagnostic message.

2138

But why are you still printing the libdevice name on stderr?

2161

I'm curious -- when would it be OK to proceed if verifier has failed?

Should this option be other way around -- and make LLVM fail by default on verifier failure? That would be my expectation of normal behavior. One could conceivably use this option for debugging purposes to force LLVM to proceed even when verifier has failed, but in general I believe that by default the errors should be reported as early as possible.

test/GPGPU/libdevice-functions-copied-into-kernel.ll
38–69

It may be worth reconsidering your approach and making this functionality generic to NVPTX so it can benefit all users of NVPTX back-end. The functionality is generic enough to benefit CUDA (and possibly OpenCL) which currently fail miserably if any of standard library functions sneak into IR.

bollu accepted this revision.Jul 25 2017, 12:50 AM

Other than nits, LGTM

lib/CodeGen/PPCGCodeGeneration.cpp
1314

this can be const std::set<std::string> ... ? I do not see us mutating this, and it would be nice to communicate this fact.

test/lit.site.cfg.in
37 ↗(On Diff #107971)

Can we be more explicit and mention that these are for the libdevice tests?

This revision is now accepted and ready to land.Jul 25 2017, 12:50 AM
bollu added inline comments.Jul 28 2017, 5:13 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1314

Could you also add copysign, please?

This revision was automatically updated to reflect the committed changes.
grosser marked an inline comment as done.