This is an archive of the discontinued LLVM Phabricator instance.

[GPGPU] Add support for NVIDIA libdevice
ClosedPublic

Authored by grosser on Jul 20 2017, 3:44 PM.

Details

Summary

This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp* and cos functions. Other functions
will be enabled as needed.

Diff Detail

Repository
rL LLVM

Event Timeline

grosser created this revision.Jul 20 2017, 3:44 PM
bollu added inline comments.Jul 20 2017, 3:54 PM
lib/CodeGen/PPCGCodeGeneration.cpp
1314 ↗(On Diff #107597)

Consider changing to SmallSet? That feels correct in terms of semantics (a set of functions.)

1418 ↗(On Diff #107597)

const bool? :)

2131 ↗(On Diff #107597)

consider moving computing RequiresLibDevice to a pure function?

2138 ↗(On Diff #107597)

I believe we can assert at this point, since this is not a "mis-compile" in the strictest sense of the word?.

2148 ↗(On Diff #107597)

nit: trible -> triple.

tra added a subscriber: tra.Jul 20 2017, 3:59 PM
tra added inline comments.
lib/CodeGen/PPCGCodeGeneration.cpp
107–111 ↗(On Diff #107597)

This is something that is useful for all NVPTX users and should probably live there and it should not have any hardcoded path -- it's too easy to end up silently picking wrong library otherwise.

Hardcoded compute_20 is also problematic because it should depend on particular GPU arch we're compiling for.

Considering that LLVM has no idea about CUDA SDL location, this is sommething that should always be explicitly specified. Either base path + libdevice name derived from GPU arch, or complete path to specific libdevice variant (i.e. it's completely up to the user to provide correct libdevice).

3020 ↗(On Diff #107597)

Bllock -> Block

test/GPGPU/libdevice-functions-copied-into-kernel.ll
38–69 ↗(On Diff #107597)

Can the test be reduced to just expf() call?

test/GPGPU/libdevice-functions-copied-into-kernel_libdevice.bc
1–6 ↗(On Diff #107597)

This file should be under Inputs/ directory (see NVPTX tests for example) and have .ll extension.

bollu edited edge metadata.Jul 21 2017, 7:08 AM

Is there some way to test this without having libdevice? The tests break on my mac.

bollu added inline comments.Jul 24 2017, 3:18 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1314 ↗(On Diff #107597)

could you also please add sqrt to the list? this is present in the COSMO kernel.

singam-sanjay added inline comments.Jul 24 2017, 8:09 AM
lib/CodeGen/PPCGCodeGeneration.cpp
107 ↗(On Diff #107597)

Would it be better to call this CUDALibDevice or CULibDevice instead ? since this applies only to NVPTX

109 ↗(On Diff #107597)

Consider changing this to "/usr/local/cuda/nvvm/libdevice/libdevice.compute_20_10.bc". That would work on most Linux platforms by default.

I'm not sure if PTX code for a compute capability 2 device would run on any newer device. Is it possible to initialize this after figuring out the CC of device 0 ?

Also, I heard that CUDA SDK 8 would be the last to support CC 2.x. CUDA 9 supports all CCs from 3.0.

609 ↗(On Diff #107597)

Would addCULibDevice be a better name ?

grosser updated this revision to Diff 107971.Jul 24 2017, 2:48 PM
grosser marked 8 inline comments as done.

Address review comments

grosser marked an inline comment as done.Jul 24 2017, 2:49 PM

Thank you for all the good reviews. I tried to address all of them.

Best,
Tobias

lib/CodeGen/PPCGCodeGeneration.cpp
107 ↗(On Diff #107597)

I use CUDALibDevice

107–111 ↗(On Diff #107597)

@tra: Thank you for your comment. This is the very first commit to introduce this feature. We currently are in early beta tests. The library location is supposed to be provided by the user by setting the path with polly-acc-libdevice. I set the option to a very basic default. For now I expect the user to adjust this default. In the future we can add some generic infrastructure to LLVM to derive this path automatically. If a specific fixed path is too confusing I can also use an empty default and always prompt for a path.

109 ↗(On Diff #107597)

Changed. using /usr/local/cuda is indeed a good idea.

I would like to start with the oldest library. We can later add support for different library versions. I don't think we can query device 0, as we might compile on different platform as where we run the final code. However, we can make this depend on polly-acc-cuda-version.

609 ↗(On Diff #107597)

Changed.

1314 ↗(On Diff #107597)

Done.!

1314 ↗(On Diff #107597)

I use a std::set. That should be good enough for now.

2131 ↗(On Diff #107597)

Done.

2138 ↗(On Diff #107597)

I use report_fatal_error as suggested by Michael.

2148 ↗(On Diff #107597)

Done!

3020 ↗(On Diff #107597)

Fixed in r308715.

test/GPGPU/libdevice-functions-copied-into-kernel.ll
38–69 ↗(On Diff #107597)

Unlikely. This is a test for Polly-ACC, where we auto-offload to CUDA. For this we need at least some parallelism, which means some loop.

test/GPGPU/libdevice-functions-copied-into-kernel_libdevice.bc
1–6 ↗(On Diff #107597)

Very good idea. I adopted it.

How is this different from passing libdevice to either of the -mlink-cuda-bitcode or -mlink-bticode-file options ?

lib/CodeGen/PPCGCodeGeneration.cpp
2309 ↗(On Diff #107971)

Could this be avoided by implementing Triple::isCompatibleWith() for nvptx?

tra added inline comments.Jul 24 2017, 3:51 PM
lib/CodeGen/PPCGCodeGeneration.cpp
113 ↗(On Diff #107971)

The variable appears to be unused.

On a side note, please consider that there's already a way to specify GPU variant for NVPTX back-end. This option is either going to be redundant or you'll need a good explanation for what's supposed to happen when its value conflicts with whatever GPU variant NVPTX back-end thinks it's supposed to generate the code for.

118 ↗(On Diff #107971)

This also appears to be unused.
Please remove them from this patch and re-introduce them in subsequent patches that really need them.

2277 ↗(On Diff #107971)

!empty()

2288 ↗(On Diff #107971)

What's supposed to happen in this case? Consider adding a diagnostic message.

2293 ↗(On Diff #107971)

But why are you still printing the libdevice name on stderr?

2323 ↗(On Diff #107971)

I'm curious -- when would it be OK to proceed if verifier has failed?

Should this option be other way around -- and make LLVM fail by default on verifier failure? That would be my expectation of normal behavior. One could conceivably use this option for debugging purposes to force LLVM to proceed even when verifier has failed, but in general I believe that by default the errors should be reported as early as possible.

107–111 ↗(On Diff #107597)

I believe no default would be a better option in this case as it minimizes possibility for things to go wrong silently.

test/GPGPU/libdevice-functions-copied-into-kernel.ll
38–69 ↗(On Diff #107597)

It may be worth reconsidering your approach and making this functionality generic to NVPTX so it can benefit all users of NVPTX back-end. The functionality is generic enough to benefit CUDA (and possibly OpenCL) which currently fail miserably if any of standard library functions sneak into IR.

bollu accepted this revision.Jul 25 2017, 12:50 AM

Other than nits, LGTM

lib/CodeGen/PPCGCodeGeneration.cpp
1341 ↗(On Diff #107971)

this can be const std::set<std::string> ... ? I do not see us mutating this, and it would be nice to communicate this fact.

test/lit.site.cfg.in
37 ↗(On Diff #107971)

Can we be more explicit and mention that these are for the libdevice tests?

This revision is now accepted and ready to land.Jul 25 2017, 12:50 AM
bollu added inline comments.Jul 28 2017, 5:13 AM
lib/CodeGen/PPCGCodeGeneration.cpp
1341 ↗(On Diff #107971)

Could you also add copysign, please?

This revision was automatically updated to reflect the committed changes.
grosser marked an inline comment as done.