This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp* and cos functions. Other functions
will be enabled as needed.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
1314 ↗ | (On Diff #107597) | Consider changing to SmallSet? That feels correct in terms of semantics (a set of functions.) |
1418 ↗ | (On Diff #107597) | const bool? :) |
2131 ↗ | (On Diff #107597) | consider moving computing RequiresLibDevice to a pure function? |
2138 ↗ | (On Diff #107597) | I believe we can assert at this point, since this is not a "mis-compile" in the strictest sense of the word?. |
2148 ↗ | (On Diff #107597) | nit: trible -> triple. |
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
107–111 ↗ | (On Diff #107597) | This is something that is useful for all NVPTX users and should probably live there and it should not have any hardcoded path -- it's too easy to end up silently picking wrong library otherwise. Hardcoded compute_20 is also problematic because it should depend on particular GPU arch we're compiling for. Considering that LLVM has no idea about CUDA SDL location, this is sommething that should always be explicitly specified. Either base path + libdevice name derived from GPU arch, or complete path to specific libdevice variant (i.e. it's completely up to the user to provide correct libdevice). |
3020 ↗ | (On Diff #107597) | Bllock -> Block |
test/GPGPU/libdevice-functions-copied-into-kernel.ll | ||
38–69 ↗ | (On Diff #107597) | Can the test be reduced to just expf() call? |
test/GPGPU/libdevice-functions-copied-into-kernel_libdevice.bc | ||
1–6 ↗ | (On Diff #107597) | This file should be under Inputs/ directory (see NVPTX tests for example) and have .ll extension. |
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
1314 ↗ | (On Diff #107597) | could you also please add sqrt to the list? this is present in the COSMO kernel. |
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
107 ↗ | (On Diff #107597) | Would it be better to call this CUDALibDevice or CULibDevice instead ? since this applies only to NVPTX |
109 ↗ | (On Diff #107597) | Consider changing this to "/usr/local/cuda/nvvm/libdevice/libdevice.compute_20_10.bc". That would work on most Linux platforms by default. I'm not sure if PTX code for a compute capability 2 device would run on any newer device. Is it possible to initialize this after figuring out the CC of device 0 ? Also, I heard that CUDA SDK 8 would be the last to support CC 2.x. CUDA 9 supports all CCs from 3.0. |
609 ↗ | (On Diff #107597) | Would addCULibDevice be a better name ? |
Thank you for all the good reviews. I tried to address all of them.
Best,
Tobias
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
107 ↗ | (On Diff #107597) | I use CUDALibDevice |
107–111 ↗ | (On Diff #107597) | @tra: Thank you for your comment. This is the very first commit to introduce this feature. We currently are in early beta tests. The library location is supposed to be provided by the user by setting the path with polly-acc-libdevice. I set the option to a very basic default. For now I expect the user to adjust this default. In the future we can add some generic infrastructure to LLVM to derive this path automatically. If a specific fixed path is too confusing I can also use an empty default and always prompt for a path. |
109 ↗ | (On Diff #107597) | Changed. using /usr/local/cuda is indeed a good idea. I would like to start with the oldest library. We can later add support for different library versions. I don't think we can query device 0, as we might compile on different platform as where we run the final code. However, we can make this depend on polly-acc-cuda-version. |
609 ↗ | (On Diff #107597) | Changed. |
1314 ↗ | (On Diff #107597) | Done.! |
1314 ↗ | (On Diff #107597) | I use a std::set. That should be good enough for now. |
2131 ↗ | (On Diff #107597) | Done. |
2138 ↗ | (On Diff #107597) | I use report_fatal_error as suggested by Michael. |
2148 ↗ | (On Diff #107597) | Done! |
3020 ↗ | (On Diff #107597) | Fixed in r308715. |
test/GPGPU/libdevice-functions-copied-into-kernel.ll | ||
38–69 ↗ | (On Diff #107597) | Unlikely. This is a test for Polly-ACC, where we auto-offload to CUDA. For this we need at least some parallelism, which means some loop. |
test/GPGPU/libdevice-functions-copied-into-kernel_libdevice.bc | ||
1–6 ↗ | (On Diff #107597) | Very good idea. I adopted it. |
How is this different from passing libdevice to either of the -mlink-cuda-bitcode or -mlink-bticode-file options ?
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
2309 ↗ | (On Diff #107971) | Could this be avoided by implementing Triple::isCompatibleWith() for nvptx? |
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
113 ↗ | (On Diff #107971) | The variable appears to be unused. On a side note, please consider that there's already a way to specify GPU variant for NVPTX back-end. This option is either going to be redundant or you'll need a good explanation for what's supposed to happen when its value conflicts with whatever GPU variant NVPTX back-end thinks it's supposed to generate the code for. |
118 ↗ | (On Diff #107971) | This also appears to be unused. |
2277 ↗ | (On Diff #107971) | !empty() |
2288 ↗ | (On Diff #107971) | What's supposed to happen in this case? Consider adding a diagnostic message. |
2293 ↗ | (On Diff #107971) | But why are you still printing the libdevice name on stderr? |
2323 ↗ | (On Diff #107971) | I'm curious -- when would it be OK to proceed if verifier has failed? Should this option be other way around -- and make LLVM fail by default on verifier failure? That would be my expectation of normal behavior. One could conceivably use this option for debugging purposes to force LLVM to proceed even when verifier has failed, but in general I believe that by default the errors should be reported as early as possible. |
107–111 ↗ | (On Diff #107597) | I believe no default would be a better option in this case as it minimizes possibility for things to go wrong silently. |
test/GPGPU/libdevice-functions-copied-into-kernel.ll | ||
38–69 ↗ | (On Diff #107597) | It may be worth reconsidering your approach and making this functionality generic to NVPTX so it can benefit all users of NVPTX back-end. The functionality is generic enough to benefit CUDA (and possibly OpenCL) which currently fail miserably if any of standard library functions sneak into IR. |
Other than nits, LGTM
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
1341 ↗ | (On Diff #107971) | this can be const std::set<std::string> ... ? I do not see us mutating this, and it would be nice to communicate this fact. |
test/lit.site.cfg.in | ||
37 ↗ | (On Diff #107971) | Can we be more explicit and mention that these are for the libdevice tests? |
lib/CodeGen/PPCGCodeGeneration.cpp | ||
---|---|---|
1341 ↗ | (On Diff #107971) | Could you also add copysign, please? |