Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/GPU/Targets/NVPTXTarget.cpp | ||
---|---|---|
214 | We should have options to stop after emitting LLVMIR and PTX, seems like we have to go to cubin right now? I would also think that going to PTX can be possible without requiring any dependency on the Cuda toolkit? |
Not really? I added some on the final patches were you have all the infra ready. The best test I can think of at this point, is an unit test for this serializer?
mlir/include/mlir/Dialect/GPU/IR/GPUCompilationAttr.td | ||
---|---|---|
24 | I initially added these attributes (NVPTX & AMDGPU) to nvvm & rocdl, however I decided against that in the final patch because it added GPU dependencies to those dialects (includes & libs) and created more libs instead of a single (GPUTargets). I'm open to change it. | |
83 | I'll change it. | |
mlir/lib/Dialect/GPU/Targets/NVPTXTarget.cpp | ||
214 | These series of patches intend to be a full replacement of the current pipeline, upon approval we should intermediately deprecate the current pipeline and then remove it after a deprecation period. The idea is that in future patches we change this behavior and stop at LLVM, but that requires having the LLVM Offload work done. The benefit of this approach is that those future changes would be invisible to users and at the same time it makes them adopt this new mechanism now, that's why I go straight to cubin. |
mlir/include/mlir/Dialect/GPU/IR/GPUCompilationAttr.td | ||
---|---|---|
24 | Where will the dependency come from? The GPUTargetAttrInterface? Maybe we should revisit the way the interface is specified. Worse case we're just one level of indirection away :) Here if the issue is just the gpu::TargetOptions, maybe it can be declared with the interface definition, which makes it an isolated library? | |
mlir/lib/Dialect/GPU/Targets/NVPTXTarget.cpp | ||
214 | OK, thanks for explaining, can you add a TODO with this info in the code? |
Move the target attribute to NVVM and added it as an external model that it's promised.
mlir/lib/Target/LLVM/NVVM/Target.cpp | ||
---|---|---|
55 ↗ | (On Diff #545486) | Do we need to the init in the registration? Can we do the init when the serialization is called instead? |
mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp | ||
33 ↗ | (On Diff #545486) | Not clear to me where we need the native target? |
46 ↗ | (On Diff #545486) | Can we tone this down? There is no reason to link all of MLIR into this unit-test. |
Swtiched to ptxas, added options for stopping compilation at LLVM IR & PTX, as well as unit tests testing the added functionalty.
Bring @tra here to get feedback on pros/cons of invoking ptxas through a temp file vs using the library APIs exposed by nvptxcompile.
One immediate benefit is that we don't longer have a dependency on the toolkit to build this. You can detect ptxas by setting the env variable CUDA_ROOT to the location of the toolkit or by adding it to PATH.
mlir/lib/Target/LLVM/NVVM/Target.cpp | ||
---|---|---|
55 ↗ | (On Diff #545486) | Yes, it's possible I'll change it. |
mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp | ||
33 ↗ | (On Diff #545486) | Not needed, removed. |
46 ↗ | (On Diff #545486) | Yes, I've removed it. |
I think compiling by ptxas has merits. One can different version of ptxas in case of performance regression.
mlir/lib/Target/LLVM/NVVM/Target.cpp | ||
---|---|---|
290 ↗ | (On Diff #546432) | It would be nice to pass parameters to ptxas from MLIR. |
325 ↗ | (On Diff #546432) | What do you think about passsing -v always to ptxas and print the output (perhaps stdout file) using llvm::errs. |
mlir/lib/Target/LLVM/NVVM/Target.cpp | ||
---|---|---|
286 ↗ | (On Diff #546432) | Can we keep the PTX file? After codegen, it is very natural to look at the PTX file, if we keep the file via a flag, I think it would be great. I implemented the dump-ptx flag to the gpu-to-cubin Pass earlier. I guess we cannot use that flag here. |
mlir/lib/Target/LLVM/NVVM/Target.cpp | ||
---|---|---|
286 ↗ | (On Diff #546432) | If you pass -debug-only=serialize-to-ptx (I'm changing it to serialize-to-isa) to mlir-opt then the PTX file will be printed to stdout. This mechanism also allows emitting PTX instead of binary, so you have multiple ways of obtaining the PTX. |
290 ↗ | (On Diff #546432) | You can, the args are passed through the targetOptions variable. Below you'll find that I add them to the PTX invocation. |
325 ↗ | (On Diff #546432) | We could consider adding a dedicated variable, but as it stands you can pass that variable through the cmdline and use --debug-only=serialize-to-binary to dump that output into stdout. |
I think compiling by ptxas has merits. One can different version of ptxas in case of performance regression.
nvptxcompile provides the same facility I believe, doesn't it?
Though I haven't personally worked with nvptxcompile, it appears to be bundled with the toolkit. If one could use different toolkit version for PTX compilation, then yes it gives the same facility.
Recently, I used the driver (current MLIR compilation) and ptxas for PTX compilation. I noticed that even if I feed the same PTX code, the driver occasionally generates different SASS with ptxas. Maybe I hit a corner case but it was really hard to find the reason.
I believe it's crucial for us to know the potential disparities between nvptxcompile and ptxas if there is any.
My 2 cents, as far as I know the nvptxcompiler and ptxas should produce the same code for a given release. However unlike ptxas the only way to change the version of nvptxcompiler is to recompile the NVVMTarget library with a different version.
Adding support for nvptxcompiler is like extra 20 lines of code in this patch, and a couple on CMake, how do we feel about having both?
If nvptxcompiler is detected during build then nvptxcompiler is used, if not, then ptxas is used, and we also give the user the ability to choose on CMake?
If nvptxcompiler is detected during build then nvptxcompiler is used, if not, then ptxas is used, and we also give the user the ability to choose on CMake?
Adding a CMake option looks good to me, I would pick a default and stick to it (that is fail the cmake configuration with a helpful message).
I don't like "magic" fallback because detection is fragile and it makes the behavior dependent on the environment of the user. It's harder to know what you build, and getting bug reports it makes it also harder to know what's happening.
Added NVPTX compiler support & removed unnecessary fwd declarations. The CMake variable MLIR_ENABLE_NVPTXCOMPILER controls the usage of the NVPTXCompiler library and is disabled by default, as CMake 3.20 doesn't support CUDA::nvptxcompiler_static to find the library.
mlir/lib/Target/LLVM/NVVM/Target.cpp | ||
---|---|---|
123 ↗ | (On Diff #547733) | A word of caution -- some linux distributions scatter CUDA SDK across the 'standard' linux filsystem paths, so a single getToolkitPath() would not be able to find all the necessary bits, as libdevice and the binaries will be in different places. You may need additional heuristics along the lines of what clang driver does. |
mlir/lib/Target/LLVM/NVVM/Target.cpp | ||
---|---|---|
123 ↗ | (On Diff #547733) | Thank you, that's good to know. I'll see how to rework it or add docs indicating how to make it work. For ptxas this mechanism searches several places in the following order:
So it would work on Debian & Ubuntu, but the mechanism could be over burdening the user. |
Reapplied clang-format.
Adds a couple of syntax tests & fixes a bug not displaying the link option in the attribute.
Thanks Fabian!
mlir/CMakeLists.txt | ||
---|---|---|
122 ↗ | (On Diff #548030) | The description isn't super clear, because it conflicts with what "the NVPTX backend" is in LLVM. |
Changed the description of the CMake variable MLIR_ENABLE_NVPTXCOMPILER. Applied clang-format again.
Wondering why is this in the gpu dialect and not a target specific one? (like nvvm)