This is an archive of the discontinued LLVM Phabricator instance.

[mlir][NVVM] Adds the NVVM target attribute.
ClosedPublic

Authored by fmorac on Jun 29 2023, 10:40 AM.

Details

Summary

For an explanation of these patches see D154153.

Commit message:
This patch adds the NVVM target attribute for serializing GPU modules into
strings containing cubin.

Depends on D154113 and D154100 and D154097

Diff Detail

Event Timeline

fmorac created this revision.Jun 29 2023, 10:40 AM
fmorac updated this revision to Diff 535897.Jun 29 2023, 10:45 AM

Update the commit message to include D154100

fmorac edited the summary of this revision. (Show Details)Jun 29 2023, 10:51 AM
fmorac updated this revision to Diff 535906.Jun 29 2023, 10:53 AM

Update commit message to include D154097

fmorac updated this revision to Diff 536066.Jun 29 2023, 5:33 PM

Rebasing.

fmorac edited the summary of this revision. (Show Details)Jun 29 2023, 6:18 PM
fmorac added a reviewer: mehdi_amini.
fmorac published this revision for review.Jun 30 2023, 6:29 AM
fmorac updated this revision to Diff 543179.Jul 22 2023, 6:51 AM

Updated location of ModuleToObject.h

mehdi_amini added inline comments.Jul 23 2023, 10:53 PM
mlir/include/mlir/Dialect/GPU/IR/GPUCompilationAttr.td
82 ↗(On Diff #536066)

Nit: hasFastMath since it returns a bool. I would think that getFastMath would return a FastMathFlag attribute.

23 ↗(On Diff #543179)

Wondering why is this in the gpu dialect and not a target specific one? (like nvvm)

mehdi_amini added inline comments.Jul 24 2023, 12:38 AM
mlir/lib/Dialect/GPU/Targets/NVPTXTarget.cpp
213 ↗(On Diff #543179)

We should have options to stop after emitting LLVMIR and PTX, seems like we have to go to cubin right now?

I would also think that going to PTX can be possible without requiring any dependency on the Cuda toolkit?

Do we have a way to test this without too much of the other code by the way?

Do we have a way to test this without too much of the other code by the way?

Not really? I added some on the final patches were you have all the infra ready. The best test I can think of at this point, is an unit test for this serializer?

mlir/include/mlir/Dialect/GPU/IR/GPUCompilationAttr.td
82 ↗(On Diff #536066)

I'll change it.

23 ↗(On Diff #543179)

I initially added these attributes (NVPTX & AMDGPU) to nvvm & rocdl, however I decided against that in the final patch because it added GPU dependencies to those dialects (includes & libs) and created more libs instead of a single (GPUTargets).

I'm open to change it.

mlir/lib/Dialect/GPU/Targets/NVPTXTarget.cpp
213 ↗(On Diff #543179)

These series of patches intend to be a full replacement of the current pipeline, upon approval we should intermediately deprecate the current pipeline and then remove it after a deprecation period.

The idea is that in future patches we change this behavior and stop at LLVM, but that requires having the LLVM Offload work done.

The benefit of this approach is that those future changes would be invisible to users and at the same time it makes them adopt this new mechanism now, that's why I go straight to cubin.

mehdi_amini added inline comments.Jul 24 2023, 10:47 PM
mlir/include/mlir/Dialect/GPU/IR/GPUCompilationAttr.td
23 ↗(On Diff #543179)

Where will the dependency come from? The GPUTargetAttrInterface?

Maybe we should revisit the way the interface is specified. Worse case we're just one level of indirection away :)

Here if the issue is just the gpu::TargetOptions, maybe it can be declared with the interface definition, which makes it an isolated library?

mlir/lib/Dialect/GPU/Targets/NVPTXTarget.cpp
213 ↗(On Diff #543179)

OK, thanks for explaining, can you add a TODO with this info in the code?

fmorac updated this revision to Diff 545485.Jul 30 2023, 5:07 PM
fmorac marked 4 inline comments as done.

Move the target attribute to NVVM and added it as an external model that it's promised.

fmorac updated this revision to Diff 545486.Jul 30 2023, 5:11 PM

Fix typo in unit test.

fmorac retitled this revision from [mlir][gpu] Adds the NVPTX target attribute. to [mlir][NVVM] Adds the NVVM target attribute..Jul 30 2023, 5:13 PM
fmorac edited the summary of this revision. (Show Details)
mehdi_amini added inline comments.Jul 30 2023, 5:53 PM
mlir/lib/Target/LLVM/NVVM/Target.cpp
56

Do we need to the init in the registration? Can we do the init when the serialization is called instead?

mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp
34

Not clear to me where we need the native target?

47

Can we tone this down? There is no reason to link all of MLIR into this unit-test.

fmorac updated this revision to Diff 545495.Jul 30 2023, 6:04 PM
fmorac marked 3 inline comments as done.

Addressed comments.

mehdi_amini accepted this revision.Aug 1 2023, 3:06 PM
This revision is now accepted and ready to land.Aug 1 2023, 3:06 PM
mehdi_amini requested changes to this revision.Aug 1 2023, 7:12 PM

(Waiting on update to use ptxas)

This revision now requires changes to proceed.Aug 1 2023, 7:12 PM
fmorac updated this revision to Diff 546432.Aug 2 2023, 6:16 AM

Swtiched to ptxas, added options for stopping compilation at LLVM IR & PTX, as well as unit tests testing the added functionalty.

mehdi_amini added a subscriber: guraypp.

Bring @tra here to get feedback on pros/cons of invoking ptxas through a temp file vs using the library APIs exposed by nvptxcompile.

Bring @tra here to get feedback on pros/cons of invoking ptxas through a temp file vs using the library APIs exposed by nvptxcompile.

One immediate benefit is that we don't longer have a dependency on the toolkit to build this. You can detect ptxas by setting the env variable CUDA_ROOT to the location of the toolkit or by adding it to PATH.

mlir/lib/Target/LLVM/NVVM/Target.cpp
56

Yes, it's possible I'll change it.

mlir/unittests/Target/LLVM/SerializeNVVMTarget.cpp
34

Not needed, removed.

47

Yes, I've removed it.

Bring @tra here to get feedback on pros/cons of invoking ptxas through a temp file vs using the library APIs exposed by nvptxcompile.

One immediate benefit is that we don't longer have a dependency on the toolkit to build this. You can detect ptxas by setting the env variable CUDA_ROOT to the location of the toolkit or by adding it to PATH.

I think compiling by ptxas has merits. One can different version of ptxas in case of performance regression.

guraypp added inline comments.Aug 3 2023, 7:35 AM
mlir/lib/Target/LLVM/NVVM/Target.cpp
291

It would be nice to pass parameters to ptxas from MLIR.

326

What do you think about passsing -v always to ptxas and print the output (perhaps stdout file) using llvm::errs.
I found -v very useful to see compile-time register usage and local memory.

guraypp added inline comments.Aug 3 2023, 7:43 AM
mlir/lib/Target/LLVM/NVVM/Target.cpp
287

Can we keep the PTX file? After codegen, it is very natural to look at the PTX file, if we keep the file via a flag, I think it would be great.

I implemented the dump-ptx flag to the gpu-to-cubin Pass earlier. I guess we cannot use that flag here.

fmorac marked 3 inline comments as done.Aug 3 2023, 9:47 AM
fmorac added inline comments.
mlir/lib/Target/LLVM/NVVM/Target.cpp
287

If you pass -debug-only=serialize-to-ptx (I'm changing it to serialize-to-isa) to mlir-opt then the PTX file will be printed to stdout. This mechanism also allows emitting PTX instead of binary, so you have multiple ways of obtaining the PTX.

291

You can, the args are passed through the targetOptions variable. Below you'll find that I add them to the PTX invocation.

326

We could consider adding a dedicated variable, but as it stands you can pass that variable through the cmdline and use --debug-only=serialize-to-binary to dump that output into stdout.

I think compiling by ptxas has merits. One can different version of ptxas in case of performance regression.

nvptxcompile provides the same facility I believe, doesn't it?

I think compiling by ptxas has merits. One can different version of ptxas in case of performance regression.

nvptxcompile provides the same facility I believe, doesn't it?

Though I haven't personally worked with nvptxcompile, it appears to be bundled with the toolkit. If one could use different toolkit version for PTX compilation, then yes it gives the same facility.

Recently, I used the driver (current MLIR compilation) and ptxas for PTX compilation. I noticed that even if I feed the same PTX code, the driver occasionally generates different SASS with ptxas. Maybe I hit a corner case but it was really hard to find the reason.

I believe it's crucial for us to know the potential disparities between nvptxcompile and ptxas if there is any.

fmorac marked 3 inline comments as done.EditedAug 4 2023, 9:51 AM

My 2 cents, as far as I know the nvptxcompiler and ptxas should produce the same code for a given release. However unlike ptxas the only way to change the version of nvptxcompiler is to recompile the NVVMTarget library with a different version.

Adding support for nvptxcompiler is like extra 20 lines of code in this patch, and a couple on CMake, how do we feel about having both?

If nvptxcompiler is detected during build then nvptxcompiler is used, if not, then ptxas is used, and we also give the user the ability to choose on CMake?

mehdi_amini added a comment.EditedAug 4 2023, 11:20 AM

If nvptxcompiler is detected during build then nvptxcompiler is used, if not, then ptxas is used, and we also give the user the ability to choose on CMake?

Adding a CMake option looks good to me, I would pick a default and stick to it (that is fail the cmake configuration with a helpful message).

I don't like "magic" fallback because detection is fragile and it makes the behavior dependent on the environment of the user. It's harder to know what you build, and getting bug reports it makes it also harder to know what's happening.

fmorac updated this revision to Diff 547733.Aug 7 2023, 5:18 AM

Added NVPTX compiler support & removed unnecessary fwd declarations. The CMake variable MLIR_ENABLE_NVPTXCOMPILER controls the usage of the NVPTXCompiler library and is disabled by default, as CMake 3.20 doesn't support CUDA::nvptxcompiler_static to find the library.

tra added inline comments.Aug 7 2023, 11:05 AM
mlir/lib/Target/LLVM/NVVM/Target.cpp
124

A word of caution -- some linux distributions scatter CUDA SDK across the 'standard' linux filsystem paths, so a single getToolkitPath() would not be able to find all the necessary bits, as libdevice and the binaries will be in different places. You may need additional heuristics along the lines of what clang driver does.
https://github.com/llvm/llvm-project/blob/1b74459df8a6d960f7387f0c8379047e42811f58/clang/lib/Driver/ToolChains/Cuda.cpp#L180

fmorac added inline comments.Aug 7 2023, 2:03 PM
mlir/lib/Target/LLVM/NVVM/Target.cpp
124

Thank you, that's good to know. I'll see how to rework it or add docs indicating how to make it work.
As it stands the user could specify the CUDA path to /usr/lib/cuda for libdevice, somewhat similar to --cuda-path.

For ptxas this mechanism searches several places in the following order:

  1. Command line option.
  2. PATH variable.
  3. The toolkit path as specified by a couple of env variables or the one detected by CMake.

So it would work on Debian & Ubuntu, but the mechanism could be over burdening the user.

fmorac updated this revision to Diff 548030.Aug 7 2023, 7:25 PM

Reapplied clang-format.
Adds a couple of syntax tests & fixes a bug not displaying the link option in the attribute.

mehdi_amini accepted this revision.Aug 7 2023, 7:37 PM

Thanks Fabian!

mlir/CMakeLists.txt
122

The description isn't super clear, because it conflicts with what "the NVPTX backend" is in LLVM.
What about: "Statically link the nvptxlibrary instead of calling ptxas as a subprocess for compiling PTX to cubin" ?

This revision is now accepted and ready to land.Aug 7 2023, 7:37 PM
fmorac updated this revision to Diff 548180.Aug 8 2023, 6:06 AM

Changed the description of the CMake variable MLIR_ENABLE_NVPTXCOMPILER. Applied clang-format again.

fmorac updated this revision to Diff 548238.Aug 8 2023, 8:19 AM

Fixed merge conflicts with D157183.

This revision was automatically updated to reflect the committed changes.