This is an archive of the discontinued LLVM Phabricator instance.

[llvm][nvptx] Add sm_90a
ClosedPublic

Authored by guraypp on Jul 20 2023, 8:29 AM.

Details

Summary

This works adds sm_90a as nvptx target. sm_90a is required to generate wgmma and setmaxnreg instructions.

Here is information about "a" prefix in PTX document:
Target architectures with suffix “a”, such as sm_90a, include architecture-accelerated features that are supported on the specified architecture only, hence such targets do not follow the onion layer model. Therefore, PTX code generated for such targets cannot be run on later generation devices. Architecture-accelerated features can only be used with targets that support these features.

Diff Detail

Event Timeline

guraypp created this revision.Jul 20 2023, 8:29 AM
guraypp requested review of this revision.Jul 20 2023, 8:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 8:29 AM

I am using NVPTX backend from MLIR. Adding sm_90a like this works fine to generate wgmma instructions in my case.

Please let me if I am missing something here.

tra added a comment.Jul 20 2023, 10:16 AM

It's unfortunate that NVIDIA decided to break existing naming convention. :-/

PTX code generated for such targets cannot be run on later generation devices. Architecture-accelerated features can only be used with targets that support these features.

We do have a number of places where we compare SM values, so adding support for sm90a will require changing the way we encode SM version and the way we determine availability of various instructions/intrinsics/builtins. Properly implementing it will be a bit of a pain.

This patch effectively makes sm_90a a synonym for sm_90, only with additional constraint of requiring PTX 8.0.

It does not hurt, but it also does not do anything you would not be able to do by specifying sm_90 and ptx80 manually. Do you really need this change? Or do you plan making further changes to fully implement support for sm_90a?

llvm/lib/Target/NVPTX/NVPTX.td
46

Nit: I'd move it up to where we define other SMxx records.

Thanks for review! Having a synonym sm_90a is actually a big help for generating .target sm_90a in PTX code. ptxas throws an error for wgmma instructions with .target sm_90.

By the way, wgmma are new tensor core instructions for hopper, and they're only supported for sm_90a. MLIR generates them as inline assembly, and one can generate them using asm in CUDA (like CUTLASS).

It would be really great if we can land this workaround until we find a proper solution.

I saw that sm_90a requires PTX 8.0, see below:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes-ptx-release-history

tra accepted this revision.Jul 20 2023, 1:24 PM

SGTM.

This revision is now accepted and ready to land.Jul 20 2023, 1:24 PM
guraypp updated this revision to Diff 542760.Jul 20 2023, 9:26 PM

move sm90a definition to up

This revision was landed with ongoing or failed builds.Jul 24 2023, 6:12 AM
This revision was automatically updated to reflect the committed changes.