This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Start documenting calling conventions. NFC
ClosedPublic

Authored by rovka on Jun 2 2023, 6:44 AM.

Details

Summary

Add a section to AMDGPUUsage.rst about calling conventions and list the
ones from the CallingConv enum. Full descriptions can come later (help
appreciated).

Diff Detail

Event Timeline

rovka created this revision.Jun 2 2023, 6:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2023, 6:44 AM
rovka requested review of this revision.Jun 2 2023, 6:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2023, 6:44 AM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Jun 2 2023, 12:14 PM
llvm/docs/AMDGPUUsage.rst
1084

"graphics targets?" Doesn't really say how it's not an entry point

There could probably be a general section here to state that in the graphics calling conventions, inreg is used to denote arguments mapped to SGPRs, while other arguments are mapped to VGPRs.

llvm/docs/AMDGPUUsage.rst
1050–1054

The logic between Mesa and AMDPAL here is actually the same.

Can you reupload with full context? If you're using the web interface I think https://www.llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface is still accurate, so something like:

git show HEAD -U999999 > mypatch.patch

I ask because I think this should be cross-referenced with (or maybe even combined with) the existing Call Convention docs around line ~13500:

Call Convention
~~~~~~~~~~~~~~~

.. note::

  This section is currently incomplete and has inaccuracies. It is WIP that will
  be updated as information is determined.

See :ref:`amdgpu-dwarf-address-space-identifier` for information on swizzled
addresses. Unswizzled addresses are normal linear addresses.

.. _amdgpu-amdhsa-function-call-convention-kernel-functions:

Kernel Functions
++++++++++++++++

This section describes the call convention ABI for the outer kernel function.
...
llvm/docs/AMDGPUUsage.rst
1039

Nit: capitalize

1049

The omission of the default ccc convention seems like it might lead to confusion. The backend supports it, so I would suggest just adding it at the front of the list and making it clear that it is the default.

scchan added a subscriber: scchan.Jun 6 2023, 3:50 PM
scchan added inline comments.
llvm/docs/AMDGPUUsage.rst
1048

Nit: this list seems to be in a random order, can we sort this in alphabetical order or organize it in a more meaningful way?

rovka updated this revision to Diff 529266.Jun 7 2023, 5:27 AM

Address review comments.

rovka marked 3 inline comments as done.Jun 7 2023, 5:32 AM
rovka added inline comments.
llvm/docs/AMDGPUUsage.rst
1048

Ok, I alphabetized it for now, except for the ccc which looks good at the top imo. I'm open to other suggestions.

1084

"graphics targets?"

...apparently :) I just copied the comments from the enum, I have no idea what this is used for (one of the reasons why I thought we should start writing these things up in more detail).

Doesn't really say how it's not an entry point

I mentioned that now. Ideally we'd describe it more thoroughly in a follow-up patch.

rovka added a comment.Jun 7 2023, 5:59 AM

Can you reupload with full context? If you're using the web interface I think https://www.llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface is still accurate, so something like:

git show HEAD -U999999 > mypatch.patch

Sorry, I haven't used the web interface in a while so I forgot that part... Anyway, I got arc working again now, so it shouldn't happen again in the future.

I ask because I think this should be cross-referenced with (or maybe even combined with) the existing Call Convention docs around line ~13500:

Call Convention
~~~~~~~~~~~~~~~

.. note::

  This section is currently incomplete and has inaccuracies. It is WIP that will
  be updated as information is determined.

See :ref:`amdgpu-dwarf-address-space-identifier` for information on swizzled
addresses. Unswizzled addresses are normal linear addresses.

.. _amdgpu-amdhsa-function-call-convention-kernel-functions:

Kernel Functions
++++++++++++++++

This section describes the call convention ABI for the outer kernel function.
...

I added a :ref: to this from `amdgpu_kernel. There should probably be a reference to amdgpu-amdhsa-function-call-convention-non-kernel-functions` somewhere but I'm not sure where. Is this actually the ccc?

Can you reupload with full context? If you're using the web interface I think https://www.llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface is still accurate, so something like:

git show HEAD -U999999 > mypatch.patch

Sorry, I haven't used the web interface in a while so I forgot that part... Anyway, I got arc working again now, so it shouldn't happen again in the future.

No worries!

I ask because I think this should be cross-referenced with (or maybe even combined with) the existing Call Convention docs around line ~13500:

Call Convention
~~~~~~~~~~~~~~~

.. note::

  This section is currently incomplete and has inaccuracies. It is WIP that will
  be updated as information is determined.

See :ref:`amdgpu-dwarf-address-space-identifier` for information on swizzled
addresses. Unswizzled addresses are normal linear addresses.

.. _amdgpu-amdhsa-function-call-convention-kernel-functions:

Kernel Functions
++++++++++++++++

This section describes the call convention ABI for the outer kernel function.
...

I added a :ref: to this from `amdgpu_kernel. There should probably be a reference to amdgpu-amdhsa-function-call-convention-non-kernel-functions` somewhere but I'm not sure where. Is this actually the ccc?

Perfect, thank you!

And yes, AFAIU we just use the default ccc for non-kernel functions. I am actually only basing this off the observation that the IR doesn't print an explicit CC, though; I'm not really sure what else it could be?

rovka updated this revision to Diff 529521.Jun 8 2023, 12:54 AM

Add ref to the description of the calling convention for AMDHSA non-kernel functions to the entry about the ccc.

arsenm accepted this revision.Jun 8 2023, 5:46 AM
arsenm added inline comments.
llvm/docs/AMDGPUUsage.rst
1080

Also fastcc but currently treated identically to the default ccc

This revision is now accepted and ready to land.Jun 8 2023, 5:46 AM
rovka updated this revision to Diff 529926.Jun 9 2023, 5:28 AM

Added the fast and cold calling conventions. AFAICT the only difference between these and the C calling convention is how hard we try to TCO.

sebastian-ne added inline comments.
llvm/docs/AMDGPUUsage.rst
1096–1099

For the description: A difference between amdgpu_gfx and the C calling convention is that in amdgpu_gfx, SGPR arguments are callee-save (preserved), in the C cc, they are not.
Otherwise, there is not much difference.

rovka added inline comments.Jun 13 2023, 4:16 AM
llvm/docs/AMDGPUUsage.rst
1096–1099

Surely that's one difference, but I'm not 100% convinced it's the only one. I think we should leave the more detailed description for a future patch.

nhaehnle accepted this revision.Jun 19 2023, 9:11 AM
This revision was automatically updated to reflect the committed changes.

There could probably be a general section here to state that in the graphics calling conventions, inreg is used to denote arguments mapped to SGPRs, while other arguments are mapped to VGPRs.

@arsenm does compute have this available? I've periodically wanted to be able to write a function which assumes a parameter is passed in SGPRs and hope for a compile time error calling it if that doesn't work out.

@arsenm does compute have this available? I've periodically wanted to be able to write a function which assumes a parameter is passed in SGPRs and hope for a compile time error calling it if that doesn't work out.

No, I just consider this a bug. The handling for inreg was artificially limited to amdgpu_gfx. You would also get a waterfall loop, not an error.

Semi-related, I think we should start interpreting i1 argument values as SGPR masks (i1 inreg would be extended to sreg32)

Semi-related, I think we should start interpreting i1 argument values as SGPR masks (i1 inreg would be extended to sreg32)

Yeah, that makes a lot of sense. I think one challenge for compute to truly leverage this is that I believe Clang emits bool as i8. Or is that only in struct types?

Semi-related, I think we should start interpreting i1 argument values as SGPR masks (i1 inreg would be extended to sreg32)

Yeah, that makes a lot of sense. I think one challenge for compute to truly leverage this is that I believe Clang emits bool as i8. Or is that only in struct types?

If it's a pure bool I know you get zeroext i1. It does look like struct elements give i8