This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][Docs] Update Code Object V3 example to includes args section
ClosedPublic

Authored by krzysz00 on Jun 30 2021, 12:04 PM.

Details

Summary

The documentation for the AMDGPU assembler's examples don't show the
.args section, which, if ommitted, will cause arguments to silently
not be passed into the kernel. This commit fixes this issue.

Diff Detail

Event Timeline

krzysz00 created this revision.Jun 30 2021, 12:04 PM
krzysz00 requested review of this revision.Jun 30 2021, 12:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2021, 12:04 PM
krzysz00 added a reviewer: Restricted Project.Jun 30 2021, 12:05 PM

LGTM! I'd like to hear from @t-tye or @kzhuravl , whom to my knowledge are contributors to this documentation.

Thank you for improving the docs! This definitely just looks like an oversight on my part when implementing the new metadata.

I have a couple small requests, but otherwise LGTM

llvm/docs/AMDGPUUsage.rst
12191–12196

The argument in this case it a pointer, which I believe we would describe with .value_kind: global_buffer.

We are also using flat store, so this likely should have .address_space: generic, although I think in the frontend we coerce pointer arguments to the global address space because we know the arguments are not in e.g. scratch; with that in mind I think .address_space: global may actually be correct? The TODO in the description of .address_space reads:

.. TODO
                                                        Is "global_buffer" only "global"
                                                        or "constant"? Is
                                                        "dynamic_shared_pointer" always
                                                        "local"? Can HCC allow "generic"?
                                                        How can "private" or "region"
                                                        ever happen?

Maybe one of @kzhuravl @t-tye @rampitec know what the .address_space should be in this case?

Also, expanding on this, would it be useful for us to include a source-language "equivalent" for the example? I.e. the following HIP program should compile to something essentially equivalent:

__global__ void hello_world(float *p) {
  *p = 3.14159f;
}

My understanding of the documentation is that global_buffer causes the kernel launcher to put the argument into a buffer. Is that not the case?

t-tye added a comment.Jun 30 2021, 9:17 PM

My understanding of the documentation is that global_buffer causes the kernel launcher to put the argument into a buffer. Is that not the case?

For compute we are not using buffers in the sense of graphics. I think buffer is more in the spirit (and was the original motivation for) the runtime managing OpenCL buffers.

@kzhuravl or @scott.linder can you actually get a dump of a real HIP program to see what is being used? @yaxunl may also be able to advise.

My understanding of the documentation is that global_buffer causes the kernel launcher to put the argument into a buffer. Is that not the case?

For compute we are not using buffers in the sense of graphics. I think buffer is more in the spirit (and was the original motivation for) the runtime managing OpenCL buffers.

@kzhuravl or @scott.linder can you actually get a dump of a real HIP program to see what is being used? @yaxunl may also be able to advise.

We seem to generate the global_buffer kind for a HIP kernel with a pointer argument: https://gcc.godbolt.org/z/q71nfEs91

I also looked into the address_space question briefly, and at least our OpenCL runtime does not seem to support the generic address space on global_buffer arguments.

krzysz00 updated this revision to Diff 355901.Jul 1 2021, 8:53 AM
  • Add source-language example, per review comments
krzysz00 updated this revision to Diff 355961.Jul 1 2021, 11:45 AM
  • Update argument type to global_buffer

@scott.linder Thanks for checking up on that, I've updated the documentation (and the code that I was writing that spawned this PR) accordingly.

scott.linder accepted this revision.EditedJul 6 2021, 7:49 AM

LGTM, thank you!

If someone notices something I missed, please just follow up with another review, or mention it here.

As a note to to myself, it would likely be a good idea to also pin down the remaining TODO's in the relevant docs for the metadata keys. We could also reword the description of the global_buffer or add some text to make it more obvious to the reader how it relates to source language semantics and runtime concepts. The relevant code in ROCclr seems to be:

There also seems to be a workaround for Image/Pipe arguments at https://github.com/ROCm-Developer-Tools/ROCclr/blob/5cefcaf62893fcd86c8feed6bb1ebb84850fcd2f/device/devkernel.cpp#L1336 that should be looked into (assuming we haven't already addressed it).

Edit: it turns out ROCclr actually just considers "generic" to be "global" for the purposes of global_buffer arguments: https://github.com/ROCm-Developer-Tools/ROCclr/blob/5cefcaf62893fcd86c8feed6bb1ebb84850fcd2f/device/devkernel.hpp#L277

This revision is now accepted and ready to land.Jul 6 2021, 7:49 AM

I don't have write access - can someone land this?