Page MenuHomePhabricator

[LIBOMPTARGET] Adding AMD to llvm-omp-device-info
ClosedPublic

Authored by josemonsalve2 on Jun 1 2022, 3:46 PM.

Details

Summary

Adding device information print for AMD devices on the
llvm-omp-device-info command line tool. The output is inspired by
the rocminfo command line tool.

This commit adds missing HSA functions, enums and structs
needed to query additional information from the HSA agents.
A generic message for the generic-elf-64bit plugin is also added

Example of an output:

llvm-omp-device-info
Device (0):
    This is a generic-elf-64bit device

Device (1):
    This is a generic-elf-64bit device

Device (2):
    This is a generic-elf-64bit device

Device (3):
    This is a generic-elf-64bit device

Device (4):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           0
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (5):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           1
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (6):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           2
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Device (7):
    HSA Runtime Version:                1.1
    HSA OpenMP Device Number:           3
    Device Name:                        gfx906
    Vendor Name:                        AMD
    Device Type:                        GPU
    Max Queues:                         128
    Queue Min Size:                     64
    Queue Max Size:                     131072
    Cache:
      L0:                               16384 bytes
      L1:                               8388608 bytes
    Cacheline Size:                     64
    Max Clock Freq(MHz):                1725
    Compute Units:                      60
    SIMD per CU:                        4
    Fast F16 Operation:                 TRUE
    Wavefront Size:                     64
    Workgroup Max Size:                 1024
    Workgroup Max Size per Dimension:
      x:                                1024
      y:                                1024
      z:                                1024
    Max Waves Per CU:                   40
    Max Work-item Per CU:               2560
    Grid Max Size:                      4294967295
    Grid Max Size per Dimension:
      x:                                4294967295
      y:                                4294967295
      z:                                4294967295
    Max fbarriers/Workgrp:              32
    Memory Pools:
      Pool GLOBAL; FLAGS: COARSE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GLOBAL; FLAGS: FINE GRAINED, :
        Size:                            34342961152 bytes
        Allocatable:                     TRUE
        Runtime Alloc Granule:           4096 bytes
        Runtime Alloc alignment:         4096 bytes
        Accessable by all:               FALSE
      Pool GROUP:
        Size:                            65536 bytes
        Allocatable:                     FALSE
        Runtime Alloc Granule:           0 bytes
        Runtime Alloc alignment:         0 bytes
        Accessable by all:               FALSE

Diff Detail

Event Timeline

josemonsalve2 created this revision.Jun 1 2022, 3:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 3:46 PM
josemonsalve2 requested review of this revision.Jun 1 2022, 3:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 3:46 PM
JonChesterfield added a comment.EditedJun 1 2022, 9:56 PM

I think this is reasonable, assuming that's a feature we want from this tool. Not keen on the interleaving of HSA calls and printf as it makes it relatively difficult to read but the overall complexity is still low. Will leave it open for other comments but intend to accept.

edit: the formatted output is not easily parsable which is annoying but does look very like rocm-info, so maybe that's fine. I'd have been tempted to render it as JSON or similar by default, and add a helper that turns that format into the rocm-info format if requested. Would be more work than the current patch.

openmp/libomptarget/plugins/amdgpu/dynamic_hsa/hsa_ext_amd.h
61

Several of these fields are unused, not sure it's good to add them

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
555

Can this be static or a free function? Lots of state in it but I'm hopeful it doesn't mutate anything

Could you please add support for HSA_ISA_INFO_NAME using hsa_isa_get_info_alt?
This one gives triple and target ID information, like amdgcn-amd-amdhsa--gfx908:sramecc-:xnack-

@saiislam I will look into that.

@JonChesterfield I look at the rocm-info code and they have different structs for all the different fields. I just didn't want to do all that, since it is not really that important for the runtime itself. However, the idea of the JSON file is good. Let's consider it and re-structure this.

One option could be to add a device info layer, with all the details, that can be access by the runtime, as well as output in text and JSON. But this should be an extension of this.

openmp/libomptarget/plugins/amdgpu/dynamic_hsa/hsa_ext_amd.h
61

Ok, I will remove the ones that are not used

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
555

That's reasonable. It should not mutate anything

I have added the HSA_ISA_INFO_NAME, made the function static, and removed
the unused elements in the enums.

JonChesterfield added inline comments.Jun 2 2022, 3:48 PM
openmp/libomptarget/plugins/amdgpu/dynamic_hsa/hsa_ext_amd.h
61

I guess that was unclear, I meant for all the enums. E.g. a brief manual search suggests HSA_REGION_INFO_RUNTIME_ALLOC_ALLOWED is dead. Would suggest commenting out all of the new ones then recompiling as a crude but quick way to produce the minimal set.

I could probably be persuaded we can have dead code here that happens to match the downstream hsa.h as it's supposed to be a stable interface but I have vague fears of the numbers diverging between the two and confusion resulting. Perhaps the right thing to do is create a test case that checks this file against a hsa.h and errors if they diverge.

I missed those. I will check well with all of them.

There's one problem with the hsa_cache_info_t that the version I am using is deprecated, but the one version that is current (commented in the code) breaks. It segfaults. I think it is best to use the deprecated version as this is used also by default in rocminfo. However, I am not sure if I should leave the commented code, and also the hsa_cache_info_t struct.

Thanks. The comment about using the deprecated interface because the new one segfaults is good, but the commented out code should go. It's straightforward to reimplement later if we wish to.

Cleaning up code to remove unused enums. Removed the commented code with cache_iteration, and all related funcitons.

kosarev added a reviewer: Restricted Project.Jun 8 2022, 3:31 AM

I have added the HSA_ISA_INFO_NAME, made the function static, and removed
the unused elements in the enums.

Thank you!

JonChesterfield accepted this revision.Jun 8 2022, 4:31 AM

I think that's all the comments addressed. Thanks!

This revision is now accepted and ready to land.Jun 8 2022, 4:31 AM

Thanks Jon for all the comments.

There's an error that I think it is from clang-format, do I need to address it before merging?

I check and clang-format will change the order of two header files. It was not introduced by me.

Thanks

jhuber6 added a comment.EditedJun 8 2022, 5:35 AM

Thanks Jon for all the comments.

There's an error that I think it is from clang-format, do I need to address it before merging?

I check and clang-format will change the order of two header files. It was not introduced by me.

Thanks

This should format only the portions you introduced or changed.

git clang-format HEAD~1
arsenm added a subscriber: arsenm.Jun 8 2022, 5:55 AM

Seems unfortunate that the vendor name reported by HSA isn't identical to the one reported by OpenCL

Running clang-format

This revision was automatically updated to reflect the committed changes.
HaohaiWen added inline comments.
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
750

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp:750:7: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]
There's warning for this switch body.

josemonsalve2 reopened this revision.Jun 9 2022, 3:27 AM

Ah! Sorry about this. Quick fix. Working on it.

This revision is now accepted and ready to land.Jun 9 2022, 3:27 AM

Fixing the warning issue on the switch statement.

This revision was automatically updated to reflect the committed changes.