This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][libomptarget] Improve plugin device info printing
ClosedPublic

Authored by kevinsala on Apr 12 2023, 3:39 PM.

Details

Summary

This patch improves the device info printing in the NextGen plugins. The device info properties are composed of keys and values, each property encapsulated by a PrintInfoTy object. These properties are pushed into a std::deque by each vendor-specific plugins, and later, processed and printed by the PluginInterface. This implementation extensively uses std::string, but it should not be a performance issue, since it's not in any critical path.

For the moment, this patch adds the device info for AMDGPU. It's missing the same changes for the CUDA plugin.

Diff Detail

Event Timeline

kevinsala created this revision.Apr 12 2023, 3:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2023, 3:39 PM
kevinsala requested review of this revision.Apr 12 2023, 3:39 PM

Example of a device info printing for a AMDGPU device:

HSA Runtime Version:                 1.1
HSA OpenMP Device Number:            1
Product Name:                        
Device Name:                         gfx906
Vendor Name:                         AMD
Device Type:                         GPU
Max Queues:                          128
Queue Min Size:                      64
Queue Max Size:                      131072
Cache:                               
    L0:                              16384
    L1:                              8388608
Cacheline Size:                      64
Max Clock Freq(MHz):                 1725
Compute Units:                       60
SIMD per CU:                         4
Fast F16 Operation:                  TRUE
Wavefront Size:                      64
Workgroup Max Size:                  1024
Workgroup Max Size per Dimension:    
    x:                               1024
    y:                               1024
    z:                               1024
Max Waves Per CU:                    40
Max Work-item Per CU:                2560
Grid Max Size:                       4294967295
Grid Max Size per Dimension:         
    x:                               4294967295
    y:                               4294967295
    z:                               4294967295
Max fbarriers/Workgrp:               32
Memory Pools:                        
    Pool GLOBAL:                     
        Flags:                       COARSE GRAINED 
        Size:                        34342961152 bytes
        Allocatable:                 TRUE
        Runtime Alloc Granule:       4096 bytes
        Runtime Alloc Alignment:     4096 bytes
        Accessable by all:           FALSE
    Pool GROUP:                      
        Size:                        65536 bytes
        Allocatable:                 FALSE
        Runtime Alloc Granule:       0 bytes
        Runtime Alloc Alignment:     0 bytes
        Accessable by all:           FALSE
ISAs:                                
    Name:                            amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-

Notice it's very similar to the device information printed by the original AMDGPU plugin.

kevinsala updated this revision to Diff 515248.Apr 20 2023, 1:38 AM
kevinsala added a reviewer: josemonsalve2.

Added support for CUDA plugin and improved generic code.

Examples of llvm-omp-device-info in AMDGPU and NVIDIA devices:

Device (7):
    CUDA Driver Version:              10020
    CUDA OpenMP Device Number:        3
    Device Name:                      Tesla V100-SXM2-16GB
    Global Memory Size:               16911433728 bytes
    Number of Multiprocessors:        80
    Concurrent Copy and Execution:    Yes
    Total Constant Memory:            65536 bytes
    Max Shared Memory per Block:      49152 bytes
    Registers per Block:              65536
    Warp Size:                        32
    Maximum Threads per Block:        1024
    Maximum Block Dimensions:         
        x:                            1024
        y:                            1024
        z:                            64
    Maximum Grid Dimensions:          
        x:                            2147483647
        y:                            65535
        z:                            65535
    Maximum Memory Pitch:             2147483647 bytes
    Texture Alignment:                512 bytes
    Clock Rate:                       1530000 kHz
    Execution Timeout:                No
    Integrated Device:                No
    Can Map Host Memory:              Yes
    Compute Mode:                     Default
    Concurrent Kernels:               Yes
    ECC Enabled:                      Yes
    Memory Clock Rate:                877000 kHz
    Memory Bus Width:                 4096 bits
    L2 Cache Size:                    6291456 bytes
    Max Threads Per SMP:              2048
    Async Engines:                    4
    Unified Addressing:               Yes
    Managed Memory:                   Yes
    Concurrent Managed Memory:        Yes
    Preemption Supported:             Yes
    Cooperative Launch:               Yes
    Multi-Device Boars:               No
    Compute Capabilities:             sm_70
Device (5):
    HSA Runtime Version:                 1.1
    HSA OpenMP Device Number:            1
    Product Name:                        
    Device Name:                         gfx906
    Vendor Name:                         AMD
    Device Type:                         GPU
    Max Queues:                          128
    Queue Min Size:                      64
    Queue Max Size:                      131072
    Cache:                               
        L0:                              16384
        L1:                              8388608
    Cacheline Size:                      64
    Max Clock Freq:                      1725 MHz
    Compute Units:                       60
    SIMD per CU:                         4
    Fast F16 Operation:                  Yes
    Wavefront Size:                      64
    Workgroup Max Size:                  1024
    Workgroup Max Size per Dimension:    
        x:                               1024
        y:                               1024
        z:                               1024
    Max Waves Per CU:                    40
    Max Work-item Per CU:                2560
    Grid Max Size:                       4294967295
    Grid Max Size per Dimension:         
        x:                               4294967295
        y:                               4294967295
        z:                               4294967295
    Max fbarriers/Workgrp:               32
    Memory Pools:                        
        Pool Global:                     
            Flags:                       Coarse Grained 
            Size:                        34342961152 bytes
            Allocatable:                 Yes
            Runtime Alloc Granule:       4096 bytes
            Runtime Alloc Alignment:     4096 bytes
            Accessable by all:           No
        Pool Group:                      
            Size:                        65536 bytes
            Allocatable:                 No
            Runtime Alloc Granule:       0 bytes
            Runtime Alloc Alignment:     0 bytes
            Accessable by all:           No
    ISAs:                                
        Name:                            amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
kevinsala added inline comments.Apr 20 2023, 1:51 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
124

Can be const.

tianshilei1992 added inline comments.Apr 20 2023, 5:33 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
144

why not just printf?

jhuber6 added inline comments.Apr 20 2023, 5:34 AM
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
144

We could even use llvm::outs() if so inclined.

kevinsala updated this revision to Diff 515331.Apr 20 2023, 7:45 AM

Fixing review comments and other improvements.

kevinsala marked 2 inline comments as done.Apr 20 2023, 7:46 AM
kevinsala added inline comments.
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
144

It looks cleaner with llvm::outs()

Other than the comment this looks good to me, cleaner than before. Thanks Kevin.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
88

Maybe, add a comment to explain what level means.

/// Level represents the level in the info tree print (i.e. indentation)

Or something like that?

kevinsala updated this revision to Diff 515644.Apr 21 2023, 1:42 AM
kevinsala marked an inline comment as done.

Fixed reviewers' comments

kevinsala updated this revision to Diff 515645.Apr 21 2023, 1:42 AM
kevinsala marked an inline comment as done.

Fixed format

This revision is now accepted and ready to land.May 6 2023, 8:28 PM