This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix GCNSubtarget::getMinNumVGPRs, add unit test to check consistency between GCNSubtarget's getMinNumVGPRs, getMaxNumVGPRs and getOccupancyWithNumVGPRs.
ClosedPublic

Authored by vpykhtin on Nov 21 2022, 8:32 AM.

Details

Summary
/// \returns Minimum number of VGPRs that meets given number of waves per
/// execution unit requirement supported by the subtarget.
unsigned getMinNumVGPRs(unsigned WavesPerEU) const;

/// \returns Maximum number of VGPRs that meets given number of waves per
/// execution unit requirement supported by the subtarget.
unsigned getMaxNumVGPRs(unsigned WavesPerEU) const;

/// Return the maximum number of waves per SIMD for kernels using \p VGPRs
/// VGPRs
unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const;

While working on RP tracking issues I noticed that getMinNumVGPRs return incorrect
values: the problem is large VGPR granule sizes on GFX10+ architectures. Some of the
occupancies aren't reachable because require the same amount of VGPR granules as others.
For example 19 waves occupancy on gfx1010 require the same amount of granules as 20 waves
so the resultng occupancy would be 20.

SGPRs have the same issue and even have inconsistency between getMaxNumSGPRs and getOccupancyWithNumSGPRs.
It will be addressed in the next patch.

Legend:

  1. MinVGPR and MaxVGPR are values returned by getMinNumVGPRs and getMaxNumVGPRs for a given Occ.
  2. (ONumber) is the value returned by getOccupancyWithNumVGPRs for a given MinVGPR or MaxVGPR.
  3. R means range problem: MinVGPR should be less than MaxVGPR and both should refer to the same occupancy.

Unit test output without the fix:

./build/unittests/Target/AMDGPU/AMDGPUTests --gtest_filter=AMDGPU.TestVGPRLimitsPerOccupancy --print-cpu-reg-limits

 gfx90a gfx940:
Occ    MinVGPR        MaxVGPR
  8        0 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     80  (O6)
  5       81 (O5)     96  (O5)
  4       97 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      257 (O1)     512 (O1)

 gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c:
Occ    MinVGPR        MaxVGPR
 10        0 (O10)    24  (O10)
  9       25 (O9)     28  (O9)
  8       29 (O8)     32  (O8)
  7       33 (O7)     36  (O7)
  6       37 (O6)     40  (O6)
  5       41 (O5)     48  (O5)
  4       49 (O4)     64  (O4)
  3       65 (O3)     84  (O3)
  2       85 (O2)     128 (O2)
  1      129 (O1)     256 (O1)

 gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    32  (O16)
 15       33 (O12) R  32  (O16)
 14       33 (O12) R  32  (O16)
 13       33 (O12) R  32  (O16)
 12       33 (O12)    40  (O12)
 11       41 (O10) R  40  (O12)
 10       41 (O10)    48  (O10)
  9       49 (O9)     56  (O9)
  8       57 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     80  (O6)
  5       81 (O5)     96  (O5)
  4       97 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      256 (O2) R   256 (O2)

 gfx1100w64 gfx1101w64:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    48  (O16)
 15       49 (O12) R  48  (O16)
 14       49 (O12) R  48  (O16)
 13       49 (O12) R  48  (O16)
 12       49 (O12)    60  (O12)
 11       61 (O10) R  60  (O12)
 10       61 (O10)    72  (O10)
  9       73 (O9)     84  (O9)
  8       85 (O8)     96  (O8)
  7       97 (O7)     108 (O7)
  6      109 (O6)     120 (O6)
  5      121 (O5)     144 (O5)
  4      145 (O4)     192 (O4)
  3      193 (O3)     252 (O3)
  2      253 (O2)     256 (O2)
  1      256 (O2) R   256 (O2)

 gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    64  (O16)
 15       65 (O12) R  64  (O16)
 14       65 (O12) R  64  (O16)
 13       65 (O12) R  64  (O16)
 12       65 (O12)    80  (O12)
 11       81 (O10) R  80  (O12)
 10       81 (O10)    96  (O10)
  9       97 (O9)     112 (O9)
  8      113 (O8)     128 (O8)
  7      129 (O7)     144 (O7)
  6      145 (O6)     160 (O6)
  5      161 (O5)     192 (O5)
  4      193 (O4)     256 (O4)
  3      256 (O4) R   256 (O4)
  2      256 (O4) R   256 (O4)
  1      256 (O4) R   256 (O4)

 gfx1100w32 gfx1101w32:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    96  (O16)
 15       97 (O12) R  96  (O16)
 14       97 (O12) R  96  (O16)
 13       97 (O12) R  96  (O16)
 12       97 (O12)    120 (O12)
 11      121 (O10) R  120 (O12)
 10      121 (O10)    144 (O10)
  9      145 (O9)     168 (O9)
  8      169 (O8)     192 (O8)
  7      193 (O7)     216 (O7)
  6      217 (O6)     240 (O6)
  5      241 (O5)     256 (O5)
  4      256 (O5) R   256 (O5)
  3      256 (O5) R   256 (O5)
  2      256 (O5) R   256 (O5)
  1      256 (O5) R   256 (O5)

 gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64:
Occ    MinVGPR        MaxVGPR
 20        0 (O20)    24  (O20)
 19       25 (O18) R  24  (O20)
 18       25 (O18)    28  (O18)
 17       29 (O16) R  28  (O18)
 16       29 (O16)    32  (O16)
 15       33 (O14) R  32  (O16)
 14       33 (O14)    36  (O14)
 13       37 (O12) R  36  (O14)
 12       37 (O12)    40  (O12)
 11       41 (O11)    44  (O11)
 10       45 (O10)    48  (O10)
  9       49 (O9)     56  (O9)
  8       57 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     84  (O6)
  5       85 (O5)     100 (O5)
  4      101 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      256 (O2) R   256 (O2)

 gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32:
Occ    MinVGPR        MaxVGPR
 20        0 (O20)    48  (O20)
 19       49 (O18) R  48  (O20)
 18       49 (O18)    56  (O18)
 17       57 (O16) R  56  (O18)
 16       57 (O16)    64  (O16)
 15       65 (O14) R  64  (O16)
 14       65 (O14)    72  (O14)
 13       73 (O12) R  72  (O14)
 12       73 (O12)    80  (O12)
 11       81 (O11)    88  (O11)
 10       89 (O10)    96  (O10)
  9       97 (O9)     112 (O9)
  8      113 (O8)     128 (O8)
  7      129 (O7)     144 (O7)
  6      145 (O6)     168 (O6)
  5      169 (O5)     200 (O5)
  4      201 (O4)     256 (O4)
  3      256 (O4) R   256 (O4)
  2      256 (O4) R   256 (O4)
  1      256 (O4) R   256 (O4)

After the fix:

 gfx90a gfx940:
Occ    MinVGPR        MaxVGPR
  8        0 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     80  (O6)
  5       81 (O5)     96  (O5)
  4       97 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      257 (O1)     512 (O1)

 gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c:
Occ    MinVGPR        MaxVGPR
 10        0 (O10)    24  (O10)
  9       25 (O9)     28  (O9)
  8       29 (O8)     32  (O8)
  7       33 (O7)     36  (O7)
  6       37 (O6)     40  (O6)
  5       41 (O5)     48  (O5)
  4       49 (O4)     64  (O4)
  3       65 (O3)     84  (O3)
  2       85 (O2)     128 (O2)
  1      129 (O1)     256 (O1)

 gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    32  (O16)
 15        0 (O16)    32  (O16)
 14        0 (O16)    32  (O16)
 13        0 (O16)    32  (O16)
 12       33 (O12)    40  (O12)
 11       33 (O12)    40  (O12)
 10       41 (O10)    48  (O10)
  9       49 (O9)     56  (O9)
  8       57 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     80  (O6)
  5       81 (O5)     96  (O5)
  4       97 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      169 (O2)     256 (O2)

 gfx1100w64 gfx1101w64:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    48  (O16)
 15        0 (O16)    48  (O16)
 14        0 (O16)    48  (O16)
 13        0 (O16)    48  (O16)
 12       49 (O12)    60  (O12)
 11       49 (O12)    60  (O12)
 10       61 (O10)    72  (O10)
  9       73 (O9)     84  (O9)
  8       85 (O8)     96  (O8)
  7       97 (O7)     108 (O7)
  6      109 (O6)     120 (O6)
  5      121 (O5)     144 (O5)
  4      145 (O4)     192 (O4)
  3      193 (O3)     252 (O3)
  2      253 (O2)     256 (O2)
  1      253 (O2)     256 (O2)

 gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    64  (O16)
 15        0 (O16)    64  (O16)
 14        0 (O16)    64  (O16)
 13        0 (O16)    64  (O16)
 12       65 (O12)    80  (O12)
 11       65 (O12)    80  (O12)
 10       81 (O10)    96  (O10)
  9       97 (O9)     112 (O9)
  8      113 (O8)     128 (O8)
  7      129 (O7)     144 (O7)
  6      145 (O6)     160 (O6)
  5      161 (O5)     192 (O5)
  4      193 (O4)     256 (O4)
  3      193 (O4)     256 (O4)
  2      193 (O4)     256 (O4)
  1      193 (O4)     256 (O4)

 gfx1100w32 gfx1101w32:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    96  (O16)
 15        0 (O16)    96  (O16)
 14        0 (O16)    96  (O16)
 13        0 (O16)    96  (O16)
 12       97 (O12)    120 (O12)
 11       97 (O12)    120 (O12)
 10      121 (O10)    144 (O10)
  9      145 (O9)     168 (O9)
  8      169 (O8)     192 (O8)
  7      193 (O7)     216 (O7)
  6      217 (O6)     240 (O6)
  5      241 (O5)     256 (O5)
  4      241 (O5)     256 (O5)
  3      241 (O5)     256 (O5)
  2      241 (O5)     256 (O5)
  1      241 (O5)     256 (O5)

 gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64:
Occ    MinVGPR        MaxVGPR
 20        0 (O20)    24  (O20)
 19        0 (O20)    24  (O20)
 18       25 (O18)    28  (O18)
 17       25 (O18)    28  (O18)
 16       29 (O16)    32  (O16)
 15       29 (O16)    32  (O16)
 14       33 (O14)    36  (O14)
 13       33 (O14)    36  (O14)
 12       37 (O12)    40  (O12)
 11       41 (O11)    44  (O11)
 10       45 (O10)    48  (O10)
  9       49 (O9)     56  (O9)
  8       57 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     84  (O6)
  5       85 (O5)     100 (O5)
  4      101 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      169 (O2)     256 (O2)

 gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32:
Occ    MinVGPR        MaxVGPR
 20        0 (O20)    48  (O20)
 19        0 (O20)    48  (O20)
 18       49 (O18)    56  (O18)
 17       49 (O18)    56  (O18)
 16       57 (O16)    64  (O16)
 15       57 (O16)    64  (O16)
 14       65 (O14)    72  (O14)
 13       65 (O14)    72  (O14)
 12       73 (O12)    80  (O12)
 11       81 (O11)    88  (O11)
 10       89 (O10)    96  (O10)
  9       97 (O9)     112 (O9)
  8      113 (O8)     128 (O8)
  7      129 (O7)     144 (O7)
  6      145 (O6)     168 (O6)
  5      169 (O5)     200 (O5)
  4      201 (O4)     256 (O4)
  3      201 (O4)     256 (O4)
  2      201 (O4)     256 (O4)
  1      201 (O4)     256 (O4)

Diff Detail

Event Timeline

vpykhtin created this revision.Nov 21 2022, 8:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 21 2022, 8:32 AM
vpykhtin requested review of this revision.Nov 21 2022, 8:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 21 2022, 8:32 AM
vpykhtin retitled this revision from [AMDGPU] Fix getMinNumVGPRs, add unit test to check consistency between getMinNumVGPRs, getMaxNumVGPRs and getOccupancyWithNumVGPRs. to [AMDGPU] Fix GCNSubtarget::getMinNumVGPRs, add unit test to check consistency between GCNSubtarget's getMinNumVGPRs, getMaxNumVGPRs and getOccupancyWithNumVGPRs..Nov 21 2022, 8:49 AM
vpykhtin edited the summary of this revision. (Show Details)
vpykhtin added reviewers: Restricted Project, rampitec.

I thought I noticed funny behavior of the min occupancy while working on D115559

llvm/unittests/Target/AMDGPU/AMDGPUUnitTests.cpp
21

Not sure why you need this

29–31

This is confusing. Probably should move this somewhere else and have a proper declaration for it. Also the name is too generic

33

static

62

raw_string_ostream or raw_svector_ostream

72

static

73–74

Can you go just off the features? This is going to be annoying to maintain (I thought we already had a feature to indicate it supports wave32 vs. wave64 separate from the mode itself, but I don't see it)

77–79

static const std::array?

81

static

87

DenseMap

99

Same as above

111

I'm not sure why you're printing everything to a string before printing it. Why not just directly print if printing is enabled?

122

Can you also check SGPRs and LDS?

arsenm requested changes to this revision.Nov 22 2022, 6:45 AM
This revision now requires changes to proceed.Nov 22 2022, 6:45 AM
vpykhtin marked 2 inline comments as done.Nov 22 2022, 12:10 PM
vpykhtin added inline comments.
llvm/unittests/Target/AMDGPU/AMDGPUUnitTests.cpp
29–31

I just reuse the function from the llvm/unittests/Target/AMDGPU/DwarfRegMappings.cpp, but this can be changed.

62

I have to work with std::string and std::ostream because of the unittest framework.

73–74

The problem is that I cannot know the feature until I have GCNSubtarget ready, but then I have to recreate the machine. Maybe I can test if the GPU has w32 mode and then run it again with w64 mode.

87

DenseMap doesn't work with std::string as a key.

111

This is how unittest framework works. It's multitreaded and uses std::string and std::ostream everywhere. DenseMap doesn't work with std::string.

The actual output happens here as an indication of error:

118  EXPECT_TRUE(ErrStr.empty()) << ErrStr;

When the test pass it shouldn't print anything. I added "--print-cpu-reg-limits" command line option just to be able to print entire results if needed.

122

This is planned with subsequent patches.

vpykhtin updated this revision to Diff 477411.Nov 23 2022, 1:11 AM
vpykhtin marked 2 inline comments as done.
vpykhtin edited the summary of this revision. (Show Details)

Per review issues:

  • Don't rely on arch name, use feature bits for wave32.
  • Refactored common code for unit tests, unified test namings.
vpykhtin marked 10 inline comments as done.
foad added a comment.Nov 23 2022, 3:22 AM
/// \returns Minimum number of VGPRs that meets given number of waves per
/// execution unit requirement supported by the subtarget.
unsigned getMinNumVGPRs(unsigned WavesPerEU) const;

/// \returns Maximum number of VGPRs that meets given number of waves per
/// execution unit requirement supported by the subtarget.
unsigned getMaxNumVGPRs(unsigned WavesPerEU) const;

I think your patch shows that this documentation needs improving here. My understanding is:

getMaxNumVGPRs returns the maximum number of VGPRs that you can use and still achieve at least the specified occupancy.
getMinNumVGPRs returns the minimum number of VGPRs that will prevent you from achieving more than the specified occupancy.

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
629

Why change terminology from "Occupancy" to "NumWaves"? AMDGPUBaseInfo seems to use "WavesPerEU" a lot instead of just "NumWaves".

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
1027

This is divideCeil(NumVGPRs, Granule) * Granule. But how about adding uint64_t alignUp(uint64_t Value, uint64_t Align) to MathExtras.h (like alignDown but no need for the Skew parameter) and using it here?

I think your patch shows that this documentation needs improving here. My understanding is:
getMaxNumVGPRs returns the maximum number of VGPRs that you can use and still achieve at least the specified occupancy.
getMinNumVGPRs returns the minimum number of VGPRs that will prevent you from achieving more than the specified occupancy.

Right, your description is precise, I'll change the documentation.

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
629

Do you think it would be more consistent to use WavesPerEU in AMDGPUBaseInfo? Like getNumWavesPerEUWithNumVGPRs?

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
1027

I just didn't want to change the function body since I moved it but if I changed the name the body can be also imporved :)
MathExtras.h already has alignTo, I'll use it here.

inline uint64_t alignTo(uint64_t Value, uint64_t Align) {
  assert(Align != 0u && "Align can't be 0.");
  return (Value + Align - 1) / Align * Align;
}
vpykhtin updated this revision to Diff 477694.Nov 24 2022, 12:35 AM
  • Updated description for getMinNumVGPRs, getMaxNumVGPRs
  • Used alignTo in getNumWavesWithNumVGPRs
  • Rebased
arsenm added inline comments.Nov 29 2022, 5:54 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
629

Yes

vpykhtin updated this revision to Diff 478597.Nov 29 2022, 7:50 AM
  • Changed function name to getNumWavesPerEUWithNumVGPRs.
  • Rebased.
vpykhtin marked 2 inline comments as done.Nov 29 2022, 7:51 AM
arsenm accepted this revision.Nov 29 2022, 12:42 PM

LGTM with nits

llvm/lib/Target/AMDGPU/GCNSubtarget.h
1221–1222

Remove "you from"

1227–1228

s/"you can use"/"can be used"

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
1027

I'd somewhat prefer non-moving changes to be separate, but doesn't really matter

llvm/unittests/Target/AMDGPU/AMDGPUUnitTests.cpp
101

StringMap?

This revision is now accepted and ready to land.Nov 29 2022, 12:42 PM
vpykhtin updated this revision to Diff 478951.Nov 30 2022, 7:40 AM

Fixed documentation.

vpykhtin marked 3 inline comments as done.Nov 30 2022, 7:45 AM
vpykhtin added inline comments.
llvm/unittests/Target/AMDGPU/AMDGPUUnitTests.cpp
101

I need std::string as a key, StringMap uses StringRef but it doesn't own string data. Its not a big deal here because its only used on test fail or when printing.

arsenm accepted this revision.Nov 30 2022, 8:00 AM
arsenm added inline comments.
llvm/unittests/Target/AMDGPU/AMDGPUUnitTests.cpp
101

DenseMap

vpykhtin marked 2 inline comments as done.Nov 30 2022, 8:51 AM
vpykhtin added inline comments.
llvm/unittests/Target/AMDGPU/AMDGPUUnitTests.cpp
101

DenseMap doesn't work with std::string, it's not specialized for it.

foad added a comment.Dec 1 2022, 2:57 AM

I have no objection to committing this.

The only thing I'd like to try cleaning up in future is that getMinNumVGPRs relies on calling getNumWavesPerEUWithNumVGPRs with a NumVGPRs value greater than getAddressableNumVGPRs and still getting sensible answers, which seems to me like a case that getNumWavesPerEUWithNumVGPRs should not have to handle. I.e. I think we should be able to do this:

--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -1020,6 +1020,7 @@ unsigned getAddressableNumVGPRs(const MCSubtargetInfo *STI) {
 
 unsigned getNumWavesWithNumVGPRs(const MCSubtargetInfo *STI,
                                  unsigned NumVGPRs) {
+  assert(NumVGPRs <= getAddressableNumVGPRs(STI));
   NumVGPRs = std::min(std::max(NumVGPRs, 1u), getAddressableNumVGPRs(STI));
   return std::min(getTotalNumVGPRs(STI) /
                       unsigned(alignTo(NumVGPRs, getVGPRAllocGranule(STI))),

but currently we can't.

llvm/unittests/Target/AMDGPU/AMDGPUUnitTests.cpp
101

StringMap<std:string>?

jmmartinez added inline comments.
llvm/unittests/Target/AMDGPU/AMDGPUUnitTests.cpp
33

Maybe replace std::string by StringRef ?

101

I need std::string as a key, StringMap uses StringRef but it doesn't own string data. Its not a big deal here because its only used on test fail or when printing.

@vpykhtin What do you mean that it doesn't own the string data? When creating the StringMapEntry (see StringMapEntryBase::allocateWithKey) it performs a copy of the input StringRef into its own internal buffer.

The awkward side is that you'll have to use raw_string_ostream for building the ErrStr as shown below (notice the call to .first() instead of .first):

std::string ErrStr;
raw_string_ostream OS(ErrStr);
for (auto &P : TablePerCPUs) {
  for (auto &CPUName : P.second)
    OS << ' ' << CPUName;
  OS << ":\nOcc    Min" << RegName << "        Max" << RegName << '\n'
     << P.first() << '\n';
}
EXPECT_TRUE(ErrStr.empty()) << ErrStr;
jmmartinez added inline comments.Dec 1 2022, 4:58 AM
llvm/unittests/Target/AMDGPU/CMakeLists.txt
6–9

I think AMDGPUUtils is missing from the list. Otherwise I get an undefined reference to llvm::AMDGPU::IsaInfo::getMaxNumVGPRs(llvm::MCSubtargetInfo const*, unsigned int) on my local build.

vpykhtin updated this revision to Diff 479328.Dec 1 2022, 9:06 AM
vpykhtin marked an inline comment as done.
  • Replaced call to getNumWavesPerEUWithNumVGPRs with NumVGPRs above addressable with better code.
  • Assert cannot be added right now at getNumWavesPerEUWithNumVGPRs because it fail other tests.
  • Better naming.

The only thing I'd like to try cleaning up in future is that getMinNumVGPRs relies on calling getNumWavesPerEUWithNumVGPRs with a NumVGPRs value greater than getAddressableNumVGPRs and still getting sensible answers, which seems to me like a case that getNumWavesPerEUWithNumVGPRs should not have to handle.

@foad I replaced call to getNumWavesPerEUWithNumVGPRs with a NumVGPRs value greater than getAddressableNumVGPRs with:

if (MaxNumVGPRs == alignDown(TotNumVGPRs / MaxWavesPerEU, Granule))
  return 0;

but I cannot add assert now because it fails other tests at the moment.

llvm/unittests/Target/AMDGPU/AMDGPUUnitTests.cpp
101

This way you specify Value type as std::string but not the key.

101

<< P.first() << '\n';

I don't see () in my code.

101

When creating the StringMapEntry (see StringMapEntryBase::allocateWithKey) it performs a copy of the input StringRef into its own internal buffer

Ok, but here is the final argument: strings are sorted at the map and this gives nice picture at output :) This map is only used on error or printing, it's not used during usual llvm-check routine so there is no penalty.

llvm/unittests/Target/AMDGPU/CMakeLists.txt
6–9

not sure, my build is ok, I build AMDGPUTests executable. Probably should go to another patch

jmmartinez added inline comments.Dec 2 2022, 8:14 AM
llvm/unittests/Target/AMDGPU/AMDGPUUnitTests.cpp
101

Sorry, I might have not been clear. I meant the .first() in my code snippet.

101

Pretty-printing as an argument sounds good to me.

llvm/unittests/Target/AMDGPU/CMakeLists.txt
6–9

Are you building with -DBUILD_SHARED_LIBS=OFF ? If that's the case please consier reproducing the problem with -DBUILD_SHARED_LIBS=ON and adding the modification I suggested, to avoid breaking other peoples's builds.

vpykhtin updated this revision to Diff 480032.Dec 5 2022, 2:52 AM
  • Rebased.
  • Added AMDGPUUtils reference to CMakelists.txt
vpykhtin marked an inline comment as done.Dec 5 2022, 2:53 AM
vpykhtin added inline comments.
llvm/unittests/Target/AMDGPU/CMakeLists.txt
6–9

Done, thank you!

This revision was landed with ongoing or failed builds.Dec 6 2022, 12:15 AM
This revision was automatically updated to reflect the committed changes.
vpykhtin marked an inline comment as done.