This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Calculate number of min/max SGPRs/VGPRs for WavesPerEU instead of using switch statement
ClosedPublic

Authored by kzhuravl on Feb 8 2017, 5:28 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

kzhuravl created this revision.Feb 8 2017, 5:28 PM
arsenm accepted this revision.Feb 8 2017, 5:37 PM

LGTM

This revision is now accepted and ready to land.Feb 8 2017, 5:37 PM
tony-tye added inline comments.Feb 8 2017, 5:53 PM
lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
211–233 ↗(On Diff #87740)

Should this also have a calculation that takes into account if a trap handler is present?

243 ↗(On Diff #87740)

Shouldn't this take into account the presence of a trap handler?

return std::min(MaxNumSGPRs - (TRAP_ENABLED ? TRAP_NUM_SGPRS : 0),
                AddressableNumSGPRs);
262–276 ↗(On Diff #87740)

Since getMaxNumVGPRs() was updated to use a calculation should this one too?

kzhuravl retitled this revision from [AMDGPU] Calculate number of available SGPRs/VGPRs for WavesPerEU instead of using switch statement to [AMDGPU] Calculate number of min/max SGPRs/VGPRs for WavesPerEU instead of using switch statement.Feb 8 2017, 9:12 PM
kzhuravl updated this revision to Diff 87763.Feb 8 2017, 10:46 PM
kzhuravl marked an inline comment as done.

Address review comments.

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
211–233 ↗(On Diff #87740)

I do not think so. getMinNumSGPRs is used to bump up granulated sgpr count in pgm rsrc1 register to match maximum number of requested waves per execution unit.

243 ↗(On Diff #87740)

It should. But it depends on trap handler subtarget feature from D26010, which has not landed yet.

tony-tye added inline comments.Feb 9 2017, 12:08 AM
lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
211–233 ↗(On Diff #87740)

Right, but the value required to achieve that is based on determining how many registers the kernel must use. That value is reduced if there is a trap handler as the hardware will automatically allocate registers for the trap handler. It would also seem better if the hardcoded numbers here were replaced by the calculation that produces them which can then use the already existing target specific constants. If future hardware changes those limits then the calculation will still be correct.

tony-tye added inline comments.Feb 9 2017, 12:38 AM
lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
211–233 ↗(On Diff #87740)

Sorry, missed that the function was updated with the calculation. So ignore that part: of the comment-)

kzhuravl updated this revision to Diff 87868.Feb 9 2017, 1:38 PM
tony-tye accepted this revision.Feb 9 2017, 1:42 PM

LGTM

This revision was automatically updated to reflect the committed changes.