Details
- Reviewers
arsenm nhaehnle - Commits
- rG8e760e1018d1: [TBLGEN] Inhibit generation of unneeded psets
Diff Detail
Event Timeline
FYI, these all PSets generated for AMDGPU with this patch instead of 255 of them:
// Get the name of this register unit pressure set. const char *AMDGPUGenRegisterInfo:: getRegPressureSetName(unsigned Idx) const { static const char *const PressureNameTable[] = { "SReg_32", "AGPR_32", "VGPR_32", }; return PressureNameTable[Idx]; } // Get the register unit pressure limit for this dimension. // This limit must be adjusted dynamically for reserved registers. unsigned AMDGPUGenRegisterInfo:: getRegPressureSetLimit(const MachineFunction &MF, unsigned Idx) const { static const uint16_t PressureLimitTable[] = { 144, // 0: SReg_32 256, // 1: AGPR_32 256, // 2: VGPR_32 }; return PressureLimitTable[Idx]; } /// Table of pressure sets per register class or unit. static const int RCSetsTable[] = { /* 0 */ 0, -1, /* 2 */ 1, -1, /* 4 */ 2, -1, };
That I did not measure, but I guess it should, given much less walking pset iterators.
@rampitec Since you've been looking at register pressure sets. Do you have any suggestions on how to prevent VK16 and VK16WM from being merged on X86? Most our instructions that use these classes use the VK16WM class which contains one less register than VK16. All of the registers in both classes are allocatable. So tracking VK16WM pressure correctly is important to make sure we don't over count by 1 in situations where everything is constrained to VK16WM.
From what you are describing this is dynamic, i.e. depends on some enabled features and constrained RC. Maybe you can do something like in the SIRegisterInfo::getRegUnitPressureSets(): https://llvm.org/doxygen/SIRegisterInfo_8cpp_source.html#l01762
I had to exclude register M0 from pressure with this:
static const int Empty[] = { -1 }; if (hasRegUnit(AMDGPU::M0, RegUnit)) return Empty;
Perhaps you can exclude that extra register conditionally.
We also have to override TRI::getRegPressureLimit() to return different limits depending on some compile time conditions.