Page MenuHomePhabricator

[TBLGEN] Inhibit generation of unneeded psets
ClosedPublic

Authored by rampitec on Feb 17 2020, 3:28 PM.

Diff Detail

Event Timeline

rampitec created this revision.Feb 17 2020, 3:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2020, 3:28 PM
arsenm accepted this revision.Feb 17 2020, 3:32 PM

How much does this help compile time?

This revision is now accepted and ready to land.Feb 17 2020, 3:32 PM

FYI, these all PSets generated for AMDGPU with this patch instead of 255 of them:

// Get the name of this register unit pressure set.
const char *AMDGPUGenRegisterInfo::
getRegPressureSetName(unsigned Idx) const {
  static const char *const PressureNameTable[] = {
    "SReg_32",
    "AGPR_32",
    "VGPR_32",
  };
  return PressureNameTable[Idx];
}

// Get the register unit pressure limit for this dimension.
// This limit must be adjusted dynamically for reserved registers.
unsigned AMDGPUGenRegisterInfo::
getRegPressureSetLimit(const MachineFunction &MF, unsigned Idx) const {
  static const uint16_t PressureLimitTable[] = {
    144,        // 0: SReg_32
    256,        // 1: AGPR_32
    256,        // 2: VGPR_32
  };
  return PressureLimitTable[Idx];
}

/// Table of pressure sets per register class or unit.
static const int RCSetsTable[] = {
  /* 0 */ 0, -1,
  /* 2 */ 1, -1,
  /* 4 */ 2, -1,
};

How much does this help compile time?

That I did not measure, but I guess it should, given much less walking pset iterators.

This revision was automatically updated to reflect the committed changes.

@rampitec Since you've been looking at register pressure sets. Do you have any suggestions on how to prevent VK16 and VK16WM from being merged on X86? Most our instructions that use these classes use the VK16WM class which contains one less register than VK16. All of the registers in both classes are allocatable. So tracking VK16WM pressure correctly is important to make sure we don't over count by 1 in situations where everything is constrained to VK16WM.

@rampitec Since you've been looking at register pressure sets. Do you have any suggestions on how to prevent VK16 and VK16WM from being merged on X86? Most our instructions that use these classes use the VK16WM class which contains one less register than VK16. All of the registers in both classes are allocatable. So tracking VK16WM pressure correctly is important to make sure we don't over count by 1 in situations where everything is constrained to VK16WM.

From what you are describing this is dynamic, i.e. depends on some enabled features and constrained RC. Maybe you can do something like in the SIRegisterInfo::getRegUnitPressureSets(): https://llvm.org/doxygen/SIRegisterInfo_8cpp_source.html#l01762
I had to exclude register M0 from pressure with this:

static const int Empty[] = { -1 };

if (hasRegUnit(AMDGPU::M0, RegUnit))
  return Empty;

Perhaps you can exclude that extra register conditionally.

We also have to override TRI::getRegPressureLimit() to return different limits depending on some compile time conditions.