Index: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp =================================================================== --- llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -8,7 +8,64 @@ /// \file /// This file implements the targeting of the RegisterBankInfo class for /// AMDGPU. -/// \todo This should be generated by TableGen. +/// +/// \par +/// +/// AMDGPU has unique register bank constraints that require special high level +/// strategies to deal with. There are two main true physical register banks +/// VGPR (vector), and SGPR (scalar). Additionally the VCC register bank is a +/// sort of pseudo-register bank needed to represent SGPRs used in a vector +/// boolean context. There is also the AGPR bank, which is a special purpose +/// physical register bank present on some subtargets. +/// +/// Copying from VGPR to SGPR is generally illegal, unless the value is known to +/// be uniform. It is generally not valid to legalize operands by inserting +/// copies as on other targets. Operations which require uniform, SGPR operands +/// generally require scalarization by repeatedly executing the instruction, +/// activating each set of lanes using a unique set of input values. This is +/// referred to as a waterfall loop. +/// +/// \par Booleans +/// +/// Booleans (s1 values) requires special consideration. A vector compare result +/// is naturally a bitmask with one bit per lane, in a 32 or 64-bit +/// register. These are represented with the VCC bank. During selection, we need +/// to be able to unambiguously go back from a register class to a register +/// bank. To distinguish whether an SGPR should use the SGPR or VCC register +/// bank, we need to know the use context type. An SGPR s1 value always means a +/// VCC bank value, otherwise it will be the SGPR bank. A scalar compare sets +/// SCC, which is a 1-bit unaddressable register. This will need to be copied to +/// a 32-bit virtual register. Taken together, this means we need to adjust the +/// type of boolean operations to be regbank legal. All SALU booleans need to be +/// widened to 32-bits, and all VALU booleans need to be s1 values. +/// +/// A noteworthy exception to the s1-means-vcc rule is for legalization artifact +/// casts. G_TRUNC s1 results, and G_SEXT/G_ZEXT/G_ANYEXT sources are never vcc +/// bank. A non-boolean source (such as a truncate from a 1-bit load from +/// memory) will require a copy to the VCC bank which will require clearing the +/// high bits and inserting a compare. +/// +/// \par Constant bus restriction +/// +/// VALU instructions have a limitation known as the constant bus +/// restriction. Most VALU instructions can use SGPR operands, but may read at +/// most 1 SGPR or constant literal value (this to 2 in gfx10 for most +/// instructions). This is one unique SGPR, so the same SGPR may be used for +/// multiple operands. From a register bank perspective, any combination of +/// operands should be legal as an SGPR, but this is contextually dependent on +/// the SGPR operands all being the same register. There is therefore optimal to +/// choose the SGPR with the most uses to minimize the number of copies. +/// +/// We avoid trying to solve this problem in RegBankSelect. Any VALU G_* +/// operation should have its source operands all mapped to VGPRs (except for +/// VCC), inserting copies from any SGPR operands. This the most trival legal +/// mapping. Anything beyond the simplest 1:1 instruction selection would be too +/// complicated to solve here. Every optimization pattern or instruction +/// selected to multiple outputs would have to enforce this rule, and there +/// would be additional complexity in tracking this rule for every G_* +/// operation. By forcing all inputs to VGPRs, it also simplifies the task of +/// picking the optimal operand combination from a post-isel optimization pass. +/// //===----------------------------------------------------------------------===// #include "AMDGPURegisterBankInfo.h"