This patch fixes the latency/throughput of LEA instructions in the BtVer2 scheduling model.
On Jaguar, A 3-operands LEA has a latency of 2cy, and a reciprocal throughput of 1.
That is because it uses one cycle of SAGU followed by 1cy of ALU1.
An LEA with a "Scale" operand is also slow, and it has the same latency profile as the 3-operands LEA.
An LEA16r has a latency of 3cy, and a throughput of 0.5 (i.e. RThrouhgput of 2.0).
This patch adds a new TIIPredicate named IsThreeOperandsLEAFn to X86Schedule.td.
The tablegen backend (for instruction-info) expands that definition into this (file X86GenInstrInfo.inc):
static bool isThreeOperandsLEA(const MachineInstr &MI) { return ( MI.getOperand(1).isReg() && MI.getOperand(1).getReg() != 0 && MI.getOperand(3).isReg() && MI.getOperand(3).getReg() != 0 && ( ( MI.getOperand(4).isImm() && MI.getOperand(4).getImm() != 0 ) || (MI.getOperand(4).isGlobal()) ) ); }
A similar method is generated in the X86_MC namespace, and included into X86MCTargetDesc.cpp (the declaration lives in X86MCTargetDesc.h).
Back to the BtVer2 scheduling model:
A new scheduling predicate named JSlowLEAPredicate now checks if either the instruction is a three-operands LEA, or it is an LEA with a Scale value different than 1.
A variant scheduling class uses that new predicate to correctly select the appropriate latency profile.
This patch is essentially structured in two parts:
- A first part that adds a common MCPredicate to X86Schedule.td
- A second part that uses the new predicate to construct a variant class for LEA in the BtVer2 model only.
To help the implementation of part 1, the predicate expanded gained the ability to check if a register operand references the invalid register. That check is needed by IsThreeOperandsLEAFn.
This new TII hook can now be used in other parts of LLVM.
For example, this patch uses it in X86FixupLEA.cpp. Note that the auto-generated isThreeOperandsLEA() is semantically equivalent to the old function in X86FixupLEA.cpp.
As a side note: if people prefer, this patch can be committed in two revisions. The first part would be an NFC, while the second part would contain the actual change to the BtVer2 model.
Please let me know what you think.
Thanks
-Andrea
Add a TODO explaining that this should move to the scheduler model variants at some point?