This is an archive of the discontinued LLVM Phabricator instance.

[X86][BtVer2] correctly model the latency/throughput of LEA instructions.
ClosedPublic

Authored by andreadb on Jul 17 2018, 10:37 AM.

Details

Summary

This patch fixes the latency/throughput of LEA instructions in the BtVer2 scheduling model.

On Jaguar, A 3-operands LEA has a latency of 2cy, and a reciprocal throughput of 1.
That is because it uses one cycle of SAGU followed by 1cy of ALU1.

An LEA with a "Scale" operand is also slow, and it has the same latency profile as the 3-operands LEA.
An LEA16r has a latency of 3cy, and a throughput of 0.5 (i.e. RThrouhgput of 2.0).

This patch adds a new TIIPredicate named IsThreeOperandsLEAFn to X86Schedule.td.
The tablegen backend (for instruction-info) expands that definition into this (file X86GenInstrInfo.inc):

static bool isThreeOperandsLEA(const MachineInstr &MI) {
  return (
    MI.getOperand(1).isReg()
    && MI.getOperand(1).getReg() != 0
    && MI.getOperand(3).isReg()
    && MI.getOperand(3).getReg() != 0
    && (
      (
        MI.getOperand(4).isImm()
        && MI.getOperand(4).getImm() != 0
      )
      || (MI.getOperand(4).isGlobal())
    )
  );
}

A similar method is generated in the X86_MC namespace, and included into X86MCTargetDesc.cpp (the declaration lives in X86MCTargetDesc.h).

Back to the BtVer2 scheduling model:
A new scheduling predicate named JSlowLEAPredicate now checks if either the instruction is a three-operands LEA, or it is an LEA with a Scale value different than 1.
A variant scheduling class uses that new predicate to correctly select the appropriate latency profile.

This patch is essentially structured in two parts:

  • A first part that adds a common MCPredicate to X86Schedule.td
  • A second part that uses the new predicate to construct a variant class for LEA in the BtVer2 model only.

To help the implementation of part 1, the predicate expanded gained the ability to check if a register operand references the invalid register. That check is needed by IsThreeOperandsLEAFn.

This new TII hook can now be used in other parts of LLVM.
For example, this patch uses it in X86FixupLEA.cpp. Note that the auto-generated isThreeOperandsLEA() is semantically equivalent to the old function in X86FixupLEA.cpp.

As a side note: if people prefer, this patch can be committed in two revisions. The first part would be an NFC, while the second part would contain the actual change to the BtVer2 model.

Please let me know what you think.

Thanks
-Andrea

Diff Detail

Repository
rL LLVM

Event Timeline

andreadb created this revision.Jul 17 2018, 10:37 AM
mattd added a comment.Jul 17 2018, 5:38 PM

I think it makes sense to have a separate patch for checkInvalidRegOperand. I do like that idea in particular, it simplifies reading the tablegen sources.

utils/TableGen/PredicateExpander.cpp
50 ↗(On Diff #155911)

I'm curious about the hardcoded value "0", is the invalid register always 0 for all targets?

craig.topper added inline comments.Jul 17 2018, 5:59 PM
utils/TableGen/PredicateExpander.cpp
50 ↗(On Diff #155911)

It should be. Every target should get NoRegister as the first entry in its enum.

I think it makes sense to have a separate patch for checkInvalidRegOperand. I do like that idea in particular, it simplifies reading the tablegen sources.

I will commit it directly since it is a very small/minor change.
I will upload a new patch soon.

andreadb updated this revision to Diff 156053.Jul 18 2018, 5:33 AM

Patch updated.

Tablegen changes related to the new "invalid register operand" checker are no longer part of this patch. Those changes have been committed at r337378.

Scheduling predicates that are common to all X86 subtargets have been moved to a separate .td file named X86SchedPredicates.td.
This chage is motivated by the fact that CheckOpcode predicates require that tablegen instruction definitions are already visible/accessible. That means, file X86SchedPredicates.td has to be included after the #include X86InstrInfo.td in X86.td.

The auto-generated definition of isThreeOperandsLEA() now looks like this:

static bool isThreeOperandsLEA(const MachineInstr &MI) {
  return (
    (
      MI.getOpcode() == X86::LEA32r
      || MI.getOpcode() == X86::LEA64r
      || MI.getOpcode() == X86::LEA64_32r
      || MI.getOpcode() == X86::LEA16r
    )
    && MI.getOperand(1).isReg()
    && MI.getOperand(1).getReg() != 0
    && MI.getOperand(3).isReg()
    && MI.getOperand(3).getReg() != 0
    && (
      (
        MI.getOperand(4).isImm()
        && MI.getOperand(4).getImm() != 0
      )
      || (MI.getOperand(4).isGlobal())
    )
  );
}
RKSimon added inline comments.Jul 19 2018, 4:12 AM
lib/Target/X86/X86FixupLEAs.cpp
289 ↗(On Diff #156053)

Add a TODO explaining that this should move to the scheduler model variants at some point?

lib/Target/X86/X86SchedPredicates.td
16 ↗(On Diff #156053)

Add description comments to each predicate.

andreadb updated this revision to Diff 156249.Jul 19 2018, 5:48 AM

Patch updated.

Addressed review comments.

RKSimon accepted this revision.Jul 19 2018, 9:11 AM

LGTM

This revision is now accepted and ready to land.Jul 19 2018, 9:11 AM
This revision was automatically updated to reflect the committed changes.