This is an archive of the discontinued LLVM Phabricator instance.

[TableGen][SubtargetEmitter] Add the ability for processor models to describe dependency breaking instructions.
ClosedPublic

Authored by andreadb on Sep 17 2018, 7:39 AM.

Details

Summary

This patch adds the ability for processor models to describe dependency breaking instructions.

Different processors may specify a different set of dependency-breaking instructions.
That means, we cannot assume that all processors of the same target would use the same rules to classify dependency breaking instructions.

The main goal of this patch is to provide the means to describe dependency breaking instructions directly via tablegen, and have the following TargetSubtargetInfo hooks redefined in overrides by tabegen'd XXXGenSubtargetInfo classes (here, XXX is a Target name).

virtual bool isZeroIdiom(const MachineInstr *MI, APInt &Mask) const {
  return false;
}

virtual bool isDependencyBreaking(const MachineInstr *MI, APInt &Mask) const {
  return isZeroIdiom(MI);
}

An instruction MI is a dependency-breaking instruction if a call to method isDependencyBreaking(MI) on the STI (TargetSubtargetInfo object) evaluates to true. Similarly, an instruction MI is a special case of zero-idiom dependency breaking instruction if a call to STI.isZeroIdiom(MI) returns true.
The extra APInt is used for those targets that may want to select which machine operands have their dependency broken (see comments in code).
Note that by default, subtargets don't know about the existence of dependency-breaking. In the absence of external information, those method calls would always return false.

A new tablegen class named STIPredicate has been added by this patch to let processor models classify instructions that have properties in common. The idea is that, a MCInstrPredicate definition can be used to "generate" an instruction equivalence class, with the idea that instructions of a same class all have a property in common.

STIPredicate definitions are essentially a collection of instruction equivalence classes.
Also, different processor models can specify a different variant of the same STIPredicate with different rules (i.e. predicates) to classify instructions. Tablegen backends (in this particular case, the SubtargetEmitter) will be able to process STIPredicate definitions, and automatically generate functions in XXXGenSubtargetInfo.

This patch introduces two special kind of STIPredicate classes named IsZeroIdiomFunction and IsDepBreakingFunction in tablegen. It also adds a definition for those in the BtVer2 scheduling model only.

The definition of zero-idioms in BtVer2 is quite big. For simplicity, in the example below I only reported GPR and AVX zero-idioms variants:

def : IsZeroIdiomFunction<[
  // GPR Zero-idioms.
  DepBreakingClass<[ SUB32rr, SUB64rr, XOR32rr, XOR64rr ], ZeroIdiomPredicate>,

  // AVX Zero-idioms.
  DepBreakingClass<[
    VPXORrr, VPANDNrr, VXORPSrr, VXORPDrr,
    VXORPSYrr, VXORPDYrr, VANDNPSrr, VANDNPDrr,
    VPSUBBrr, VPSUBDrr, VPSUBQrr, VPSUBWrr,
    VPCMPGTBrr, VPCMPGTDrr, VPCMPGTQrr, VPCMPGTWrr
  ], ZeroIdiomPredicate>
>;

This is what the SubtargetEmitter generates for those variants:

bool X86GenSubtargetInfo::isZeroIdiom(const MachineInstr *MI, APInt &Mask) const {
  unsigned ProcessorID = getSchedModel().getProcessorID();
  switch(MI->getOpcode()) {
  default:
    break;
  case X86::SUB32rr:
  case X86::SUB64rr:
  case X86::XOR32rr:
  case X86::XOR64rr:
  case X86::VPXORrr:
  case X86::VPANDNrr:
  case X86::VXORPSrr:
  case X86::VXORPDrr:
  case X86::VXORPSYrr:
  case X86::VXORPDYrr:
  case X86::VANDNPSrr:
  case X86::VANDNPDrr:
  case X86::VPSUBBrr:
  case X86::VPSUBDrr:
  case X86::VPSUBQrr:
  case X86::VPSUBWrr:
  case X86::VPCMPGTBrr:
  case X86::VPCMPGTDrr:
  case X86::VPCMPGTQrr:
  case X86::VPCMPGTWrr:
    if (ProcessorID == 4) {
      Mask.clearAllBits();
      return MI->getOperand(1).getReg() == MI->getOperand(2).getReg();
    }
    break;
  }

  return false;
} // X86GenSubtargetInfo::isZeroIdiom

Rules for different zero-idioms are discriminated based on the processor identifier which comes from the scheduling model.
Note that a similar definition can be generated for MCInst. This patch shows how to do it, and those extra definitions are currently expanded into X86MCInstrDesc.

This patch supersedes the one committed at r338372 for D49310.

The main advantages are:

  • we can describe subtarget predicates via tablegen using STIPredicates.
  • we can describe zero-idioms / dep-breaking instructions directly via tablegen in the scheduling models.

In future, the STIPredicates framework can be used for solving other problems. For example:

  • teach how to identify optimizable register-register moves
  • teach how to identify slow LEA instructions (each subtarget defining its own concept of "slow" LEA).
  • teach how to identify instructions that have undocumented false dependencies on the output registers on some processors only.
  • etc.

It is also (in my opinion) an elegant way to expose knowledge to both external tools like llvm-mca, and codegen passes.
For example, machine schedulers in LLVM could reuse that information when internally constructing the data dependency graph for a code region.

This new design feature is also an "opt-in" feature. Processor models don't have to use the new STIPredicates. It has all been designed to be as unintrusive as possible.

Please let me know what you think.

Thanks,
Andrea

Diff Detail

Repository
rL LLVM

Event Timeline

andreadb created this revision.Sep 17 2018, 7:39 AM
RKSimon added inline comments.Sep 17 2018, 8:11 AM
include/llvm/CodeGen/TargetSubtargetInfo.h
152 ↗(On Diff #165750)

relationship

include/llvm/MC/MCInstrAnalysis.h
94 ↗(On Diff #165750)

\param MI

114 ↗(On Diff #165750)

Please can you make it clear that Mask's bitwidth is total number of operands or just register operands?

Some minors

lib/Target/X86/X86ScheduleBtVer2.td
716 ↗(On Diff #165750)

VANDNPSYrr/VANDNPDYrr?

test/tools/llvm-mca/X86/BtVer2/zero-idioms-avx-256.s
68 ↗(On Diff #165750)

Shouldn't this only take a single resource cycle (0.5 rtp)? IIRC dep-breaking 256-bit ops only needs to process the upper half

tools/llvm-mca/include/Instruction.h
319 ↗(On Diff #165750)

Comments?

tools/llvm-mca/lib/Stages/DispatchStage.cpp
115 ↗(On Diff #165750)

Does this comment need updating to be zero-idiom specific?

utils/TableGen/CodeGenSchedule.cpp
412 ↗(On Diff #165750)

Pass by const ref?

utils/TableGen/CodeGenSchedule.h
311 ↗(On Diff #165750)

Why SmallVector here - the other added classes use std::vector

andreadb marked 8 inline comments as done.Sep 17 2018, 9:33 AM
andreadb added inline comments.
lib/Target/X86/X86ScheduleBtVer2.td
716 ↗(On Diff #165750)

Interestingly, those were missing in the original implementation of X86InstrAnalysis::isDependencyBreaking().

I have added them to the set.
I have also added two extra tests for VANDNPSYrr/VANDNPDYrr in test/tools/llvm-mca/BtVer2/zero-idioms-avx-256.s

test/tools/llvm-mca/X86/BtVer2/zero-idioms-avx-256.s
68 ↗(On Diff #165750)

Nice catch.

I think the latency/throughput of instructions should be fixed by a separate patch. This patch should only help to identify independent operands of an instruction.
I will add a TODO to this test.

utils/TableGen/CodeGenSchedule.h
311 ↗(On Diff #165750)

That vector is expected to be small. In the worst case scenario, it has exactly one element per processor model. It is initialized to 8, because most targets define less than 8 predicates. X86 is the target with most models (9).

If you want, I can use a vector here for consistency.

andreadb updated this revision to Diff 165783.Sep 17 2018, 9:56 AM
andreadb marked 2 inline comments as done.

Patch updated.

Addressed review comments.

Added a TODO comment to test zero-idioms-avx-256.s. Processor resource cycles consumed by VXORPSYrr and VXORPDYrr are wrongly set to 2cy (instead of 1cy).

I like this change. I have a few nits in the comments, but this looks good to me, caveat: I am not a tablegen expert.

include/llvm/Target/TargetInstrPredicate.td
245 ↗(On Diff #165750)

*instructions

266 ↗(On Diff #165750)

*remove one of the 'operands'

andreadb updated this revision to Diff 165944.Sep 18 2018, 5:32 AM
andreadb marked 2 inline comments as done.

Patch updated.

Addressed review comments.

andreadb updated this revision to Diff 165985.Sep 18 2018, 8:29 AM

Patch rebased.

Nice! I only have one comment on the schema, the rest is cosmetic.

include/llvm/Target/TargetInstrPredicate.td
292 ↗(On Diff #165985)

This feels like a double negation to me. What about "ExpandForMC" ? It would also make it more future-rpoof if we want to expand to more stuff.

tools/llvm-mca/lib/InstrBuilder.cpp
458 ↗(On Diff #165985)

This could use two temporary bool variables for readability.

utils/TableGen/CodeGenSchedule.cpp
257 ↗(On Diff #165985)

*processSTIPredicate

andreadb marked 5 inline comments as done.Sep 19 2018, 5:02 AM
andreadb added inline comments.
include/llvm/Target/TargetInstrPredicate.td
292 ↗(On Diff #165985)

Sure, I will change it. Thanks for the feedback!

306 ↗(On Diff #165985)

I will update this comment too.

tools/llvm-mca/lib/InstrBuilder.cpp
458 ↗(On Diff #165985)

I will change it.
I tried to expand that if-stmt a bit, adding extra code comments. Hopefully, it is more readable with code comments. Let me know if you prefer it done in a different way.

utils/TableGen/CodeGenSchedule.cpp
257 ↗(On Diff #165985)

I will fix it. Thanks.

andreadb updated this revision to Diff 166100.Sep 19 2018, 5:22 AM
andreadb marked 3 inline comments as done.

Address review comments.

Also:

  • Improved the description of class PredicateInfo.
  • Removed two unused fields from tablegen class STIPredicate.
  • Added extra verification checks to: a) explicitly disallow InstructionEquivalenceClass definitions with an empty set of opcodes; b) avoid that an instruction opcode is used by multiple equivalence classes of a same STIPredicate.
RKSimon accepted this revision.Sep 19 2018, 7:09 AM

LGTM - the vandps/vandpd btver2 zero-idiom cases need cleaning up but have confirmed with @andreadb offline that he'll do this as a follow-up

lib/Target/X86/X86ScheduleBtVer2.td
716 ↗(On Diff #165750)

Possibly sort all these instructions so its easier to find a specific one?

This revision is now accepted and ready to land.Sep 19 2018, 7:09 AM
This revision was automatically updated to reflect the committed changes.