This patch adds the ability for processor models to describe dependency breaking instructions.
Different processors may specify a different set of dependency-breaking instructions.
That means, we cannot assume that all processors of the same target would use the same rules to classify dependency breaking instructions.
The main goal of this patch is to provide the means to describe dependency breaking instructions directly via tablegen, and have the following TargetSubtargetInfo hooks redefined in overrides by tabegen'd XXXGenSubtargetInfo classes (here, XXX is a Target name).
virtual bool isZeroIdiom(const MachineInstr *MI, APInt &Mask) const { return false; } virtual bool isDependencyBreaking(const MachineInstr *MI, APInt &Mask) const { return isZeroIdiom(MI); }
An instruction MI is a dependency-breaking instruction if a call to method isDependencyBreaking(MI) on the STI (TargetSubtargetInfo object) evaluates to true. Similarly, an instruction MI is a special case of zero-idiom dependency breaking instruction if a call to STI.isZeroIdiom(MI) returns true.
The extra APInt is used for those targets that may want to select which machine operands have their dependency broken (see comments in code).
Note that by default, subtargets don't know about the existence of dependency-breaking. In the absence of external information, those method calls would always return false.
A new tablegen class named STIPredicate has been added by this patch to let processor models classify instructions that have properties in common. The idea is that, a MCInstrPredicate definition can be used to "generate" an instruction equivalence class, with the idea that instructions of a same class all have a property in common.
STIPredicate definitions are essentially a collection of instruction equivalence classes.
Also, different processor models can specify a different variant of the same STIPredicate with different rules (i.e. predicates) to classify instructions. Tablegen backends (in this particular case, the SubtargetEmitter) will be able to process STIPredicate definitions, and automatically generate functions in XXXGenSubtargetInfo.
This patch introduces two special kind of STIPredicate classes named IsZeroIdiomFunction and IsDepBreakingFunction in tablegen. It also adds a definition for those in the BtVer2 scheduling model only.
The definition of zero-idioms in BtVer2 is quite big. For simplicity, in the example below I only reported GPR and AVX zero-idioms variants:
def : IsZeroIdiomFunction<[ // GPR Zero-idioms. DepBreakingClass<[ SUB32rr, SUB64rr, XOR32rr, XOR64rr ], ZeroIdiomPredicate>, // AVX Zero-idioms. DepBreakingClass<[ VPXORrr, VPANDNrr, VXORPSrr, VXORPDrr, VXORPSYrr, VXORPDYrr, VANDNPSrr, VANDNPDrr, VPSUBBrr, VPSUBDrr, VPSUBQrr, VPSUBWrr, VPCMPGTBrr, VPCMPGTDrr, VPCMPGTQrr, VPCMPGTWrr ], ZeroIdiomPredicate> >;
This is what the SubtargetEmitter generates for those variants:
bool X86GenSubtargetInfo::isZeroIdiom(const MachineInstr *MI, APInt &Mask) const { unsigned ProcessorID = getSchedModel().getProcessorID(); switch(MI->getOpcode()) { default: break; case X86::SUB32rr: case X86::SUB64rr: case X86::XOR32rr: case X86::XOR64rr: case X86::VPXORrr: case X86::VPANDNrr: case X86::VXORPSrr: case X86::VXORPDrr: case X86::VXORPSYrr: case X86::VXORPDYrr: case X86::VANDNPSrr: case X86::VANDNPDrr: case X86::VPSUBBrr: case X86::VPSUBDrr: case X86::VPSUBQrr: case X86::VPSUBWrr: case X86::VPCMPGTBrr: case X86::VPCMPGTDrr: case X86::VPCMPGTQrr: case X86::VPCMPGTWrr: if (ProcessorID == 4) { Mask.clearAllBits(); return MI->getOperand(1).getReg() == MI->getOperand(2).getReg(); } break; } return false; } // X86GenSubtargetInfo::isZeroIdiom
Rules for different zero-idioms are discriminated based on the processor identifier which comes from the scheduling model.
Note that a similar definition can be generated for MCInst. This patch shows how to do it, and those extra definitions are currently expanded into X86MCInstrDesc.
This patch supersedes the one committed at r338372 for D49310.
The main advantages are:
- we can describe subtarget predicates via tablegen using STIPredicates.
- we can describe zero-idioms / dep-breaking instructions directly via tablegen in the scheduling models.
In future, the STIPredicates framework can be used for solving other problems. For example:
- teach how to identify optimizable register-register moves
- teach how to identify slow LEA instructions (each subtarget defining its own concept of "slow" LEA).
- teach how to identify instructions that have undocumented false dependencies on the output registers on some processors only.
- etc.
It is also (in my opinion) an elegant way to expose knowledge to both external tools like llvm-mca, and codegen passes.
For example, machine schedulers in LLVM could reuse that information when internally constructing the data dependency graph for a code region.
This new design feature is also an "opt-in" feature. Processor models don't have to use the new STIPredicates. It has all been designed to be as unintrusive as possible.
Please let me know what you think.
Thanks,
Andrea