TLDR: Currently, by the time CustomBehaviour and InstrPostProcess get involved, some of an instruction's attributes have been made const and cannot be modified. This patch is a proposal to inject a new method (InstrPostProcess::modifyInstrDesc()) after these specific attributes have been set, but before they have been made const.
CustomBehaviour and InstrPostProcess were designed for the purpose of giving developers the ability to modify an instruction's behaviour within mca without having to modify the tablegen. Because of where CustomBehaviour is injected, it is only capable of adding hazards that aren't detected automatically. It is not capable of removing hazards.
InstrPostProcess is injected earlier and gives developers a chance to make modifications to the instruction before the simulation even begins. However, some of an instruction's important attributes are stored within its InstrDesc object. After this InstrDesc object is created (but before InstrPostProcess gets a chance to make any modifications), the object is made const. This limits what InstrPostProcess is capable of modifying.
My original motivation for this patch came because I was working on a group of downstream targets where we wanted to set the RetireOOO flag for all of our targets, but we didn't want to have to modify the tablegen for every single one of those targets. At first, I was thinking that the simplest solution might be to just add a -retireooo command line argument to mca. However, when I was looking at where the RetireOOO flag was being saved (within InstrDesc) I realized that this const object contains some other attributes that developers may want to modify in the future.
Because of this, I felt that by making this patch instead of just adding the -retireooo flag, I'd be solving my problem while also making CustomBehaviour and InstrPostProcess more flexible / capable. If you disagree with the overall method and/or motivation of this patch, I can scrap it and instead just make a new proposal for a -retireooo flag. If you feel like this patch is appropriate, then I'd like some input on some issues that I'll discuss below.
(Skip reading this block of code for now.)
/// An instruction descriptor struct InstrDesc { SmallVector<WriteDescriptor, 2> Writes; // Implicit writes are at the end. SmallVector<ReadDescriptor, 4> Reads; // Implicit reads are at the end. // For every resource used by an instruction of this kind, this vector // reports the number of "consumed cycles". SmallVector<std::pair<uint64_t, ResourceUsage>, 4> Resources; // A bitmask of used hardware buffers. uint64_t UsedBuffers; // A bitmask of used processor resource units. uint64_t UsedProcResUnits; // A bitmask of implicit uses of processor resource units. uint64_t ImplicitlyUsedProcResUnits; // A bitmask of used processor resource groups. uint64_t UsedProcResGroups; unsigned MaxLatency; // Number of MicroOps for this instruction. unsigned NumMicroOps; // SchedClassID used to construct this InstrDesc. // This information is currently used by views to do fast queries on the // subtarget when computing the reciprocal throughput. unsigned SchedClassID; unsigned MayLoad : 1; unsigned MayStore : 1; unsigned HasSideEffects : 1; unsigned BeginGroup : 1; unsigned EndGroup : 1; unsigned RetireOOO : 1; // True if all buffered resources are in-order, and there is at least one // buffer which is a dispatch hazard (BufferSize = 0). unsigned MustIssueImmediately : 1; // A zero latency instruction doesn't consume any scheduler resources. bool isZeroLatency() const { return !MaxLatency && Resources.empty(); } InstrDesc() = default; InstrDesc(const InstrDesc &Other) = delete; InstrDesc &operator=(const InstrDesc &Other) = delete; };
The code block above is the InstrDesc definition. As mentioned earlier, this object is made const soon after its creation and so InstrPostProcess currently has no way to modify any of these attributes. It's reasonable to assume that if a developer wants to modify the mca-specific behaviour of an instruction, they'll want / need to modify some of these attributes.
One thing that I find potentially unsettling about allowing developers to modify this object is that the Writes and Reads vectors are built in a specific way. For example, in the Writes vector, the explicit defs come first, then the implicit defs, then optional defs, then variadic defs. I'm not positive whether or not this order is important during mca's simulations, but I did add a comment within the generic modifyInstrDesc definition that says to maintain that order.
Another issue with this patch is that there are no test cases associated with it. This is obviously not ideal and not best practice, but the purpose of the patch is for a downstream target and I'm not personally aware of any upstream targets that can immediately benefit from something in the InstrDesc object being modified for one of their instructions. If anybody knows of a test case that would make sense (or an instruction from an upstream target that could have one of these attributes modified to make it more correct within mca), I'd be happy to add it.
One of the more complex parts of this patch had to do with debug output. When mca is run with the -debug flag, some instruction information is printed while the InstrDesc objects are created. An example of this output is shown below:
Opcode Name= ADDPDrr SchedClassID=24 Opcode=408 Resource Mask=0x00000000000008, Reserved=0, #Units=1, cy=1 Buffer Mask=0x00000000000800 Used Units=0x00000000000008 Implicitly Used Units=0x00000000000000 Used Groups=0x00000000000e80 [Def] OpIdx=0, Latency=3, WriteResourceID=0 [Use] OpIdx=1, UseIndex=0 [Use] OpIdx=2, UseIndex=1 [Use][I] OpIdx=0, UseIndex=2, RegisterID=MXCSR MaxLatency=3 NumMicroOps=1
From the perspective of the user, this output all happens at once. However, within the code, this output is printed in slightly different places. For example, the instruction's defs are printed within the InstrBuilder::populateWrites() method while the instruction's resource information is printed within the initializeUsedResources() static function (also in InstrBuilder.cpp).
All of this information is printed before the InstrDesc has been fully created. The IPP::modifyInstrDesc method isn't injected until after the InstrDesc has been fully created, therefore, if modifyInstrDesc actually modifies an instruction, the printed information may no longer be accurate.
To account for this, modifyInstrDesc returns a bool which is true if and only if the instruction has been modified. If this method returns true, then we print a message stating that the instruction has been modified and then we re-print all of this information. Because re-printing this information would require the same LLVM_DEBUG code, I took this code and made functions for each of them that are then called from their respective places. Also, for these functions to be able to tell what type of def/use a particular write/read is, I had to add the IsVariadicDef and IsVariadicUse attributes to the WriteDescriptor and ReadDescriptor structs.
This above paragraph is something that I'm not sure I did properly (best practice wise). As far as I understand, the LLVM_DEBUG() blocks are code that will be omitted when building llvm in Release mode. I surrounded the bodies of these new functions with the LLVM_DEBUG blocks and I surrounded the calls to these new functions with the LLVM_DEBUG blocks, but I did not surround the actual definitions of these functions. Should I?
For example, should:
static void debugPrintExplicitWrite(const WriteDescriptor &Write) { assert(!Write.isImplicitWrite() && !Write.IsVariadicDef && "This function should only be called for non-variadic explicit " "writes."); // Negative integer for implicit writes LLVM_DEBUG({ dbgs() << "\t\t[Def] OpIdx=" << Write.OpIndex << ", Latency=" << Write.Latency << ", WriteResourceID=" << Write.SClassOrWriteResourceID << '\n'; }); }
be:
LLVM_DEBUG({ static void debugPrintExplicitWrite(const WriteDescriptor &Write) { assert(!Write.isImplicitWrite() && !Write.IsVariadicDef && "This function should only be called for non-variadic explicit " "writes."); // Negative integer for implicit writes dbgs() << "\t\t[Def] OpIdx=" << Write.OpIndex << ", Latency=" << Write.Latency << ", WriteResourceID=" << Write.SClassOrWriteResourceID << '\n'; } });
or something else entirely?
Here is an example of one of these instruction's being called:
Write.IsOptionalDef = false; Write.IsVariadicDef = false; LLVM_DEBUG(debugPrintExplicitWrite(Write)); CurrentDef++;
As always, thank you so much for your time and interest. I would appreciate any and all feedback / questions / suggestions / criticisms.
PS. While I was moving the debug output into their own functions, I added a line for printing the opcode number of the instruction (below the OpcodeName and SchedClassID). I often find myself needing to know the opcode of an instruction and it can be a bit of a hassle to find it out in other ways so this additional line seems useful, intuitive, and not too disruptive. If this is at all controversial, I can take it out of this patch and maybe make a post for it by itself to be discussed specifically. Let me know if you think it should be removed.
In the cases where Modified is true (and it's not variadic / variant), should we still cache InstrDesc by the opcode?
Or more specifically, can modifyInstrDesc create different changes for two separate MCInst-s that has the same opcode? (e.g. I'm only changing the MayStore flag in an ADD32 MCInst if it's preceded by a LD16, given the fact that IPP can have state)
Because if that's true, I think the Descriptors cache here will return a stale / incorrect InstrDesc.
I understand that in your motivating example ("...where we wanted to set the RetireOOO flag for all of our targets, but we didn't want to have to modify the tablegen for every single one of those targets...") you're probably making homogeneous changes across all MCInst-s that have the same opcode but I feel like there needs to be some clarifications on the model of modifyInstrDesc.