This is an archive of the discontinued LLVM Phabricator instance.

[MCA] Proposing the InstrPostProcess:modifyInstrDesc() method.
AbandonedPublic

Authored by holland11 on Jan 16 2022, 9:07 PM.

Details

Summary

TLDR: Currently, by the time CustomBehaviour and InstrPostProcess get involved, some of an instruction's attributes have been made const and cannot be modified. This patch is a proposal to inject a new method (InstrPostProcess::modifyInstrDesc()) after these specific attributes have been set, but before they have been made const.

CustomBehaviour and InstrPostProcess were designed for the purpose of giving developers the ability to modify an instruction's behaviour within mca without having to modify the tablegen. Because of where CustomBehaviour is injected, it is only capable of adding hazards that aren't detected automatically. It is not capable of removing hazards.

InstrPostProcess is injected earlier and gives developers a chance to make modifications to the instruction before the simulation even begins. However, some of an instruction's important attributes are stored within its InstrDesc object. After this InstrDesc object is created (but before InstrPostProcess gets a chance to make any modifications), the object is made const. This limits what InstrPostProcess is capable of modifying.

My original motivation for this patch came because I was working on a group of downstream targets where we wanted to set the RetireOOO flag for all of our targets, but we didn't want to have to modify the tablegen for every single one of those targets. At first, I was thinking that the simplest solution might be to just add a -retireooo command line argument to mca. However, when I was looking at where the RetireOOO flag was being saved (within InstrDesc) I realized that this const object contains some other attributes that developers may want to modify in the future.

Because of this, I felt that by making this patch instead of just adding the -retireooo flag, I'd be solving my problem while also making CustomBehaviour and InstrPostProcess more flexible / capable. If you disagree with the overall method and/or motivation of this patch, I can scrap it and instead just make a new proposal for a -retireooo flag. If you feel like this patch is appropriate, then I'd like some input on some issues that I'll discuss below.

(Skip reading this block of code for now.)

/// An instruction descriptor
struct InstrDesc {
  SmallVector<WriteDescriptor, 2> Writes; // Implicit writes are at the end.
  SmallVector<ReadDescriptor, 4> Reads;   // Implicit reads are at the end.

  // For every resource used by an instruction of this kind, this vector
  // reports the number of "consumed cycles".
  SmallVector<std::pair<uint64_t, ResourceUsage>, 4> Resources;

  // A bitmask of used hardware buffers.
  uint64_t UsedBuffers;

  // A bitmask of used processor resource units.
  uint64_t UsedProcResUnits;

  // A bitmask of implicit uses of processor resource units.
  uint64_t ImplicitlyUsedProcResUnits;

  // A bitmask of used processor resource groups.
  uint64_t UsedProcResGroups;

  unsigned MaxLatency;
  // Number of MicroOps for this instruction.
  unsigned NumMicroOps;
  // SchedClassID used to construct this InstrDesc.
  // This information is currently used by views to do fast queries on the
  // subtarget when computing the reciprocal throughput.
  unsigned SchedClassID;

  unsigned MayLoad : 1;
  unsigned MayStore : 1;
  unsigned HasSideEffects : 1;
  unsigned BeginGroup : 1;
  unsigned EndGroup : 1;
  unsigned RetireOOO : 1;

  // True if all buffered resources are in-order, and there is at least one
  // buffer which is a dispatch hazard (BufferSize = 0).
  unsigned MustIssueImmediately : 1;

  // A zero latency instruction doesn't consume any scheduler resources.
  bool isZeroLatency() const { return !MaxLatency && Resources.empty(); }

  InstrDesc() = default;
  InstrDesc(const InstrDesc &Other) = delete;
  InstrDesc &operator=(const InstrDesc &Other) = delete;
};

The code block above is the InstrDesc definition. As mentioned earlier, this object is made const soon after its creation and so InstrPostProcess currently has no way to modify any of these attributes. It's reasonable to assume that if a developer wants to modify the mca-specific behaviour of an instruction, they'll want / need to modify some of these attributes.

One thing that I find potentially unsettling about allowing developers to modify this object is that the Writes and Reads vectors are built in a specific way. For example, in the Writes vector, the explicit defs come first, then the implicit defs, then optional defs, then variadic defs. I'm not positive whether or not this order is important during mca's simulations, but I did add a comment within the generic modifyInstrDesc definition that says to maintain that order.

Another issue with this patch is that there are no test cases associated with it. This is obviously not ideal and not best practice, but the purpose of the patch is for a downstream target and I'm not personally aware of any upstream targets that can immediately benefit from something in the InstrDesc object being modified for one of their instructions. If anybody knows of a test case that would make sense (or an instruction from an upstream target that could have one of these attributes modified to make it more correct within mca), I'd be happy to add it.

One of the more complex parts of this patch had to do with debug output. When mca is run with the -debug flag, some instruction information is printed while the InstrDesc objects are created. An example of this output is shown below:

		Opcode Name= ADDPDrr
		SchedClassID=24
		Opcode=408
		Resource Mask=0x00000000000008, Reserved=0, #Units=1, cy=1
		Buffer Mask=0x00000000000800
		 Used Units=0x00000000000008
		Implicitly Used Units=0x00000000000000
		Used Groups=0x00000000000e80
		[Def]    OpIdx=0, Latency=3, WriteResourceID=0
		[Use]    OpIdx=1, UseIndex=0
		[Use]    OpIdx=2, UseIndex=1
		[Use][I] OpIdx=0, UseIndex=2, RegisterID=MXCSR
		MaxLatency=3
		NumMicroOps=1

From the perspective of the user, this output all happens at once. However, within the code, this output is printed in slightly different places. For example, the instruction's defs are printed within the InstrBuilder::populateWrites() method while the instruction's resource information is printed within the initializeUsedResources() static function (also in InstrBuilder.cpp).

All of this information is printed before the InstrDesc has been fully created. The IPP::modifyInstrDesc method isn't injected until after the InstrDesc has been fully created, therefore, if modifyInstrDesc actually modifies an instruction, the printed information may no longer be accurate.

To account for this, modifyInstrDesc returns a bool which is true if and only if the instruction has been modified. If this method returns true, then we print a message stating that the instruction has been modified and then we re-print all of this information. Because re-printing this information would require the same LLVM_DEBUG code, I took this code and made functions for each of them that are then called from their respective places. Also, for these functions to be able to tell what type of def/use a particular write/read is, I had to add the IsVariadicDef and IsVariadicUse attributes to the WriteDescriptor and ReadDescriptor structs.

This above paragraph is something that I'm not sure I did properly (best practice wise). As far as I understand, the LLVM_DEBUG() blocks are code that will be omitted when building llvm in Release mode. I surrounded the bodies of these new functions with the LLVM_DEBUG blocks and I surrounded the calls to these new functions with the LLVM_DEBUG blocks, but I did not surround the actual definitions of these functions. Should I?

For example, should:

static void debugPrintExplicitWrite(const WriteDescriptor &Write) {
  assert(!Write.isImplicitWrite() && !Write.IsVariadicDef &&
         "This function should only be called for non-variadic explicit "
         "writes."); // Negative integer for implicit writes

  LLVM_DEBUG({
    dbgs() << "\t\t[Def]    OpIdx=" << Write.OpIndex
           << ", Latency=" << Write.Latency
           << ", WriteResourceID=" << Write.SClassOrWriteResourceID << '\n';
  });
}

be:

LLVM_DEBUG({
  static void debugPrintExplicitWrite(const WriteDescriptor &Write) {
    assert(!Write.isImplicitWrite() && !Write.IsVariadicDef &&
           "This function should only be called for non-variadic explicit "
           "writes."); // Negative integer for implicit writes

    dbgs() << "\t\t[Def]    OpIdx=" << Write.OpIndex
           << ", Latency=" << Write.Latency
           << ", WriteResourceID=" << Write.SClassOrWriteResourceID << '\n';
  }
});

or something else entirely?

Here is an example of one of these instruction's being called:

Write.IsOptionalDef = false;
Write.IsVariadicDef = false;

LLVM_DEBUG(debugPrintExplicitWrite(Write));

CurrentDef++;

As always, thank you so much for your time and interest. I would appreciate any and all feedback / questions / suggestions / criticisms.

PS. While I was moving the debug output into their own functions, I added a line for printing the opcode number of the instruction (below the OpcodeName and SchedClassID). I often find myself needing to know the opcode of an instruction and it can be a bit of a hassle to find it out in other ways so this additional line seems useful, intuitive, and not too disruptive. If this is at all controversial, I can take it out of this patch and maybe make a post for it by itself to be discussed specifically. Let me know if you think it should be removed.

Diff Detail

Event Timeline

holland11 created this revision.Jan 16 2022, 9:07 PM
holland11 requested review of this revision.Jan 16 2022, 9:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 16 2022, 9:07 PM
holland11 updated this revision to Diff 400431.Jan 16 2022, 9:10 PM
holland11 updated this revision to Diff 400433.Jan 16 2022, 9:12 PM

Fixed some typos in my comments.

myhsu added a subscriber: myhsu.Jan 16 2022, 9:46 PM

For example, should:

static void debugPrintExplicitWrite(const WriteDescriptor &Write) {

assert(!Write.isImplicitWrite() && !Write.IsVariadicDef &&
       "This function should only be called for non-variadic explicit "
       "writes."); // Negative integer for implicit writes

LLVM_DEBUG({
  dbgs() << "\t\t[Def]    OpIdx=" << Write.OpIndex
         << ", Latency=" << Write.Latency
         << ", WriteResourceID=" << Write.SClassOrWriteResourceID << '\n';
});

}
be:

LLVM_DEBUG({

static void debugPrintExplicitWrite(const WriteDescriptor &Write) {
  assert(!Write.isImplicitWrite() && !Write.IsVariadicDef &&
         "This function should only be called for non-variadic explicit "
         "writes."); // Negative integer for implicit writes

  dbgs() << "\t\t[Def]    OpIdx=" << Write.OpIndex
         << ", Latency=" << Write.Latency
         << ", WriteResourceID=" << Write.SClassOrWriteResourceID << '\n';
}

});
or something else entirely?

I think in this case, you can simply wrap the entire function with #ifndef NDEBUG / #endif.

myhsu added inline comments.Jan 16 2022, 10:05 PM
llvm/lib/MCA/InstrBuilder.cpp
736

In the cases where Modified is true (and it's not variadic / variant), should we still cache InstrDesc by the opcode?
Or more specifically, can modifyInstrDesc create different changes for two separate MCInst-s that has the same opcode? (e.g. I'm only changing the MayStore flag in an ADD32 MCInst if it's preceded by a LD16, given the fact that IPP can have state)
Because if that's true, I think the Descriptors cache here will return a stale / incorrect InstrDesc.

I understand that in your motivating example ("...where we wanted to set the RetireOOO flag for all of our targets, but we didn't want to have to modify the tablegen for every single one of those targets...") you're probably making homogeneous changes across all MCInst-s that have the same opcode but I feel like there needs to be some clarifications on the model of modifyInstrDesc.

@myhsu

In the cases where Modified is true (and it's not variadic / variant), should we still cache InstrDesc by the opcode?
Or more specifically, can modifyInstrDesc create different changes for two separate MCInst-s that has the same opcode? (e.g. I'm only changing the MayStore flag in an ADD32 MCInst if it's preceded by a LD16, given the fact that IPP can have state)
Because if that's true, I think the Descriptors cache here will return a stale / incorrect InstrDesc.

I understand that in your motivating example ("...where we wanted to set the RetireOOO flag for all of our targets, but we didn't want to have to modify the tablegen for every single one of those targets...") you're probably making homogeneous changes across all MCInst-s that have the same opcode but I feel like there needs to be some clarifications on the model of modifyInstrDesc.

Yeah I should have made that more clear, sorry about that. Each MCInst that shares the same opcode will also share the same InstrDesc object. So if you have two instructions that share the same opcode and you only want to modify one of them (and the modifications require modifications to the InstrDesc object) then this patch won't help with that. InstrPostProcess::postProcessInstruction() can modify specific instructions, however if the attributes you want to modify are part of the InstrDesc class then you're out of luck since that will be const (and shared) by then.

This is maybe something for me to look into in the future though as I do see the potential benefit.

One thing that I find potentially unsettling about allowing developers to modify this object is that the Writes and Reads vectors are built in a specific way. For example, in the Writes vector, the explicit defs come first, then the implicit defs, then optional defs, then variadic defs. I'm not positive whether or not this order is important during mca's simulations, but I did add a comment within the generic modifyInstrDesc definition that says to maintain that order.

Field OpIndex in ReadDescriptor/WriteDescriptor stores the MCOperand index from the original MCInst. Changing that value might break the logic in InstrBuilder that peeks into the original MCInst.

For implicit reads and writes, the register definition is already encoded in tablegen, and therefore InstrBuilder doesn't need to peek the actual register identifier from the MCOperand. The register value is already known statically, and its value is stored in field RegisterID (see for example ReadDescriptor::RegisterID). This effectively means that, at least for implicit reads/writes, we don't need to worry about whether OpIndex is a valid MCOperand index or not. That being said, some debug printing logic might still try to decode the original operand index when pretty-printing defs/uses.

Example: this is how we lower reads:

MCPhysReg RegID = 0;
for (const ReadDescriptor &RD : D.Reads) {
  if (!RD.isImplicitRead()) {
    // explicit read.
    const MCOperand &Op = MCI.getOperand(RD.OpIndex);
    // Skip non-register operands.
    if (!Op.isReg())
      continue;
    RegID = Op.getReg();
  } else {
    // Implicit read.
    RegID = RD.RegisterID;
  }

Changing the value of field ReadDescriptor::UseIndex has the potential of breaking the way how read-advance entries are resolved during the simulation. If possible, those indices should never be touched.

It would be better if we could be more specific about what can be post-processed, and possibly define a more restrictive API. Toggling some feature flags ON/OFF is generally safe. Changes to RetireOOO and MayRead/MayWrite don't have the potential of breaking the simulation.

Changing OpIndex and UseIndex are likely to break the lowering logic in InstrBuilder, and introduce bugs in the read-advance resolution logic which are very hard to triage.

Things get much more complicated (and potentially problematic) if we decide to let users tweak register defs/uses. However, it should be fairly simple to allow a few basic changes.

For example, users should be allowed to:

  1. Eliminate/disable specific register defs/uses.
  2. Add new IMPLICIT defs/uses.

About 1. We could introduce a special Ignored flag in the read/write descriptor. InstrBuilder would then skip operands whose read/write descriptor has been specifically marked as Ignored. This is just an idea; you can probably achieve this same behaviour in other ways.

About 2: Implicit defs/uses encode their modified/read register identifier in field RegisterID. Adding implicit reads/writes should be relatively easy because InstrBuilder doesn't require a valid OpIndex for those. For those new operands, OpIndex could reference a "special invalid index" (since it won't be used). For the purpose of ReadAdvance, UseIndex should default to an invalid operand index. I don't think that there is a good way to encode read-advance for custom new operands.

Things gets much more complicated if we allow adding "explicit" register defs/uses. I don't think that there is a good way to add support for those. In most cases, being able to add implicit defs/uses is already more than enough. There may be specific cases which would require support for adding explicit register uses. For example this one: https://github.com/llvm/llvm-project/issues/38161, which is about instructions that have an un-modelled false dependency on a register definition. To fix those cases, we may need to introduce a special "aliasing register use" which behaves like a normal explicit use, with the difference that the register index would be interpreted as an alias of an already defined read/write operand in mca::Instruction. This is just an idea. But I think it would be enough to fix that bug.

To account for this, modifyInstrDesc returns a bool which is true if and only if the instruction has been modified. If this method returns true, then we print a message stating that the instruction has been modified and then we re-print all of this information. Because re-printing this information would require the same LLVM_DEBUG code, I took this code and made functions for each of them that are then called from their respective places. Also, for these functions to be able to tell what type of def/use a particular write/read is, I had to add the IsVariadicDef and IsVariadicUse attributes to the WriteDescriptor and ReadDescriptor structs.

Sounds good.

This above paragraph is something that I'm not sure I did properly (best practice wise). As far as I understand, the LLVM_DEBUG() blocks are code that will be omitted when building llvm in Release mode. I surrounded the bodies of these new functions with the LLVM_DEBUG blocks and I surrounded the calls to these new functions with the LLVM_DEBUG blocks, but I did not surround the actual definitions of these functions. Should I?

I agree with Min on this. You should wrap definitions with NDEBUG if those are only meant to be used in debug.

holland11 added a comment.EditedJan 17 2022, 7:02 PM

Okay, that makes a lot of sense.

Since CustomBehaviour and InstrPostProcess::postProcessInstruction() are already capable of adding new dependencies, the main two purposes of this patch were to

  1. Be able to modify the flags contained within InstrDesc.
  2. Be able to remove dependencies.

I'll think about this for a bit then most likely modify the patch to allow for removing uses / defs (likely through an Ignored flag like you suggested) and to have a more restrictive API. I'll consider adding the ability to add implicit uses / defs (not explicit ones), but that isn't as much of a priority in my mind so if an elegant solution to it doesn't come to me then I probably won't add it.

Will also wrap the debug functions with NDEBUG.

Okay, that makes a lot of sense.

Since CustomBehaviour and InstrPostProcess::postProcessInstruction() are already capable of adding new dependencies, the main two purposes of this patch were to

  1. Be able to modify the flags contained within InstrDesc.
  2. Be able to remove dependencies.

I'll think about this for a bit then most likely modify the patch to allow for removing uses / defs (likely through an Ignored flag like you suggested) and to have a more restrictive API. I'll consider adding the ability to add implicit uses / defs (not explicit ones), but that isn't as much of a priority in my mind so if an elegant solution to it doesn't come to me then I probably won't add it.

Will also wrap the debug functions with NDEBUG.

Sounds good.

Thanks

Hey Patrick,

Are there any updates on this patch?

holland11 added a comment.EditedFeb 9 2022, 2:27 PM

Hey Patrick,

Are there any updates on this patch?

Yes, sorry about the delay. I've been meaning to come here to ask for your input on how to go about restricting the API.

bool Modified = IPP.modifyInstrDesc(*ID, MCI);

This is the current way that IPP gets access to modifying the InstrDesc. This isn't ideal because it gives IPP the ability to modify anything that they want.

As a reminder, here is the InstrDesc struct:

/// An instruction descriptor
struct InstrDesc {
  SmallVector<WriteDescriptor, 2> Writes; // Implicit writes are at the end.
  SmallVector<ReadDescriptor, 4> Reads;   // Implicit reads are at the end.

  // For every resource used by an instruction of this kind, this vector
  // reports the number of "consumed cycles".
  SmallVector<std::pair<uint64_t, ResourceUsage>, 4> Resources;

  // A bitmask of used hardware buffers.
  uint64_t UsedBuffers;

  // A bitmask of used processor resource units.
  uint64_t UsedProcResUnits;

  // A bitmask of implicit uses of processor resource units.
  uint64_t ImplicitlyUsedProcResUnits;

  // A bitmask of used processor resource groups.
  uint64_t UsedProcResGroups;

  unsigned MaxLatency;
  // Number of MicroOps for this instruction.
  unsigned NumMicroOps;
  // SchedClassID used to construct this InstrDesc.
  // This information is currently used by views to do fast queries on the
  // subtarget when computing the reciprocal throughput.
  unsigned SchedClassID;

  unsigned MayLoad : 1;
  unsigned MayStore : 1;
  unsigned HasSideEffects : 1;
  unsigned BeginGroup : 1;
  unsigned EndGroup : 1;
  unsigned RetireOOO : 1;

  // True if all buffered resources are in-order, and there is at least one
  // buffer which is a dispatch hazard (BufferSize = 0).
  unsigned MustIssueImmediately : 1;

  // A zero latency instruction doesn't consume any scheduler resources.
  bool isZeroLatency() const { return !MaxLatency && Resources.empty(); }

  InstrDesc() = default;
  InstrDesc(const InstrDesc &Other) = delete;
  InstrDesc &operator=(const InstrDesc &Other) = delete;
};

What I'd like to do is make it so that IPP has the ability to modify everything except the Writes and Reads vectors. Ideally, the WriteDescriptor and ReadDescriptor classes would have a new attribute (Ignored) that can be set to true by IPP, but that would be the only thing that IPP could do to those vectors.

I can think of several ways to achieve the above, but none of them are particularly elegant and so I've just been trying to think a little bit each day to see if I can come up with a better solution. Do you have any ideas?

Hey Patrick,

Are there any updates on this patch?

Yes, sorry about the delay. I've been meaning to come here to ask for your input on how to go about restricting the API.

bool Modified = IPP.modifyInstrDesc(*ID, MCI);

<snip>

What I'd like to do is make it so that IPP has the ability to modify everything except the Writes and Reads vectors. Ideally, the WriteDescriptor and ReadDescriptor classes would have a new attribute (Ignored) that can be set to true by IPP, but that would be the only thing that IPP could do to those vectors.

I can think of several ways to achieve the above, but none of them are particularly elegant and so I've just been trying to think a little bit each day to see if I can come up with a better solution. Do you have any ideas?

The way how I see it, is that method postProcessInstruction has been conveniently designed to allow for maximum flexibility. Targets are basically free to modify whatever aspect they want of the mca intermediate form. From a practical perspective, this is both great and scary; we are essentially exposing all the implementation details to the users. In terms of design, that is always a questionable choice.

Method postProcessInstruction is just a generic entrypoint for some custom/target specific logic. There is no structure and no emphasis to what steps are performed by the algorithm.
Introducing a more structured/restrictive API would require a more formal definition of what the post-processing stage does. I am afraid that there is no escape from this.

It essentially means that the sequence of steps performed by the post-processing logic must be defined in a target independent way. What I am suggesting is implementing some sort of policy base design, where the main algorithm essentially describes a workflow, but each step can be customised by Targets by overriding specific hooks.

Understanding how to structure the workflow is probably the hardest part. At the very least, it requires good knowledge of all the requirements derived from known use cases.

As far as I am aware of, these have always been the three major requirements:

  1. Add the ability to model special target specific hazards. This was solved by introducing custom MCAOperand operands to mca::Instruction, and adding a CustomBehaviour class.
  2. Add the ability to toggle some instruction flags ON/OFF. For example, some Targets may want to be able to set flag RetireOOO.
  3. Add the ability to disable individual read/write operands.

Assuming that my understanding is correct, this is how a possible API for the Target policy might look like:

virtual void populateCustomOperands(SmallVectorImpl<MCAOperand> &InVec, const MCInst &MCI) { }
virtual void toggleFlags(mca::InstructionFlags& Flags, const MCInst &MCI) { }
virtual bool shouldBeDisabled(mca::ReadState& Read, const MCInst &MCI) { return false; };
virtual bool shouldBeDisabled(mca::WriteState& Write, const MCInst &MCI) { return false; };

Here, every functionality implements an individual step in the post-processing algorithm. Targets that want to customise a specific step, only need to override the corresponding method(s) of the post-processing interface. For example: if a Targets wants to be able to toggle mca::Instruction flags, then it needs to override method toggleFlags; etc.

This is how a process might look like:

 postProcessInstruction(std::unique_ptr<Instruction> &Inst, const MCInst &MCI)
{

    // Step 1. Populate custom operands.
    // NOTE1: We need to convert the std::vector of MCAOperand into a SmallVector.
    // NOTE2: we need a getMCAOperands() to obtain a reference to the MCAOperand vector in mca::Instruction. 

    populateCustomOperands(Inst->getMCAOperands(), const MCInst &MCI);

   // Step 2. Add the ability to manipulate instruction flags.
   // NOTE: We need to move all the flags into their own struct. This example assumes the presence of 
   // a struct named `mca::InstructionFlags`.
   //
   // NOTE2: Changing flags at InstrDesc level might be problematic since it has the potential of changing the
   // semantic of a class of instruction (and not just those with a specific opcode).
   // Consider adding a copy of the flags in every mca::Instruction. The mca pipeline would then check the
   // copy in mca::InstructionBase, rather than the default values from InstrDesc.

    toggleFlags(Inst->getFlags(), MCI);
   
    // Step 3. Disable specific def/uses.
    for(auto &def : Inst->getDefs())
      if (shouldDisable(def, opcode))
        disable(def);
     
    for each(use : Inst->getUses())
      if (shouldDisable(use, opcode))
        disable(use)

PROS & CONS:

Introducing a more restrictive API essentially requires that we standardise the post-processing workflow. It is basically some sort of strategy pattern, where each Target provides its own policy for implementing individual steps.

This is obviously far less flexible that what we had before. However, it may be argued that previously, Targets had potentially way too much freedom.
For some proprietary targets this new design might not be good enough. That being said, for most of those targets it would be just a matter of adding yet-another step to the sequence performed by the post-processing stage.

I have also added some notes in my pseudo-code to highlight some issues/things that need to be addressed for this patch to work.
My biggest concern is about how flags are manipulated. We should probably considering copying flags into mca::InstructionBase. That way, we avoid manipulating InstrDesc directly. This would also require that we fix every direct reference to InstrDesc flags from the various mca pipeline stages. Hopefully this would be simple to fix in case.

In general, I can't say for sure that what I have suggested here is a good step forward. However, I think that it would be an improvement in general.

What do you think? Is this a too crazy idea? I hope not :)

-Andrea

Thank you for that detailed suggestion! This was somewhat similar to one of the potential solutions that I had in mind, but I was hesitant to work on it because of how much would need to be modified around the rest of the mca pipeline to get it to work. I may try implementing something similar, but a bit less disruptive. What I do may result in some of the (potentially scary) flexibility of IPP/CB remaining, but the ability to modify InstrDesc should be much more rigid. I'll see what you think once I've been able to implement and post it here for your review.

holland11 added a comment.EditedMar 11 2022, 7:01 PM

I really like your ideas and I played around with a few different ways to implement variations of them. To do it properly would likely be quite disruptive which isn't necessarily a bad thing, but it would require thorough testing to make sure there were no bugs or flaws in my logic. The internal testing that already exists would be very useful in making sure that I didn't break mca's existing behaviour, but those tests wouldn't help me verify that CustomBehaviour and InstrPostProcess would still be modifying things in the expected and correct way. To do that, I would need to write a bunch of my own tests where I implement modifications for a target with CB & IPP and then ensure that the new mca simulations behave as intended and expected.

Therefore, to be able to implement the changes that I'd like to in the way that I'd like to, I'd need to be able to test them with an instruction set that I am intimately familiar with. That instruction set would be the one that I worked with in the summer and will be working with again starting next month. Over the summer, I worked as an intern on Apple's GPU Compiler team and am starting back full-time with them in April. Since I am not currently employed by them, I do not have access to the instruction set and so I wouldn't feel confident enough in my ability to implement all of the desired changes without introducing some serious flaws or bugs.

I do still think that making the API more restricted and self-explanatory is a really good idea. I also think that giving developers a way to use the API to disable specific uses / defs for any given instruction could be really useful. These things should be done properly though and my current familiarity of open source instruction sets isn't strong enough for me to feel confident in verifying my work.

The main reason that I initially started working on all of this was because I wanted to make it easy to have the RetireOOO flag be set globally without having to modify tablegen. Your suggestion of copying the instruction flags from the InstrDesc object into the Instruction object would actually accomplish that (and do it in a minimally disruptive way) so I've decided to make that change. I hope and plan to come back to this in the not-too-distance future to make the restrictive API and disabling uses / defs changes, but for now, this is the change that I am most comfortable with and that accomplishes my most basic goal with this patch.

This updated patch is quite a bit different from the initial one posted with this diff so I think it's probably best to make a new diff for it and close this one.

Edit: Here is the new diff https://reviews.llvm.org/D121508

Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2022, 7:01 PM

I really like your ideas and I played around with a few different ways to implement variations of them. To do it properly would likely be quite disruptive which isn't necessarily a bad thing, but it would require thorough testing to make sure there were no bugs or flaws in my logic. The internal testing that already exists would be very useful in making sure that I didn't break mca's existing behaviour, but those tests wouldn't help me verify that CustomBehaviour and InstrPostProcess would still be modifying things in the expected and correct way. To do that, I would need to write a bunch of my own tests where I implement modifications for a target with CB & IPP and then ensure that the new mca simulations behave as intended and expected.

Therefore, to be able to implement the changes that I'd like to in the way that I'd like to, I'd need to be able to test them with an instruction set that I am intimately familiar with. That instruction set would be the one that I worked with in the summer and will be working with again starting next month. Over the summer, I worked as an intern on Apple's GPU Compiler team and am starting back full-time with them in April. Since I am not currently employed by them, I do not have access to the instruction set and so I wouldn't feel confident enough in my ability to implement all of the desired changes without introducing some serious flaws or bugs.

I do still think that making the API more restricted and self-explanatory is a really good idea. I also think that giving developers a way to use the API to disable specific uses / defs for any given instruction could be really useful. These things should be done properly though and my current familiarity of open source instruction sets isn't strong enough for me to feel confident in verifying my work.

The main reason that I initially started working on all of this was because I wanted to make it easy to have the RetireOOO flag be set globally without having to modify tablegen. Your suggestion of copying the instruction flags from the InstrDesc object into the Instruction object would actually accomplish that (and do it in a minimally disruptive way) so I've decided to make that change. I hope and plan to come back to this in the not-too-distance future to make the restrictive API and disabling uses / defs changes, but for now, this is the change that I am most comfortable with and that accomplishes my most basic goal with this patch.

This updated patch is quite a bit different from the initial one posted with this diff so I think it's probably best to make a new diff for it and close this one.

Edit: Here is the new diff https://reviews.llvm.org/D121508

Thanks Patrick. I understand your situation.

Feel free to come back to this patch whenever you feel more comfortable about it. There is no need to rush :-) .

Meanwhile, I had a look at the other patch, and it looks good to me (modulo a few nits).

As always, thanks for sending patches to mca!

I really appreciate your contributions.

-Andrea

holland11 abandoned this revision.Mar 12 2022, 11:33 AM

Sounds good. I'll refer back to this if/when I start working on it again. Thank you!