This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/CommandGuide/
-
CommandGuide/
1
llvm-mca.rst
-
include/llvm/MCA/
-
llvm/
-
MCA/
-
Context.h
-
CustomBehaviour.h
-
HWEventListener.h
1
Instruction.h
-
Stages/
-
InOrderIssueStage.h
-
lib/MCA/
-
MCA/
-
CMakeLists.txt
-
Context.cpp
2
CustomBehaviour.cpp
2
InstrBuilder.cpp
-
Stages/
-
InOrderIssueStage.cpp
-
tools/llvm-mca/
-
llvm-mca/
-
CMakeLists.txt
-
Views/
1
DispatchStatistics.cpp
-
lib/
-
AMDGPU/
-
AMDGPUCustomBehaviour.h
2
AMDGPUCustomBehaviour.cpp
-
CMakeLists.txt
-
CMakeLists.txt
-
llvm-mca.cpp

Differential D104149

[MCA] Adding the CustomBehaviour class to llvm-mca
ClosedPublic

Authored by holland11 on Jun 11 2021, 1:23 PM.

Download Raw Diff

Details

Reviewers

arsenm
qcolombet
andreadb

Commits

rGef16c8eaa5cd: Reapply "[MCA] Adding the CustomBehaviour class to llvm-mca".
rGf7a23ecece52: [MCA] Adding the CustomBehaviour class to llvm-mca

Summary

TLDR:
Some instructions are not defined well enough within the target’s scheduling model for llvm-mca to be able to properly simulate its behaviour. The ideal solution to this situation is to modify the scheduling model, but that’s not always a viable strategy. Maybe other parts of the backend depend on that instruction being modelled the way that it is. Or maybe the instruction is quite complex and it’s difficult to fully capture its behaviour with tablegen. The CustomBehaviour class (which I will refer to as CB frequently) is designed to provide intuitive scaffolding for developers to implement the correct modelling for these instructions.

Some quick notes before I get into some of the implementation details:
I built this class specifically for a downstream architecture, but I tried to design it in a way that would be useful for all targets. I wanted to be able to submit these changes upstream so I found an instruction from the AMDGPU instruction set that is used quite frequently, but isn’t currently modelled properly (resulting in significantly inaccurate analysis generated by llvm-mca). My knowledge of AMDGPU is limited and there is also an issue with flags not being set correctly on many of the relevant AMDGPU instructions so my implementation isn’t perfect. However, it should still provide an example use case for CB and also an example of how someone could use the class. If any AMDGPU developers would like to improve my implementation, I encourage you to do so. And on that note, if the AMDGPU folks don’t want this at all, we can definitely remove it from the patch and stick to pushing CB without any upstream examples.

This patch only has CB integrated into the in-order pipeline, however it was designed to fit into both pipelines. I’m much more familiar with the in-order pipeline (and CB fits into it a bit more intuitively), but I plan on adding it to the out-of-order pipeline in the future. I would like some feedback on what would be the best way to do that, but for now I want to keep the scope of this patch as limited as possible so I will make another post soon asking for feedback and to open up a discussion towards integrating CB within the out-of-order pipeline.

Also, I'm Canadian so I hope the 'u' in behaviour doesn't bother anyone :).

Implementation details:

llvm-mca does its best to extract relevant register, resource, and memory information from every MCInst when lowering them to an mca::Instruction. It then uses this information to detect dependencies and simulate stalls within the pipeline. For some instructions, the information that gets captured within the mca::Instruction is not enough for mca to simulate them properly. In these cases, there are two main possibilities:

The instruction has a dependency that isn’t detected by mca.
mca is incorrectly enforcing a dependency that shouldn’t exist.

For the rest of this discussion, I will be focusing on (1), but I have put some thought into (2) and I may revisit it in the future.

So we have an instruction that has dependencies that aren’t picked up by mca. The basic idea for both pipelines in mca is that when an instruction wants to be dispatched, we first check for register hazards and then we check for resource hazards. This is where CB is injected. If no register or resource hazards have been detected, we make a call to CustomBehaviour::checkCustomHazard() to give the target specific CB the chance to detect and enforce any custom dependencies.

The return value for checkCustomHazaard() is an unsigned int representing the (minimum) number of cycles that the instruction needs to stall for. It’s fine to underestimate this value because when StallCycles gets down to 0, we’ll end up checking for all the hazards again before the instruction is actually dispatched. However, it’s important not to overestimate the value and the more accurate your estimate is, the more efficient mca’s execution can be.

In general, for checkCustomHazard() to be able to detect these custom dependencies, it needs information about the current instruction and also all of the instructions that are still executing within the pipeline. The mca pipeline uses mca::Instruction rather than MCInst and the current information encoded within each mca::Instruction isn’t sufficient for my use cases. I had to add a few extra attributes to the mca::Instruction class and have them get set by the MCInst during instruction building. For example, the current mca::Instruction doesn’t know its opcode, and it also doesn’t know anything about its immediate operands (both of which I had to add to the class).

With information about the current instruction, a list of all currently executing instructions, and some target specific objects (MCSubtargetInfo and MCInstrInfo which the base CB class has references to), developers should be able to detect and enforce most custom dependencies within checkCustomHazard. If you need more information than is present in the mca::Instruction, feel free to add attributes to that class and have them set during the lowering sequence from MCInst.

Fortunately, in the in-order pipeline, it’s very convenient for us to pass these arguments to checkCustomHazard. The hazard checking is taken care of within InOrderIssueStage::canExecute(). This function takes a const InstRef as a parameter (representing the instruction that currently wants to be dispatched) and the InOrderIssueStage class maintains a SmallVector<InstRef, 4> which holds all of the currently executing instructions. For the out-of-order pipeline, it’s a bit trickier to get the list of executing instructions and this is why I have held off on implementing it myself. This is the main topic I will bring up when I eventually make a post to discuss and ask for feedback.

CB is a base class where targets implement their own derived classes. If a target specific CB does not exist (or we pass in the -disable-cb flag), the base class is used. This base class trivially returns 0 from its checkCustomHazard() implementation (meaning that the current instruction needs to stall for 0 cycles aka no hazard is detected). For this reason, targets or users who choose not to use CB shouldn’t see any negative impacts to accuracy or performance (in comparison to pre-patch llvm-mca).

AMDGPU Example:

In AMDGPU, there is a family of s_waitcnt instructions. The purpose of these instructions is to force a stall until certain memory operations have been completed. For example, if we are loading a value to gpr5 and then using gpr5, one of the backend passes will detect this and place an s_waitcnt instruction before the use to ensure that gpr5 has been written to. These instructions interact with 4 different ‘counters’ (vscnt, expcnt, lgkmcnt, and vscnt) and s_waitcnt can force a wait on any number of them. Each memory instruction will also interact with 1 or more of these counters and (this is an oversimplification) the instruction will increment those counters when it begins executing, and then decrement the counters once it has written its result.

From a high level, when CB sees an s_waitcnt instruction, it will first decode its operand to determine what it’s waiting for. CB then needs to look at all of the currently executing instructions to determine if we need to stall or not. This is where things get a little tricky with respect to AMDGPU.

For each instruction within the pipeline, we need to determine if that instruction interacts with any of the counters (and which ones specifically). The backend pass that I mentioned earlier (that adds s_waitcnt instructions during codegen) has a function that does just that. However, this pass is dealing with ‘psuedo’ instructions, whereas in mca, we are dealing with ‘real’ instructions. This is relevant because many of the flags that the logic depends on (such as mayLoad, mayStore, and even some target specific flags) are not always shared between the ‘pseudo’ and ‘real’ instructions. This makes it really difficult to determine with certainty which instructions interact with which counters.

To combat this, I made some conservative assumptions (such as assuming mayLoad and mayStore will always be true). These assumption result in some instructions being flagged as interacting with more CNTs than they actually do. I do not make any assumptions about the target specific flags though and this results in certain instructions (such as ds_read and ds_write variants) not being detected at all. I highly suggest that if the AMDGPU folks would like to use llvm-mca, they look into having the flags properly applied to the ‘real’ instructions and then modifying the logic from this stage to be copied from the SIInsertWaitcnts::updateEventWaitcntAfter() function.

Another issue that I had to face was that mca (properly) enforces in-order writes on all instructions (even ones that don’t write anything such as s_waitcnt). This means that if there is a 300 cycle memory instruction currently executing, and then an s_waitcnt comes along (that isn’t meant for the 300 cycle instruction), we’ll still end up stalling for those ~300 cycles (because otherwise, the s_waitcnt instruction would finish before the memory instruction). However, mca uses a RetireOOO flag that can be set on scheduling classes within tablegen. For this reason, I modified the AMDGPU tablegen slightly to set that flag for the s_waitcnt variants. The flag is only used within mca, so (in theory) this shouldn’t have any side-effects. As mentioned earlier, if the AMDGPU folks do not want these changes, I’m happy to remove this example from the patch.

With all that background information out of the way, let’s take a look at a few examples.

ds_read_u8 v1, v0
s_waitcnt expcnt(0)
v_mul_lo_u32 v5, v3, v4
s_waitcnt lgkmcnt(0)

ds_read_u8 doesn’t interact with expcnt at all so the first s_waitcnt shouldn’t force a stall. The behaviour of the v_mul instruction is one that I’m not positive on though as far as in-order writes are concerned (in AMDGPU, should that first v_mul be allowed to complete before the ds_read?). Either way, the first wait should be dispatched and executed as soon as it’s seen.

Here’s the mca timeline BEFORE this patch:

Timeline view:
                    0123456789          
Index     0123456789          0123456789
[0,0]     DeeeeeeeeeeeeeeeeeeeE    .      ds_read_u8 v1, v0
[0,1]     .    .    .    .  DeE    .      s_waitcnt expcnt(0)
[0,2]     .    .    .    .   DeeeeeeeE    v_mul_lo_u32 v5, v3, v4
[0,3]     .    .    .    .    .    DeE    s_waitcnt lgkmcnt(0)

Notice how mca’s enforcement of the in-order writes causes the first s_waitcnt to stall.

Now here’s the mca timeline without CB, but with the RetireOOO flag set for all s_waitcnt instructions:

Timeline view:
                    0123456789 
Index     0123456789          0
[0,0]     DeeeeeeeeeeeeeeeeeeeE   ds_read_u8 v1, v0
[0,1]     .DeE .    .    .    .   s_waitcnt expcnt(0)
[0,2]     .    .    . DeeeeeeeE   v_mul_lo_u32 v5, v3, v4
[0,3]     .    .    .  DeE    .   s_waitcnt lgkmcnt(0)

And here’s with CB:

Timeline view:
                    0123456789   
Index     0123456789          012
[0,0]     DeeeeeeeeeeeeeeeeeeeE .   ds_read_u8 v1, v0
[0,1]     .DeE .    .    .    . .   s_waitcnt expcnt(0)
[0,2]     .    .    . DeeeeeeeE .   v_mul_lo_u32 v5, v3, v4
[0,3]     .    .    .    .    DeE   s_waitcnt lgkmcnt(0)

The first s_waitcnt correctly doesn’t get stalled. The second s_waitcnt correctly does get stalled.

Here’s a different example (the formatting breaks in this post due to the long sequence of characters so I use screenshots for the timelines):

s_load_dwordx4 s[0:3], s[4:5], 0x0
s_waitcnt vmcnt(0) lgkmcnt(0)
s_waitcnt_vscnt null, 0x0
global_load_dword v1, v0, s[0:1] glc dlc
s_waitcnt vmcnt(0)

Here is the timeline BEFORE this patch has been applied:
https://i.imgur.com/OZQtrE1.png

You’ll notice that the simulation is actually fairly accurate for this example, but it’s accurate for the wrong reasons. The s_waitcnt instructions aren’t waiting because they know they have to wait for the specific s_load_dword and global_load_dword instructions. They are waiting because mca is enforcing the in-order write-backs. While this may produce a reasonable output in this case, it’s not good in general.

Here is the timeline AFTER this patch has been applied, but with the -disable-cb command (so basically what it would look like to run the example through mca with s_waitcnt instructions having the RetireOOO flag):
https://i.imgur.com/s62cKYJ.png

You can see now that the s_waitcnt instructions are allowed to finish out-of-order, and without CB to enforce their behaviour, they don’t force a stall.

Now here’s the patch with CB enabled:
https://i.imgur.com/VMlG0DO.png

CB is able to detect that s_waitcnt vmcnt(0) lgkmcnt(0) needs to wait for s_load_dwordx4 to finish. And s_waitcnt vmcnt(0) needs to wait for global_load_dword to finish.

Overall, due to the flags not being reliable, if the AMDGPU developers are not interested in improving / correcting my implementation, it would probably be best to remove it from the patch. I really just wanted an upstream example to show how CB could be used, but I didn’t anticipate all of the complications that arose from this specific example. llvm-mca can be a really helpful tool for many different purposes, but especially for finding mistakes in the scheduling models. For example, I noticed that a sequence of ds_read instructions do not get simulated properly:

ds_read_u8 v1, v0
ds_read_u8 v2, v0 offset:1
ds_read_u8 v3, v0 offset:2
ds_read_u8 v4, v0 offset:3
ds_read_u8 v5, v0 offset:4

Timeline view:
                    0123456789          0123456789          0123456789          0123456789          0123456789 
Index     0123456789          0123456789          0123456789          0123456789          0123456789          0
[0,0]     DeeeeeeeeeeeeeeeeeeeE    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .   ds_read_u8 v1, v0
[0,1]     .    .    .    .    DeeeeeeeeeeeeeeeeeeeE    .    .    .    .    .    .    .    .    .    .    .    .   ds_read_u8 v2, v0 offset:1
[0,2]     .    .    .    .    .    .    .    .    DeeeeeeeeeeeeeeeeeeeE    .    .    .    .    .    .    .    .   ds_read_u8 v3, v0 offset:2
[0,3]     .    .    .    .    .    .    .    .    .    .    .    .    DeeeeeeeeeeeeeeeeeeeE    .    .    .    .   ds_read_u8 v4, v0 offset:3
[0,4]     .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    DeeeeeeeeeeeeeeeeeeeE   ds_read_u8 v5, v0 offset:4

As far as I understand, these instructions shouldn’t have to wait for each other. However, ds_read_u8 is modeled as having both a read and a write to the M0 register which mca detects as a register hazard. I looked in the ISA and I’m fairly confident that this isn’t accurate. If you start using mca, you will likely be able to find a lot of other discrepancies such as this one. And if you do want to start using the tool, the CB class can offer you the scaffolding to implement any instruction behaviour that you can’t / don’t want to model in the tablegen itself.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

holland11 created this revision.Jun 11 2021, 1:23 PM

Herald added subscribers: foad, kerbowa, jfb and 6 others. · View Herald TranscriptJun 11 2021, 1:23 PM

holland11 requested review of this revision.Jun 11 2021, 1:23 PM

Herald added a subscriber: wdng. · View Herald TranscriptJun 11 2021, 1:23 PM

Removed some unnecessary includes and fixed header formatting for a few files.

Harbormaster completed remote builds in B108892: Diff 351556.Jun 11 2021, 2:42 PM

Made some modifications to please clang-tidy.

Harbormaster completed remote builds in B108927: Diff 351610.Jun 11 2021, 6:28 PM

More clang-tidy warning fixes.

Harbormaster completed remote builds in B108936: Diff 351619.Jun 11 2021, 7:41 PM

RKSimon added a subscriber: RKSimon.Jun 12 2021, 12:16 AM

clang-format minor tweak.

Harbormaster completed remote builds in B108963: Diff 351647.Jun 12 2021, 3:43 AM

Thanks for the patch.

I like the idea of adding a CustomBehaviour class in llvm-mca to customise/post-process dependencies and model extra delays due to unsimulated structural hazards.

I had a quick look at the patch, and so far, I like the general idea/design. There are a couple of things which must be done differently, or that needs to be improved. But I think that hopefully all those points can be easily addressed.

Speaking about the AMDGPU specific changes. It may be worthy to split this patch into two patches at least.
Specifically, the scheduling model changes related to the RetireOOO flags (as well as the new tests) should be committed as a separate patch, and possibly reviewed by @foad (and/or whoever else is interested in supporting llvm-mca for AMDGPU).

Thanks,
Andrea

llvm/docs/CommandGuide/llvm-mca.rst
989–994	I think this paragraph should be more explicit on what is the goal of the CustomBehaviour class. The first two sentences should probably be merged together. The focus on those two sentences is the "lack of expressiveness" (of scheduling models and machine instructions). That lack of expressivenes is what eventually leads to poor simulations. Starting the entire paragraph with that first sentence is not great; in the absence of context, it sounds more like there are bugs in llvm-mca itself. I also noticed that you don't explain what is the main goal of this new CustomBehaviour class. You have done a much better job at introducing it in the summary of this bug. At this point, you should at least drop a hint about what is the most common goal for it: "it allows targets to customise data dependencies, and model delays which cannot be normally predicted just by simply evaluating register defs/uses". Something like that... Maybe you can be more accurate, but keep in mind that the description should not be too low level in this document.
llvm/include/llvm/MCA/Instruction.h
512	Can this vector be fully optional? I am thinking about using a simple std::vector here, rather than a SmallVector, which would force us to consume space for a default number of inline elements (8 in this case). For the (majority) of targets that don't require/implement a custom behaviour class, that field is literally wasted space. In general, I am already concerned about the `sizeof` mca::instruction. Those objects tend to be often too big, and this change has the potential of making them even bigger.
llvm/lib/MCA/InstrBuilder.cpp
619–621	Three questions: Is it necessary to add MCAOperands to every single instruction? Is it possible to do this as a post-processing step, if requested by the simulator? Can we limit the number of MCAOperand objects that we store in an mca::Instruction? Ideally, this logic should be split from the normal `InstrBuilder::createInstruction` and moved to a separate `InstrBuilder::postProcessInstruction` step. That post-processing step would then be performed only if the so-called "custom behaviour" is requested. In order to address points 1. and 3., it may be useful to implement a new "MCA Lowering Context" class, which knows about: a) which instructions require a list of MCAOperand, and b) which operands must be lowered to MCAOperand for the custom hazard check to work. The MCA lowering context would know about which instructions/operands require a custom post-processing step, and skip the post-processing step for those instructions that can be safely ignored. Essentially, have a context class that filters opcodes and rules out which operands are important to keep as MCAOperand, and which are not important for the custom behaviour class. To give you an idea: if we know that the custom behaviour class is only interested in the first operand of a specific load opcode, then we should only create MCAOperand objects for that load, and just for its first operand. In most cases (correct me if I am wrong), I suspect that we may not need to store MCAOperands for every single instruction out there. So, blindly lowering every MCOperand into MCAOperand - while it work in general - is a bit extreme in my opinion. Keep in mind that most targets won't implement any custom behaviour for their subtargets, and for those targets, translating to MCAOperand would be unnecessary, and it would simply waste space. Essentially, if a custom behaviour is not requested/implemented, then we should ignore this extra post-processing step entirely, and not populate the vector of MCAOperands (this is also why it may be worthy to make the new MCAOperand vector field a std::vector). I hope it makes sense.
llvm/tools/llvm-mca/Views/DispatchStatistics.cpp
80	Please just use something like "Uncategorised Structural Hazards stalls" (or something like that).
llvm/tools/llvm-mca/lib/AMDGPU/AMDGPUCustomBehaviour.cpp
44–45	I am not familiar with those pseudos. However, I would be careful about asserting here. One day, people might want to drive an mca pipeline from a backend pass, and even after regalloc there might still be MachineInstr pseudos around. Just merge these with the "default" case for now (and maybe add a small code comment).
48–52	I don't think it is a good idea to generate a warning here. Instead, just raise a bug for it once this patch is committed in main. For now, please convert it into a TODO comment.

andreadb added inline comments.Jun 12 2021, 6:37 AM

llvm/lib/MCA/InstrBuilder.cpp
619–621	I forgot to mention that, if ordering of operands is important for the custom behaviour check to work, and you are concerned about having gaps in the MCAOperand sequence, then you could use an extra field to store the `original operand index`.

@andreadb I appreciate all of the suggestions. They all make sense to me and I'll get started on implementing them.

Once I've made the changes, would it be better for me to update the diff within this post, or create a new post with the same title/body (but with a new diff)?

Your concerns about the MCAOperand list are completely reasonable. I'll get started on making the list optional and only storing the operands that the target's CB requests. What do you think would be the best way to make it 'optional' (with respect to the actual datatype of the MCAOperand list within the mca::Instruction class)? Would making it a std::vector that defaults to a size of 0 be smart or can you think of a better way?

What I'm thinking is that within the CustomBehaviour.h file, there will be another class InstrPostProcess (open to better names). This class will have a public method processInstruction() (modify may be a better word). The generic class's method won't do anything, but targets who implement their own CB will also implement their own InstrPostProcess where they can choose to override the processInstruction() method. After the MCInst is lowered to mca::Instruction, but before we add it to the LoweredSequence, we will call InstrPostProcess::processInstruction(). This method will take both the MCInst and mca::Instruction and will make any modifications / additions to the mca::Instruction.

This way, for targets that don't have a CB or don't need any modifications to the mca::Instruction (such as storing specific operands), the processInstruction() method can trivially return without doing anything. And for targets who do want to make modifications to the mca::Instruction, they can have a switch statement within the processInstruction() to only modify the desired instructions.

I mentioned in the original post that CB is currently only designed to handle adding dependencies, but not removing them. As a bonus with this design, we'd get an easy way to remove undesired dependencies (on top of the original intention to only store what we need).

Edit: I just did a quick, rough, and untested re-design for the MCAOperand issue if you want a more explicit look at what this plan looks like (don't feel any pressure to look at it though, I'm just sharing in case you're curious) https://pastebin.com/rL7w0WC4

In D104149#2815523, @holland11 wrote:

@andreadb I appreciate all of the suggestions. They all make sense to me and I'll get started on implementing them.

Once I've made the changes, would it be better for me to update the diff within this post, or create a new post with the same title/body (but with a new diff)?

You don't need to create a new review. You can always update the review description if you want, and keep everything as part of this same review.

Your concerns about the MCAOperand list are completely reasonable. I'll get started on making the list optional and only storing the operands that the target's CB requests. What do you think would be the best way to make it 'optional' (with respect to the actual datatype of the MCAOperand list within the mca::Instruction class)? Would making it a std::vector that defaults to a size of 0 be smart or can you think of a better way?

I would simply use a std::vector. That way, it would start empty (which is what we want most of the times).

What I'm thinking is that within the CustomBehaviour.h file, there will be another class InstrPostProcess (open to better names). This class will have a public method processInstruction() (modify may be a better word). The generic class's method won't do anything, but targets who implement their own CB will also implement their own InstrPostProcess where they can choose to override the processInstruction() method. After the MCInst is lowered to mca::Instruction, but before we add it to the LoweredSequence, we will call InstrPostProcess::processInstruction(). This method will take both the MCInst and mca::Instruction and will make any modifications / additions to the mca::Instruction.

Sound good.

This way, for targets that don't have a CB or don't need any modifications to the mca::Instruction (such as storing specific operands), the processInstruction() method can trivially return without doing anything. And for targets who do want to make modifications to the mca::Instruction, they can have a switch statement within the processInstruction() to only modify the desired instructions.

Awesome.

I mentioned in the original post that CB is currently only designed to handle adding dependencies, but not removing them. As a bonus with this design, we'd get an easy way to remove undesired dependencies (on top of the original intention to only store what we need).

Great.
However, let's just pretend for now that this is all about adding new dependencies.
We can re-evaluate the importance of this particular use case in future if we find that it is required by some code.

Edit: I just did a quick, rough, and untested re-design for the MCAOperand issue if you want a more explicit look at what this plan looks like (don't feel any pressure to look at it though, I'm just sharing in case you're curious) https://pastebin.com/rL7w0WC4

For what I can see, it looks good!

Only a couple of minor nits:

Protected fields of InstrPostProcess should be private (even if at the cost of adding const& getters).
Do we actually even need to add a derived class for x86? For X86 the base class seems already enough.

Thanks
-Andrea

Protected fields of InstrPostProcess should be private (even if at the cost of adding const& getters).

Does this hold true in general for objects that get stored in base classes, but used in derived classes? I also have the MCSubtargetInfo, SrcManager, and MCInstrInfo objects within the base CB as protected. Should I change them to private and then use getters for the derived classes to access them?

Do we actually even need to add a derived class for x86? For X86 the base class seems already enough.

We probably don't need it. I was originally going to make an empty derived class for every upstream target just so they'd have all their cmake stuff setup already for when/if people wanted to add to them. But then I just decided to keep two of them so that it would be easy for people to refer to how they are setup. Though, keeping the X86 one seems a bit misleading considering that CB doesn't even play a part in the out-of-order pipeline yet. Do you think I should remove it (the X86 derived class)?

However, let's just pretend for now that this is all about adding new dependencies.
We can re-evaluate the importance of this particular use case in future if we find that it is required by some code.

I'm really tempted to use this altered design to fix the ds_read issue that I described at the end of the original post. It would be a much cleaner example use case than the s_waitcnt 'fix' (although it wouldn't use the actual CB class at all so it wouldn't be an example of CB directly).

From a high level, all I need to do is remove the WriteState that refers to the M0 register. From a lower level, this WriteState exists in the mca::Instruction as part of SmallVector<WriteState, 2> Defs;. From looking at the way the WriteState objects interact with the RegisterFile (most notably, what happens in both pipelines when an instruction gets dispatched and needs to add its defs to the register file), I could either (1) remove the WriteState from the SmallVector, (2) set the RegisterID of the WriteState to 0, or (3) set the Latency of the WriteState to 0.

As far as I can tell, there is no way to remove an element from a SmallVector. I could maybe build a new SmallVector without the specific WriteState, but this seems less than ideal.

This would cause an assertion to fail within RegisterFile::addRegisterWrite. Would it be alright to change this to an if statement that exits the function instead? Edit Actually I just tried this, and there are other assertions that would need to be changed as well (one from RegisterFile::removeRegisterWrite and one from RegisterFile::onInstructionExecuted) so maybe this isn't a great option.

void RegisterFile::addRegisterWrite(WriteRef Write,
                                    MutableArrayRef<unsigned> UsedPhysRegs) {
  WriteState &WS = *Write.getWriteState();
  MCPhysReg RegID = WS.getRegisterID();
  assert(RegID && "Adding an invalid register definition?");

void RegisterFile::addRegisterWrite(WriteRef Write,
                                    MutableArrayRef<unsigned> UsedPhysRegs) {
  WriteState &WS = *Write.getWriteState();
  MCPhysReg RegID = WS.getRegisterID();
  if (!RegID)  return;

This is my least desired route since I don't fully understand the ramifications that it would have and doesn't seem as 'explicit' as far as removing a def from an instruction goes.

If none of these options are desirable, another option could be to have a new method InstrPostProcess::skipRegDef() that gets called from InstrBuilder::createInstruction():

// Initialize writes.
unsigned WriteIndex = 0;
for (const WriteDescriptor &WD : D.Writes) {
  RegID = WD.isImplicitWrite() ? WD.RegisterID
                               : MCI.getOperand(WD.OpIndex).getReg();
  // Check if this is a optional definition that references NoReg.
  if (WD.IsOptionalDef && !RegID) {
    ++WriteIndex;
    continue;
  }

  assert(RegID && "Expected a valid register ID!");
  NewIS->getDefs().emplace_back(WD, RegID,
                                /* ClearsSuperRegs */ WriteMask[WriteIndex],
                                /* WritesZero */ IsZeroIdiom);
  ++WriteIndex;
}

// Initialize writes.
unsigned WriteIndex = 0;
for (const WriteDescriptor &WD : D.Writes) {
  RegID = WD.isImplicitWrite() ? WD.RegisterID
                               : MCI.getOperand(WD.OpIndex).getReg();
  // Check if this is a optional definition that references NoReg.
  if ((WD.IsOptionalDef && !RegID) || IPP->skipRegDef(WD, MCI)) {
    ++WriteIndex;
    continue;
  }

  assert(RegID && "Expected a valid register ID!");
  NewIS->getDefs().emplace_back(WD, RegID,
                                /* ClearsSuperRegs */ WriteMask[WriteIndex],
                                /* WritesZero */ IsZeroIdiom);
  ++WriteIndex;
}

The more I think about it, the more I like the last option best. What do you think?

However, this pass is dealing with ‘psuedo’ instructions, whereas in mca, we are dealing with ‘real’ instructions. This is relevant because many of the flags that the logic depends on (such as mayLoad, mayStore, and even some target specific flags) are not always shared between the ‘pseudo’ and ‘real’ instructions. This makes it really difficult to determine with certainty which instructions interact with which counters.

In most cases the reason that some flags are not set accurately on Real instructions is just that there has been no need for them. Copying them across from the corresponding Pseudo instruction is almost a no-brainer. I have done some of this before in D99187 and am happy to do more.

However, ds_read_u8 is modeled as having both a read and a write to the M0 register

I'm surprised by this. I don't see anything in the tablegen files to say that DS instructions write to M0.

In D104149#2815836, @holland11 wrote:

Protected fields of InstrPostProcess should be private (even if at the cost of adding const& getters).

Does this hold true in general for objects that get stored in base classes, but used in derived classes? I also have the MCSubtargetInfo, SrcManager, and MCInstrInfo objects within the base CB as protected. Should I change them to private and then use getters for the derived classes to access them?

In my code, I tend to use private instead of protected whenever possible. This is just my coding style though. I don't think that LLVM imposes a particular coding style for it. Since mca has been mainly written by me, you'll find out that protected is almost never used in practice.

That being said, it is fine for you to use protected for your new class. It is your code, and -strictly speaking - there is nothing fundamentally wrong with it.

Do we actually even need to add a derived class for x86? For X86 the base class seems already enough.

We probably don't need it. I was originally going to make an empty derived class for every upstream target just so they'd have all their cmake stuff setup already for when/if people wanted to add to them. But then I just decided to keep two of them so that it would be easy for people to refer to how they are setup. Though, keeping the X86 one seems a bit misleading considering that CB doesn't even play a part in the out-of-order pipeline yet. Do you think I should remove it (the X86 derived class)?

Yes please.

In case, any X86 specific changes should go into a separate patch.

However, let's just pretend for now that this is all about adding new dependencies.
We can re-evaluate the importance of this particular use case in future if we find that it is required by some code.

I'm really tempted to use this altered design to fix the ds_read issue that I described at the end of the original post. It would be a much cleaner example use case than the s_waitcnt 'fix' (although it wouldn't use the actual CB class at all so it wouldn't be an example of CB directly).

From a high level, all I need to do is remove the WriteState that refers to the M0 register. From a lower level, this WriteState exists in the mca::Instruction as part of SmallVector<WriteState, 2> Defs;. From looking at the way the WriteState objects interact with the RegisterFile (most notably, what happens in both pipelines when an instruction gets dispatched and needs to add its defs to the register file), I could either (1) remove the WriteState from the SmallVector, (2) set the RegisterID of the WriteState to 0, or (3) set the Latency of the WriteState to 0.

I don't like the idea of using your new framework to workaround problems/bugs that are not in llvm-mca.

If you think that there is a genuine bug in their instruction definitions, then please raise a bug for it.

We should never "fix" non-mca issues within mca. This is true in general, and it doesn't only apply to mca. If there is a bug in someone else's code, we create a bug report for it, and optionally we send a patch for it. We don't pretend that the issue doesn't exist, and (worse) hack some workarounds for those bugs in our code.

So, I strongly recommend to raise bugs for all these problems you have encountered when testing the AMDGPU models.
If there is a reason why there is a write to M0, then we can consider implementing a solution based on your new class/post-processing step.
If instead it that was an oversight, then it will be fixed outside of mca without the need for a custom hack in mca.

As far as I can tell, there is no way to remove an element from a SmallVector. I could maybe build a new SmallVector without the specific WriteState, but this seems less than ideal.

This would cause an assertion to fail within RegisterFile::addRegisterWrite. Would it be alright to change this to an if statement that exits the function instead? Edit Actually I just tried this, and there are other assertions that would need to be changed as well (one from RegisterFile::removeRegisterWrite and one from RegisterFile::onInstructionExecuted) so maybe this isn't a great option.
void RegisterFile::addRegisterWrite(WriteRef Write,
                                    MutableArrayRef<unsigned> UsedPhysRegs) {
  WriteState &WS = *Write.getWriteState();
  MCPhysReg RegID = WS.getRegisterID();
  assert(RegID && "Adding an invalid register definition?");
To
void RegisterFile::addRegisterWrite(WriteRef Write,
                                    MutableArrayRef<unsigned> UsedPhysRegs) {
  WriteState &WS = *Write.getWriteState();
  MCPhysReg RegID = WS.getRegisterID();
  if (!RegID)  return;
This is my least desired route since I don't fully understand the ramifications that it would have and doesn't seem as 'explicit' as far as removing a def from an instruction goes.

If none of these options are desirable, another option could be to have a new method InstrPostProcess::skipRegDef() that gets called from InstrBuilder::createInstruction():
// Initialize writes.
unsigned WriteIndex = 0;
for (const WriteDescriptor &WD : D.Writes) {
  RegID = WD.isImplicitWrite() ? WD.RegisterID
                               : MCI.getOperand(WD.OpIndex).getReg();
  // Check if this is a optional definition that references NoReg.
  if (WD.IsOptionalDef && !RegID) {
    ++WriteIndex;
    continue;
  }

  assert(RegID && "Expected a valid register ID!");
  NewIS->getDefs().emplace_back(WD, RegID,
                                /* ClearsSuperRegs */ WriteMask[WriteIndex],
                                /* WritesZero */ IsZeroIdiom);
  ++WriteIndex;
}
To
// Initialize writes.
unsigned WriteIndex = 0;
for (const WriteDescriptor &WD : D.Writes) {
  RegID = WD.isImplicitWrite() ? WD.RegisterID
                               : MCI.getOperand(WD.OpIndex).getReg();
  // Check if this is a optional definition that references NoReg.
  if ((WD.IsOptionalDef && !RegID) || IPP->skipRegDef(WD, MCI)) {
    ++WriteIndex;
    continue;
  }

  assert(RegID && "Expected a valid register ID!");
  NewIS->getDefs().emplace_back(WD, RegID,
                                /* ClearsSuperRegs */ WriteMask[WriteIndex],
                                /* WritesZero */ IsZeroIdiom);
  ++WriteIndex;
}
The more I think about it, the more I like the last option best. What do you think?

Assuming that it always works, Option 2. seems like the right way to go. A Write to an invalid registers should always be ignored.
That being said, let's wait to see if the issue with the write to M0 can be fixed in a different way.
If not, then it is OK to add a post-processing step for it in mca. But that would go into a separate patch anyway.

-Andrea

Another issue that I had to face was that mca (properly) enforces in-order writes on all instructions (even ones that don’t write anything such as s_waitcnt). This means that if there is a 300 cycle memory instruction currently executing, and then an s_waitcnt comes along (that isn’t meant for the 300 cycle instruction), we’ll still end up stalling for those ~300 cycles (because otherwise, the s_waitcnt instruction would finish before the memory instruction). However, mca uses a RetireOOO flag that can be set on scheduling classes within tablegen. For this reason, I modified the AMDGPU tablegen slightly to set that flag for the s_waitcnt variants. The flag is only used within mca, so (in theory) this shouldn’t have any side-effects. As mentioned earlier, if the AMDGPU folks do not want these changes, I’m happy to remove this example from the patch.

The rules are quite complicated for which types of instruction are guaranteed to complete in order with respect to which other types of instruction. The whole point of s_waitcnt is to enforce that ordering in cases where it would otherwise be unspecified. I'm not sure that a single RetireOOO flag is enough to model that complexity. It is certainly true that in most cases a simple ALU instruction can complete while there are still outstanding memory instructions.

I don't like the idea of using your new framework to workaround problems/bugs that are not in llvm-mca.

If you think that there is a genuine bug in their instruction definitions, then please raise a bug for it.

We should never "fix" non-mca issues within mca. This is true in general, and it doesn't only apply to mca. If there is a bug in someone else's code, we create a bug report for it, and optionally we send a patch for it. We don't pretend that the issue doesn't exist, and (worse) hack some workarounds for those bugs in our code.

Yeah this makes sense. I suppose I'm just trying to find different use cases where this patch could be useful in the future. That way, I can make sure that the framework is already setup in a way that facilitates that use case rather than myself or someone else needing to tweak it to do what they want. I think there are legitimate cases where we'd want to use this pattern (removing a Def from an instruction) without wanting to touch the tablegen. But this probably isn't one of those cases so I should just be using it to make sure the design is reasonable rather than trying to submit the example upstream.

In the event that I do completely remove the AMDGPU example from the patch (and then submit bug reports for all the issues I found), should I still keep an empty AMDGPUCustomBehaviour class within the patch so there is at least one example implementation? That way, if anyone wants to implement their own CB for their target, they have an example to look at for setting it up (mainly with respect to the directory structure and cmake files).

In D104149#2817469, @holland11 wrote:

I don't like the idea of using your new framework to workaround problems/bugs that are not in llvm-mca.

If you think that there is a genuine bug in their instruction definitions, then please raise a bug for it.

We should never "fix" non-mca issues within mca. This is true in general, and it doesn't only apply to mca. If there is a bug in someone else's code, we create a bug report for it, and optionally we send a patch for it. We don't pretend that the issue doesn't exist, and (worse) hack some workarounds for those bugs in our code.

Yeah this makes sense. I suppose I'm just trying to find different use cases where this patch could be useful in the future. That way, I can make sure that the framework is already setup in a way that facilitates that use case rather than myself or someone else needing to tweak it to do what they want. I think there are legitimate cases where we'd want to use this pattern (removing a Def from an instruction) without wanting to touch the tablegen. But this probably isn't one of those cases so I should just be using it to make sure the design is reasonable rather than trying to submit the example upstream.

In the event that I do completely remove the AMDGPU example from the patch (and then submit bug reports for all the issues I found), should I still keep an empty AMDGPUCustomBehaviour class within the patch so there is at least one example implementation? That way, if anyone wants to implement their own CB for their target, they have an example to look at for setting it up (mainly with respect to the directory structure and cmake files).

As far as I understand, your CustomBehaviour plus the RetireOOO change does improve the AMDGPU simulation.
@foad, what do you think about it?

Basically, if the AMDGPU changes are a good improvement, then I still think that it makes sense to keep them as part of the patch.

About future developments:
I'd be happy to review a patch that checks for the presence of invalid writes. I have been thinking about it, and I believe that there is no harm in accepting a patch like that one. I also agree that there may be legitimate use cases for removing register definitions. So, I am fine with it. The only thing is: please, send it as a follow-up patch :)

The more I think about it, the more I believe that your framework is powerful enough to unblock a number of interesting use cases. For example, we could use it to annotate instructions based on external data generated - for example - by a profile. That way, we could refine latencies for loads; branch probabilities (if we decide to improve the simulator and enable different traces of executions); etc. These are just ideas...

Back to the patch: let see what @foad says about the new tests. If he is happy with the new numbers, then I have no further objections, and the patch can go in with the existing tests / changes to AMDGPU. If worse comes to worse, I am willing to accept your patch anyway (even if with no tests), because I think this framework is very useful in general.

-Andrea

I just spoke with Quentin, and we both agree with you. If it's alright with @foad, we'd like to push the current patch with the AMDGPU example (after I finish making the modifications Andrea suggested in his first comment). Afterwards, I will submit a bug report regarding the relevant flags not being transferred from pseudo -> real instructions. This will allow me to clean up the cnt detection logic and make it a lot more similar to the logic found in the si-insert-waitcnts pass (which will make the s_waitcnt modelling more accurate).

I'm surprised by this. I don't see anything in the tablegen files to say that DS instructions write to M0.

This is interesting. The MCInst for DS_READ_U8_gfx10 definitely has a Def for M0.

		Opcode Name= DS_READ_U8_gfx10
		SchedClassID=4
		Resource Mask=0x00000000000004, Reserved=0, #Units=1, cy=1
		Resource Mask=0x00000000000008, Reserved=0, #Units=1, cy=1
		Buffer Mask=0x00000000000004
		Buffer Mask=0x00000000000008
		 Used Units=0x0000000000000c
		Used Groups=0x00000000000000
		[Def]    OpIdx=0, Latency=20, WriteResourceID=0
		[Def][V] OpIdx=4, Latency=20, WriteResourceID=0
		[Use]    OpIdx=1, UseIndex=0
		[Use][V] OpIdx=4, UseIndex=3
		MaxLatency=20

The M0 def corresponds to [Def][V] OpIdx=4, Latency=20, WriteResourceID=0 where the [V] signifies that it's a 'variadic' operand (I'm not entirely sure what this means).

I can submit a separate bug report for this if you'd like. It's not really related to the CustomBehaviour class or the s_waitcnt instruction.

In D104149#2817752, @holland11 wrote:
I just spoke with Quentin, and we both agree with you. If it's alright with @foad, we'd like to push the current patch with the AMDGPU example (after I finish making the modifications Andrea suggested in his first comment). Afterwards, I will submit a bug report regarding the relevant flags not being transferred from pseudo -> real instructions. This will allow me to clean up the cnt detection logic and make it a lot more similar to the logic found in the si-insert-waitcnts pass (which will make the s_waitcnt modelling more accurate).

I'm surprised by this. I don't see anything in the tablegen files to say that DS instructions write to M0.

This is interesting. The MCInst for DS_READ_U8_gfx10 definitely has a Def for M0.
		Opcode Name= DS_READ_U8_gfx10
		SchedClassID=4
		Resource Mask=0x00000000000004, Reserved=0, #Units=1, cy=1
		Resource Mask=0x00000000000008, Reserved=0, #Units=1, cy=1
		Buffer Mask=0x00000000000004
		Buffer Mask=0x00000000000008
		 Used Units=0x0000000000000c
		Used Groups=0x00000000000000
		[Def]    OpIdx=0, Latency=20, WriteResourceID=0
		[Def][V] OpIdx=4, Latency=20, WriteResourceID=0
		[Use]    OpIdx=1, UseIndex=0
		[Use][V] OpIdx=4, UseIndex=3
		MaxLatency=20
The M0 def corresponds to [Def][V] OpIdx=4, Latency=20, WriteResourceID=0 where the [V] signifies that it's a 'variadic' operand (I'm not entirely sure what this means).

I can submit a separate bug report for this if you'd like. It's not really related to the CustomBehaviour class or the s_waitcnt instruction.

I think this is probably a bug in mca.

According to AMDGPUGenInstrInfo.inc, DS_READ_U8_gfx10 has four operands (of which, the first one is a definition).

This is the output from llvm-mc:

ds_read_u8 v1, v0                       ; <MCInst #6688 DS_READ_U8_gfx10
                                ;  <MCOperand Reg:468>
                                ;  <MCOperand Reg:467>
                                ;  <MCOperand Imm:0>
                                ;  <MCOperand Imm:0>
                                ;  <MCOperand Reg:314>>

That extra register operand at the end (i.e. Reg:314) is a variadic operand.

For some reasons, mca::InstrBuilder conservatively decides to assume that Reg:314 is both defined and used.
That's a bug. It should simply check if method MCInstrDesc::variadicOpsAreDefs() returns true. Otherwise, it should assume that variadic operands are not definitions.
I'll see if I can fix it tomorrow.

Made some changes suggested by Andrea.

Removed the X86CustomBehaviour class
Modified the MCAOperands to only be stored for specific instructions (and only specific operands from those instructions)
Added the InstrPostProcess class for modifying the mca::Instruction objects after they are created, but before they are added to the LoweredSequence
Modified the CustomBehaviour section of the docs
Changed some assert(RegID) lines into if (!RegID) lines that either exit the current function or continue the current loop (this will make it so that InstrPostProcess can remove instruction Defs by setting that Def's RegisterID to 0)
And some other small changes.

Harbormaster completed remote builds in B109214: Diff 352014.Jun 14 2021, 5:04 PM

clang-format fix.

Harbormaster completed remote builds in B109216: Diff 352021.Jun 14 2021, 6:07 PM

The patch looks good to me.
I have requested to commit the changes to the RegisterFile as a follow-up. I'd like to keep those changes separate from this initial patch (see my comment below).

Also, please wait for the feedback from @foad about the changes/improvements to the AMDGPU model.

P.s.: The issue with the extra definition of register M0 has been fixed by commit beb5213a2ee5: [MCA][InstrBuilder] Check for the presence of flag VariadicOpsAreDefs..

Thanks
-Andrea

llvm/lib/MCA/HardwareUnits/RegisterFile.cpp
325–328 ↗	(On Diff #352021)	In one of my previous messages, I've explicitly asked to send this particular change in a follow-up patch. If possible, I'd like to keep this change separate from the rest of the work on the CustomBehaviour class. It is fine if there are no tests for it. Also, you don't need to send another patch for review just for this extra register checks. You can simply commit it straight away in main as soon as the rest of this patch is in main too.

As far as I understand, your CustomBehaviour plus the RetireOOO change does improve the AMDGPU simulation.
@foad, what do you think about it?

Basically, if the AMDGPU changes are a good improvement, then I still think that it makes sense to keep them as part of the patch.

The AMDGPU improvements are nice (thanks!) but there is a fair bit of code to review in AMDGPUCustomBehaviour.cpp, so I would prefer that all the AMDGPU parts were split out into a separate patch for ease of review. In particular I don't like working around the lack of flags on Real DS instructions; I would much rather fix that properly.

But if you really want to commit this patch as-is and then review and fix the AMDGPU parts later then I won't block it.

foad mentioned this in D104293: [AMDGPU] Set more flags on Real instructions.Jun 15 2021, 5:21 AM

Made the AMDGPUCustomBehaviour class empty so that we can commit the implementation in a separate patch. Reverted the if (!RegID) changes back to their original asserts (so that they can be changed in a separate patch as well).

Forgot to revert the AMDGPU scheduling model changes in the previous diff. AMDGPU should be unchanged now and should have an empty AMDGPUCustomBehaviour implementation.

Rushed my previous updated diff and it had a mistake.

Okay, the AMDGPU code should be trivial now. The AMDGPUCB should be an empty implementation and the changes to the scheduling model should all have been reverted. I will submit a new patch later on once we get the flags sorted out.

I also reverted the RegisterFile changes (changed the ifs back to asserts). This should be ready to be committed if it still looks good to everyone.

I still don't have commit access, so could you do the honours again Andrea? I will request commit access after this patch and then hopefully be able to commit the small "assert -> if" changes within the RegisterFile as part of their own patch by myself.

Patrick Holland patrickeholland@gmail.com

Thanks a lot for all your help and feedback with this.

Made the AMDGPUCustomBehaviour class empty so that we can commit the implementation in a separate patch.

Thanks. Adding the empty AMDGPU class is fine, of course.

llvm/lib/MCA/CustomBehaviour.cpp
21	Could the first argument be `ArrayRef<InstRef> IssuedInst`?

Could the first argument be ArrayRef<InstRef> IssuedInst?

The place that this argument comes from (in the in-order pipeline) is

class InOrderIssueStage final : public Stage {
  ...
  /// Instructions that were issued, but not executed yet.
  SmallVector<InstRef, 4> IssuedInst;
  ...
  ...
};

I didn't setup that SmallVector and I don't know enough about how those data structures work to know if it would be smart to change it to an ArrayRef or have it get casted into an ArrayRef on its way into the checkCustomHazard function (if that's even a possible cast).

I would guess the answer to your question is no, but you probably have more familiarity with these data structures than I do so you probably know better.

I am happy to commit this patch for you once you address the minor comment from @foad about the SmallVector.

You can then send the other change to the RegisterFile for review (that would be straightforward), and the rest of the changes to AMDGPU.

Thanks for working on this!

P.s.: if you plan to do more work on llvm then it might be worthy to request commit access.

llvm/lib/MCA/CustomBehaviour.cpp
21	Yeah, it should be immutable.

This revision is now accepted and ready to land.Jun 15 2021, 11:45 AM

Harbormaster completed remote builds in B109336: Diff 352188.Jun 15 2021, 12:04 PM

Yeah, it should be immutable.

Does this mean that I should make that change? So I can pass a SmallVector into a function that expects an ArrayRef? (And this just makes it so that the SmallVector is now immutable?)

In D104149#2819949, @holland11 wrote:
Could the first argument be ArrayRef<InstRef> IssuedInst?

The place that this argument comes from (in the in-order pipeline) is
class InOrderIssueStage final : public Stage {
  ...
  /// Instructions that were issued, but not executed yet.
  SmallVector<InstRef, 4> IssuedInst;
  ...
  ...
};
I didn't setup that SmallVector and I don't know enough about how those data structures work to know if it would be smart to change it to an ArrayRef or have it get casted into an ArrayRef on its way into the checkCustomHazard function (if that's even a possible cast).

If method checkCustomHazard is not expected to add/remove and/or modify any elements in IssuedInst, then that SmallVector can simply be passed as an ArrayRef.
You don't need to worry about changing how method checkCustomHazard is called, because there is a very convenient constructor in ArrayRef which accepts a SmallVector (https://llvm.org/doxygen/ArrayRef_8h_source.html#l00090).
So, you simply change the signature of your method, and everything should work fine.

Note that by passing a SmallVector as an ArrayRef, you prevent functions from modifying its content. The internal array-like structure is effectively immutable (there are only getters).

There are variations to the concept of "array reference". For example, class MutableArrayRef lets you mutate elements (but not add/remove existing elements). So the array size would still be fixed.

For maximum flexibility, a SmallVector can always be passed around as a SmallVectorImpl. The main difference between SmallVector and SmallVectorImpl, is that SmallVectorImpl doesn't require that you specify the default (template) number of inline elements from the declaration.

I suggest to have a quick look at the docs when you have time. Some of those classes are used very often in llvm. ArrayRef in particular is a very very useful class.

I hope this makes sense.

That makes a lot of sense, thank you. Submitting new diff now.

Changed function signature for checkCustomHazard to take ArrayRef<InstRef> rather than const SmallVector<InstRef, 4> &.

This revision was landed with ongoing or failed builds.Jun 15 2021, 1:32 PM

Closed by commit rGf7a23ecece52: [MCA] Adding the CustomBehaviour class to llvm-mca (authored by holland11, committed by andreadb). · Explain Why

This revision was automatically updated to reflect the committed changes.

andreadb added a commit: rGf7a23ecece52: [MCA] Adding the CustomBehaviour class to llvm-mca.

I am seeing failures on buildbots that don't build the AMDGPU backend.
So I am going to revert it for now.

This is one of the errors I am seeing:

FAILED: bin/llvm-mca 
: && /usr/bin/g++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -fuse-ld=gold    -Wl,-O3 -Wl,--gc-sections tools/llvm-mca/CMakeFiles/llvm-mca.dir/llvm-mca.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/CodeRegion.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/CodeRegionGenerator.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/PipelinePrinter.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/BottleneckAnalysis.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/DispatchStatistics.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/InstructionInfoView.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/InstructionView.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/RegisterFileStatistics.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/ResourcePressureView.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/RetireControlUnitStatistics.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/SchedulerStatistics.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/SummaryView.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/TimelineView.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/View.cpp.o -o bin/llvm-mca  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libLLVMX86AsmParser.a  lib/libLLVMX86Desc.a  lib/libLLVMX86Disassembler.a  lib/libLLVMX86Info.a  lib/libLLVMMCA.a  lib/libLLVMMC.a  lib/libLLVMMCParser.a  lib/libLLVMSupport.a  -lpthread  lib/libLLVMMCDisassembler.a  lib/libLLVMMC.a  lib/libLLVMBinaryFormat.a  lib/libLLVMDebugInfoCodeView.a  lib/libLLVMSupport.a  -lrt  -ldl  -lpthread  -lm  /usr/lib/x86_64-linux-gnu/libz.so  lib/libLLVMDemangle.a && :
tools/llvm-mca/CMakeFiles/llvm-mca.dir/llvm-mca.cpp.o:llvm-mca.cpp:function createCustomBehaviour(llvm::Triple const&, llvm::MCSubtargetInfo const&, llvm::mca::SourceMgr const&, llvm::MCInstrInfo const&) [clone .localalias]: error: undefined reference to 'llvm::mca::AMDGPUCustomBehaviour::AMDGPUCustomBehaviour(llvm::MCSubtargetInfo const&, llvm::mca::SourceMgr const&, llvm::MCInstrInfo const&)'
collect2: error: ld returned 1 exit status

FAILED: bin/llvm-mca 
: && /usr/local/bin/c++  -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -Wl,-rpath-link,/home/tcwg-buildslave/worker/clang-cmake-armv7-quick/stage1/./lib  -Wl,-O3 -Wl,--gc-sections tools/llvm-mca/CMakeFiles/llvm-mca.dir/llvm-mca.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/CodeRegion.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/CodeRegionGenerator.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/PipelinePrinter.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/BottleneckAnalysis.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/DispatchStatistics.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/InstructionInfoView.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/InstructionView.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/RegisterFileStatistics.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/ResourcePressureView.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/RetireControlUnitStatistics.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/SchedulerStatistics.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/SummaryView.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/TimelineView.cpp.o tools/llvm-mca/CMakeFiles/llvm-mca.dir/Views/View.cpp.o  -o bin/llvm-mca  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libLLVMARMAsmParser.a  lib/libLLVMAArch64AsmParser.a  lib/libLLVMARMDesc.a  lib/libLLVMAArch64Desc.a  lib/libLLVMARMDisassembler.a  lib/libLLVMAArch64Disassembler.a  lib/libLLVMARMInfo.a  lib/libLLVMAArch64Info.a  lib/libLLVMMCA.a  lib/libLLVMMC.a  lib/libLLVMMCParser.a  lib/libLLVMSupport.a  -lpthread  lib/libLLVMARMDesc.a  lib/libLLVMARMInfo.a  lib/libLLVMARMUtils.a  lib/libLLVMAArch64Desc.a  lib/libLLVMAArch64Info.a  lib/libLLVMAArch64Utils.a  lib/libLLVMMCDisassembler.a  lib/libLLVMMC.a  lib/libLLVMBinaryFormat.a  lib/libLLVMDebugInfoCodeView.a  lib/libLLVMSupport.a  -lrt  -ldl  -lpthread  -lm  /usr/lib/arm-linux-gnueabihf/libz.so  /usr/lib/arm-linux-gnueabihf/libtinfo.so  lib/libLLVMDemangle.a && :
tools/llvm-mca/CMakeFiles/llvm-mca.dir/llvm-mca.cpp.o: In function `main':
llvm-mca.cpp:(.text.main+0x2a40): undefined reference to `llvm::mca::AMDGPUCustomBehaviour::AMDGPUCustomBehaviour(llvm::MCSubtargetInfo const&, llvm::mca::SourceMgr const&, llvm::MCInstrInfo const&)'
clang-12: error: linker command failed with exit code 1 (use -v to see invocation)

andreadb added a reverting change: rGa04f01bab2da: Revert "[MCA] Adding the CustomBehaviour class to llvm-mca".Jun 15 2021, 1:42 PM

The problem is that llvm-mca.cpp includes "lib/AMDGPU/AMDGPUCustomBehaviour.h".
However, according to the new CMakeLists.txt, that directory is not build if this check fails:

if (LLVM_TARGETS_TO_BUILD MATCHES "AMDGPU")```.

So that explains why we are only seeing failures on buildbots that don't build the AMDGPU target.

I am afraid that you need a different mechanism for getting target specific info...

holland11 added a comment.Jun 15 2021, 1:54 PM

This comment was removed by holland11.

Good call. I'll work something out and then update the diff once I've tested it properly.

Quentin came up with a solution to the build problem and I implemented it. We define macros within /llvm-mca/lib/CMakeLists.txt based on which targets are being built. Then we use those macros within llvm-mca.cpp to guard the includes and references to target specific CBs.

Tested by building for all targets and by building for only x86.

Should be good for another commit attempt if you're alright with that change.

Harbormaster completed remote builds in B109400: Diff 352274.Jun 15 2021, 8:33 PM

Restructured the way the target macros were being set in the cmake files so that they are properly scoped.

foad mentioned this in rG6f778fed8e50: [AMDGPU] Set more flags on Real instructions.Jun 16 2021, 2:00 AM

Thanks!
It must have been pretty annoying to have to address that issue. I am glad that you were able to find a workaround for it.

Going back to the AMDGPU changes (sorry if I reopen the discussion).
I noticed that @foad has committed a sequence of patches aiming at improving llvm-mca simulation (thanks @foad !).
Do you know if in future we will still need a CustomBehaviour class for AMDGPU?

In general, it think that it is valuable to have all your initial patch in tree.
People that want to add CustomBehaviour classes for their own targets would encounter the same problems that you had to solve with those macro definitions. Having an example that shows how to address that issue is very useful in general.

Going back to the AMDGPU changes (sorry if I reopen the discussion).
I noticed that @foad has committed a sequence of patches aiming at improving llvm-mca simulation (thanks @foad !).
Do you know if in future we will still need a CustomBehaviour class for AMDGPU?

I think we will still need the class. My patches only remove the need for AMDGPUCustomBehaviour::manuallySetWaitCntInfo with its long list of DS instruction opcodes.

In D104149#2821977, @foad wrote:

Going back to the AMDGPU changes (sorry if I reopen the discussion).
I noticed that @foad has committed a sequence of patches aiming at improving llvm-mca simulation (thanks @foad !).
Do you know if in future we will still need a CustomBehaviour class for AMDGPU?

I think we will still need the class. My patches only remove the need for AMDGPUCustomBehaviour::manuallySetWaitCntInfo with its long list of DS instruction opcodes.

Thanks for clarifying it.

In which case, the patch still LGTM.
Hopefully this time I won't need to revert it :-p

Cheers
-Andrea

This revision was landed with ongoing or failed builds.Jun 16 2021, 8:57 AM

andreadb added a commit: rGef16c8eaa5cd: Reapply "[MCA] Adding the CustomBehaviour class to llvm-mca"..

Harbormaster completed remote builds in B109475: Diff 352374.Jun 16 2021, 10:27 AM

holland11 mentioned this in D104730: [MCA] [AMDGPU] Adding CustomBehaviour implementation for AMDGPU..Jun 22 2021, 11:20 AM

holland11 mentioned this in D106775: [MCA] Moving the target specific CustomBehaviour impl. from /tools/llvm-mca/ to /lib/Target/..Jul 25 2021, 2:43 PM

Revision Contents

Path

Size

llvm/

docs/

CommandGuide/

llvm-mca.rst

34 lines

include/

llvm/

MCA/

7 lines

86 lines

1 line

132 lines

Stages/

InOrderIssueStage.h

13 lines

lib/

MCA/

1 line

10 lines

26 lines

3 lines

Stages/

InOrderIssueStage.cpp

14 lines

tools/

llvm-mca/

CMakeLists.txt

8 lines

Views/

DispatchStatistics.cpp

2 lines

lib/

AMDGPU/

AMDGPUCustomBehaviour.h

57 lines

AMDGPUCustomBehaviour.cpp

33 lines

CMakeLists.txt

17 lines

CMakeLists.txt

11 lines

llvm-mca.cpp

58 lines

Diff 352374

llvm/docs/CommandGuide/llvm-mca.rst

Show First 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	.. option:: -bottleneck-analysis
in the summary view. Bottleneck analysis is currently not supported for		in the summary view. Bottleneck analysis is currently not supported for
processors with an in-order backend.		processors with an in-order backend.

.. option:: -json		.. option:: -json

Print the requested views in JSON format. The instructions and the processor		Print the requested views in JSON format. The instructions and the processor
resources are printed as members of special top level JSON objects. The		resources are printed as members of special top level JSON objects. The
individual views refer to them by index.		individual views refer to them by index.

		.. option:: -disable-cb

		Force usage of the generic CustomBehaviour class rather than using the target
		specific class. The generic class never detects any custom hazards.


EXIT STATUS		EXIT STATUS
-----------		-----------

:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed		:program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed
to standard error, and the tool returns 1.		to standard error, and the tool returns 1.

USING MARKERS TO ANALYZE SPECIFIC CODE BLOCKS		USING MARKERS TO ANALYZE SPECIFIC CODE BLOCKS
▲ Show 20 Lines • Show All 749 Lines • ▼ Show 20 Lines
soon as their operand registers are available and resource requirements are		soon as their operand registers are available and resource requirements are
met. Multiple instructions can be issued in one cycle according to the value of		met. Multiple instructions can be issued in one cycle according to the value of
the ``IssueWidth`` parameter in LLVM's scheduling model.		the ``IssueWidth`` parameter in LLVM's scheduling model.

Once issued, an instruction is moved to ``IssuedInst`` set until it is ready to		Once issued, an instruction is moved to ``IssuedInst`` set until it is ready to
retire. :program:`llvm-mca` ensures that writes are committed in-order. However,		retire. :program:`llvm-mca` ensures that writes are committed in-order. However,
an instruction is allowed to commit writes and retire out-of-order if		an instruction is allowed to commit writes and retire out-of-order if
``RetireOOO`` property is true for at least one of its writes.		``RetireOOO`` property is true for at least one of its writes.

		Custom Behaviour
		""""""""""""""""""""""""""""""""""""
		Due to certain instructions not being expressed perfectly within their
		scheduling model, :program:`llvm-ma` isn't always able to simulate them
		perfectly. Modifying the scheduling model isn't always a viable
		option though (maybe because the instruction is modeled incorrectly on
		purpose or the instruction's behaviour is quite complex). The
		CustomBehaviour class can be used in these cases to enforce proper
		andreadbUnsubmitted Not Done Reply Inline Actions I think this paragraph should be more explicit on what is the goal of the CustomBehaviour class. The first two sentences should probably be merged together. The focus on those two sentences is the "lack of expressiveness" (of scheduling models and machine instructions). That lack of expressivenes is what eventually leads to poor simulations. Starting the entire paragraph with that first sentence is not great; in the absence of context, it sounds more like there are bugs in llvm-mca itself. I also noticed that you don't explain what is the main goal of this new CustomBehaviour class. You have done a much better job at introducing it in the summary of this bug. At this point, you should at least drop a hint about what is the most common goal for it: "it allows targets to customise data dependencies, and model delays which cannot be normally predicted just by simply evaluating register defs/uses". Something like that... Maybe you can be more accurate, but keep in mind that the description should not be too low level in this document. andreadb: I think this paragraph should be more explicit on what is the goal of the CustomBehaviour class.
		instruction modeling (often by customizing data dependencies and detecting
		hazards that :program:`llvm-ma` has no way of knowing about).

		:program:`llvm-mca` comes with one generic and multiple target specific
		CustomBehaviour classes. The generic class will be used if the ``-disable-cb``
		flag is used or if a target specific CustomBehaviour class doesn't exist for
		that target. (The generic class does nothing.) Currently, the CustomBehaviour
		class is only a part of the in-order pipeline, but there are plans to add it
		to the out-of-order pipeline in the future.

		CustomBehaviour's main method is `checkCustomHazard()` which uses the
		current instruction and a list of all instructions still executing within
		the pipeline to determine if the current instruction should be dispatched.
		As output, the method returns an integer representing the number of cycles
		that the current instruction must stall for (this can be an underestimate
		if you don't know the exact number and a value of 0 represents no stall).

		If you'd like to add a CustomBehaviour class for a target that doesn't
		already have one, refer to an existing implementation to see how to set it
		up. Remember to look at (and add to) `/llvm-mca/lib/CMakeLists.txt`.

llvm/include/llvm/MCA/Context.h

Show All 13 Lines
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_MCA_CONTEXT_H		#ifndef LLVM_MCA_CONTEXT_H
#define LLVM_MCA_CONTEXT_H		#define LLVM_MCA_CONTEXT_H

#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
		#include "llvm/MCA/CustomBehaviour.h"
#include "llvm/MCA/HardwareUnits/HardwareUnit.h"		#include "llvm/MCA/HardwareUnits/HardwareUnit.h"
#include "llvm/MCA/Pipeline.h"		#include "llvm/MCA/Pipeline.h"
#include "llvm/MCA/SourceMgr.h"		#include "llvm/MCA/SourceMgr.h"
#include <memory>		#include <memory>

namespace llvm {		namespace llvm {
namespace mca {		namespace mca {

Show All 32 Lines	public:

void addHardwareUnit(std::unique_ptr<HardwareUnit> H) {		void addHardwareUnit(std::unique_ptr<HardwareUnit> H) {
Hardware.push_back(std::move(H));		Hardware.push_back(std::move(H));
}		}

/// Construct a basic pipeline for simulating an out-of-order pipeline.		/// Construct a basic pipeline for simulating an out-of-order pipeline.
/// This pipeline consists of Fetch, Dispatch, Execute, and Retire stages.		/// This pipeline consists of Fetch, Dispatch, Execute, and Retire stages.
std::unique_ptr<Pipeline> createDefaultPipeline(const PipelineOptions &Opts,		std::unique_ptr<Pipeline> createDefaultPipeline(const PipelineOptions &Opts,
SourceMgr &SrcMgr);		SourceMgr &SrcMgr,
		CustomBehaviour &CB);

/// Construct a basic pipeline for simulating an in-order pipeline.		/// Construct a basic pipeline for simulating an in-order pipeline.
/// This pipeline consists of Fetch, InOrderIssue, and Retire stages.		/// This pipeline consists of Fetch, InOrderIssue, and Retire stages.
std::unique_ptr<Pipeline> createInOrderPipeline(const PipelineOptions &Opts,		std::unique_ptr<Pipeline> createInOrderPipeline(const PipelineOptions &Opts,
SourceMgr &SrcMgr);		SourceMgr &SrcMgr,
		CustomBehaviour &CB);
};		};

} // namespace mca		} // namespace mca
} // namespace llvm		} // namespace llvm
#endif // LLVM_MCA_CONTEXT_H		#endif // LLVM_MCA_CONTEXT_H

llvm/include/llvm/MCA/CustomBehaviour.h

This file was added.

				//===---------------------- CustomBehaviour.h -------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				///
				/// This file defines the base class CustomBehaviour which can be inherited from
				/// by specific targets (ex. llvm/tools/llvm-mca/lib/X86CustomBehaviour.h).
				/// CustomBehaviour is designed to enforce custom behaviour and dependencies
				/// within the llvm-mca pipeline simulation that llvm-mca isn't already capable
				/// of extracting from the Scheduling Models.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_MCA_CUSTOMBEHAVIOUR_H
				#define LLVM_MCA_CUSTOMBEHAVIOUR_H

				#include "llvm/MC/MCInst.h"
				#include "llvm/MC/MCInstrInfo.h"
				#include "llvm/MC/MCSubtargetInfo.h"
				#include "llvm/MCA/SourceMgr.h"

				namespace llvm {
				namespace mca {

				/// Class which can be overriden by targets to modify the
				/// mca::Instruction objects before the pipeline starts.
				/// A common usage of this class is to add immediate operands to certain
				/// instructions or to remove Defs/Uses from an instruction where the
				/// schedulinng model is incorrect.
				class InstrPostProcess {
				protected:
				const MCSubtargetInfo &STI;
				const MCInstrInfo &MCII;

				public:
				InstrPostProcess(const MCSubtargetInfo &STI, const MCInstrInfo &MCII)
				: STI(STI), MCII(MCII) {}

				virtual ~InstrPostProcess() {}

				virtual void postProcessInstruction(std::unique_ptr<Instruction> &Inst,
				const MCInst &MCI) {}
				};

				/// Class which can be overriden by targets to enforce instruction
				/// dependencies and behaviours that aren't expressed well enough
				/// within the scheduling model for mca to automatically simulate
				/// them properly.
				/// If you implement this class for your target, make sure to also implement
				/// a target specific InstrPostProcess class as well.
				class CustomBehaviour {
				protected:
				const MCSubtargetInfo &STI;
				const SourceMgr &SrcMgr;
				const MCInstrInfo &MCII;

				public:
				CustomBehaviour(const MCSubtargetInfo &STI, const SourceMgr &SrcMgr,
				const MCInstrInfo &MCII)
				: STI(STI), SrcMgr(SrcMgr), MCII(MCII) {}

				virtual ~CustomBehaviour() {}

				// Before the llvm-mca pipeline dispatches an instruction, it first checks
				// for any register or resource dependencies / hazards. If it doesn't find
				// any, this method will be invoked to determine if there are any custom
				// hazards that the instruction needs to wait for.
				// The return value of this method is the number of cycles that the
				// instruction needs to wait for.
				// It's safe to underestimate the number of cycles to wait for since these
				// checks will be invoked again before the intruction gets dispatched.
				// However, it's not safe (accurate) to overestimate the number of cycles
				// to wait for since the instruction will wait for AT LEAST that number of
				// cycles before attempting to be dispatched again.
				virtual unsigned checkCustomHazard(ArrayRef<InstRef> IssuedInst,
				const InstRef &IR);
				};

				} // namespace mca
				} // namespace llvm

				#endif /* LLVM_MCA_CUSTOMBEHAVIOUR_H */

llvm/include/llvm/MCA/HWEventListener.h

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	enum GenericEventType {
// Generic stall events generated by the DispatchStage.		// Generic stall events generated by the DispatchStage.
RegisterFileStall,		RegisterFileStall,
RetireControlUnitStall,		RetireControlUnitStall,
// Generic stall events generated by the Scheduler.		// Generic stall events generated by the Scheduler.
DispatchGroupStall,		DispatchGroupStall,
SchedulerQueueFull,		SchedulerQueueFull,
LoadQueueFull,		LoadQueueFull,
StoreQueueFull,		StoreQueueFull,
		CustomBehaviourStall,
LastGenericEvent		LastGenericEvent
};		};

HWStallEvent(unsigned type, const InstRef &Inst) : Type(type), IR(Inst) {}		HWStallEvent(unsigned type, const InstRef &Inst) : Type(type), IR(Inst) {}

// The exact meaning of the stall event type depends on the subtarget.		// The exact meaning of the stall event type depends on the subtarget.
const unsigned Type;		const unsigned Type;

▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/include/llvm/MCA/Instruction.h

Show All 27 Lines
#include <memory>		#include <memory>

namespace llvm {		namespace llvm {

namespace mca {		namespace mca {

constexpr int UNKNOWN_CYCLES = -512;		constexpr int UNKNOWN_CYCLES = -512;

		/// A representation of an mca::Instruction operand
		/// for use in mca::CustomBehaviour.
		class MCAOperand {
		// This class is mostly copied from MCOperand within
		// MCInst.h except that we don't keep track of
		// expressions or sub-instructions.
		enum MCAOperandType : unsigned char {
		kInvalid, ///< Uninitialized, Relocatable immediate, or Sub-instruction.
		kRegister, ///< Register operand.
		kImmediate, ///< Immediate operand.
		kSFPImmediate, ///< Single-floating-point immediate operand.
		kDFPImmediate, ///< Double-Floating-point immediate operand.
		};
		MCAOperandType Kind = kInvalid;

		union {
		unsigned RegVal;
		int64_t ImmVal;
		uint32_t SFPImmVal;
		uint64_t FPImmVal;
		};

		// We only store specific operands for specific instructions
		// so an instruction's operand 3 may be stored within the list
		// of MCAOperand as element 0. This Index attribute keeps track
		// of the original index (3 for this example).
		unsigned Index;

		public:
		MCAOperand() : FPImmVal(0) {}

		bool isValid() const { return Kind != kInvalid; }
		bool isReg() const { return Kind == kRegister; }
		bool isImm() const { return Kind == kImmediate; }
		bool isSFPImm() const { return Kind == kSFPImmediate; }
		bool isDFPImm() const { return Kind == kDFPImmediate; }

		/// Returns the register number.
		unsigned getReg() const {
		assert(isReg() && "This is not a register operand!");
		return RegVal;
		}

		int64_t getImm() const {
		assert(isImm() && "This is not an immediate");
		return ImmVal;
		}

		uint32_t getSFPImm() const {
		assert(isSFPImm() && "This is not an SFP immediate");
		return SFPImmVal;
		}

		uint64_t getDFPImm() const {
		assert(isDFPImm() && "This is not an FP immediate");
		return FPImmVal;
		}

		void setIndex(const unsigned Idx) { Index = Idx; }

		unsigned getIndex() const { return Index; }

		static MCAOperand createReg(unsigned Reg) {
		MCAOperand Op;
		Op.Kind = kRegister;
		Op.RegVal = Reg;
		return Op;
		}

		static MCAOperand createImm(int64_t Val) {
		MCAOperand Op;
		Op.Kind = kImmediate;
		Op.ImmVal = Val;
		return Op;
		}

		static MCAOperand createSFPImm(uint32_t Val) {
		MCAOperand Op;
		Op.Kind = kSFPImmediate;
		Op.SFPImmVal = Val;
		return Op;
		}

		static MCAOperand createDFPImm(uint64_t Val) {
		MCAOperand Op;
		Op.Kind = kDFPImmediate;
		Op.FPImmVal = Val;
		return Op;
		}

		static MCAOperand createInvalid() {
		MCAOperand Op;
		Op.Kind = kInvalid;
		Op.FPImmVal = 0;
		return Op;
		}
		};

/// A register write descriptor.		/// A register write descriptor.
struct WriteDescriptor {		struct WriteDescriptor {
// Operand index. The index is negative for implicit writes only.		// Operand index. The index is negative for implicit writes only.
// For implicit writes, the actual operand index is computed performing		// For implicit writes, the actual operand index is computed performing
// a bitwise not of the OpIndex.		// a bitwise not of the OpIndex.
int OpIndex;		int OpIndex;
// Write latency. Number of cycles before write-back stage.		// Write latency. Number of cycles before write-back stage.
unsigned Latency;		unsigned Latency;
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	WriteState(const WriteDescriptor &Desc, MCPhysReg RegID,
DependentWriteCyclesLeft(0), CRD() {}		DependentWriteCyclesLeft(0), CRD() {}

WriteState(const WriteState &Other) = default;		WriteState(const WriteState &Other) = default;
WriteState &operator=(const WriteState &Other) = default;		WriteState &operator=(const WriteState &Other) = default;

int getCyclesLeft() const { return CyclesLeft; }		int getCyclesLeft() const { return CyclesLeft; }
unsigned getWriteResourceID() const { return WD->SClassOrWriteResourceID; }		unsigned getWriteResourceID() const { return WD->SClassOrWriteResourceID; }
MCPhysReg getRegisterID() const { return RegisterID; }		MCPhysReg getRegisterID() const { return RegisterID; }
		void setRegisterID(const MCPhysReg RegID) { RegisterID = RegID; }
unsigned getRegisterFileID() const { return PRFID; }		unsigned getRegisterFileID() const { return PRFID; }
unsigned getLatency() const { return WD->Latency; }		unsigned getLatency() const { return WD->Latency; }
unsigned getDependentWriteCyclesLeft() const {		unsigned getDependentWriteCyclesLeft() const {
return DependentWriteCyclesLeft;		return DependentWriteCyclesLeft;
}		}
const WriteState *getDependentWrite() const { return DependentWrite; }		const WriteState *getDependentWrite() const { return DependentWrite; }
const CriticalDependency &getCriticalRegDep() const { return CRD; }		const CriticalDependency &getCriticalRegDep() const { return CRD; }

▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines	class InstructionBase {
// Output dependencies.		// Output dependencies.
// One entry per each implicit and explicit register definition.		// One entry per each implicit and explicit register definition.
SmallVector<WriteState, 2> Defs;		SmallVector<WriteState, 2> Defs;

// Input dependencies.		// Input dependencies.
// One entry per each implicit and explicit register use.		// One entry per each implicit and explicit register use.
SmallVector<ReadState, 4> Uses;		SmallVector<ReadState, 4> Uses;

		// List of operands which can be used by mca::CustomBehaviour
		std::vector<MCAOperand> Operands;
		andreadbUnsubmitted Not Done Reply Inline Actions Can this vector be fully optional? I am thinking about using a simple std::vector here, rather than a SmallVector, which would force us to consume space for a default number of inline elements (8 in this case). For the (majority) of targets that don't require/implement a custom behaviour class, that field is literally wasted space. In general, I am already concerned about the `sizeof` mca::instruction. Those objects tend to be often too big, and this change has the potential of making them even bigger. andreadb: Can this vector be fully optional? I am thinking about using a simple std::vector here, rather…

		// Instruction opcode which can be used by mca::CustomBehaviour
		unsigned Opcode;

public:		public:
InstructionBase(const InstrDesc &D) : Desc(D), IsOptimizableMove(false) {}		InstructionBase(const InstrDesc &D, const unsigned Opcode)
		: Desc(D), IsOptimizableMove(false), Operands(0), Opcode(Opcode) {}

SmallVectorImpl<WriteState> &getDefs() { return Defs; }		SmallVectorImpl<WriteState> &getDefs() { return Defs; }
ArrayRef<WriteState> getDefs() const { return Defs; }		ArrayRef<WriteState> getDefs() const { return Defs; }
SmallVectorImpl<ReadState> &getUses() { return Uses; }		SmallVectorImpl<ReadState> &getUses() { return Uses; }
ArrayRef<ReadState> getUses() const { return Uses; }		ArrayRef<ReadState> getUses() const { return Uses; }
const InstrDesc &getDesc() const { return Desc; }		const InstrDesc &getDesc() const { return Desc; }

unsigned getLatency() const { return Desc.MaxLatency; }		unsigned getLatency() const { return Desc.MaxLatency; }
unsigned getNumMicroOps() const { return Desc.NumMicroOps; }		unsigned getNumMicroOps() const { return Desc.NumMicroOps; }
		unsigned getOpcode() const { return Opcode; }

		/// Return the MCAOperand which corresponds to index Idx within the original
		/// MCInst.
		const MCAOperand *getOperand(const unsigned Idx) const {
		auto It = std::find_if(
		Operands.begin(), Operands.end(),
		[&Idx](const MCAOperand &Op) { return Op.getIndex() == Idx; });
		if (It == Operands.end())
		return nullptr;
		return &(*It);
		}
		unsigned getNumOperands() const { return Operands.size(); }
		void addOperand(const MCAOperand Op) { Operands.push_back(Op); }

bool hasDependentUsers() const {		bool hasDependentUsers() const {
return any_of(Defs,		return any_of(Defs,
[](const WriteState &Def) { return Def.getNumUsers() > 0; });		[](const WriteState &Def) { return Def.getNumUsers() > 0; });
}		}

unsigned getNumUsers() const {		unsigned getNumUsers() const {
unsigned NumUsers = 0;		unsigned NumUsers = 0;
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	class Instruction : public InstructionBase {
// This field is set to zero only if execution is not delayed during this		// This field is set to zero only if execution is not delayed during this
// cycle because of unavailable pipeline resources.		// cycle because of unavailable pipeline resources.
uint64_t CriticalResourceMask;		uint64_t CriticalResourceMask;

// True if this instruction has been optimized at register renaming stage.		// True if this instruction has been optimized at register renaming stage.
bool IsEliminated;		bool IsEliminated;

public:		public:
Instruction(const InstrDesc &D)		Instruction(const InstrDesc &D, const unsigned Opcode)
: InstructionBase(D), Stage(IS_INVALID), CyclesLeft(UNKNOWN_CYCLES),		: InstructionBase(D, Opcode), Stage(IS_INVALID),
RCUTokenID(0), LSUTokenID(0), UsedBuffers(D.UsedBuffers),		CyclesLeft(UNKNOWN_CYCLES), RCUTokenID(0), LSUTokenID(0),
CriticalRegDep(), CriticalMemDep(), CriticalResourceMask(0),		UsedBuffers(D.UsedBuffers), CriticalRegDep(), CriticalMemDep(),
IsEliminated(false) {}		CriticalResourceMask(0), IsEliminated(false) {}

unsigned getRCUTokenID() const { return RCUTokenID; }		unsigned getRCUTokenID() const { return RCUTokenID; }
unsigned getLSUTokenID() const { return LSUTokenID; }		unsigned getLSUTokenID() const { return LSUTokenID; }
void setLSUTokenID(unsigned LSUTok) { LSUTokenID = LSUTok; }		void setLSUTokenID(unsigned LSUTok) { LSUTokenID = LSUTok; }

uint64_t getUsedBuffers() const { return UsedBuffers; }		uint64_t getUsedBuffers() const { return UsedBuffers; }
void setUsedBuffers(uint64_t Mask) { UsedBuffers = Mask; }		void setUsedBuffers(uint64_t Mask) { UsedBuffers = Mask; }
void clearUsedBuffers() { UsedBuffers = 0ULL; }		void clearUsedBuffers() { UsedBuffers = 0ULL; }
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

llvm/include/llvm/MCA/Stages/InOrderIssueStage.h

//===---------------------- InOrderIssueStage.h ------------------ C++ --===//		//===---------------------- InOrderIssueStage.h ------------------ C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
/// \file		/// \file
///		///
/// InOrderIssueStage implements an in-order execution pipeline.		/// InOrderIssueStage implements an in-order execution pipeline.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_MCA_STAGES_INORDERISSUESTAGE_H		#ifndef LLVM_MCA_STAGES_INORDERISSUESTAGE_H
#define LLVM_MCA_STAGES_INORDERISSUESTAGE_H		#define LLVM_MCA_STAGES_INORDERISSUESTAGE_H

		#include "llvm/MCA/CustomBehaviour.h"
#include "llvm/MCA/HardwareUnits/ResourceManager.h"		#include "llvm/MCA/HardwareUnits/ResourceManager.h"
#include "llvm/MCA/SourceMgr.h"		#include "llvm/MCA/SourceMgr.h"
#include "llvm/MCA/Stages/Stage.h"		#include "llvm/MCA/Stages/Stage.h"

namespace llvm {		namespace llvm {
namespace mca {		namespace mca {
class RegisterFile;		class RegisterFile;

struct StallInfo {		struct StallInfo {
enum class StallKind { DEFAULT, REGISTER_DEPS, DISPATCH, DELAY };		enum class StallKind {
		DEFAULT,
		REGISTER_DEPS,
		DISPATCH,
		DELAY,
		CUSTOM_STALL
		};

InstRef IR;		InstRef IR;
unsigned CyclesLeft;		unsigned CyclesLeft;
StallKind Kind;		StallKind Kind;

StallInfo() : IR(), CyclesLeft(), Kind(StallKind::DEFAULT) {}		StallInfo() : IR(), CyclesLeft(), Kind(StallKind::DEFAULT) {}

StallKind getStallKind() const { return Kind; }		StallKind getStallKind() const { return Kind; }
unsigned getCyclesLeft() const { return CyclesLeft; }		unsigned getCyclesLeft() const { return CyclesLeft; }
const InstRef &getInstruction() const { return IR; }		const InstRef &getInstruction() const { return IR; }
InstRef &getInstruction() { return IR; }		InstRef &getInstruction() { return IR; }

bool isValid() const { return (bool)IR; }		bool isValid() const { return (bool)IR; }
void clear();		void clear();
void update(const InstRef &Inst, unsigned Cycles, StallKind SK);		void update(const InstRef &Inst, unsigned Cycles, StallKind SK);
void cycleEnd();		void cycleEnd();
};		};

class InOrderIssueStage final : public Stage {		class InOrderIssueStage final : public Stage {
const MCSubtargetInfo &STI;		const MCSubtargetInfo &STI;
RegisterFile &PRF;		RegisterFile &PRF;
ResourceManager RM;		ResourceManager RM;
		CustomBehaviour &CB;

/// Instructions that were issued, but not executed yet.		/// Instructions that were issued, but not executed yet.
SmallVector<InstRef, 4> IssuedInst;		SmallVector<InstRef, 4> IssuedInst;

/// Number of instructions issued in the current cycle.		/// Number of instructions issued in the current cycle.
unsigned NumIssued;		unsigned NumIssued;

StallInfo SI;		StallInfo SI;
Show All 39 Lines	class InOrderIssueStage final : public Stage {
void notifyInstructionExecuted(const InstRef &IR);		void notifyInstructionExecuted(const InstRef &IR);
void notifyInstructionRetired(const InstRef &IR,		void notifyInstructionRetired(const InstRef &IR,
ArrayRef<unsigned> FreedRegs);		ArrayRef<unsigned> FreedRegs);

/// Retire instruction once it is executed.		/// Retire instruction once it is executed.
void retireInstruction(InstRef &IR);		void retireInstruction(InstRef &IR);

public:		public:
InOrderIssueStage(const MCSubtargetInfo &STI, RegisterFile &PRF);		InOrderIssueStage(const MCSubtargetInfo &STI, RegisterFile &PRF,
		CustomBehaviour &CB);

unsigned getIssueWidth() const;		unsigned getIssueWidth() const;
bool isAvailable(const InstRef &) const override;		bool isAvailable(const InstRef &) const override;
bool hasWorkToComplete() const override;		bool hasWorkToComplete() const override;
Error execute(InstRef &IR) override;		Error execute(InstRef &IR) override;
Error cycleStart() override;		Error cycleStart() override;
Error cycleEnd() override;		Error cycleEnd() override;
};		};

} // namespace mca		} // namespace mca
} // namespace llvm		} // namespace llvm

#endif // LLVM_MCA_STAGES_INORDERISSUESTAGE_H		#endif // LLVM_MCA_STAGES_INORDERISSUESTAGE_H

llvm/lib/MCA/CMakeLists.txt

	add_llvm_component_library(LLVMMCA			add_llvm_component_library(LLVMMCA
	CodeEmitter.cpp			CodeEmitter.cpp
	Context.cpp			Context.cpp
				CustomBehaviour.cpp
	HWEventListener.cpp			HWEventListener.cpp
	HardwareUnits/HardwareUnit.cpp			HardwareUnits/HardwareUnit.cpp
	HardwareUnits/LSUnit.cpp			HardwareUnits/LSUnit.cpp
	HardwareUnits/RegisterFile.cpp			HardwareUnits/RegisterFile.cpp
	HardwareUnits/ResourceManager.cpp			HardwareUnits/ResourceManager.cpp
	HardwareUnits/RetireControlUnit.cpp			HardwareUnits/RetireControlUnit.cpp
	HardwareUnits/Scheduler.cpp			HardwareUnits/Scheduler.cpp
	InstrBuilder.cpp			InstrBuilder.cpp
	Show All 19 Lines

llvm/lib/MCA/Context.cpp

Show All 23 Lines
#include "llvm/MCA/Stages/InOrderIssueStage.h"		#include "llvm/MCA/Stages/InOrderIssueStage.h"
#include "llvm/MCA/Stages/MicroOpQueueStage.h"		#include "llvm/MCA/Stages/MicroOpQueueStage.h"
#include "llvm/MCA/Stages/RetireStage.h"		#include "llvm/MCA/Stages/RetireStage.h"

namespace llvm {		namespace llvm {
namespace mca {		namespace mca {

std::unique_ptr<Pipeline>		std::unique_ptr<Pipeline>
Context::createDefaultPipeline(const PipelineOptions &Opts, SourceMgr &SrcMgr) {		Context::createDefaultPipeline(const PipelineOptions &Opts, SourceMgr &SrcMgr,
		CustomBehaviour &CB) {
const MCSchedModel &SM = STI.getSchedModel();		const MCSchedModel &SM = STI.getSchedModel();

if (!SM.isOutOfOrder())		if (!SM.isOutOfOrder())
return createInOrderPipeline(Opts, SrcMgr);		return createInOrderPipeline(Opts, SrcMgr, CB);

// Create the hardware units defining the backend.		// Create the hardware units defining the backend.
auto RCU = std::make_unique<RetireControlUnit>(SM);		auto RCU = std::make_unique<RetireControlUnit>(SM);
auto PRF = std::make_unique<RegisterFile>(SM, MRI, Opts.RegisterFileSize);		auto PRF = std::make_unique<RegisterFile>(SM, MRI, Opts.RegisterFileSize);
auto LSU = std::make_unique<LSUnit>(SM, Opts.LoadQueueSize,		auto LSU = std::make_unique<LSUnit>(SM, Opts.LoadQueueSize,
Opts.StoreQueueSize, Opts.AssumeNoAlias);		Opts.StoreQueueSize, Opts.AssumeNoAlias);
auto HWS = std::make_unique<Scheduler>(SM, *LSU);		auto HWS = std::make_unique<Scheduler>(SM, *LSU);

Show All 19 Lines	StagePipeline->appendStage(std::make_unique<MicroOpQueueStage>(
Opts.MicroOpQueueSize, Opts.DecodersThroughput));		Opts.MicroOpQueueSize, Opts.DecodersThroughput));
StagePipeline->appendStage(std::move(Dispatch));		StagePipeline->appendStage(std::move(Dispatch));
StagePipeline->appendStage(std::move(Execute));		StagePipeline->appendStage(std::move(Execute));
StagePipeline->appendStage(std::move(Retire));		StagePipeline->appendStage(std::move(Retire));
return StagePipeline;		return StagePipeline;
}		}

std::unique_ptr<Pipeline>		std::unique_ptr<Pipeline>
Context::createInOrderPipeline(const PipelineOptions &Opts, SourceMgr &SrcMgr) {		Context::createInOrderPipeline(const PipelineOptions &Opts, SourceMgr &SrcMgr,
		CustomBehaviour &CB) {
const MCSchedModel &SM = STI.getSchedModel();		const MCSchedModel &SM = STI.getSchedModel();
auto PRF = std::make_unique<RegisterFile>(SM, MRI, Opts.RegisterFileSize);		auto PRF = std::make_unique<RegisterFile>(SM, MRI, Opts.RegisterFileSize);

// Create the pipeline stages.		// Create the pipeline stages.
auto Entry = std::make_unique<EntryStage>(SrcMgr);		auto Entry = std::make_unique<EntryStage>(SrcMgr);
auto InOrderIssue = std::make_unique<InOrderIssueStage>(STI, *PRF);		auto InOrderIssue = std::make_unique<InOrderIssueStage>(STI, *PRF, CB);
auto StagePipeline = std::make_unique<Pipeline>();		auto StagePipeline = std::make_unique<Pipeline>();

// Pass the ownership of all the hardware units to this Context.		// Pass the ownership of all the hardware units to this Context.
addHardwareUnit(std::move(PRF));		addHardwareUnit(std::move(PRF));

// Build the pipeline.		// Build the pipeline.
StagePipeline->appendStage(std::move(Entry));		StagePipeline->appendStage(std::move(Entry));
StagePipeline->appendStage(std::move(InOrderIssue));		StagePipeline->appendStage(std::move(InOrderIssue));
return StagePipeline;		return StagePipeline;
}		}

} // namespace mca		} // namespace mca
} // namespace llvm		} // namespace llvm

llvm/lib/MCA/CustomBehaviour.cpp

This file was added.

				//===--------------------- CustomBehaviour.cpp ------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				///
				/// This file implements methods from the CustomBehaviour interface.
				///
				//===----------------------------------------------------------------------===//

				#include "llvm/MCA/CustomBehaviour.h"

				namespace llvm {
				namespace mca {

				unsigned CustomBehaviour::checkCustomHazard(ArrayRef<InstRef> IssuedInst,
				const InstRef &IR) {
				// 0 signifies that there are no hazards that need to be waited on
				foadUnsubmitted Not Done Reply Inline Actions Could the first argument be `ArrayRef<InstRef> IssuedInst`? foad: Could the first argument be `ArrayRef<InstRef> IssuedInst`?
				andreadbUnsubmitted Not Done Reply Inline Actions Yeah, it should be immutable. andreadb: Yeah, it should be immutable.
				return 0;
				}

				} // namespace mca
				} // namespace llvm

llvm/lib/MCA/InstrBuilder.cpp

	Show First 20 Lines • Show All 610 Lines • ▼ Show 20 Lines
	}			}

	Expected<std::unique_ptr<Instruction>>			Expected<std::unique_ptr<Instruction>>
	InstrBuilder::createInstruction(const MCInst &MCI) {			InstrBuilder::createInstruction(const MCInst &MCI) {
	Expected<const InstrDesc &> DescOrErr = getOrCreateInstrDesc(MCI);			Expected<const InstrDesc &> DescOrErr = getOrCreateInstrDesc(MCI);
	if (!DescOrErr)			if (!DescOrErr)
	return DescOrErr.takeError();			return DescOrErr.takeError();
	const InstrDesc &D = *DescOrErr;			const InstrDesc &D = *DescOrErr;
	std::unique_ptr<Instruction> NewIS = std::make_unique<Instruction>(D);			std::unique_ptr<Instruction> NewIS =
				std::make_unique<Instruction>(D, MCI.getOpcode());

				andreadbUnsubmitted Not Done Reply Inline Actions Three questions: Is it necessary to add MCAOperands to every single instruction? Is it possible to do this as a post-processing step, if requested by the simulator? Can we limit the number of MCAOperand objects that we store in an mca::Instruction? Ideally, this logic should be split from the normal `InstrBuilder::createInstruction` and moved to a separate `InstrBuilder::postProcessInstruction` step. That post-processing step would then be performed only if the so-called "custom behaviour" is requested. In order to address points 1. and 3., it may be useful to implement a new "MCA Lowering Context" class, which knows about: a) which instructions require a list of MCAOperand, and b) which operands must be lowered to MCAOperand for the custom hazard check to work. The MCA lowering context would know about which instructions/operands require a custom post-processing step, and skip the post-processing step for those instructions that can be safely ignored. Essentially, have a context class that filters opcodes and rules out which operands are important to keep as MCAOperand, and which are not important for the custom behaviour class. To give you an idea: if we know that the custom behaviour class is only interested in the first operand of a specific load opcode, then we should only create MCAOperand objects for that load, and just for its first operand. In most cases (correct me if I am wrong), I suspect that we may not need to store MCAOperands for every single instruction out there. So, blindly lowering every MCOperand into MCAOperand - while it work in general - is a bit extreme in my opinion. Keep in mind that most targets won't implement any custom behaviour for their subtargets, and for those targets, translating to MCAOperand would be unnecessary, and it would simply waste space. Essentially, if a custom behaviour is not requested/implemented, then we should ignore this extra post-processing step entirely, and not populate the vector of MCAOperands (this is also why it may be worthy to make the new MCAOperand vector field a std::vector). I hope it makes sense. andreadb: Three questions: 1. Is it necessary to add MCAOperands to every single instruction? 2. Is it…
				andreadbUnsubmitted Not Done Reply Inline Actions I forgot to mention that, if ordering of operands is important for the custom behaviour check to work, and you are concerned about having gaps in the MCAOperand sequence, then you could use an extra field to store the `original operand index`. andreadb: I forgot to mention that, if ordering of operands is important for the custom behaviour check…
	// Check if this is a dependency breaking instruction.			// Check if this is a dependency breaking instruction.
	APInt Mask;			APInt Mask;

	bool IsZeroIdiom = false;			bool IsZeroIdiom = false;
	bool IsDepBreaking = false;			bool IsDepBreaking = false;
	if (MCIA) {			if (MCIA) {
	unsigned ProcID = STI.getSchedModel().getProcessorID();			unsigned ProcID = STI.getSchedModel().getProcessorID();
	IsZeroIdiom = MCIA->isZeroIdiom(MCI, Mask, ProcID);			IsZeroIdiom = MCIA->isZeroIdiom(MCI, Mask, ProcID);
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

llvm/lib/MCA/Stages/InOrderIssueStage.cpp

Show All 37 Lines	void StallInfo::cycleEnd() {

if (!CyclesLeft)		if (!CyclesLeft)
return;		return;

--CyclesLeft;		--CyclesLeft;
}		}

InOrderIssueStage::InOrderIssueStage(const MCSubtargetInfo &STI,		InOrderIssueStage::InOrderIssueStage(const MCSubtargetInfo &STI,
RegisterFile &PRF)		RegisterFile &PRF, CustomBehaviour &CB)
: STI(STI), PRF(PRF), RM(STI.getSchedModel()), NumIssued(), SI(),		: STI(STI), PRF(PRF), RM(STI.getSchedModel()), CB(CB), NumIssued(), SI(),
CarryOver(), Bandwidth(), LastWriteBackCycle() {}		CarryOver(), Bandwidth(), LastWriteBackCycle() {}

unsigned InOrderIssueStage::getIssueWidth() const {		unsigned InOrderIssueStage::getIssueWidth() const {
return STI.getSchedModel().IssueWidth;		return STI.getSchedModel().IssueWidth;
}		}

bool InOrderIssueStage::hasWorkToComplete() const {		bool InOrderIssueStage::hasWorkToComplete() const {
return !IssuedInst.empty() \|\| SI.isValid() \|\| CarriedOver;		return !IssuedInst.empty() \|\| SI.isValid() \|\| CarriedOver;
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	if (unsigned Cycles = checkRegisterHazard(PRF, STI, IR)) {
return false;		return false;
}		}

if (hasResourceHazard(RM, IR)) {		if (hasResourceHazard(RM, IR)) {
SI.update(IR, /* delay */ 1, StallInfo::StallKind::DISPATCH);		SI.update(IR, /* delay */ 1, StallInfo::StallKind::DISPATCH);
return false;		return false;
}		}

		if (unsigned CustomStallCycles = CB.checkCustomHazard(IssuedInst, IR)) {
		SI.update(IR, CustomStallCycles, StallInfo::StallKind::CUSTOM_STALL);
		return false;
		}

if (LastWriteBackCycle) {		if (LastWriteBackCycle) {
if (!IR.getInstruction()->getDesc().RetireOOO) {		if (!IR.getInstruction()->getDesc().RetireOOO) {
unsigned NextWriteBackCycle = findFirstWriteBackCycle(IR);		unsigned NextWriteBackCycle = findFirstWriteBackCycle(IR);
// Delay the instruction to ensure that writes happen in program order.		// Delay the instruction to ensure that writes happen in program order.
if (NextWriteBackCycle < LastWriteBackCycle) {		if (NextWriteBackCycle < LastWriteBackCycle) {
SI.update(IR, LastWriteBackCycle - NextWriteBackCycle,		SI.update(IR, LastWriteBackCycle - NextWriteBackCycle,
StallInfo::StallKind::DELAY);		StallInfo::StallKind::DELAY);
return false;		return false;
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	void InOrderIssueStage::notifyStallEvent() {
}		}
case StallInfo::StallKind::DISPATCH: {		case StallInfo::StallKind::DISPATCH: {
notifyEvent<HWStallEvent>(		notifyEvent<HWStallEvent>(
HWStallEvent(HWStallEvent::DispatchGroupStall, IR));		HWStallEvent(HWStallEvent::DispatchGroupStall, IR));
notifyEvent<HWPressureEvent>(		notifyEvent<HWPressureEvent>(
HWPressureEvent(HWPressureEvent::RESOURCES, IR));		HWPressureEvent(HWPressureEvent::RESOURCES, IR));
break;		break;
}		}
		case StallInfo::StallKind::CUSTOM_STALL: {
		notifyEvent<HWStallEvent>(
		HWStallEvent(HWStallEvent::CustomBehaviourStall, IR));
		break;
		}
}		}
}		}

llvm::Error InOrderIssueStage::cycleStart() {		llvm::Error InOrderIssueStage::cycleStart() {
NumIssued = 0;		NumIssued = 0;
Bandwidth = getIssueWidth();		Bandwidth = getIssueWidth();

PRF.cycleStart();		PRF.cycleStart();
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/tools/llvm-mca/CMakeLists.txt

include_directories(include)		include_directories(include)

		add_subdirectory(lib)

set(LLVM_LINK_COMPONENTS		set(LLVM_LINK_COMPONENTS
AllTargetsAsmParsers		AllTargetsAsmParsers
AllTargetsDescs		AllTargetsDescs
AllTargetsDisassemblers		AllTargetsDisassemblers
AllTargetsInfos		AllTargetsInfos
MCA		MCA
MC		MC
MCParser		MCParser
Show All 14 Lines	add_llvm_tool(llvm-mca
Views/RetireControlUnitStatistics.cpp		Views/RetireControlUnitStatistics.cpp
Views/SchedulerStatistics.cpp		Views/SchedulerStatistics.cpp
Views/SummaryView.cpp		Views/SummaryView.cpp
Views/TimelineView.cpp		Views/TimelineView.cpp
Views/View.cpp		Views/View.cpp
)		)

set(LLVM_MCA_SOURCE_DIR ${CURRENT_SOURCE_DIR})		set(LLVM_MCA_SOURCE_DIR ${CURRENT_SOURCE_DIR})

		target_link_libraries(llvm-mca PRIVATE
		${LLVM_MCA_CUSTOMBEHAVIOUR_TARGETS}
		)

		add_definitions(${LLVM_MCA_MACROS_TO_DEFINE})

llvm/tools/llvm-mca/Views/DispatchStatistics.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	void DispatchStatistics::printDispatchStalls(raw_ostream &OS) const {
SS << "\nSCHEDQ - Scheduler full: ";		SS << "\nSCHEDQ - Scheduler full: ";
printStalls(SS, HWStalls[HWStallEvent::SchedulerQueueFull], NumCycles);		printStalls(SS, HWStalls[HWStallEvent::SchedulerQueueFull], NumCycles);
SS << "\nLQ - Load queue full: ";		SS << "\nLQ - Load queue full: ";
printStalls(SS, HWStalls[HWStallEvent::LoadQueueFull], NumCycles);		printStalls(SS, HWStalls[HWStallEvent::LoadQueueFull], NumCycles);
SS << "\nSQ - Store queue full: ";		SS << "\nSQ - Store queue full: ";
printStalls(SS, HWStalls[HWStallEvent::StoreQueueFull], NumCycles);		printStalls(SS, HWStalls[HWStallEvent::StoreQueueFull], NumCycles);
SS << "\nGROUP - Static restrictions on the dispatch group: ";		SS << "\nGROUP - Static restrictions on the dispatch group: ";
printStalls(SS, HWStalls[HWStallEvent::DispatchGroupStall], NumCycles);		printStalls(SS, HWStalls[HWStallEvent::DispatchGroupStall], NumCycles);
		SS << "\nUSH - Uncategorised Structural Hazard: ";
		andreadbUnsubmitted Not Done Reply Inline Actions Please just use something like "Uncategorised Structural Hazards stalls" (or something like that). andreadb: Please just use something like "Uncategorised Structural Hazards stalls" (or something like…
		printStalls(SS, HWStalls[HWStallEvent::CustomBehaviourStall], NumCycles);
SS << '\n';		SS << '\n';
SS.flush();		SS.flush();
OS << Buffer;		OS << Buffer;
}		}

} // namespace mca		} // namespace mca
} // namespace llvm		} // namespace llvm

llvm/tools/llvm-mca/lib/AMDGPU/AMDGPUCustomBehaviour.h

This file was added.

				//===------------------- AMDGPUCustomBehaviour.h -----------------C++ - -===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				///
				/// This file defines the AMDGPUCustomBehaviour class which inherits from
				/// CustomBehaviour.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TOOLS_LLVM_MCA_LIB_AMDGPU_AMDGPUCUSTOMBEHAVIOUR_H
				#define LLVM_TOOLS_LLVM_MCA_LIB_AMDGPU_AMDGPUCUSTOMBEHAVIOUR_H

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/MCA/CustomBehaviour.h"
				#include "llvm/Support/TargetParser.h"

				namespace llvm {
				namespace mca {

				class AMDGPUInstrPostProcess : public InstrPostProcess {
				public:
				AMDGPUInstrPostProcess(const MCSubtargetInfo &STI, const MCInstrInfo &MCII)
				: InstrPostProcess(STI, MCII) {}

				~AMDGPUInstrPostProcess() {}

				void postProcessInstruction(std::unique_ptr<Instruction> &Inst,
				const MCInst &MCI) override {}
				};

				class AMDGPUCustomBehaviour : public CustomBehaviour {
				public:
				AMDGPUCustomBehaviour(const MCSubtargetInfo &STI, const SourceMgr &SrcMgr,
				const MCInstrInfo &MCII);

				~AMDGPUCustomBehaviour() {}

				/// This method is used to determine if an instruction
				/// should be allowed to be dispatched. The return value is
				/// how many cycles until the instruction can be dispatched.
				/// This method is called after MCA has already checked for
				/// register and hardware dependencies so this method should only
				/// implement custom behaviour and dependencies that are not picked up
				/// by MCA naturally.
				unsigned checkCustomHazard(ArrayRef<InstRef> IssuedInst,
				const InstRef &IR) override;
				};

				} // namespace mca
				} // namespace llvm

				#endif /* LLVM_TOOLS_LLVM_MCA_LIB_AMDGPU_AMDGPUCUSTOMBEHAVIOUR_H */

llvm/tools/llvm-mca/lib/AMDGPU/AMDGPUCustomBehaviour.cpp

This file was added.

				//===------------------ AMDGPUCustomBehaviour.cpp ----------------C++ - -===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				///
				/// This file implements methods from the AMDGPUCustomBehaviour class.
				///
				//===----------------------------------------------------------------------===//

				#include "AMDGPUCustomBehaviour.h"
				#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
				#include "SIInstrInfo.h"
				#include "llvm/Support/WithColor.h"

				namespace llvm {
				namespace mca {

				AMDGPUCustomBehaviour::AMDGPUCustomBehaviour(const MCSubtargetInfo &STI,
				const SourceMgr &SrcMgr,
				const MCInstrInfo &MCII)
				: CustomBehaviour(STI, SrcMgr, MCII) {}

				unsigned AMDGPUCustomBehaviour::checkCustomHazard(ArrayRef<InstRef> IssuedInst,
				const InstRef &IR) {
				return 0;
				}

				} // namespace mca
				} // namespace llvm
				andreadbUnsubmitted Not Done Reply Inline Actions I am not familiar with those pseudos. However, I would be careful about asserting here. One day, people might want to drive an mca pipeline from a backend pass, and even after regalloc there might still be MachineInstr pseudos around. Just merge these with the "default" case for now (and maybe add a small code comment). andreadb: I am not familiar with those pseudos. However, I would be careful about asserting here. One day…
				andreadbUnsubmitted Not Done Reply Inline Actions I don't think it is a good idea to generate a warning here. Instead, just raise a bug for it once this patch is committed in main. For now, please convert it into a TODO comment. andreadb: I don't think it is a good idea to generate a warning here. Instead, just raise a bug for it…

llvm/tools/llvm-mca/lib/AMDGPU/CMakeLists.txt

This file was added.

				include_directories(
				${LLVM_MAIN_SRC_DIR}/lib/Target/AMDGPU
				${LLVM_BINARY_DIR}/lib/Target/AMDGPU
				)

				set(LLVM_LINK_COMPONENTS
				AMDGPU
				Core
				Support
				)

				add_llvm_library(LLVMMCACustomBehaviourAMDGPU
				AMDGPUCustomBehaviour.cpp

				DEPENDS
				AMDGPUCommonTableGen
				)

llvm/tools/llvm-mca/lib/CMakeLists.txt

This file was added.

				set(TARGETS_TO_APPEND "")
				set(MACROS_TO_APPEND "")

				if (LLVM_TARGETS_TO_BUILD MATCHES "AMDGPU")
				add_subdirectory(AMDGPU)
				list(APPEND TARGETS_TO_APPEND LLVMMCACustomBehaviourAMDGPU)
				list(APPEND MACROS_TO_APPEND -DHAS_AMDGPU)
				endif()

				set(LLVM_MCA_CUSTOMBEHAVIOUR_TARGETS ${TARGETS_TO_APPEND} PARENT_SCOPE)
				set(LLVM_MCA_MACROS_TO_DEFINE ${MACROS_TO_APPEND} PARENT_SCOPE)

llvm/tools/llvm-mca/llvm-mca.cpp

Show All 26 Lines
#include "Views/DispatchStatistics.h"		#include "Views/DispatchStatistics.h"
#include "Views/InstructionInfoView.h"		#include "Views/InstructionInfoView.h"
#include "Views/RegisterFileStatistics.h"		#include "Views/RegisterFileStatistics.h"
#include "Views/ResourcePressureView.h"		#include "Views/ResourcePressureView.h"
#include "Views/RetireControlUnitStatistics.h"		#include "Views/RetireControlUnitStatistics.h"
#include "Views/SchedulerStatistics.h"		#include "Views/SchedulerStatistics.h"
#include "Views/SummaryView.h"		#include "Views/SummaryView.h"
#include "Views/TimelineView.h"		#include "Views/TimelineView.h"
		#ifdef HAS_AMDGPU
		#include "lib/AMDGPU/AMDGPUCustomBehaviour.h"
		#endif
#include "llvm/MC/MCAsmBackend.h"		#include "llvm/MC/MCAsmBackend.h"
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCCodeEmitter.h"		#include "llvm/MC/MCCodeEmitter.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCObjectFileInfo.h"		#include "llvm/MC/MCObjectFileInfo.h"
#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/MCTargetOptionsCommandFlags.h"		#include "llvm/MC/MCTargetOptionsCommandFlags.h"
#include "llvm/MCA/CodeEmitter.h"		#include "llvm/MCA/CodeEmitter.h"
#include "llvm/MCA/Context.h"		#include "llvm/MCA/Context.h"
		#include "llvm/MCA/CustomBehaviour.h"
#include "llvm/MCA/InstrBuilder.h"		#include "llvm/MCA/InstrBuilder.h"
#include "llvm/MCA/Pipeline.h"		#include "llvm/MCA/Pipeline.h"
#include "llvm/MCA/Stages/EntryStage.h"		#include "llvm/MCA/Stages/EntryStage.h"
#include "llvm/MCA/Stages/InstructionTables.h"		#include "llvm/MCA/Stages/InstructionTables.h"
#include "llvm/MCA/Support.h"		#include "llvm/MCA/Support.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/ErrorOr.h"		#include "llvm/Support/ErrorOr.h"
▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	static cl::opt<bool> EnableBottleneckAnalysis(
cl::desc("Enable bottleneck analysis (disabled by default)"),		cl::desc("Enable bottleneck analysis (disabled by default)"),
cl::cat(ViewOptions), cl::init(false));		cl::cat(ViewOptions), cl::init(false));

static cl::opt<bool> ShowEncoding(		static cl::opt<bool> ShowEncoding(
"show-encoding",		"show-encoding",
cl::desc("Print encoding information in the instruction info view"),		cl::desc("Print encoding information in the instruction info view"),
cl::cat(ViewOptions), cl::init(false));		cl::cat(ViewOptions), cl::init(false));

		static cl::opt<bool> DisableCustomBehaviour(
		"disable-cb",
		cl::desc(
		"Disable custom behaviour (use the default class which does nothing)."),
		cl::cat(ViewOptions), cl::init(false));

namespace {		namespace {

const Target getTarget(const char ProgName) {		const Target getTarget(const char ProgName) {
if (TripleName.empty())		if (TripleName.empty())
TripleName = Triple::normalize(sys::getDefaultTargetTriple());		TripleName = Triple::normalize(sys::getDefaultTargetTriple());
Triple TheTriple(TripleName);		Triple TheTriple(TripleName);

// Get the target specific parser.		// Get the target specific parser.
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	const cl::opt<bool> &Default =
: EnableAllViews;		: EnableAllViews;
processOptionImpl(PrintRegisterFileStats, Default);		processOptionImpl(PrintRegisterFileStats, Default);
processOptionImpl(PrintDispatchStats, Default);		processOptionImpl(PrintDispatchStats, Default);
processOptionImpl(PrintSchedulerStats, Default);		processOptionImpl(PrintSchedulerStats, Default);
if (IsOutOfOrder)		if (IsOutOfOrder)
processOptionImpl(PrintRetireStats, Default);		processOptionImpl(PrintRetireStats, Default);
}		}

		std::unique_ptr<mca::InstrPostProcess>
		createInstrPostProcess(const Triple &TheTriple, const MCSubtargetInfo &STI,
		const MCInstrInfo &MCII) {
		// Might be a good idea to have a separate flag so that InstrPostProcess
		// can be used with or without CustomBehaviour
		if (DisableCustomBehaviour)
		return std::make_unique<mca::InstrPostProcess>(STI, MCII);
		#ifdef HAS_AMDGPU
		if (TheTriple.isAMDGPU())
		return std::make_unique<mca::AMDGPUInstrPostProcess>(STI, MCII);
		#endif
		return std::make_unique<mca::InstrPostProcess>(STI, MCII);
		}

		std::unique_ptr<mca::CustomBehaviour>
		createCustomBehaviour(const Triple &TheTriple, const MCSubtargetInfo &STI,
		const mca::SourceMgr &SrcMgr, const MCInstrInfo &MCII) {
		// Build the appropriate CustomBehaviour object for the current target.
		// The CustomBehaviour class should never depend on the source code,
		// but it can depend on the list of mca::Instruction and any classes
		// that can be built using just the target info. If you need extra
		// information from the source code or the list of MCInst, consider
		// adding that information to the mca::Instruction class and setting
		// it during InstrBuilder::createInstruction().
		if (DisableCustomBehaviour)
		return std::make_unique<mca::CustomBehaviour>(STI, SrcMgr, MCII);
		#ifdef HAS_AMDGPU
		if (TheTriple.isAMDGPU())
		return std::make_unique<mca::AMDGPUCustomBehaviour>(STI, SrcMgr, MCII);
		#endif
		return std::make_unique<mca::CustomBehaviour>(STI, SrcMgr, MCII);
		}

// Returns true on success.		// Returns true on success.
static bool runPipeline(mca::Pipeline &P) {		static bool runPipeline(mca::Pipeline &P) {
// Handle pipeline errors here.		// Handle pipeline errors here.
Expected<unsigned> Cycles = P.run();		Expected<unsigned> Cycles = P.run();
if (!Cycles) {		if (!Cycles) {
WithColor::error() << toString(Cycles.takeError());		WithColor::error() << toString(Cycles.takeError());
return false;		return false;
}		}
▲ Show 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	if (Region->startLoc().isValid() \|\| Region->endLoc().isValid()) {
if (!Desc.empty())		if (!Desc.empty())
TOF->os() << " - " << Desc;		TOF->os() << " - " << Desc;
TOF->os() << "\n\n";		TOF->os() << "\n\n";
}		}

// Lower the MCInst sequence into an mca::Instruction sequence.		// Lower the MCInst sequence into an mca::Instruction sequence.
ArrayRef<MCInst> Insts = Region->getInstructions();		ArrayRef<MCInst> Insts = Region->getInstructions();
mca::CodeEmitter CE(STI, MAB, *MCE, Insts);		mca::CodeEmitter CE(STI, MAB, *MCE, Insts);
		std::unique_ptr<mca::InstrPostProcess> IPP =
		createInstrPostProcess(TheTriple, STI, MCII);
std::vector<std::unique_ptr<mca::Instruction>> LoweredSequence;		std::vector<std::unique_ptr<mca::Instruction>> LoweredSequence;
for (const MCInst &MCI : Insts) {		for (const MCInst &MCI : Insts) {
Expected<std::unique_ptr<mca::Instruction>> Inst =		Expected<std::unique_ptr<mca::Instruction>> Inst =
IB.createInstruction(MCI);		IB.createInstruction(MCI);
if (!Inst) {		if (!Inst) {
if (auto NewE = handleErrors(		if (auto NewE = handleErrors(
Inst.takeError(),		Inst.takeError(),
[&IP, &STI](const mca::InstructionError<MCInst> &IE) {		[&IP, &STI](const mca::InstructionError<MCInst> &IE) {
std::string InstructionStr;		std::string InstructionStr;
raw_string_ostream SS(InstructionStr);		raw_string_ostream SS(InstructionStr);
WithColor::error() << IE.Message << '\n';		WithColor::error() << IE.Message << '\n';
IP->printInst(&IE.Inst, 0, "", *STI, SS);		IP->printInst(&IE.Inst, 0, "", *STI, SS);
SS.flush();		SS.flush();
WithColor::note()		WithColor::note()
<< "instruction: " << InstructionStr << '\n';		<< "instruction: " << InstructionStr << '\n';
})) {		})) {
// Default case.		// Default case.
WithColor::error() << toString(std::move(NewE));		WithColor::error() << toString(std::move(NewE));
}		}
return 1;		return 1;
}		}

		IPP->postProcessInstruction(Inst.get(), MCI);

LoweredSequence.emplace_back(std::move(Inst.get()));		LoweredSequence.emplace_back(std::move(Inst.get()));
}		}

mca::SourceMgr S(LoweredSequence, PrintInstructionTables ? 1 : Iterations);		mca::SourceMgr S(LoweredSequence, PrintInstructionTables ? 1 : Iterations);

if (PrintInstructionTables) {		if (PrintInstructionTables) {
// Create a pipeline, stages, and a printer.		// Create a pipeline, stages, and a printer.
auto P = std::make_unique<mca::Pipeline>();		auto P = std::make_unique<mca::Pipeline>();
Show All 11 Lines	if (PrintInstructionTables) {

if (!runPipeline(*P))		if (!runPipeline(*P))
return 1;		return 1;

Printer.printReport(TOF->os());		Printer.printReport(TOF->os());
continue;		continue;
}		}

		// Create the CustomBehaviour object for enforcing Target Specific
		// behaviours and dependencies that aren't expressed well enough
		// in the tablegen. CB cannot depend on the list of MCInst or
		// the source code (but it can depend on the list of
		// mca::Instruction or any objects that can be reconstructed
		// from the target information).
		std::unique_ptr<mca::CustomBehaviour> CB =
		createCustomBehaviour(TheTriple, STI, S, MCII);

// Create a basic pipeline simulating an out-of-order backend.		// Create a basic pipeline simulating an out-of-order backend.
auto P = MCA.createDefaultPipeline(PO, S);		auto P = MCA.createDefaultPipeline(PO, S, *CB);
mca::PipelinePrinter Printer(*P, PrintJson ? mca::View::OK_JSON		mca::PipelinePrinter Printer(*P, PrintJson ? mca::View::OK_JSON
: mca::View::OK_READABLE);		: mca::View::OK_READABLE);

// When we output JSON, we add a view that contains the instructions		// When we output JSON, we add a view that contains the instructions
// and CPU resource information.		// and CPU resource information.
if (PrintJson)		if (PrintJson)
Printer.addView(		Printer.addView(
std::make_unique<mca::InstructionView>(STI, IP, Insts, MCPU));		std::make_unique<mca::InstructionView>(STI, IP, Insts, MCPU));
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MCA] Adding the CustomBehaviour class to llvm-mcaClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 352374

llvm/docs/CommandGuide/llvm-mca.rst

llvm/include/llvm/MCA/Context.h

llvm/include/llvm/MCA/CustomBehaviour.h

llvm/include/llvm/MCA/HWEventListener.h

llvm/include/llvm/MCA/Instruction.h

llvm/include/llvm/MCA/Stages/InOrderIssueStage.h

llvm/lib/MCA/CMakeLists.txt

llvm/lib/MCA/Context.cpp

llvm/lib/MCA/CustomBehaviour.cpp

llvm/lib/MCA/InstrBuilder.cpp

llvm/lib/MCA/Stages/InOrderIssueStage.cpp

llvm/tools/llvm-mca/CMakeLists.txt

llvm/tools/llvm-mca/Views/DispatchStatistics.cpp

llvm/tools/llvm-mca/lib/AMDGPU/AMDGPUCustomBehaviour.h

llvm/tools/llvm-mca/lib/AMDGPU/AMDGPUCustomBehaviour.cpp

llvm/tools/llvm-mca/lib/AMDGPU/CMakeLists.txt

llvm/tools/llvm-mca/lib/CMakeLists.txt

llvm/tools/llvm-mca/llvm-mca.cpp

[MCA] Adding the CustomBehaviour class to llvm-mca
ClosedPublic