Within the in-order pipeline, when an instruction gets issued and dispatched, it gets placed in the InOrderIssueStage::IssuedInst list. Then, at the beginning of every cycle, this list is iterated over and each instruction has their CyclesLeft attribute decremented. If any instructions have their CyclesLeft attribute become 0, then functions are invoked to handle everything associated with an instruction finishing.
At this time, an instruction with 0 latency is handled the same as any other instruction. Once it's dispatched, it gets placed within the IssuedInst list. This is a problem because that list doesn't get iterated over until the following cycle so the instruction will be retired 1 cycle later than it should. Before that even happens though, we'll fail an assert and the program will terminate.
My tablegen investigative skills aren't very strong and I wasn't able to find an existing 0 latency instruction within any of the in-order upstream targets, but if you want to test it for yourself, you can follow these steps:
Within llvm/lib/Target/AMDGPU/SISchedule.td, change (around line 252)
def : HWWriteRes<WriteIntMul, [HWVALU, HWRC], 8>;
To
def : HWWriteRes<WriteIntMul, [], 0>;
Rebuild llvm-mca then run llvm-mca with the following parameters
-mtriple=amdgcn -mcpu=gfx1010
On the following source input
v_mul_lo_u32 v5, v1, v4
You should then get this error:
Assertion failed: (isExecuting() && "Instruction not in-flight?"), function cycleEvent, file /llvm/lib/MCA/Instruction.cpp, line 235.
I believe that the in-order pipeline was intended to be able to handle 0 latency instructions, but they're pretty uncommon so it was probably never tested. If we look at Instruction::execute()
void Instruction::execute(unsigned IID) { assert(Stage == IS_READY); Stage = IS_EXECUTING; // Set the cycles left before the write-back stage. CyclesLeft = getLatency(); for (WriteState &WS : getDefs()) WS.onInstructionIssued(IID); // Transition to the "executed" stage if this is a zero-latency instruction. if (!CyclesLeft) Stage = IS_EXECUTED; }
We can see a comment near the bottom that demonstrates this. This function is invoked from within InOrderIssueStage::tryIssue() and the code that I added comes a few lines after Instruction::execute() is run. Since Instruction::execute() will set 0 latency instructions to IS_EXECUTED, my new block checks for this state and invokes the appropriate 'retirement' functions (and then returns so that we don't end up adding the instruction to the IssuedInst list).
After making this fix, I noticed that the TimelineView was not working properly if the first instruction in the source code was a 0 latency instruction. That is the motivation for the change I made to TimelineView.cpp. I'm not quite sure why that if-statement -> early exit block is in that loop. I tried using git blame and git log to find any clues, but couldn't figure it out. I then asked Andrea and he wasn't sure either so I think it should be safe to remove it.