This is a really simple bug fix, but I mainly wanted to post a review because I think this could motivate taking the duplicated code and turning it into a function so that this kind of thing is less likely to happen in the future.
Essentially, when an instruction finishes executing, there are a few functions that get called. You can see this in the InOrderIssueStage::updateIssuedInst() function (look at the second half of the function after the if (!IS.isExecuted()) block) :
void InOrderIssueStage::updateIssuedInst() {
// Update other instructions. Executed instructions will be retired during the
// next cycle.
unsigned NumExecuted = 0;
for (auto I = IssuedInst.begin(), E = IssuedInst.end();
I != (E - NumExecuted);) {
InstRef &IR = *I;
Instruction &IS = *IR.getInstruction();
IS.cycleEvent();
if (!IS.isExecuted()) {
LLVM_DEBUG(dbgs() << "[N] Instruction #" << IR
<< " is still executing\n");
++I;
continue;
}
PRF.onInstructionExecuted(&IS);
LSU.onInstructionExecuted(IR);
notifyInstructionExecuted(IR);
++NumExecuted;
retireInstruction(*I);
std::iter_swap(I, E - NumExecuted);
}
if (NumExecuted)
IssuedInst.resize(IssuedInst.size() - NumExecuted);
}Since this function gets called at the beginning of a cycle for each instruction that is still executing from previous cycles, it wasn't properly handling 0 latency instructions that should start and finish within the same cycle. To fix this issue, I added the following block to the InOrderIssueStage::tryIssue() function:
// If the instruction has a latency of 0, we need to handle
// the execution and retirement now.
if (IS.isExecuted()) {
PRF.onInstructionExecuted(&IS);
notifyEvent<HWInstructionEvent>(
HWInstructionEvent(HWInstructionEvent::Executed, IR));
LLVM_DEBUG(dbgs() << "[E] Instruction #" << IR << " is executed\n");
retireInstruction(IR);
return llvm::ErrorSuccess();
}When the LSUnit was added to the in-order pipeline a few weeks ago, the LSU.onInstructionExecuted(IR); line was added to the InOrderIssueStage::updateIssuedInst() function. However, it was not added to the 0 latency block above.
This caused mca to run forever for one of my assembly files.
The fix is really simple and it's just adding that line to the 0 latency block. However, it might be a good idea to turn this duplicated code into a function so that this is less likely to occur in the future.