This is a really simple bug fix, but I mainly wanted to post a review because I think this could motivate taking the duplicated code and turning it into a function so that this kind of thing is less likely to happen in the future.
Essentially, when an instruction finishes executing, there are a few functions that get called. You can see this in the InOrderIssueStage::updateIssuedInst() function (look at the second half of the function after the if (!IS.isExecuted()) block) :
void InOrderIssueStage::updateIssuedInst() { // Update other instructions. Executed instructions will be retired during the // next cycle. unsigned NumExecuted = 0; for (auto I = IssuedInst.begin(), E = IssuedInst.end(); I != (E - NumExecuted);) { InstRef &IR = *I; Instruction &IS = *IR.getInstruction(); IS.cycleEvent(); if (!IS.isExecuted()) { LLVM_DEBUG(dbgs() << "[N] Instruction #" << IR << " is still executing\n"); ++I; continue; } PRF.onInstructionExecuted(&IS); LSU.onInstructionExecuted(IR); notifyInstructionExecuted(IR); ++NumExecuted; retireInstruction(*I); std::iter_swap(I, E - NumExecuted); } if (NumExecuted) IssuedInst.resize(IssuedInst.size() - NumExecuted); }
Since this function gets called at the beginning of a cycle for each instruction that is still executing from previous cycles, it wasn't properly handling 0 latency instructions that should start and finish within the same cycle. To fix this issue, I added the following block to the InOrderIssueStage::tryIssue() function:
// If the instruction has a latency of 0, we need to handle // the execution and retirement now. if (IS.isExecuted()) { PRF.onInstructionExecuted(&IS); notifyEvent<HWInstructionEvent>( HWInstructionEvent(HWInstructionEvent::Executed, IR)); LLVM_DEBUG(dbgs() << "[E] Instruction #" << IR << " is executed\n"); retireInstruction(IR); return llvm::ErrorSuccess(); }
When the LSUnit was added to the in-order pipeline a few weeks ago, the LSU.onInstructionExecuted(IR); line was added to the InOrderIssueStage::updateIssuedInst() function. However, it was not added to the 0 latency block above.
This caused mca to run forever for one of my assembly files.
The fix is really simple and it's just adding that line to the 0 latency block. However, it might be a good idea to turn this duplicated code into a function so that this is less likely to occur in the future.