This is an archive of the discontinued LLVM Phabricator instance.

[MCA] Fixing bug causing LSUnit to not be notified when a 0 latency instruction finishes executing.
ClosedPublic

Authored by holland11 on Aug 19 2021, 8:31 PM.

Details

Summary

This is a really simple bug fix, but I mainly wanted to post a review because I think this could motivate taking the duplicated code and turning it into a function so that this kind of thing is less likely to happen in the future.

Essentially, when an instruction finishes executing, there are a few functions that get called. You can see this in the InOrderIssueStage::updateIssuedInst() function (look at the second half of the function after the if (!IS.isExecuted()) block) :

void InOrderIssueStage::updateIssuedInst() {
  // Update other instructions. Executed instructions will be retired during the
  // next cycle.
  unsigned NumExecuted = 0;
  for (auto I = IssuedInst.begin(), E = IssuedInst.end();
       I != (E - NumExecuted);) {
    InstRef &IR = *I;
    Instruction &IS = *IR.getInstruction();

    IS.cycleEvent();
    if (!IS.isExecuted()) {
      LLVM_DEBUG(dbgs() << "[N] Instruction #" << IR
                        << " is still executing\n");
      ++I;
      continue;
    }

    PRF.onInstructionExecuted(&IS);
    LSU.onInstructionExecuted(IR);
    notifyInstructionExecuted(IR);
    ++NumExecuted;

    retireInstruction(*I);

    std::iter_swap(I, E - NumExecuted);
  }

  if (NumExecuted)
    IssuedInst.resize(IssuedInst.size() - NumExecuted);
}

Since this function gets called at the beginning of a cycle for each instruction that is still executing from previous cycles, it wasn't properly handling 0 latency instructions that should start and finish within the same cycle. To fix this issue, I added the following block to the InOrderIssueStage::tryIssue() function:

// If the instruction has a latency of 0, we need to handle
// the execution and retirement now.
if (IS.isExecuted()) {
  PRF.onInstructionExecuted(&IS);

  notifyEvent<HWInstructionEvent>(
      HWInstructionEvent(HWInstructionEvent::Executed, IR));
  LLVM_DEBUG(dbgs() << "[E] Instruction #" << IR << " is executed\n");

  retireInstruction(IR);
  return llvm::ErrorSuccess();
}

When the LSUnit was added to the in-order pipeline a few weeks ago, the LSU.onInstructionExecuted(IR); line was added to the InOrderIssueStage::updateIssuedInst() function. However, it was not added to the 0 latency block above.

This caused mca to run forever for one of my assembly files.

The fix is really simple and it's just adding that line to the 0 latency block. However, it might be a good idea to turn this duplicated code into a function so that this is less likely to occur in the future.

Diff Detail

Event Timeline

holland11 created this revision.Aug 19 2021, 8:31 PM
holland11 requested review of this revision.Aug 19 2021, 8:31 PM
holland11 edited the summary of this revision. (Show Details)
andreadb accepted this revision.EditedAug 20 2021, 3:30 AM

LGTM

Presumably this bug was found while experimenting with your downstream target.
That would explain why you don't provide a test case for it. If my assumption is correct, then it is fine to commit this patch in this form. The change is clearly good.

That being said, I am not entirely sure how you end up in a situation like this in general. It doesn't feel right for a load/store like instruction to have zero latency.
Was the instruction somehow eliminated? I wouldn't expect this to happen for memory operations. But then, I don't know your target.

This revision is now accepted and ready to land.Aug 20 2021, 3:30 AM

Thank you!
Can you please also add a LIT test that triggered this bug?

Thank you!
Can you please also add a LIT test that triggered this bug?

I believe that the reason why the test wasn't added is because it was only reproducible on his downstream target. I may be wrong though.

holland11 added a comment.EditedAug 20 2021, 10:27 AM

Yeah, it's on a downstream target and a custom scheduling model where I've made some changes to be more accurate within llvm-mca. I don't think that it would be likely to find a proper instruction within an upstream target to recreate this bug. But I do still think that it is something worth supporting.

Out of curiosity, how would creating a lit test work for this bug? The bug itself (under the input file and scheduling model that I have with my downstream target) causes mca to never terminate. Does lit have a default timeout so that we could just give the test an input that would cause this infinite run-time and then make sure that mca actually terminates without hitting the timeout?