This is an archive of the discontinued LLVM Phabricator instance.

[MCA] Disable RCU for InOrderIssueStage
ClosedPublic

Authored by asavonic on Mar 15 2021, 4:52 AM.

Details

Summary

This is a follow-up for:
D98604 [MCA] Ensure that writes occur in-order

When instructions are aligned by the order of writes, they retire
in-order naturally. There is no need for an RCU, so it is disabled.
RCU statistic is adjusted to handle this.

Diff Detail

Event Timeline

asavonic created this revision.Mar 15 2021, 4:52 AM
asavonic requested review of this revision.Mar 15 2021, 4:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2021, 4:52 AM
foad added a subscriber: foad.Mar 15 2021, 4:59 AM
foad added a comment.Mar 15 2021, 5:01 AM

Can you remove some of the code added to RetireControlUnit.cpp by D94928 and D98356, if it no longer has to cope with in-order CPUs?

andreadb requested changes to this revision.Mar 15 2021, 5:37 AM

There is nothing wrong with using a dummy RCU for the InOrderIssue stage.

The RCUtokenID is always stored in the instruction at dispatch time. You don't need any changes to pre-existing dispatch stage APIs (and listeners).

Note that your code still does this:

unsigned RCUTokenID = RetireControlUnit::UnhandledTokenID;
IS.dispatch(RCUTokenID);

Your check on TotalROBEntries is fine. However, RetireControlUnitStatistics could (and probably should) be disabled for in-order processors (since it doesn't really makes sense).
Excluding those API changes which I don't particularly like, the other changes look mostly harmless and are basically a NFC restructuring.

I strongly suggest that we focus on fixing D98604 first.
Once D98604 is fixed, then you can update this patch and discuss the changes to RetireControlUnitStatistics (and whether those make any sense in practice for in-order processors). I don't think that we should touch the dispatch stage, nor the event associated to it.

This revision now requires changes to proceed.Mar 15 2021, 5:37 AM

There is nothing wrong with using a dummy RCU for the InOrderIssue stage.

The RCUtokenID is always stored in the instruction at dispatch time. You don't need any changes to pre-existing dispatch stage APIs (and listeners).

Thanks, I missed that.

Note that your code still does this:

unsigned RCUTokenID = RetireControlUnit::UnhandledTokenID;
IS.dispatch(RCUTokenID);

Your check on TotalROBEntries is fine. However, RetireControlUnitStatistics could (and probably should) be disabled for in-order processors (since it doesn't really makes sense).

I made RCU optional and disabled the corresponding statistic for in-order processors.

Can you remove some of the code added to RetireControlUnit.cpp by D94928 and D98356, if it no longer has to cope with in-order CPUs?

Done. RCU is initialized and used only for out-of-order processors now.

asavonic updated this revision to Diff 331652.Mar 18 2021, 12:19 PM

RCU is now optional for RetireStage
RCU statistic is disabled for in-order processors.

I personally don't like the idea of making the RCU optional in the RetireStage. The is no point in having a RetireStage if there is no RCU.

For in-order processors, the RetireStage is not really needed. Instructions are always issued in-order, and for some instructions, we allow out-of-order commit.
For the purpose of MCA, we don't require an RCU; the retire stage would simply propagates "instruction retired" events. Having a stage just for that purpouse is a bit silly.

If you want to fully get rid of the RCU, then there is a better way: just get rid of the RetireStage for in-order processors, and generate "instruction retired" events directly from the InOrderIssueStage.
For in-order, the "instruction executed" and "instruction retired" events always match (both events are generated during the same cycle).

I suggest the following changes:

  1. Remove the RetireStage and always notify that an instruction is retired after it has finished execution.
  2. (As a consequence of 1.) You don't need any changes to the RetireStage. The RCU should not be optional, so pass it always by reference to the constructor.
  3. Please revert the changes to the RetireStatistics. We don't use those stats for in-order processors, so there is no reason why that file should be touched.
  4. (As a consequence of 1. and 2.) Do not print character R in the TimelineView if E and R are the same cycle. This means ignoring a loop at around line 244.

Point 4. means that we no longer get the R character in the timeline. In my opinion this is more accurate, since focus is on the write-back, and there is no delayed commit. We could add a note about this in the section which describes the timeline view.

Minor nit:
Please rename function notifyInstructionExecute (in InOrderIssueStage.cpp) to notifyInstructionIssue.

asavonic updated this revision to Diff 331942.Mar 19 2021, 11:08 AM

Removed RetireStage from the in-order pipeline.
Retire and execute events are emitted in the same cycle.

andreadb accepted this revision.Mar 19 2021, 12:24 PM

LGTM.

Thanks!

llvm/test/tools/llvm-mca/ARM/m7-negative-readadvance.s
63

I am assuming that this change is correct.
Not sure if we have already enough load/store tests. If not, then it may be worthy to add some tests (maybe as a follow-up. But only if you think that it is required).

This revision is now accepted and ready to land.Mar 19 2021, 12:24 PM
asavonic added inline comments.Mar 22 2021, 8:50 AM
llvm/test/tools/llvm-mca/ARM/m7-negative-readadvance.s
63

No, I think the change is incorrect. Thank you very much for spotting this! VLDR is issued 1 cycle earlier, and it does not obey to -1 ReadAdvance anymore.

This happens because we check ReadAdvance only for current WriteState objects. Since we now retire instructions in the same cycle as they're executed, the corresponding WriteState objects are also invalidated in the same cycle. A subsequent instruction cannot check for a negative ReadAdvance because the WriteState objects are invalidated at this point.

I'm thinking on how to fix this. Can we free registers when an instruction is retired, but keep WriteState objects for N cycles more?

andreadb added inline comments.Mar 22 2021, 9:36 AM
llvm/test/tools/llvm-mca/ARM/m7-negative-readadvance.s
63

mm.. this is a bit of annoying to model as it would require a structural change.
I think that you can solve this problem by introducing a counter in the register file which acts as a timer. At the beginning of each cycle, that counter is incremented by one.
When a write is executed, we could store the actual value of that counter into one of its fields. So that we know the relative cycle at which it has finished execution. So even if the object is invalidated, we still retain the "last write cycle" information.

Not sure if it makes sense...

Hey Andrew,

could you please rebase your patch? I have committed a fix for the issue with negative read-advance cycles.
Now you need to notify the PRF every time an instruction is executed. That way, the PRF will be able to correctly keep track of the "write executed" cycle.
I don't think that there are other important changes to make.

Cheers,
-Andrea

asavonic updated this revision to Diff 332757.Mar 23 2021, 12:43 PM
  • Enabled negative readadvance tracking.
  • Reverted unnecessary changes in RetireStage.
andreadb accepted this revision.Mar 23 2021, 1:03 PM

LGTM. Thanks Andrew!

PR49524 can now be resolved either as invalid or fixed (mentioning that we no longer use a RetireStage for in-order processors).
Since the issue with structural hazards has been fixed, you can also resolve PR41796.

-Andrea

This revision was landed with ongoing or failed builds.Mar 24 2021, 3:56 AM
This revision was automatically updated to reflect the committed changes.