Page MenuHomePhabricator

[MachineScheduler] Update available queue on the first mop of a new cycle
Needs ReviewPublic

Authored by dmgreen on Mar 27 2020, 1:46 AM.

Details

Summary

If a resource can be held for multiple cycles in the schedule model, then an instruction can be placed into the available queue, another instruction can be scheduled, but the first will not be take back out if the two instructions hazard. To fix this, make sure that we update the Available queue even on the first MOp of a cycle, pushing available instructions back into the ready queue if they now conflict.

This happens with some downstream schedules we have around MVE instruction scheduling where we use ResourceCycles=[2] to show the instruction executing over two beats. I'm not sure about the test changes here, but this improves scheduling decisions on our in-order cpu's.

Diff Detail

Event Timeline

dmgreen created this revision.Mar 27 2020, 1:46 AM
dmgreen marked an inline comment as done.Mar 27 2020, 1:49 AM
dmgreen added inline comments.
llvm/test/CodeGen/AArch64/misched-fusion-aes.ll
82

This one I do know about. The A53 schedule holds the load pipelines for 3 resource cycles, so now decides to push one of the loads down later to not hazard with the other loads. This is similar to the aes_load_store case below I believe.

Can we guard it with getMicroOpBufferSize() == 0 to avoid the impact of the out of order CPU as you motivate this patch with inorder cpu ?

jsji added a subscriber: jhibbits.Mar 27 2020, 7:31 AM

Looks like this change has more impact to pre-P9 models, especially ppc32/ppc64, which we were using itineraries . (And I know there might be bugs in those itineraries)
Although some changes doesn't look good at first glance, we should dig into itineraries of each model to see whether this is problem in itineraries or due to this change.

@steven.zhang Can you help to look into some of them , especially for P6-P8 tests. Thanks.
@jhibbits FYI, you may want to check whether those changes in G3/SPE are good.

llvm/test/CodeGen/PowerPC/fp128-bitcast-after-operation.ll
31

This looks like a regression, rlwinm is using HI0, and lwz should be 3-4 cycle.
But we may need to check the itineraries to see whether this is due to resource hazard.

73

This looks like unexpected? we are moving something across BARRIER now?

llvm/test/CodeGen/PowerPC/vec_splat.ll
24

There doesn't look great -- all stfs are waiting fadds now.

dmgreen marked 2 inline comments as done.Mar 30 2020, 3:39 AM

Looks like this change has more impact to pre-P9 models, especially ppc32/ppc64, which we were using itineraries . (And I know there might be bugs in those itineraries)
Although some changes doesn't look good at first glance, we should dig into itineraries of each model to see whether this is problem in itineraries or due to this change.

@steven.zhang Can you help to look into some of them , especially for P6-P8 tests. Thanks.
@jhibbits FYI, you may want to check whether those changes in G3/SPE are good.

Can we guard it with getMicroOpBufferSize() == 0 to avoid the impact of the out of order CPU as you motivate this patch with inorder cpu ?

Thanks for taking a look guys. I believe this patch is improving the modelling of any cpu, given the schedule model it's working with. If I am understanding this correctly then if the last instruction scheduled in a cycle might hazard with others in the available queue, then the start of the next cycle can pick an instruction that should not be available.

I don't have any knowledge of whether it's actually better for these cpu's that have changed here though. Happy enough to add something to turn it off if required. How about an option in MachineSchedPolicy like DisableLatencyHeuristic?

llvm/test/CodeGen/PowerPC/fp128-bitcast-after-operation.ll
31

I think schedule might be believing that the second lwz will stall for a number of cycles because of the resource conflict between it and the first lwz? So from the schedule's point of view we are a few cycles further on than we would expect.

73

Yeah, the BARRIER is a CHECK-NOT. The only difference in code was that the xoris and lwz have switched places, so the CHECK-DAG's no longer match as they did in the past.

LGTM on the power side changes, as no perf regression found with this change. I will leave others to accept this patch as I am not qualified to do.

LGTM on the power side changes, as no perf regression found with this change. I will leave others to accept this patch as I am not qualified to do.

Thanks! Good to know these changes aren't causing a problem. Thanks for checking.

Ping!

Sorry I didnt see this. Please give me a day or so to get back on this.

Herald added a project: Restricted Project. · View Herald TranscriptWed, May 6, 9:19 AM
dmgreen planned changes to this revision.Wed, May 6, 3:19 PM

Thanks, but hold up. Although this does fix some problems I was seeing, it may cause others that I should look into first.

dmgreen updated this revision to Diff 266000.Mon, May 25, 4:12 AM

I managed to sort out the problems I was seeing. It was due to some incorrect information in one of our schedule models. With that out of the way, I think this is the best way forward again.