Scheduler sends NumLoads argument into shouldClusterMemOps()
one less the actual cluster length. So for 2 instructions
it will pass just 1. Correct this number.
This is NFC for in tree targets.
Differential D73292
[AMDGPU] Correct NumLoads in clustering rampitec on Jan 23 2020, 1:27 PM. Authored by
Details Scheduler sends NumLoads argument into shouldClusterMemOps() This is NFC for in tree targets.
Diff Detail
Event TimelineComment Actions Comments there argue about how much should we cluster, but regardless I do not think we should use a wrong data. If we want more clustering we need to increase thresholds, but still rely on a correct input. Comment Actions I also have an early version of a patch which can reschedule without clustering if no optimal schedule were found for a region. We may postpone this change, use that patch and then finally correct this number while increasing clustering threshold at the same time. It may be easier in terms of tests update and less prone to performance regressions. Comment Actions I agree. I also think we should fix this properly in MachineScheduler: --- a/llvm/lib/CodeGen/MachineScheduler.cpp +++ b/llvm/lib/CodeGen/MachineScheduler.cpp @@ -1584,7 +1584,7 @@ void BaseMemOpClusterMutation::clusterNeighboringMemOps( SUnit *SUb = MemOpRecords[Idx+1].SU; if (TII->shouldClusterMemOps(MemOpRecords[Idx].BaseOps, MemOpRecords[Idx + 1].BaseOps, - ClusterLength)) { + ClusterLength + 1)) { if (SUa->NodeNum > SUb->NodeNum) std::swap(SUa, SUb); if (DAG->addEdge(SUb, SDep(SUa, SDep::Cluster))) { ... and adjust any other target implementations of shouldClusterMemOps accordingly. Comment Actions That I agree too. I just cannot deal with any single performance regression in all of the targets, including out of tree targets. This bug is here at least for 6 years when I first noticed it. I belive every in-tree target need to do the same as in this patch and then we can collectively switch away from this bug. Sounds like the only discussion is about the process how to get there. Comment Actions
We can adjust AArch64 (the only other affected in-tree target) so there's no change in behaviour, and out-of-tree target maintainers can adjust their own targets. Comment Actions Changed patch to only correct clusterNeighboringMemOps. Threshold logic adjusted accordingly. Comment Actions LGTM! Perhaps add a comment in TargetInstrInfo.h documenting the argument? E.g. "the number of loads that will be in the cluster if this hook returns true"...? |