Scheduler sends NumLoads argument into shouldClusterMemOps()
one less the actual cluster length. So for 2 instructions
it will pass just 1. Correct this number.
This is NFC for in tree targets.
Paths
| Differential D73292
[AMDGPU] Correct NumLoads in clustering ClosedPublic Authored by rampitec on Jan 23 2020, 1:27 PM.
Details Summary Scheduler sends NumLoads argument into shouldClusterMemOps() This is NFC for in tree targets.
Diff Detail Event TimelineComment Actions
Comments there argue about how much should we cluster, but regardless I do not think we should use a wrong data. If we want more clustering we need to increase thresholds, but still rely on a correct input. Comment Actions I also have an early version of a patch which can reschedule without clustering if no optimal schedule were found for a region. We may postpone this change, use that patch and then finally correct this number while increasing clustering threshold at the same time. It may be easier in terms of tests update and less prone to performance regressions. Comment Actions
I agree. I also think we should fix this properly in MachineScheduler: --- a/llvm/lib/CodeGen/MachineScheduler.cpp +++ b/llvm/lib/CodeGen/MachineScheduler.cpp @@ -1584,7 +1584,7 @@ void BaseMemOpClusterMutation::clusterNeighboringMemOps( SUnit *SUb = MemOpRecords[Idx+1].SU; if (TII->shouldClusterMemOps(MemOpRecords[Idx].BaseOps, MemOpRecords[Idx + 1].BaseOps, - ClusterLength)) { + ClusterLength + 1)) { if (SUa->NodeNum > SUb->NodeNum) std::swap(SUa, SUb); if (DAG->addEdge(SUb, SDep(SUa, SDep::Cluster))) { ... and adjust any other target implementations of shouldClusterMemOps accordingly. Comment Actions
That I agree too. I just cannot deal with any single performance regression in all of the targets, including out of tree targets. This bug is here at least for 6 years when I first noticed it. I belive every in-tree target need to do the same as in this patch and then we can collectively switch away from this bug. Sounds like the only discussion is about the process how to get there. Comment Actions
We can adjust AArch64 (the only other affected in-tree target) so there's no change in behaviour, and out-of-tree target maintainers can adjust their own targets. This revision is now accepted and ready to land.Jan 24 2020, 6:26 AM Comment Actions Changed patch to only correct clusterNeighboringMemOps. Threshold logic adjusted accordingly. Comment Actions LGTM! Perhaps add a comment in TargetInstrInfo.h documenting the argument? E.g. "the number of loads that will be in the cluster if this hook returns true"...? This revision is now accepted and ready to land.Jan 24 2020, 12:21 PM Closed by commit rGbe8e38cbd978: Correct NumLoads in clustering (authored by rampitec). · Explain WhyJan 24 2020, 12:46 PM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 239998 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
llvm/test/CodeGen/AMDGPU/bitreverse.ll
llvm/test/CodeGen/AMDGPU/call-argument-types.ll
llvm/test/CodeGen/AMDGPU/copy-illegal-type.ll
llvm/test/CodeGen/AMDGPU/ctlz.ll
llvm/test/CodeGen/AMDGPU/ctpop64.ll
llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll
llvm/test/CodeGen/AMDGPU/fneg-combines.ll
llvm/test/CodeGen/AMDGPU/idot2.ll
llvm/test/CodeGen/AMDGPU/idot4s.ll
llvm/test/CodeGen/AMDGPU/idot4u.ll
llvm/test/CodeGen/AMDGPU/idot8s.ll
llvm/test/CodeGen/AMDGPU/idot8u.ll
llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll
llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll
llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ubfe.ll
llvm/test/CodeGen/AMDGPU/llvm.round.f64.ll
llvm/test/CodeGen/AMDGPU/lshr.v2i16.ll
llvm/test/CodeGen/AMDGPU/madak.ll
llvm/test/CodeGen/AMDGPU/memory_clause.ll
llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll
llvm/test/CodeGen/AMDGPU/scratch-simple.ll
llvm/test/CodeGen/AMDGPU/setcc-limit-load-shrink.ll
llvm/test/CodeGen/AMDGPU/shl.v2i16.ll
llvm/test/CodeGen/AMDGPU/sign_extend.ll
llvm/test/CodeGen/AMDGPU/smrd-vccz-bug.ll
llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll
llvm/test/CodeGen/AMDGPU/sub.v2i16.ll
llvm/test/CodeGen/AMDGPU/wwm-reserved.ll
|