Clean-up code around mem ops clustering logic. This patch cleans up code within
the function clusterNeighboringMemOps(). It is WIP, and this patch is a first cut.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This causes lots of test failures:
Failing Tests (35): LLVM :: CodeGen/AMDGPU/GlobalISel/cvt_f32_ubyte.ll LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.scale.ll LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ubfe.ll LLVM :: CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll LLVM :: CodeGen/AMDGPU/amdhsa-trap-num-sgprs.ll LLVM :: CodeGen/AMDGPU/bitreverse.ll LLVM :: CodeGen/AMDGPU/call-argument-types.ll LLVM :: CodeGen/AMDGPU/cvt_f32_ubyte.ll LLVM :: CodeGen/AMDGPU/fshr.ll LLVM :: CodeGen/AMDGPU/idot2.ll LLVM :: CodeGen/AMDGPU/idot4s.ll LLVM :: CodeGen/AMDGPU/idot4u.ll LLVM :: CodeGen/AMDGPU/idot8s.ll LLVM :: CodeGen/AMDGPU/idot8u.ll LLVM :: CodeGen/AMDGPU/insert_vector_dynelt.ll LLVM :: CodeGen/AMDGPU/insert_vector_elt.v2i16.ll LLVM :: CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.ll LLVM :: CodeGen/AMDGPU/llvm.maxnum.f16.ll LLVM :: CodeGen/AMDGPU/llvm.minnum.f16.ll LLVM :: CodeGen/AMDGPU/llvm.round.f64.ll LLVM :: CodeGen/AMDGPU/memory_clause.ll LLVM :: CodeGen/AMDGPU/merge-stores.ll LLVM :: CodeGen/AMDGPU/promote-constOffset-to-imm.ll LLVM :: CodeGen/AMDGPU/salu-to-valu.ll LLVM :: CodeGen/AMDGPU/sdiv.ll LLVM :: CodeGen/AMDGPU/sdiv64.ll LLVM :: CodeGen/AMDGPU/setcc-limit-load-shrink.ll LLVM :: CodeGen/AMDGPU/sgpr-control-flow.ll LLVM :: CodeGen/AMDGPU/smrd.ll LLVM :: CodeGen/AMDGPU/srem64.ll LLVM :: CodeGen/AMDGPU/trunc-combine.ll LLVM :: CodeGen/AMDGPU/udiv64.ll LLVM :: CodeGen/AMDGPU/urem64.ll LLVM :: CodeGen/AMDGPU/use-sgpr-multiple-times.ll Testing Time: 358.21s Unsupported Tests : 102 Expected Passes : 15421 Expected Failures : 49 Unexpected Failures: 35
(1) NumLoads argument can initially be set to 2 instead of 1, this avoids incrementing NumLoads while calling shouldClusterMemOps()
Alternatively you could move ++ClusterLength to just before the call to shouldClusterMemOps. But I really don't think this is very important.
(2) Re-arrange the code within clusterNeighboringMemOps(), so that the code is more convenient to understand
I agree that using continue is more readable.
(3) Improve the logic within shouldClusterMemOps() and remove all FIXMEs
You've changed it to accurately count bytes in a cluster, which is nice, but (a) it's a change in behvaiour so needs test updates and benchmarking, and (b) I don't particularly like the implementation. I think a cleaner way to do this would be to change all target getMemOperandsWithOffset functions to return a Width (in bytes) as well (some targets already have an internal getMemOperandsWithOffsetWidth function which does this) and change MachineScheduler to use this information to track the size of a cluster, and pass that information into shouldClusterMemOps.
Since it is safe to do clean-up step-by-step, the previous somewhat larger
patch is reverted back, and in this modified patch, initial step is taken
towards the clean-up.
Hi Jay,
I would not want to submit bigger patch since it is bit risky. So, let's take it step-by-step at time. This modified patch is the first cut.
Looks OK to me. Please also update the "summary" section at the top of this review (unfortunately "arc diff" does not do this for you when you edit the git commit message, so you have to do it by hand).