This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU/MemOpsCluster] Code clean-up around mem ops clustering logic
ClosedPublic

Authored by hsmhsm on May 18 2020, 5:25 AM.

Details

Summary

Clean-up code around mem ops clustering logic. This patch cleans up code within
the function clusterNeighboringMemOps(). It is WIP, and this patch is a first cut.

Diff Detail

Event Timeline

hsmhsm created this revision.May 18 2020, 5:25 AM
foad added a comment.May 18 2020, 6:09 AM

This causes lots of test failures:

Failing Tests (35):
  LLVM :: CodeGen/AMDGPU/GlobalISel/cvt_f32_ubyte.ll
  LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll
  LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.scale.ll
  LLVM :: CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.ubfe.ll
  LLVM :: CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
  LLVM :: CodeGen/AMDGPU/amdhsa-trap-num-sgprs.ll
  LLVM :: CodeGen/AMDGPU/bitreverse.ll
  LLVM :: CodeGen/AMDGPU/call-argument-types.ll
  LLVM :: CodeGen/AMDGPU/cvt_f32_ubyte.ll
  LLVM :: CodeGen/AMDGPU/fshr.ll
  LLVM :: CodeGen/AMDGPU/idot2.ll
  LLVM :: CodeGen/AMDGPU/idot4s.ll
  LLVM :: CodeGen/AMDGPU/idot4u.ll
  LLVM :: CodeGen/AMDGPU/idot8s.ll
  LLVM :: CodeGen/AMDGPU/idot8u.ll
  LLVM :: CodeGen/AMDGPU/insert_vector_dynelt.ll
  LLVM :: CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
  LLVM :: CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.ll
  LLVM :: CodeGen/AMDGPU/llvm.maxnum.f16.ll
  LLVM :: CodeGen/AMDGPU/llvm.minnum.f16.ll
  LLVM :: CodeGen/AMDGPU/llvm.round.f64.ll
  LLVM :: CodeGen/AMDGPU/memory_clause.ll
  LLVM :: CodeGen/AMDGPU/merge-stores.ll
  LLVM :: CodeGen/AMDGPU/promote-constOffset-to-imm.ll
  LLVM :: CodeGen/AMDGPU/salu-to-valu.ll
  LLVM :: CodeGen/AMDGPU/sdiv.ll
  LLVM :: CodeGen/AMDGPU/sdiv64.ll
  LLVM :: CodeGen/AMDGPU/setcc-limit-load-shrink.ll
  LLVM :: CodeGen/AMDGPU/sgpr-control-flow.ll
  LLVM :: CodeGen/AMDGPU/smrd.ll
  LLVM :: CodeGen/AMDGPU/srem64.ll
  LLVM :: CodeGen/AMDGPU/trunc-combine.ll
  LLVM :: CodeGen/AMDGPU/udiv64.ll
  LLVM :: CodeGen/AMDGPU/urem64.ll
  LLVM :: CodeGen/AMDGPU/use-sgpr-multiple-times.ll


Testing Time: 358.21s
  Unsupported Tests  :   102
  Expected Passes    : 15421
  Expected Failures  :    49
  Unexpected Failures:    35
foad added a comment.May 18 2020, 7:16 AM

(1) NumLoads argument can initially be set to 2 instead of 1, this avoids incrementing NumLoads while calling shouldClusterMemOps()

Alternatively you could move ++ClusterLength to just before the call to shouldClusterMemOps. But I really don't think this is very important.

(2) Re-arrange the code within clusterNeighboringMemOps(), so that the code is more convenient to understand

I agree that using continue is more readable.

(3) Improve the logic within shouldClusterMemOps() and remove all FIXMEs

You've changed it to accurately count bytes in a cluster, which is nice, but (a) it's a change in behvaiour so needs test updates and benchmarking, and (b) I don't particularly like the implementation. I think a cleaner way to do this would be to change all target getMemOperandsWithOffset functions to return a Width (in bytes) as well (some targets already have an internal getMemOperandsWithOffsetWidth function which does this) and change MachineScheduler to use this information to track the size of a cluster, and pass that information into shouldClusterMemOps.

hsmhsm updated this revision to Diff 265782.May 22 2020, 12:26 PM

Since it is safe to do clean-up step-by-step, the previous somewhat larger
patch is reverted back, and in this modified patch, initial step is taken
towards the clean-up.

Hi Jay,

I would not want to submit bigger patch since it is bit risky. So, let's take it step-by-step at time. This modified patch is the first cut.

foad accepted this revision.May 26 2020, 1:22 AM

Looks OK to me. Please also update the "summary" section at the top of this review (unfortunately "arc diff" does not do this for you when you edit the git commit message, so you have to do it by hand).

This revision is now accepted and ready to land.May 26 2020, 1:22 AM
hsmhsm edited the summary of this revision. (Show Details)May 26 2020, 3:13 AM
This revision was automatically updated to reflect the committed changes.