Clustering loads has caching benefits, but as far as I know there is no
advantage to clustering stores on any AMDGPU subtargets.
The disadvantage is that it tends to increase register pressure and
restricts scheduling freedom.
Paths
| Differential D85530
[AMDGPU] Don't cluster stores ClosedPublic Authored by foad on Aug 7 2020, 8:31 AM.
Details
Summary Clustering loads has caching benefits, but as far as I know there is no The disadvantage is that it tends to increase register pressure and
Diff Detail
Event TimelineComment Actions Some statistics for this change, from statically compiling 9785 graphics shaders from games for GFX9: Comment Actions I think we should have benefit of write combining. Do you have any performance numbers? That shall be deciding point. Comment Actions I tried this change with game traces on GFX10. I could not convince myself that there was any statically significant changes in performance. I do however wonder if this would be better as a tuning option? This revision is now accepted and ready to land.Aug 10 2020, 11:41 AM Closed by commit rGc799f873cb9f: [AMDGPU] Don't cluster stores (authored by foad). · Explain WhySep 14 2020, 5:40 AM This revision was automatically updated to reflect the committed changes. Comment Actions
Did you try this with xnack enabled? This will reduce the number of soft clauses formed for stores Comment Actions
Probably not. I don't think xnack is enabled for any of the platforms we usually care about for Vulkan graphics.
Is that a problem, and if so why?
Revision Contents
Diff 291551 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement-stack-lower.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.i16.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.large.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/load-unaligned.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/store-local.128.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/store-local.96.ll
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
llvm/test/CodeGen/AMDGPU/call-argument-types.ll
llvm/test/CodeGen/AMDGPU/cluster_stores.ll
llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.global.ll
llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.private.ll
llvm/test/CodeGen/AMDGPU/fshr.ll
llvm/test/CodeGen/AMDGPU/half.ll
llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll
llvm/test/CodeGen/AMDGPU/local-memory.amdgcn.ll
llvm/test/CodeGen/AMDGPU/memory_clause.ll
llvm/test/CodeGen/AMDGPU/merge-stores.ll
llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll
llvm/test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll
llvm/test/CodeGen/AMDGPU/store-local.128.ll
llvm/test/CodeGen/AMDGPU/store-local.96.ll
llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll
llvm/test/CodeGen/AMDGPU/token-factor-inline-limit-test.ll
llvm/test/CodeGen/AMDGPU/widen-smrd-loads.ll
|