This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Disable the load and store cluster mutations for the schedulers
AbandonedPublic

Authored by tstellar on Jul 25 2019, 9:12 AM.

Details

Summary

These tend to increase register usage, and I'm not sure they provide
that much benefit now that the SILoadStoreOptimizer is run before
scheduling.

shader-db results:

SGPRS: 2395408 -> 2245808 (-6.25 %)
VGPRS: 1385652 -> 1377068 (-0.62 %)
Spilled SGPRs: 13732 -> 12147 (-11.54 %)
Spilled VGPRs: 67 -> 104 (55.22 %)
Private memory VGPRs: 5872 -> 5872 (0.00 %)
Scratch size: 6848 -> 6864 (0.23 %) dwords per thread
Code Size: 57847052 -> 58209484 (0.63 %) bytes
LDS: 132 -> 132 (0.00 %) blocks
Max Waves: 470488 -> 472363 (0.40 %)
Wait states: 0 -> 0 (0.00 %)

Event Timeline

tstellar created this revision.Jul 25 2019, 9:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2019, 9:12 AM
rampitec requested changes to this revision.Jul 25 2019, 9:20 AM

That is known clustering increases register pressure. However in our experiments it gives performance benefits by better cache utilization. Clustering might need a tuning, but not disabled.

You can add an option to disable them though.

This revision now requires changes to proceed.Jul 25 2019, 9:20 AM
tstellar abandoned this revision.Jul 25 2019, 8:39 PM

Ok, thanks for the info. I will drop this patch.