These tend to increase register usage, and I'm not sure they provide
that much benefit now that the SILoadStoreOptimizer is run before
scheduling.
shader-db results:
SGPRS: 2395408 -> 2245808 (-6.25 %)
VGPRS: 1385652 -> 1377068 (-0.62 %)
Spilled SGPRs: 13732 -> 12147 (-11.54 %)
Spilled VGPRs: 67 -> 104 (55.22 %)
Private memory VGPRs: 5872 -> 5872 (0.00 %)
Scratch size: 6848 -> 6864 (0.23 %) dwords per thread
Code Size: 57847052 -> 58209484 (0.63 %) bytes
LDS: 132 -> 132 (0.00 %) blocks
Max Waves: 470488 -> 472363 (0.40 %)
Wait states: 0 -> 0 (0.00 %)