This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4
ClosedPublic

Authored by mareko on Oct 17 2017, 10:29 AM.

Details

Summary

Only 56 shaders (out of 48486) are affected.

Totals from affected shaders (changed stats only):
SGPRS: 2420 -> 2460 (1.65 %)
Spilled VGPRs: 94 -> 112 (19.15 %)
Scratch size: 524 -> 528 (0.76 %) dwords per thread
Code Size: 187400 -> 184992 (-1.28 %) bytes

One DiRT Showdown shader spills 6 more VGPRs.
One Grid Autosport shader spills 12 more VGPRs.

The other 54 shaders only have a decrease in code size.
(I'm ignoring the SGPR noise)

Event Timeline

mareko created this revision.Oct 17 2017, 10:29 AM
nhaehnle edited edge metadata.Oct 23 2017, 8:50 AM

One minor comment inline.

The volatile / hasOrderedMemoryRef issue is worrying, and I don't think the change should go in as-is. Besides, it's probably not as big a win as the other patches, especially due to the added VGPR spills. Do you have all these patches in a branch somewhere?

lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
754

Remove the assignment "MIB = MIB" here and below, it shouldn't be necessary. The return value is just a convenience to allow chains of .add(..).add(..) etc.

test/CodeGen/AMDGPU/fmul-2-combine-multi-use.ll
42–51 ↗(On Diff #119350)

In this case, the stores are explicitly marked as volatile in the input IR. Merging those seems like a very bad idea.

56–66 ↗(On Diff #119350)

Same here.

One minor comment inline.

The volatile / hasOrderedMemoryRef issue is worrying, and I don't think the change should go in as-is. Besides, it's probably not as big a win as the other patches, especially due to the added VGPR spills. Do you have all these patches in a branch somewhere?

The spills don't bother me because the shaders already spill VGPRs and any random codegen change tends to make it better or worse randomly. I will push my patches to some branch later.

mareko updated this revision to Diff 121057.Oct 31 2017, 2:04 PM
  • don't ignore MOVolatile
This revision was automatically updated to reflect the committed changes.