This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4
ClosedPublic

Authored by mareko on Oct 16 2017, 6:04 AM.

Details

Summary

Only constant offsets (*_IMM opcodes) are merged.
It reuses code for LDS load/store merging.
It relies on the scheduler to group loads.

The results are mixed, I think they are mostly positive. Most shaders are
affected, so here are total stats only:

SGPRS: 2072198 -> 2151462 (3.83 %)
VGPRS: 1628024 -> 1634612 (0.40 %)
Spilled SGPRs: 7883 -> 8942 (13.43 %)
Spilled VGPRs: 97 -> 101 (4.12 %)
Scratch size: 1488 -> 1492 (0.27 %) dwords per thread
Code Size: 60222620 -> 52940672 (-12.09 %) bytes
Max Waves: 374337 -> 373066 (-0.34 %)

There is 13.4% increase in SGPR spilling, DiRT Showdown spills a few more
VGPRs (now 37), but 12% decrease in code size.

These are the new stats for SGPR spilling. We already spill a lot SGPRs,
so it's uncertain whether more spilling will make any difference since
SGPRs are always spilled to VGPRs:

SGPR SPILLING APPS Shaders SpillSGPR AvgPerSh
alien_isolation 2938 100 0.0
batman_arkham_origins 589 6 0.0
bioshock-infinite 1769 4 0.0
borderlands2 3968 22 0.0
counter_strike_glob.. 1142 60 0.1
deus_ex_mankind_div.. 1410 79 0.1
dirt-showdown 533 4 0.0
dirt_rally 364 1163 3.2
divinity 1052 2 0.0
dota2 1747 7 0.0
f1-2015 776 1515 2.0
grid_autosport 1767 1505 0.9
hitman 1413 273 0.2
left_4_dead_2 1762 4 0.0
life_is_strange 1296 26 0.0
mad_max 358 96 0.3
metro_2033_redux 2670 60 0.0
payday2 1362 22 0.0
portal 474 3 0.0
saints_row_iv 1704 8 0.0
serious_sam_3_bfe 392 1348 3.4
shadow_of_mordor 1418 12 0.0
shadow_warrior 3956 239 0.1
talos_principle 324 1735 5.4
thea 172 17 0.1
tomb_raider 1449 215 0.1
total_war_warhammer 242 56 0.2
ue4_effects_cave 295 55 0.2
ue4_elemental 572 12 0.0
unigine_tropics 210 56 0.3
unigine_valley 278 152 0.5
victor_vran 1262 84 0.1
yofrankie 82 2 0.0

Diff Detail

Repository
rL LLVM

Event Timeline

mareko created this revision.Oct 16 2017, 6:04 AM
nhaehnle accepted this revision.Oct 23 2017, 8:31 AM

One comment inline, apart from that LGTM. I imagine some of those additional spills could be improved with better register allocation, and that's a bullet someone has to bite eventually, but not now.

lib/Target/AMDGPU/AMDGPUSubtarget.h
326–328 ↗(On Diff #119144)

Could you add an explanation of what the bug is in the comment? (I think it may be the page-crossing bug that I vaguely remember but don't find the reference to right now, in which case we could still consider merging stores that are properly aligned in a future patch.)

This revision is now accepted and ready to land.Oct 23 2017, 8:31 AM
mareko added inline comments.Oct 31 2017, 12:05 PM
lib/Target/AMDGPU/AMDGPUSubtarget.h
326–328 ↗(On Diff #119144)

The bug is very well documented internally at the usual place.

This revision was automatically updated to reflect the committed changes.