AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4

Description

AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4

Summary:
Only constant offsets (*_IMM opcodes) are merged.
It reuses code for LDS load/store merging.
It relies on the scheduler to group loads.

The results are mixed, I think they are mostly positive. Most shaders are
affected, so here are total stats only:

SGPRS: 2072198 -> 2151462 (3.83 %)
VGPRS: 1628024 -> 1634612 (0.40 %)
Spilled SGPRs: 7883 -> 8942 (13.43 %)
Spilled VGPRs: 97 -> 101 (4.12 %)
Scratch size: 1488 -> 1492 (0.27 %) dwords per thread
Code Size: 60222620 -> 52940672 (-12.09 %) bytes
Max Waves: 374337 -> 373066 (-0.34 %)

There is 13.4% increase in SGPR spilling, DiRT Showdown spills a few more
VGPRs (now 37), but 12% decrease in code size.

These are the new stats for SGPR spilling. We already spill a lot SGPRs,
so it's uncertain whether more spilling will make any difference since
SGPRs are always spilled to VGPRs:

SGPR SPILLING APPS Shaders SpillSGPR AvgPerSh
alien_isolation 2938 100 0.0
batman_arkham_origins 589 6 0.0
bioshock-infinite 1769 4 0.0
borderlands2 3968 22 0.0
counter_strike_glob.. 1142 60 0.1
deus_ex_mankind_div.. 1410 79 0.1
dirt-showdown 533 4 0.0
dirt_rally 364 1163 3.2
divinity 1052 2 0.0
dota2 1747 7 0.0
f1-2015 776 1515 2.0
grid_autosport 1767 1505 0.9
hitman 1413 273 0.2
left_4_dead_2 1762 4 0.0
life_is_strange 1296 26 0.0
mad_max 358 96 0.3
metro_2033_redux 2670 60 0.0
payday2 1362 22 0.0
portal 474 3 0.0
saints_row_iv 1704 8 0.0
serious_sam_3_bfe 392 1348 3.4
shadow_of_mordor 1418 12 0.0
shadow_warrior 3956 239 0.1
talos_principle 324 1735 5.4
thea 172 17 0.1
tomb_raider 1449 215 0.1
total_war_warhammer 242 56 0.2
ue4_effects_cave 295 55 0.2
ue4_elemental 572 12 0.0
unigine_tropics 210 56 0.3
unigine_valley 278 152 0.5
victor_vran 1262 84 0.1
yofrankie 82 2 0.0

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38949

Details