This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Scratch setup fix on AMDPAL gfx9+ merge shader
ClosedPublic

Authored by tpr on Jan 17 2018, 3:10 PM.

Details

Summary

With OS type AMDPAL, the scratch descriptor is hardwired to be loaded
from offset 0 of the global information table, whose low pointer is
passed in s0. For a merge shader on gfx9+, it needs to be s8 instead, as
the hardware reserves s0-s7.

Diff Detail

Repository
rL LLVM

Event Timeline

tpr created this revision.Jan 17 2018, 3:10 PM
tpr added a reviewer: nhaehnle.Feb 2 2018, 2:51 AM
arsenm added inline comments.Feb 2 2018, 8:22 AM
lib/Target/AMDGPU/SIFrameLowering.cpp
397 ↗(On Diff #130287)

Why aren't there corresponding changes in SIISelLowering? I would expect the argument lowering code would need to be aware of this

test/CodeGen/AMDGPU/amdpal_scratch_mergeshader.ll
1 ↗(On Diff #130287)

Use GCN instead of CHECK with multiple prefixes

12 ↗(On Diff #130287)

You don't need the linkage. Most of these arguments also look unused.

Why is most of this function body necessary? Shouldn't you only need a single volatile scratch access?

tpr added inline comments.Feb 5 2018, 1:06 AM
lib/Target/AMDGPU/SIFrameLowering.cpp
397 ↗(On Diff #130287)

The frontend does actually set up args for s0-s7 as well as the user sgprs starting at s8. So arg lowering does not need to do anything different.

tpr updated this revision to Diff 133031.Feb 6 2018, 9:36 AM

V2: Rebased. Addressed some review comments in the test.

tpr marked 2 inline comments as done.Feb 6 2018, 9:38 AM
tpr added inline comments.
test/CodeGen/AMDGPU/amdpal_scratch_mergeshader.ll
12 ↗(On Diff #130287)

Removed the linkage and arg9 onwards. arg0-arg8 are needed. I got that body by using bugpoint on a real shader; any further manual reduction results in the scratch access disappearing. It needs to be an alloca to test what the fix is doing.

tpr marked an inline comment as done.Feb 13 2018, 3:51 AM

Ping. Can I land the revised fix?

timcorringham accepted this revision.Feb 21 2018, 7:50 AM

Looks good to me.

This revision is now accepted and ready to land.Feb 21 2018, 7:50 AM
nhaehnle accepted this revision.Feb 21 2018, 9:13 AM

One tiny nitpick, apart from that LGTM.

lib/Target/AMDGPU/AMDGPUSubtarget.h
859 ↗(On Diff #133031)

Merged, here and elsewhere?

This revision was automatically updated to reflect the committed changes.