This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add amdgpu-ps-wqm-outputs function attributes
ClosedPublic

Authored by nhaehnle on May 31 2016, 3:16 PM.

Details

Summary

The presence of this attribute indicates that VGPR outputs should be computed
in whole quad mode. This will be used by Mesa for prolog pixel shaders, so
that derivatives can be taken of shader inputs computed by the prolog, fixing
a bug.

The generated code could certainly be improved: if a prolog pixel shader is
used (which isn't common in modern OpenGL - they're used for gl_Color, polygon
stipples, and forcing per-sample interpolation), Mesa will use this attribute
unconditionally, because it has to be conservative. So WQM may be used in the
prolog when it isn't really needed, and furthermore a silly back-and-forth
switch is likely to happen at the boundary between prolog and main shader
parts.

Fixing this is a bit involved: we'd first have to add a mechanism by which
LLVM writes the WQM-related input requirements to the main shader part binary,
and then Mesa specializes the prolog part accordingly. At that point, we may
as well just compile a monolithic shader...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130

Diff Detail

Repository
rL LLVM

Event Timeline

nhaehnle updated this revision to Diff 59138.May 31 2016, 3:16 PM
nhaehnle retitled this revision from to AMDGPU: Add amdgpu-ps-wqm-outputs function attributes.
nhaehnle updated this object.
nhaehnle added a subscriber: llvm-commits.
mareko edited edge metadata.May 31 2016, 5:11 PM

I don't understand the comment.

If a pixel shader doesn't use DDX/DDY and sample/gather, WQM can be disabled in the prolog. tgsi_scan can get that information.

If a pixel shader uses those, WQM can be enabled at the beginning of the prolog and disabled at the end, so that the main part starts with the original EXEC mask. This adds 2 unnecessary SALU (s_mov_b64, s_wqm_b64) at the prolog->main boundary, but I don't think that matters much. Prologs are usually empty with the GL core profile anyway.

arsenm added inline comments.May 31 2016, 6:19 PM
lib/Target/AMDGPU/SIWholeQuadMode.cpp
172 ↗(On Diff #59138)

Looking for specific copies is usually concerning

nhaehnle updated this revision to Diff 59195.Jun 1 2016, 3:50 AM
nhaehnle marked an inline comment as done.
nhaehnle edited edge metadata.

Instead of looking for copies, look for defs of physical VGPRs. Since this
is before register allocation, the only physical VGPRs should be shader
inputs and outputs, and we never overwrite the inputs, so that should do the
right thing.

Marek, the point of the comment is that even when the main part uses
derivatives, it's probably not very common to use derivatives of gl_Color at
least, and in that case the prolog still wouldn't need WQM (per-sample
interpolants are different).

But as you say, PS prologs aren't very common, the additional number of
instructions is small, and memory accesses are a non-issue - that's why I
didn't bother to do anything more involved.

mareko added a comment.Jun 1 2016, 5:09 AM

I think we also want to run in WQM when changing the interpolation weights in the prolog. For now, that's just for force_persample_interp, which is pretty rare, but when we start to use bc_optimize, we'll run into WQM issues more often. Sadly I didn't realize this when I was working on the prolog. From mesa - si_shader.h:

/* TODO:

  • - add force_center_interp if MSAA is disabled and centroid or
  • sample are present
  • - add force_center_interp_bc_optimize to force center interpolation
  • based on the bc_optimize SGPR bit if MSAA is enabled, centroid is
  • present and sample isn't present.
	 */

Since the decision when to use WQM is up to Mesa, are there any other comments on this LLVM patch?

mareko added a comment.Jun 2 2016, 2:01 PM

Since the decision when to use WQM is up to Mesa, are there any other comments on this LLVM patch?

Not from me.

This revision was automatically updated to reflect the committed changes.