This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Make amdgcn.s.buffer.load a memory-reading intrinsic
AbandonedPublic

Authored by krzysz00 on Mar 30 2023, 10:19 AM.

Details

Summary

Change the s.buffer.load to have a memory read effect, correctly
representing that it reads memory and matching the backend's
representation of the intrinsic.

This is forked out of D146761, a larger refactoring of the buffer
intrinsics.

On its own, this commit may have performance implications for users of
s.buffer.load (who are, to my knowledge, mainly Mesa) who may be
relying on that lack of memory effect to hoist loads from constant,
invariant buffers out of loops, for example.

However, when the full buffer refactoring lands, s.buffer.load's new
form will take a pointer-valued argument and thus, the constant-ness
of the buffer will be expressible using annotations as the callsite,
such as "noalias", so this fixup should not have a long-term
performance impact.

Diff Detail

Event Timeline

krzysz00 created this revision.Mar 30 2023, 10:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 10:19 AM
krzysz00 requested review of this revision.Mar 30 2023, 10:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 10:19 AM

Mesa sets "invariant.load" metadata on all "s.buffer.load" call sites. There are no pointers, only descriptors. Is that sufficient to get the readnone behavior?

The plan here is to introduce new intrinsics that do use pointers - that's a future patch in the series - so that we can properly reason about things like buffer alias analysis. So this is an intermediate step that will break perf until there's migration to the new intrinsics.

We can't break perf, not even temporarily.

foad added a comment.Mar 31 2023, 3:08 AM

I had a quick look at some codegen differences this caused in Vulkan shaders compiled by LLPC. The biggest difference was that the DAG nodes for scalar loads now have a chain operand, which has a big effect on DAG scheduling.

Mesa sets "invariant.load" metadata on all "s.buffer.load" call sites. There are no pointers, only descriptors. Is that sufficient to get the readnone behavior?

I don't know if that's even valid. Without looking I'm 95% sure that won't do anything. Ideally that would work

I had a quick look at some codegen differences this caused in Vulkan shaders compiled by LLPC. The biggest difference was that the DAG nodes for scalar loads now have a chain operand, which has a big effect on DAG scheduling.

Oh right, I forgot about the DAG chaining. The chain must be determined by the declaration (otherwise code would have to randomly support chained and unchained variants of memory intrinsics)

Ok, so, just to clarify:

  1. There's an undocumented invariant that the buffer you pass in to s.buffer.load is constant for the duration of the program.
  2. A bunch of folks are relying on that invariant for performance.
  3. That aside, s.buffer.load is, more or less, raw.buffer.load

Given that those things are true, I'm going to back out of this change to s.buffer.load, and we'll leave all these codepaths untouched during the buffer intrinsic updates for now. Once address space 7 properly exists, folks'll be able to migrate to that system and use a typical load with the appropriate "this points to constant memory" annotations, so we may end up not needing this special intrinsic.

So, I'll go ahead and file a new revision to update the docs.

From what I can tell, if !invariant.load is supported for load-like intrinsics, that's not documented anywhere. It might be neat to make that work, but we can cross that bridge when we get there.

On the other hand, I'm imaging the following sequence of rewrites:

%v = load T, ptr addrspace(7) %const.buffer, !invariant.load

=> (late in the frontend)

%buf, %offset = ...(%const.buffer)
%v = call T @llvm.amdgcn.raw.buffer.load(%buf, i32 %off. [...]) !invariant.load

=> (in IR translation)

INTRINSIC_W_SIDE_EFFECT(@llvm.amdgcn.raw.buffer.load, ...) :: (dereferencable invariant load from BufferAddr(%buf, %off) ....)

=>(because the load was marked invariant, assuming %off is uniform)

AMDGCN_S_BUFFER_LOAD %buf, %off, ... :: (...)
foad added a comment.Mar 31 2023, 8:39 AM

From what I can tell, if !invariant.load is supported for load-like intrinsics, that's not documented anywhere. It might be neat to make that work, but we can cross that bridge when we get there.

As far as I know it's only supported ad hoc for some AMDGPU load-like intrinsics: D119739

krzysz00 abandoned this revision.Apr 25 2023, 11:18 AM

This here isn't going to work, closing to unclutter everyone's review lists