This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences
ClosedPublic

Authored by arsenm on Jan 12 2022, 10:47 AM.

Details

Summary

Arbitrary stack pointers are accessed using MUBUF instructions with
the voffset field, which is interpreted as the swizzled address. We
want to fold fold into the MUBUF form to use the SP in the SGPR
offset, and previously we were special casing the interpretation of
the pointer value if the access memory operand said it was relative to
the stack pointer.

690f5b7a0128a210093e9b217932743ad35b5c5a removed this check, and moved
the DAG path to special casing copies from SGPRs. This is not an
entirely sound approach, since it's still changing the interpretation
of pointer values based the context.

Introduce a new pseudo which corresponds to the wave-to-vector address
transform. This way the memory instruction has consistent semantics
where the incoming pointer is always interpreted as a vector address,
and we're not obligated to optimize into the MUBUF offset-only
addressing mode. The DAG should probably have an equivalent pseudo.

This should fix some correctness issues, and folding this into
addressing modes will be a future optimization patch.

Diff Detail

Event Timeline

arsenm created this revision.Jan 12 2022, 10:47 AM
arsenm requested review of this revision.Jan 12 2022, 10:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2022, 10:47 AM
Herald added a subscriber: wdng. · View Herald Transcript
sebastian-ne added inline comments.Jan 13 2022, 9:40 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll
3

Could you also run tests with -amdgpu-enable-flat-scratch? I guess we don’t want to multiply by wavesize then

arsenm added inline comments.Jan 13 2022, 9:44 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll
3

I thought all the offsets in scratch instructions were unswizzled, so it would still be scaled

sebastian-ne added inline comments.Jan 13 2022, 9:57 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll
3

If I remember correctly, the stack pointer is not scaled when flat scratch is enabled.

I.e. if I understand it right, for buffer instructions we have

sp = n * wavesize
buffer_store voffset = sp / wavesize  ; hardware internally swizzles, so we end up with voffset = n * wavesize + laneid

with flat scratch it is

sp = n
scratch_store voffset = sp  ; hardware internally swizzles, so we end up with voffset = n * wavesize + laneid
sebastian-ne added inline comments.Jan 13 2022, 9:59 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll
3

I guess we don’t want to multiply by wavesize then

I meant divide by wavesize, I confused the shifts.

arsenm updated this revision to Diff 400974.Jan 18 2022, 1:45 PM

Fix scratch ABI handling where the offsets are unswizzled

sebastian-ne accepted this revision.Jan 19 2022, 1:18 AM

Is there an advantage of introducing a new pseudo vs using G_LSHR?

This revision is now accepted and ready to land.Jan 19 2022, 1:18 AM

Is there an advantage of introducing a new pseudo vs using G_LSHR?

G_LSHR is not allowed to directly read a physical register. It also doesn't encode the knowledge that we're reading from something we want to convert to a swizzled address. An arbitrary pointer value should be treated as a swizzled vector address

Is there an advantage of introducing a new pseudo vs using G_LSHR?

G_LSHR is not allowed to directly read a physical register. It also doesn't encode the knowledge that we're reading from something we want to convert to a swizzled address. An arbitrary pointer value should be treated as a swizzled vector address

Plus G_LSHR can't operate on pointer types