This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Restrict immediate scratch offsets
ClosedPublic

Authored by sebastian-ne on Apr 26 2021, 6:42 AM.

Details

Summary

gfx9 does not work with negative offsets, gfx10 works only with
aligned negative offsets, but not with unaligned negative offsets.

This is slightly more conservative than needed, gfx9 does support
negative offsets when a VGPR address is used and gfx10 supports
negative, unaligned offsets when an SGPR address is used, but we
do not make use of that with this patch.

I don't know if later gfx9.x cards still have these issues, I can provide an OpenCL test application if someone has the hardware.

Diff Detail

Event Timeline

sebastian-ne created this revision.Apr 26 2021, 6:42 AM
sebastian-ne requested review of this revision.Apr 26 2021, 6:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2021, 6:42 AM

Summary says:

"gfx10 works only with aligned negative offsets, but not with aligned negative offsets."

Presumably one of those should say "unaligned"?

sebastian-ne edited the summary of this revision. (Show Details)Apr 26 2021, 6:52 AM
sebastian-ne added a reviewer: t-tye.

Summary says:

"gfx10 works only with aligned negative offsets, but not with aligned negative offsets."

Presumably one of those should say "unaligned"?

Ah yes, unaligned, negative offsets do not work.

foad added inline comments.Apr 26 2021, 7:08 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
7418

Is 4 always the correct value to use here? Doesn't it depend on the size of the access?

sebastian-ne added inline comments.Apr 26 2021, 7:18 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
7418

It doesn’t depend on the size of the access. scratch_store_byte/short/dword/dwordx2/dwordx3/dwordx4 all behave the same.
So yes, 4 is always correct here.

foad added inline comments.Apr 26 2021, 8:30 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
7418

Please mention that in the description of FeatureNegativeUnalignedScratchOffsetBug.

sebastian-ne marked an inline comment as done.

Mention 4-alignment in the bug description.

Jay, is it clear like that or should I extend the description?

foad added a comment.Apr 26 2021, 8:56 AM

Mention 4-alignment in the bug description.

Jay, is it clear like that or should I extend the description?

That's clear, thanks!

I didn't realize the offset bugs were this convoluted. Can you add a comment/table or something to explain all the permutations somewhere?

llvm/lib/Target/AMDGPU/AMDGPU.td
178

"are not working" -> "do not work"

This doesn't read like a description of the feature, and more like a comment for the feature

184

Ditto

sebastian-ne marked 2 inline comments as done.

I added a table and edited the feature descriptions. I hope this is more in line with what you expected.

foad added inline comments.Apr 30 2021, 12:53 PM
llvm/lib/Target/AMDGPU/AMDGPU.td
184

Should be "a negative immediate offset" (singular).

sebastian-ne marked an inline comment as done.

Fix typo.
Friendly ping for review

foad accepted this revision.May 7 2021, 3:06 AM

LGTM, just nits inline.

llvm/lib/Target/AMDGPU/AMDGPU.td
175

Nit: maybe define these next to FlatSegmentOffsetBug since they're closely related?

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
7373–7374

Nit: "flat instruction offsets can only be non-negative, global and scratch instruction offsets can ..."

llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
305

Seems a shame that this wasn't CSE'd away.

This revision is now accepted and ready to land.May 7 2021, 3:06 AM
This revision was automatically updated to reflect the committed changes.
sebastian-ne marked 2 inline comments as done.