This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Remove the s_buffer workaround for GFX9 chips
ClosedPublic

Authored by mareko on Jan 31 2018, 11:47 AM.

Details

Summary

I checked the AMD closed source compiler and the workaround is only
needed when x3 is emulated as x4, which we don't do in LLVM.

SMEM x3 opcodes don't exist, and instead there is a possibility to use x4
with the last component being unused. If the last component is out of
buffer bounds and falls on the next 4K page, the hw hangs.

Diff Detail

Event Timeline

mareko created this revision.Jan 31 2018, 11:47 AM
arsenm added a comment.Feb 1 2018, 9:21 AM

We probably do want to do that optimization at some point, although in that case I would hope we would avoid producing them in the buggy case. Can you add more details to the comment here, and possibly leave it?

mareko added a comment.Feb 1 2018, 9:37 AM

We probably do want to do that optimization at some point, although in that case I would hope we would avoid producing them in the buggy case. Can you add more details to the comment here, and possibly leave it?

What details? Can you be more specific about what you're asking here?

We probably do want to do that optimization at some point, although in that case I would hope we would avoid producing them in the buggy case. Can you add more details to the comment here, and possibly leave it?

What details? Can you be more specific about what you're asking here?

Like you mentioned in the commit message that there is a problem with x3 loads only.

We probably do want to do that optimization at some point, although in that case I would hope we would avoid producing them in the buggy case. Can you add more details to the comment here, and possibly leave it?

What details? Can you be more specific about what you're asking here?

Like you mentioned in the commit message that there is a problem with x3 loads only.

SMEM x3 opcodes don't exist, and instead there is a possibility to use x4 with the last component being unused. If the last component is out of buffer bounds and falls on the next 4K page, the hw hangs.

mareko updated this revision to Diff 132453.Feb 1 2018, 12:39 PM
mareko edited the summary of this revision. (Show Details)

I checked the AMD closed source compiler and the workaround is only
needed when x3 is emulated as x4, which we don't do in LLVM.

SMEM x3 opcodes don't exist, and instead there is a possibility to use x4
with the last component being unused. If the last component is out of
buffer bounds and falls on the next 4K page, the hw hangs.

This revision was not accepted when it landed; it landed in state Needs Review.Feb 7 2018, 8:05 AM
This revision was automatically updated to reflect the committed changes.