This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use ds_read/write_b96/b128 when possible for SDag
ClosedPublic

Authored by mbrkusanin on Jul 23 2020, 6:59 AM.

Details

Summary

Don't break down local loads and stores so ds_read/write_b96/b128 in
ISelLowering can be selected on subtargets that support them and if align
requirements allow them.

Diff Detail

Event Timeline

mbrkusanin created this revision.Jul 23 2020, 6:59 AM
  • Rebase
  • Ping
arsenm added inline comments.Aug 10 2020, 12:14 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
7946

I don't want to spread generation checks anywhere outside of AMDGPUSubtarget, so this should check some other feature or helper inside the subtarget

8419

Ditto

8422

I assume there's a Store->getAlign() to avoid the Align construction?

  • Addressed comments
arsenm added inline comments.Aug 18 2020, 9:14 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8412

I think <= is not right here, it needs to be == 16 or == 12

mbrkusanin marked an inline comment as done.
arsenm accepted this revision.Aug 18 2020, 9:27 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8412

Whether usesDS128 should apply to B96 is open for interpretation (although I'm not sure we even need this toggleable feature anymore)

This revision is now accepted and ready to land.Aug 18 2020, 9:27 AM
mbrkusanin added inline comments.Aug 20 2020, 4:46 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8412

It looks like it was introduced because of https://bugs.freedesktop.org/show_bug.cgi?id=105464 to avoid a bug with some 128bit vectors but it was fixed in the meantime and this option is now enabled by default. Not sure if the same bug affected b96 or not. Either way if there is no need for this feature anymore it might make more sense to remove it then expand on it.
Reason for hasDS96AndDS128() was to avoid expanding on that one.

mareko added a subscriber: mareko.Aug 31 2020, 8:20 PM

This breaks LDS. LLVMSetAlignment(inst, 4) on loads and stores has no effect. The IR says "align 4", yet the backend still selects b128.

Please revert ASAP. @nhaehnle

This breaks LDS. LLVMSetAlignment(inst, 4) on loads and stores has no effect. The IR says "align 4", yet the backend still selects b128.

On what subtargets? GFX9 and 10 should select b128 for align 4. That is the purpose of the patch. Are you saying it selects it for SI, CI or VI?

arsenm added a comment.Sep 1 2020, 6:14 AM

This breaks LDS. LLVMSetAlignment(inst, 4) on loads and stores has no effect. The IR says "align 4", yet the backend still selects b128.

Please revert ASAP. @nhaehnle

As far as I can tell, b64/b96/b128 do work correctly with unaligned access, and it's potentially a bug we don't use them more aggressively

mareko added a comment.Sep 1 2020, 6:16 AM

This breaks LDS. LLVMSetAlignment(inst, 4) on loads and stores has no effect. The IR says "align 4", yet the backend still selects b128.

On what subtargets? GFX9 and 10 should select b128 for align 4. That is the purpose of the patch. Are you saying it selects it for SI, CI or VI?

On GFX10. Apparently b128 with align 4 doesn't work there.

This breaks LDS. LLVMSetAlignment(inst, 4) on loads and stores has no effect. The IR says "align 4", yet the backend still selects b128.

On what subtargets? GFX9 and 10 should select b128 for align 4. That is the purpose of the patch. Are you saying it selects it for SI, CI or VI?

On GFX10. Apparently b128 with align 4 doesn't work there.

I've checked a couple Vulkan CTS tests that now produce b128 instructions for SDag and they work fine. I also did not find any regressions on others. Can you give us any more details? Or a test to reproduce the issue?

This breaks LDS. LLVMSetAlignment(inst, 4) on loads and stores has no effect. The IR says "align 4", yet the backend still selects b128.

On what subtargets? GFX9 and 10 should select b128 for align 4. That is the purpose of the patch. Are you saying it selects it for SI, CI or VI?

On GFX10. Apparently b128 with align 4 doesn't work there.

I've checked a couple Vulkan CTS tests that now produce b128 instructions for SDag and they work fine. I also did not find any regressions on others. Can you give us any more details? Or a test to reproduce the issue?

The broken application is Unigine Heaven with Mesa OpenGL. It's a pretty standard app, so most likely all tessellation is broken and other LDS users possibly too.

mareko added a comment.Sep 2 2020, 7:19 PM

This breaks LDS. LLVMSetAlignment(inst, 4) on loads and stores has no effect. The IR says "align 4", yet the backend still selects b128.

On what subtargets? GFX9 and 10 should select b128 for align 4. That is the purpose of the patch. Are you saying it selects it for SI, CI or VI?

On GFX10. Apparently b128 with align 4 doesn't work there.

I've checked a couple Vulkan CTS tests that now produce b128 instructions for SDag and they work fine. I also did not find any regressions on others. Can you give us any more details? Or a test to reproduce the issue?

More information:

  • (gfx9 hasn't been tested)
  • gfx10.1 has the corruption in WGP mode only (CU mode works)
  • gfx10.3 works

It looks like it's a gfx10.1 hw bug in WGP mode, so a fix or workaround is needed. The driver always uses WGP mode.

foad added a comment.Sep 3 2020, 12:31 AM

This breaks LDS. LLVMSetAlignment(inst, 4) on loads and stores has no effect. The IR says "align 4", yet the backend still selects b128.

On what subtargets? GFX9 and 10 should select b128 for align 4. That is the purpose of the patch. Are you saying it selects it for SI, CI or VI?

On GFX10. Apparently b128 with align 4 doesn't work there.

I've checked a couple Vulkan CTS tests that now produce b128 instructions for SDag and they work fine. I also did not find any regressions on others. Can you give us any more details? Or a test to reproduce the issue?

More information:

  • (gfx9 hasn't been tested)
  • gfx10.1 has the corruption in WGP mode only (CU mode works)
  • gfx10.3 works

It looks like it's a gfx10.1 hw bug in WGP mode, so a fix or workaround is needed. The driver always uses WGP mode.

Do you know what the driver sets SH_MEM_CONFIG.alignment_mode to?

This breaks LDS. LLVMSetAlignment(inst, 4) on loads and stores has no effect. The IR says "align 4", yet the backend still selects b128.

On what subtargets? GFX9 and 10 should select b128 for align 4. That is the purpose of the patch. Are you saying it selects it for SI, CI or VI?

On GFX10. Apparently b128 with align 4 doesn't work there.

I've checked a couple Vulkan CTS tests that now produce b128 instructions for SDag and they work fine. I also did not find any regressions on others. Can you give us any more details? Or a test to reproduce the issue?

More information:

  • (gfx9 hasn't been tested)
  • gfx10.1 has the corruption in WGP mode only (CU mode works)
  • gfx10.3 works

It looks like it's a gfx10.1 hw bug in WGP mode, so a fix or workaround is needed. The driver always uses WGP mode.

Do you know what the driver sets SH_MEM_CONFIG.alignment_mode to?

Unaligned.