Don't break down local loads and stores so ds_read/write_b96/b128 in
ISelLowering can be selected on subtargets that support them and if align
requirements allow them.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
8412 | I think <= is not right here, it needs to be == 16 or == 12 |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
8412 | It looks like it was introduced because of https://bugs.freedesktop.org/show_bug.cgi?id=105464 to avoid a bug with some 128bit vectors but it was fixed in the meantime and this option is now enabled by default. Not sure if the same bug affected b96 or not. Either way if there is no need for this feature anymore it might make more sense to remove it then expand on it. |
This breaks LDS. LLVMSetAlignment(inst, 4) on loads and stores has no effect. The IR says "align 4", yet the backend still selects b128.
Please revert ASAP. @nhaehnle
On what subtargets? GFX9 and 10 should select b128 for align 4. That is the purpose of the patch. Are you saying it selects it for SI, CI or VI?
As far as I can tell, b64/b96/b128 do work correctly with unaligned access, and it's potentially a bug we don't use them more aggressively
I've checked a couple Vulkan CTS tests that now produce b128 instructions for SDag and they work fine. I also did not find any regressions on others. Can you give us any more details? Or a test to reproduce the issue?
The broken application is Unigine Heaven with Mesa OpenGL. It's a pretty standard app, so most likely all tessellation is broken and other LDS users possibly too.
More information:
- (gfx9 hasn't been tested)
- gfx10.1 has the corruption in WGP mode only (CU mode works)
- gfx10.3 works
It looks like it's a gfx10.1 hw bug in WGP mode, so a fix or workaround is needed. The driver always uses WGP mode.
I don't want to spread generation checks anywhere outside of AMDGPUSubtarget, so this should check some other feature or helper inside the subtarget