There's still a lot more to do, but this handles decomposing due to
alignment. I've gotten it to the point where nothing crashes or
infinite loops the legalizer.
Details
Diff Detail
Event Timeline
Just to clarify, how is selection of global_load_ubyte and friends going to work? I assume similar to today where the load returns an s32 value, but instruction selection does matching based on the MemOperand remembering the size?
Why are unaligned global loads split up on CI+? I see that you're trying to handle this in the code, but apparently it doesn't work correctly?
lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
549–550 | Shouldn't the max size for global be 128? It only goes up to dwordx4. |
Yes, it's passed on the MMO size as it has always worked.
Why are unaligned global loads split up on CI+? I see that you're trying to handle this in the code, but apparently it doesn't work correctly?
These are using mesa run lines. We only assume unaligned access is enabled for amdhsa (although I think the kernel hardcodes this). Most of the challenge of this patch is managing the number of combinations for the tests, so I'll go through all of these again eventually. I was working on a program to generate all of these, but then got tired of it
lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
549–550 | It goes up to 512 for SMRD loads. Constant address space really doesn't exist. If the global load is uniform and constant, it can use an SMRD load. It will be split up during RegBankSelect |
Add comment, separate HSA run line to test unaligned loads. We should probably just assume unaligned is always on, since I think the kernel hardcodes this
Shouldn't the max size for global be 128? It only goes up to dwordx4.