This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: First pass at attempting to legalize load/stores
ClosedPublic

Authored by arsenm on Jul 17 2019, 5:14 PM.

Details

Summary

There's still a lot more to do, but this handles decomposing due to
alignment. I've gotten it to the point where nothing crashes or
infinite loops the legalizer.

Diff Detail

Event Timeline

arsenm created this revision.Jul 17 2019, 5:14 PM

Just to clarify, how is selection of global_load_ubyte and friends going to work? I assume similar to today where the load returns an s32 value, but instruction selection does matching based on the MemOperand remembering the size?

Why are unaligned global loads split up on CI+? I see that you're trying to handle this in the code, but apparently it doesn't work correctly?

lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
572–573

Shouldn't the max size for global be 128? It only goes up to dwordx4.

arsenm marked an inline comment as done.Jul 24 2019, 11:20 AM

Just to clarify, how is selection of global_load_ubyte and friends going to work? I assume similar to today where the load returns an s32 value, but instruction selection does matching based on the MemOperand remembering the size?

Yes, it's passed on the MMO size as it has always worked.

Why are unaligned global loads split up on CI+? I see that you're trying to handle this in the code, but apparently it doesn't work correctly?

These are using mesa run lines. We only assume unaligned access is enabled for amdhsa (although I think the kernel hardcodes this). Most of the challenge of this patch is managing the number of combinations for the tests, so I'll go through all of these again eventually. I was working on a program to generate all of these, but then got tired of it

lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
572–573

It goes up to 512 for SMRD loads. Constant address space really doesn't exist. If the global load is uniform and constant, it can use an SMRD load. It will be split up during RegBankSelect

arsenm updated this revision to Diff 212715.Jul 31 2019, 7:20 PM

Add comment, separate HSA run line to test unaligned loads. We should probably just assume unaligned is always on, since I think the kernel hardcodes this

arsenm updated this revision to Diff 218009.Aug 29 2019, 6:47 PM

Rebase and fix failures

rampitec added inline comments.Sep 9 2019, 1:42 PM
lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
603

You can combine two conditions into a single if.

lib/Target/AMDGPU/SIISelLowering.cpp
1286–1305

"Align > 4"?

arsenm updated this revision to Diff 219459.Sep 9 2019, 5:11 PM

Address comments and rebase testss

rampitec accepted this revision.Sep 9 2019, 5:24 PM

LGTM

This revision is now accepted and ready to land.Sep 9 2019, 5:24 PM
arsenm closed this revision.Sep 10 2019, 9:20 AM

r371533. Had to split some of the tests to avoid differences in release builds