This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Legalize odd sized loads with widening
ClosedPublic

Authored by arsenm on Aug 8 2020, 11:38 AM.

Details

Summary

Custom lower and widen odd sized loads up to the alignment. The
default set of legalization actions doesn't have a way to represent
this. This fixes naturally aligned <3 x s8> and <3 x s16> loads.

This also starts moving towards eliminating the buggy and
overcomplicated legalization rules for narrowing. All the memory size
changes should be done in the lower or custom action, not NarrowScalar
/ FewerElements. These currently have redundant and ambiguous code
with the lower action.

Diff Detail

Event Timeline

arsenm created this revision.Aug 8 2020, 11:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2020, 11:38 AM
arsenm requested review of this revision.Aug 8 2020, 11:38 AM
foad added a comment.Aug 17 2020, 7:15 AM

Looks OK to me but I don't understand all the details. @mbrkusanin could you take a look at this one?

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2303–2308

Ty.changeElementSize(PowerOf2Ceil(Ty.getScalarSizeInBits()))?

arsenm added inline comments.Aug 17 2020, 7:18 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2303–2308

I think that should only be used to handle cases that could be a vector or not, but it’s known scalar here so doesn’t save code

foad added inline comments.Aug 17 2020, 8:05 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2303–2308

I meant as a replacement for all of lines 2303-2308, but now I see that in the vector case you're increasing the number of elements, not the size of each element.

Could use Ty.changeNumElements(PowerOf2Ceil(Ty.getNumElements())) in the vector case.

arsenm updated this revision to Diff 286630.Aug 19 2020, 12:07 PM

Use changeNumElements

foad accepted this revision.Aug 20 2020, 1:56 AM

Looks good, just a couple of questions inline.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-constant.mir
12387–12391

Is there any good reason why this widens to s64 and then s128, insteading of doing it in one step?

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-memory-metadata.mir
36–37

Where is the tbaa preserved in the GMIR for these tests? I don't see it.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-store.mir
643–666

Ouch that's a big expansion! I hope most of it can be optimized away one day.

This revision is now accepted and ready to land.Aug 20 2020, 1:56 AM
arsenm added inline comments.Aug 20 2020, 1:15 PM
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-constant.mir
12387–12391

I think this makes more sense for zext, where the high 64-bits can be done with a single s_mov_b64, and anyext just inherits from this logic

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-memory-metadata.mir
36–37

It's not, this is for the follow on commit to fix it

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-store.mir
643–666

There are a lot of legalization messes to deal with