This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Implement waterfall loop for MIMG instructions with 256-bit SRsrc
ClosedPublic

Authored by cfang on Jun 25 2020, 2:28 PM.

Diff Detail

Event Timeline

cfang created this revision.Jun 25 2020, 2:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2020, 2:28 PM

Could you make this generic over the VGPR register class instead? That code duplication is rather annoying.

arsenm requested changes to this revision.Jun 29 2020, 8:42 AM

Desperately needs tests

This revision now requires changes to proceed.Jun 29 2020, 8:42 AM
cfang updated this revision to Diff 280623.Jul 24 2020, 5:03 PM
cfang added a reviewer: nhaehnle.

Add waterloop tests, and update existing tests.

cfang added a comment.Jul 24 2020, 5:13 PM

Could you make this generic over the VGPR register class instead? That code duplication is rather annoying.

I agree it does not look nice with the duplication. However, the existing implementation is easy to follow,
and I could not figure out a way that can easily generalize over vgpr class without make the code messy.

We will continue to work on to come out with a generalized solution. But in the mean time, we need
the fix for a correctness issue which has been bothering us for a long time.

Could you make this generic over the VGPR register class instead? That code duplication is rather annoying.

I agree it does not look nice with the duplication. However, the existing implementation is easy to follow,
and I could not figure out a way that can easily generalize over vgpr class without make the code messy.

We will continue to work on to come out with a generalized solution. But in the mean time, we need
the fix for a correctness issue which has been bothering us for a long time.

GlobalISel already has a generalized implementation, it just relies on generic registers. I'd rather copy something from there and dirty it up with register classes

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
4986

llvm_unreachable

cfang updated this revision to Diff 285894.Aug 16 2020, 12:34 PM

Update based on the reviewers' request to generate the waterfall loop for resource registers of
arbitrary sizes.

Update a few LIT tests, mainly because of the changes in instruction order, or additional move instructions under -O0.

cfang updated this revision to Diff 286071.Aug 17 2020, 10:43 AM

A minor change to use the register directly in stead of getting it from the instruction.

NOTE: this is based on the version we have generalized the resource register size based on the implementation from GISel.
cfang added a comment.Aug 18 2020, 3:03 PM

Ping. Is there any further suggestions to move this patch ahead? Thanks.

arsenm accepted this revision.Aug 18 2020, 3:43 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
5075

What about the SGPR offset? I guess this was broken before anyway

This revision is now accepted and ready to land.Aug 18 2020, 3:43 PM
cfang added inline comments.Aug 18 2020, 4:03 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
5075

Do you mean we will have to legalize the case with SGPR Offset? If you can come up with a test case with SGPR offset, we can fix the broken, in a separate patch of course. Thanks.