Page MenuHomePhabricator

[AMDGPU] Insert waterfall loops for divergent calls

Authored by Flakebi on Sep 25 2020, 3:54 AM.



Extend loadSRsrcFromVGPR to allow moving a range of instructions into
the loop. The call instruction is preceded by copies into physical
registers which should be part of the waterfall loop, as the registers
can be overwritten by the call.

Diff Detail

Event Timeline

Flakebi created this revision.Sep 25 2020, 3:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2020, 3:54 AM
Flakebi requested review of this revision.Sep 25 2020, 3:54 AM
madhur13490 added inline comments.Sep 29 2020, 1:16 AM

Dump() is not required.


Should this block be executed for AGPRs too? If this is meant only for VGPRs then !SGPR is not correct.

Flakebi updated this revision to Diff 294914.Sep 29 2020, 2:42 AM

Remove debug dumps

Flakebi added inline comments.Sep 29 2020, 2:48 AM

Thanks, I forgot to remove them.


I don’t know how AGPRs work and documentation seems to be scarce, the check for calls is the same as for image operations above.
It seems like AGPRs can be copied to VGPRs but not to SGPRs (, so I think calling a function pointer that is stored in AGPRs should copy them to VGPRs and insert a waterfall loop.

madhur13490 added inline comments.Sep 29 2020, 3:10 AM

My main concern is that this check is over relaxing and allows all non-SGPR classes. This also includes any future register class we may add. AGPR is just an example. What about enabling it just for VGPR for now and put a TODO?

nhaehnle added inline comments.Sep 30 2020, 4:43 AM

Dest is the destination we're calling, i.e. the function pointer. The point of the logic, AFAIU, is that if the function pointer is non-uniform, we need to do something about that. AGPRs are non-uniform... so I'd say !isSGPR is correct.

madhur13490 accepted this revision.Oct 4 2020, 8:31 AM
This revision is now accepted and ready to land.Oct 4 2020, 8:31 AM
Flakebi updated this revision to Diff 296931.Oct 8 2020, 5:15 AM

Fix return value handling, also need to copy COPYs following the call.

This revision was automatically updated to reflect the committed changes.