This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add s_nop WaitStates between neighboring mfma
ClosedPublic

Authored by kerbowa on Mar 10 2022, 6:48 PM.

Details

Summary

In some cases padding bubbles between sequential MFMA instructions may
lead to increased inter-wave performance. Add option to request to pad
some portion of these stall cycles with s_nops.

Diff Detail

Event Timeline

kerbowa created this revision.Mar 10 2022, 6:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 6:48 PM
kerbowa requested review of this revision.Mar 10 2022, 6:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 6:48 PM
arsenm added inline comments.Mar 10 2022, 7:21 PM
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
44

I'm not sure what a percentage means here

270–271

Why does this need to parse a name?

kerbowa updated this revision to Diff 414578.Mar 10 2022, 10:35 PM

Add early exit.

rampitec added inline comments.Mar 11 2022, 10:19 AM
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
44

Yep, it shall have a text this is percentage of the mfma latency.

270

If you must compare string it is better to find this MCProcResourceDesc once and then compare the pointer.

kerbowa updated this revision to Diff 417013.Mar 21 2022, 10:38 AM

Remove gfx90a for now.
Don't parse HWXDL proc resource since all gfx908 MFMA use HWXDL.
Add more detailed comments.

rampitec added inline comments.Mar 21 2022, 11:03 AM
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
1397

Longest MAI is 64 cycles. You may want to move your code to the top as it can bring longest nop sequence.

kerbowa added inline comments.Mar 23 2022, 11:41 AM
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
1397

Isn't MFMA32x32WritesAGPRAccVgprReadWaitStates longer than 64 cycles? Max wait for padding should be 16 wait states versus 18.

rampitec accepted this revision.Mar 23 2022, 11:48 AM

LGTM

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
1397

Right, I forgot it is divided by 4.

This revision is now accepted and ready to land.Mar 23 2022, 11:48 AM
This revision was automatically updated to reflect the committed changes.