In some cases padding bubbles between sequential MFMA instructions may
lead to increased inter-wave performance. Add option to request to pad
some portion of these stall cycles with s_nops.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Remove gfx90a for now.
Don't parse HWXDL proc resource since all gfx908 MFMA use HWXDL.
Add more detailed comments.
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp | ||
---|---|---|
1397 | Longest MAI is 64 cycles. You may want to move your code to the top as it can bring longest nop sequence. |
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp | ||
---|---|---|
1397 | Isn't MFMA32x32WritesAGPRAccVgprReadWaitStates longer than 64 cycles? Max wait for padding should be 16 wait states versus 18. |
LGTM
llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp | ||
---|---|---|
1397 | Right, I forgot it is divided by 4. |
I'm not sure what a percentage means here