This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy
ClosedPublic

Authored by kerbowa on Aug 17 2022, 3:55 PM.

Details

Summary

Adds a builtin that serves as an optimization hint to apply specific optimized
DAG mutations during scheduling. This also disables any other mutations or
clustering that may interfere with the desired pipeline. The first optimization
strategy that is added here is designed to improve the performance of small gemm
kernels on gfx90a.

Diff Detail

Event Timeline

kerbowa created this revision.Aug 17 2022, 3:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2022, 3:55 PM
kerbowa requested review of this revision.Aug 17 2022, 3:55 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 17 2022, 3:55 PM

Hey Austin --

Just have a small question about the purpose of shouldApplyStrategy -- other than that, LGTM.

llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
759

Is the plan to use heuristics on top of the builtin at some point? Not sure I understand this.

llvm/lib/Target/AMDGPU/SIPostRABundler.cpp
135

Maybe not in this patch due to time constraints, but perhaps in future work we can extract checking for IGLP_OPT / SCHED_GROUP_BARRIER to an analysis patch so we don't need to keep checking for it.

kerbowa updated this revision to Diff 453904.Aug 19 2022, 12:46 AM

Update pipeline, remove edges from iglp_opt.

jrbyrnes added inline comments.Aug 19 2022, 7:23 AM
llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
1063

I think this makes more sense if you parse the entire dag first, then check if neither were found.

kerbowa updated this revision to Diff 454030.Aug 19 2022, 8:53 AM

Address comment.

kerbowa updated this revision to Diff 454074.Aug 19 2022, 11:55 AM

Don't loop over all instructions again in pre-RA scheduler.

Just a couple nitpicks

llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
1069–1070

Have a fully unguarded entry point into PS construction / PS.solve() makes me a bit uneasy -- and it is at best inefficient. Can you guard this with foundSGB || foundIGLP?

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
427–428

I think you can remove this as well since you're doing it from within the scheduler.

kerbowa updated this revision to Diff 454083.Aug 19 2022, 12:51 PM

Address comments.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
427–428

It's not added in the scheduler for plain SCHED_BARRIER.

jrbyrnes accepted this revision.Aug 19 2022, 12:58 PM

LGTM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
427–428

Oh okay -- I see

This revision is now accepted and ready to land.Aug 19 2022, 12:58 PM
This revision was landed with ongoing or failed builds.Aug 19 2022, 3:50 PM
This revision was automatically updated to reflect the committed changes.
llvm/lib/Target/AMDGPU/SIPostRABundler.cpp