Adds a builtin that serves as an optimization hint to apply specific optimized
DAG mutations during scheduling. This also disables any other mutations or
clustering that may interfere with the desired pipeline. The first optimization
strategy that is added here is designed to improve the performance of small gemm
kernels on gfx90a.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Hey Austin --
Just have a small question about the purpose of shouldApplyStrategy -- other than that, LGTM.
llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp | ||
---|---|---|
759 | Is the plan to use heuristics on top of the builtin at some point? Not sure I understand this. | |
llvm/lib/Target/AMDGPU/SIPostRABundler.cpp | ||
135 | Maybe not in this patch due to time constraints, but perhaps in future work we can extract checking for IGLP_OPT / SCHED_GROUP_BARRIER to an analysis patch so we don't need to keep checking for it. |
llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp | ||
---|---|---|
1063 | I think this makes more sense if you parse the entire dag first, then check if neither were found. |
Just a couple nitpicks
llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp | ||
---|---|---|
1069–1070 | Have a fully unguarded entry point into PS construction / PS.solve() makes me a bit uneasy -- and it is at best inefficient. Can you guard this with foundSGB || foundIGLP? | |
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
427–428 | I think you can remove this as well since you're doing it from within the scheduler. |
Address comments.
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
427–428 | It's not added in the scheduler for plain SCHED_BARRIER. |
Is the plan to use heuristics on top of the builtin at some point? Not sure I understand this.