Based on experiments this does better with target small GEMM kernels.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
Hi Austin,
Changes look fine -- and if experiments show it has better performance then I suppose it is better. But the pipeline seems rather arbitrary -- in fact, in the test the previous pipeline fits the requirements of the new one. Maybe since the DAG is less constrained the scheduler has a better ability to produce improved schedule?
Also, having pipeline with 3x as many MFMA SchedGroups as there are MFMAs is an impossible pipeline. I assume you also tried I < MFMACount ?
Comment Actions
Matching specific pipelines is difficult and often doesn't correlate super well with the requested pipeline. We need to rely on what experimentally gives the best results for now. This will change again in the future.