This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Update MFMASmallGemmOpt with better performing stategy
ClosedPublic

Authored by kerbowa on Dec 2 2022, 1:35 PM.

Details

Summary

Based on experiments this does better with target small GEMM kernels.

Diff Detail

Event Timeline

kerbowa created this revision.Dec 2 2022, 1:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2022, 1:35 PM
kerbowa requested review of this revision.Dec 2 2022, 1:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2022, 1:35 PM

Hi Austin,

Changes look fine -- and if experiments show it has better performance then I suppose it is better. But the pipeline seems rather arbitrary -- in fact, in the test the previous pipeline fits the requirements of the new one. Maybe since the DAG is less constrained the scheduler has a better ability to produce improved schedule?

Also, having pipeline with 3x as many MFMA SchedGroups as there are MFMAs is an impossible pipeline. I assume you also tried I < MFMACount ?

Hi Austin,

Changes look fine -- and if experiments show it has better performance then I suppose it is better. But the pipeline seems rather arbitrary -- in fact, in the test the previous pipeline fits the requirements of the new one. Maybe since the DAG is less constrained the scheduler has a better ability to produce improved schedule?

Also, having pipeline with 3x as many MFMA SchedGroups as there are MFMAs is an impossible pipeline. I assume you also tried I < MFMACount ?

Matching specific pipelines is difficult and often doesn't correlate super well with the requested pipeline. We need to rely on what experimentally gives the best results for now. This will change again in the future.

jrbyrnes accepted this revision.Dec 9 2022, 2:05 PM

Hi Austin,

Changes look fine -- and if experiments show it has better performance then I suppose it is better. But the pipeline seems rather arbitrary -- in fact, in the test the previous pipeline fits the requirements of the new one. Maybe since the DAG is less constrained the scheduler has a better ability to produce improved schedule?

Also, having pipeline with 3x as many MFMA SchedGroups as there are MFMAs is an impossible pipeline. I assume you also tried I < MFMACount ?

Matching specific pipelines is difficult and often doesn't correlate super well with the requested pipeline. We need to rely on what experimentally gives the best results for now. This will change again in the future.

Alright -- that sounds like a shortcoming of the PipelineSolver.

This revision is now accepted and ready to land.Dec 9 2022, 2:05 PM
This revision was landed with ongoing or failed builds.Dec 9 2022, 7:10 PM
This revision was automatically updated to reflect the committed changes.