This is an archive of the discontinued LLVM Phabricator instance.

[flang] Improved performance of runtime Matmul/MatmulTranspose.
ClosedPublic

Authored by vzakhari on Aug 29 2023, 12:23 PM.

Details

Summary

This patch mostly affects performance of the code produced by
HLIFR lowering. If MATMUL argument is an array slice, then
HLFIR lowering passes the slice to the runtime, whereas
FIR lowering would create a contiguous temporary for the slice.
Performance might be better than the generic implementation
for cases where the leading dimension is contiguous.
This patch improves CPU2000/178.galgel making HLFIR version
faster than FIR version (due to avoiding the temporary copies
for MATMUL arguments).

Diff Detail

Event Timeline

vzakhari created this revision.Aug 29 2023, 12:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 12:23 PM
Herald added a subscriber: jdoerfert. · View Herald Transcript
vzakhari requested review of this revision.Aug 29 2023, 12:23 PM

178.galgel speeds up from 74 seconds to 26 seconds on icelake with this change.

klausler accepted this revision.Aug 29 2023, 12:28 PM
klausler added inline comments.
flang/runtime/matmul-transpose.cpp
96

Sorry to ask for an NFC here, but I'd find this bit of code easier to fully understand if all of the template flag arguments were explicitly stated, not defaulted.

This revision is now accepted and ready to land.Aug 29 2023, 12:28 PM
vzakhari added inline comments.Aug 29 2023, 12:34 PM
flang/runtime/matmul-transpose.cpp
96

No worries! I will make them non-default.

vzakhari updated this revision to Diff 554463.Aug 29 2023, 12:40 PM
  • Got rid of default values for the template parameters.
This revision was landed with ongoing or failed builds.Aug 29 2023, 5:04 PM
This revision was automatically updated to reflect the committed changes.