This fused operation should run a lot faster than first transposing the
lhs array and then multiplying the matrices separately.
Based on flang/runtime/matmul.cpp
Depends on D145959
Paths
| Differential D145960
[flang] add fused matmul-transpose to the runtime ClosedPublic Authored by tblah on Mar 13 2023, 10:09 AM.
Details Summary This fused operation should run a lot faster than first transposing the Based on flang/runtime/matmul.cpp Depends on D145959
Diff Detail
Event Timelinetblah added a child revision: D145961: [flang][hlfir] lower hlfir.matmul_transpose to runtime call.Mar 13 2023, 10:10 AM Comment Actions Can the original MATMUL be extended in place as template(s) with one or two argument transposition flags, so that all the runtime APIs have a common implementation?
Comment Actions Thanks for review. I've updated to initialize using braces. The original MATMUL could be extended in place but I think it wouldn't share
This revision is now accepted and ready to land.Mar 15 2023, 12:17 PM Closed by commit rG4ff8ba72b583: [flang] add fused matmul-transpose to the runtime (authored by tblah). · Explain WhyMar 17 2023, 2:31 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 505749 flang/include/flang/Runtime/matmul-transpose.h
flang/runtime/CMakeLists.txt
flang/runtime/matmul-transpose.cpp
flang/unittests/Runtime/CMakeLists.txt
flang/unittests/Runtime/MatmulTranspose.cpp
|
Braced initialization is used in the runtime for improved protection against inadvertent truncation.