This op is the batched version of linalg.mmt4d. It performs matrix-matrix-transpose multiplication of batched 4-d (5d) inputs as the following:
C[b, m1, n1, m0, n0] = sum_{b, k1, k0}(A[b, m1, k1, m0, k0] * B[b, n1, k1, n0, k0])
The current use is to provide linalg.batch_matmul a lowering path similar to linalg.matmul -> linalg.mmt4d.