This is an archive of the discontinued LLVM Phabricator instance.

[mlir] [VectorOps] Add choice between dot and axpy lowering of vector.contract
ClosedPublic

Authored by aartbik on Jul 1 2020, 5:36 PM.

Details

Summary

Default vector.contract lowering essentially yields a series of sdot/ddot
operations. However, for some layouts a series of saxpy/daxpy operations,
chained through fma are more efficient. This CL introduces a choice between
the two lowering paths. A default heuristic is to follow.

Some preliminary avx2 performance numbers for matrix-times-vector.
Here, dot performs best for 64x64 A x b and saxpy for 64x64 A^T x b.

------------------------------------------------------------
            A x b                          A^T x b
------------------------------------------------------------
GFLOPS    sdot (reassoc)    saxpy    sdot (reassoc)    saxpy
------------------------------------------------------------
1x1        0.6               0.9       0.6             0.9
2x2        2.5               3.2       2.4             3.5
4x4        6.4               8.4       4.9             11.8
8x8       11.7               6.1       5.0             29.6
16x16     20.7              10.8       7.3             43.3
32x32     29.3               7.9       6.4             51.8
64x64     38.9                                         79.3
128x128   32.4                                         40.7
------------------------------------------------------------

Diff Detail

Event Timeline

aartbik created this revision.Jul 1 2020, 5:36 PM
aartbik edited the summary of this revision. (Show Details)Jul 1 2020, 5:38 PM
aartbik edited the summary of this revision. (Show Details)Jul 1 2020, 5:45 PM
aartbik edited the summary of this revision. (Show Details)
aartbik updated this revision to Diff 274978.Jul 1 2020, 5:59 PM
aartbik edited the summary of this revision. (Show Details)

fixed typos

ftynse accepted this revision.Jul 2 2020, 1:35 AM
ftynse added inline comments.
mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144

Ultra-nit: TODOs shouldn't go to doxygen, so use // instead of /// for them.

mlir/lib/Dialect/Vector/VectorTransforms.cpp
1774

Would it make sense to have a fused matchAndRewrite and avoid duplicate checks here? We can just have the trailing else below that returns failure().

1781

Nit: why is this using getDimSize(), but everything below uses getShape()[] ?

This revision is now accepted and ready to land.Jul 2 2020, 1:35 AM
nicolasvasilache accepted this revision.Jul 2 2020, 4:08 AM

Thanks Aart, let's land this and we can figure out the refactoring in a subsequent CL depending on what we decide for the semantics of outerproduct.

mlir/include/mlir/Dialect/Vector/VectorOps.h
56

AXPY and OuterProduct are fundamentally the same strategy in the 2dx2d -> 2d and 1dx2d->1d / 2dx1d->2d, respectively.
It would therefore be good to use a single entry point.

Going progressively through vector.outerproduct can be beneficial to compose with patterns that operate on vector.outerproduct.
vector.outerproduct then lowers to the appropriate extract/insert + splat + FMA.
For the matvec special case, I was thinking of relaxing the semantics of vector.outerproduct to take either 1dx1d -> 2d, scalar x 1d -> 1d and 1d x scalar -> 1d.

Thoughts?

mlir/lib/Dialect/Vector/VectorTransforms.cpp
1766

Great, thanks for keeping this consistent with the contraction lowering.

1774

@ftynse this is similar to the vector.outerproduct case for which there is additional fallback logic to degrade to the FMA (now called Dot) version.

1781

I'd say let's use getDimSize everywhere (including the outerproduct part).

1804

With`vector.outerproduct` vector/scalar polymorphism, this piece of code would be exactly l. 1720:

for (unsigned k = 0; k < reductionSize; ++k) {
  Value a = rewriter.create<vector::ExtractOp>(op.getLoc(), lhs, k);
  Value b = rewriter.create<vector::ExtractOp>(op.getLoc(), rhs, k);
  res = rewriter.create<vector::OuterProductOp>(op.getLoc(), a, b, res);
}
rewriter.replaceOp(op, res);

In other words, AXPY is a non-progressive lowering to broadcast + FMA.
Making it progressive would allow some nice refactoring IMO, but it requires agreeing that vector.outerproduct should work with either 1dx1d -> 2d, scalar x 1d -> 1d and 1d x scalar -> 1d forms.

@aartbik @ftynse @mehdi_amini thoughts?

ftynse added inline comments.Jul 2 2020, 4:26 AM
mlir/lib/Dialect/Vector/VectorTransforms.cpp
1774

It wouldn't anyhow affect the logic. You can fail the pattern in matchAndRewrite and the next pattern will be applied. The default implementation of matchAndRewrite is literally calling match and rewrite in a row, https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/IR/PatternMatch.h#L146.

1804

but it requires agreeing that vector.outerproduct should work with either 1dx1d -> 2d, scalar x 1d -> 1d and 1d x scalar -> 1d forms.

I won't object to this, sounds like a reasonable generalization

aartbik marked 13 inline comments as done.Jul 2 2020, 12:38 PM
aartbik added inline comments.
mlir/include/mlir/Dialect/Vector/VectorOps.h
56

Yes, I agree. If we allow scalars in outerproduct and let that "degenerate" to axpy I can unify this code quite a bit. And I agree that having fewer cases and more progressive lowering will compose much better in the end.

mlir/lib/Dialect/Vector/VectorTransforms.cpp
1774

For now, I kept the three special cases of contract lowering in this form.

1804

That's the plan (since dotproduct with a vector/scalar is just a loop around axpy).

aartbik updated this revision to Diff 275201.Jul 2 2020, 12:46 PM
aartbik marked 3 inline comments as done.

addressed comments

This revision was automatically updated to reflect the committed changes.
mehdi_amini added inline comments.Jul 4 2020, 9:56 PM
mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144

Nit: we don't add usernames in TODO in LLVM.

rriddle added inline comments.Jul 6 2020, 6:51 PM
mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144

Unresolved?

mlir/lib/Dialect/Vector/VectorTransforms.cpp
1734

Please do not add usernames in TODOs, same below.

mehdi_amini added inline comments.Jul 6 2020, 9:28 PM
mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144

(To be fair: it was a post-commit review)

aartbik marked an inline comment as done.Jul 6 2020, 9:33 PM
aartbik added inline comments.
mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144

and, it is not the first named TODO we added to the code :-)
I count 437 right now under mlir, I even count 21 TODO(riverriddle)....

But, I can keep this in mind and cleanup going forward....

mehdi_amini added inline comments.Jul 6 2020, 9:35 PM
mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144

These likely predates the integration into LLVM, when MLIR was developed inside Google

rriddle added inline comments.Jul 6 2020, 10:02 PM
mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144

Exactly. Also, patches welcome if you want to do something about it instead of just counting :)

rriddle added inline comments.Jul 7 2020, 1:43 AM
mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144

Went ahead and did it for you in: 9db53a182705ac1f652c6ee375735bea5539272c