This is an archive of the discontinued LLVM Phabricator instance.

[Matrix] Add forward shape propagation and first shape aware lowerings.
ClosedPublic

Authored by fhahn on Dec 2 2019, 5:37 AM.

Details

Summary

This patch adds infrastructure for forward shape propagation to
LowerMatrixIntrinsics. It also updates the pass to make use of
the shape information to break up larger vector operations and to
eliminate unnecessary conversion operations between columnwise matrixes
and flattened vectors: if shape information is available for an
instruction, lower the operation to a set of instructions operating on
columns. For example, a store of a matrix is broken down into separate
stores for each column. For users that do not have shape
information (e.g. because they do not yet support shape information
aware lowering), we pack the result columns into a flat vector and
update those users.

It also adds shape aware lowering for the first non-intrinsic
instruction: vector stores.

Example:

For

%c  = call <4 x double> @llvm.matrix.transpose(<4 x double> %a, i32 2, i32 2)
store <4 x double> %c, <4 x double>* %Ptr

We generate the code below without shape propagation. Note %9 which
combines the columns of the transposed matrix into a flat vector.

%split = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 0, i32 1>
%split1 = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 2, i32 3>
%1 = extractelement <2 x double> %split, i64 0
%2 = insertelement <2 x double> undef, double %1, i64 0
%3 = extractelement <2 x double> %split1, i64 0
%4 = insertelement <2 x double> %2, double %3, i64 1
%5 = extractelement <2 x double> %split, i64 1
%6 = insertelement <2 x double> undef, double %5, i64 0
%7 = extractelement <2 x double> %split1, i64 1
%8 = insertelement <2 x double> %6, double %7, i64 1
%9 = shufflevector <2 x double> %4, <2 x double> %8, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
store <4 x double> %9, <4 x double>* %Ptr

With this patch, we propagate the 2x2 shape information from the
transpose to the store and we generate the code below. Note that we
store the columns directly and do not need an extra shuffle.

%9 = bitcast <4 x double>* %Ptr to double*
%10 = bitcast double* %9 to <2 x double>*
store <2 x double> %4, <2 x double>* %10, align 8
%11 = getelementptr double, double* %9, i32 2
%12 = bitcast double* %11 to <2 x double>*
store <2 x double> %8, <2 x double>* %12, align 8

Diff Detail

Event Timeline

fhahn created this revision.Dec 2 2019, 5:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2019, 5:37 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
fhahn updated this revision to Diff 231685.Dec 2 2019, 5:40 AM

strip unnecessary test changes

Build result: FAILURE - Could not check out parent git hash "30959a9a1249e0d3b2f18c6622847da457308e49". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

Build result: FAILURE - Could not check out parent git hash "30959a9a1249e0d3b2f18c6622847da457308e49". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

reames resigned from this revision.Dec 2 2019, 4:49 PM
anemet added a comment.Dec 3 2019, 9:29 PM

Also the existing test diffs are hard to read, please explain what's going on there.

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
109–130

Needs an update explaining the shape propagation and its use.

191

Is this supposed to indicate empty or undefined? I am not sure that the implicit conversion is self-explanatory here.

219–222

Update comment

230–241

The comment says you returning the ColumnMatrix here but you're not.

290

I may be missing something but do these need the lambda?

548

Please add a comment of what's happening here.

fhahn updated this revision to Diff 232136.Dec 4 2019, 8:03 AM

Address Adam's comments

fhahn marked 5 inline comments as done.Dec 4 2019, 8:06 AM

Also the existing test diffs are hard to read, please explain what's going on there.

I've added comments to break up the check lines

Build result: FAILURE -
Log files: console-log.txt, CMakeCache.txt

LuoYuanke added inline comments.Dec 8 2019, 2:35 AM
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
362

It seems only store instruction is propagated with the shape information. Why? Take below pseudo code for example. Are v2 and v3 propagated with the shape information?

v1 = matrix_columnwise_load(..., m, n)
v2 = max(v1, 0)
v3 = v1 / v2
store v3, ptr
fhahn marked an inline comment as done.Dec 8 2019, 3:24 AM
fhahn added inline comments.
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
362

This patch mostly adds the infrastructure for the propagation and only uses it for store instructions. So in the example, the shape is not propagated by this patch.

Additional support for loads (D70900), binary operators (D70898) and back-propagation (D70899) are added in follow-up commits, to make reviewing them more manageable. The whole patch series is linked in Phabricator (see the 'stack' section).

Please note that we could propagate shape information to more instructions, e.g. phis or selects. That can be added as follow-up as well, it is just a matter of priorities (we found loads/stores/binary operators to be by far the most common operations in matrix expressions). Any help with covering more cases would be very welcome :)

LuoYuanke added inline comments.Dec 8 2019, 4:33 AM
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
362

Thank you for reply. Do you propagate the shape information recursively? If the matrix is stored to memory, does the propagation break? Can we get the shape information when reload the matrix from memory?

v1 = matrix_multipy(..., m, n, k)
store v1, ptr *
v2 = load ptr*

How do we pass the shape information across function and return shape from function? Do you plan to have matrix as first class type?

v1 = matrix_multipy(..., m, n, k)
v2 = call foo(v1)
fhahn marked an inline comment as done.Dec 8 2019, 7:00 AM
fhahn added inline comments.
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
362

Thank you for reply. Do you propagate the shape information recursively?

It is propagated iteratively: once we propagated shape information to an instruction, we add its users to the worklist. A later patches add back propagation as well and D70901 implements iteration until no new shape information can be discovered.

If the matrix is stored to memory, does the propagation break? Can we get the shape information when reload the matrix from memory?

Currently yes, we do not propagate through memory instructions. For simple cases like the one above should not really show up, as such loads should be promoted to use the value directly. We could handle more involved cases by using MemorySSA/additional alias analysis. Currently that is not a high priority for us, but we would be happy to collaborate on that as well.

How do we pass the shape information across function and return shape from function? Do you plan to have matrix as first class type?

Currently we do not propagate the shape information across function boundaries and we do not plan on proposing a dedicated matrix type. The original proposal was focused around a dedicated type, but it was decided to go with a more lightweight solution and potential revisit the matrix type once there is a strong need.

For propagating across function boundaries one way to go about would be to turn the lowering into a module pass.

fhahn updated this revision to Diff 233628.Dec 12 2019, 8:40 AM

Update after changing %stride.

Also the existing test diffs are hard to read, please explain what's going on there.

I've added comments to break up the check lines

Thanks for that. Can you also explain the nature of the changes with one example. I am assuming we're removing embedVectors/extractVectors, i.e. bunch of shuffles, pointing to one example would be useful.

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
116

produced

121

all instruction *that* we have

130

Nice write-up!

560

nit: ShapeMap.count?

llvm/test/Transforms/LowerMatrixIntrinsics/propagate-forward.ll
38

FileCheck is never executed with the SHAPE prefix.

anemet requested changes to this revision.Dec 19 2019, 3:14 PM
This revision now requires changes to proceed.Dec 19 2019, 3:14 PM
fhahn updated this revision to Diff 234807.Dec 19 2019, 4:43 PM

Address comments, thanks!

fhahn edited the summary of this revision. (Show Details)Dec 19 2019, 4:43 PM

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

fhahn marked 4 inline comments as done.Dec 19 2019, 4:47 PM

Also the existing test diffs are hard to read, please explain what's going on there.

I've added comments to break up the check lines

Thanks for that. Can you also explain the nature of the changes with one example. I am assuming we're removing embedVectors/extractVectors, i.e. bunch of shuffles, pointing to one example would be useful.

I've updated the description of the patch to include an example.

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
560

I usually prefer find(), but I guess for DenseMap it does not matter performance wise and I can change it before committing if you prefer.

llvm/test/Transforms/LowerMatrixIntrinsics/propagate-forward.ll
38

I've dropped those, but added an explanation of what we are checking.

anemet accepted this revision.Dec 20 2019, 9:20 AM

LGTM

This revision is now accepted and ready to land.Dec 20 2019, 9:20 AM
This revision was automatically updated to reflect the committed changes.