This is an archive of the discontinued LLVM Phabricator instance.

[Matrix] Add initial tiling for load/multiply/store chains.
ClosedPublic

Authored by fhahn on Mar 3 2020, 1:43 PM.

Details

Summary

This patch adds initial fusion for load/multiply/store chains of matrix
operations.

The patch contains roughly two parts:

  1. Code generation for a fused load/multiply/store chain (LowerMatrixMultiplyFused).

First, we ensure that both loads of the multiply operands do not alias the store. If they do, we create new non-aliasing copies of the operands. Note that this may introduce new basic block. Finally we process TileSize x TileSize blocks. That is: load tiles from the input operands, multiply and store them.

  1. Identify fusion candidates & matrix instructions.

As a first step, collect all instructions with shape info and fusion candidates (currently @llvm.matrix.multiply calls). Next, try to fuse candidates and collect instructions eliminated by fusion. Finally iterate over all matrix instructions, skip the ones eliminated by fusion and lower the rest as usual.

Diff Detail

Event Timeline

fhahn created this revision.Mar 3 2020, 1:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2020, 1:43 PM
fhahn edited the summary of this revision. (Show Details)Mar 3 2020, 1:46 PM
fhahn retitled this revision from [Matrix] Add initial tiling for multiplies. to [Matrix] Add initial tiling for load/multiply/store chains..Mar 3 2020, 1:48 PM
fhahn updated this revision to Diff 248022.Mar 3 2020, 1:51 PM
fhahn edited the summary of this revision. (Show Details)

Strip outdated comment.

fhahn updated this revision to Diff 248324.Mar 4 2020, 2:42 PM

Use correct strides for loads/stores and ensure we always move the instructions after the multiply/store to a new BB.

Can you please describe the approach in the description/in a comment?

fhahn updated this revision to Diff 251599.Mar 20 2020, 3:51 AM

Rebased and fixed a small merge error.

fhahn edited the summary of this revision. (Show Details)Mar 20 2020, 8:20 AM
fhahn updated this revision to Diff 251654.Mar 20 2020, 8:36 AM

A couple of small code simplifications: use range based iterator over BB, use pattern match for @llvm.matrix.multiply.

I have a few specific comments below but overall it would be great if we could simplify VisitBBFusion to avoid recursion and invalidating the iterator...

This patch adds initial fusion for load/multiply/store chains of matrix
operations.

The patch contains roughly two parts:

Code generation for a fused load/multiply/store chain (LowerMatrixMultiplyFused).
First, we ensure that both loads of the multiply operands do not alias.

You mean any of the loads and the store?

If they do, we create new non-aliasing copies of the operands. Note that this may introduce new basic block. Then we split the block containing the multiply at the multiply,

At the multiply or at the store?

to simplify processing by returning the remainder of the original block to continue analysis (see 2.). Finally we process TileSize x TileSize blocks, that is, load tiles from the input operands, multiply and store them.

Identify fusion candidates.
To identify candidates for fusion, we look for @llvm.matrix.multiply with operands that are loads and a single use of the result in a store. To avoid generating unnecessary code for loads that later on get fused, we do a first pass over the function and only try fusing instructions, while keeping track of all other instructions with shape information in the function. We continue with the regular code generation for the remaining instructions with shape information after finishing fusion.

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
544

Document MatrixInsts

551

The name NextBB is strange here. It suggests that that is next one we are going to iterate to. Sounds like this is more like a NewBB?

561–563

Why is this necessary now? We weren't doing this before during traversal.

When Touched is true we always return so I think Touched is always false here, no?

607

Needs comment on the logic here, what is returned in MatrixInsts, etc.

1100

updated LoweredMatrix

1102

Document the BB returned

fhahn updated this revision to Diff 252334.Mar 24 2020, 8:29 AM
fhahn marked 11 inline comments as done.

I have a few specific comments below but overall it would be great if we could simplify VisitBBFusion to avoid recursion and invalidating the iterator...

Thanks Adam!

I've split the code into 3 parts:

  1. Collect all instructions with shape information and fusion candidates.
  2. Iterate over fusion candidates and try to fuse. Also collect set of instructions completely eliminated by fusion.
  3. Iterate over all instructions with shape info, skip the ones eliminated by fusion and lower the rest.
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
544

Code is gone.

551

Code is gone.

561–563

Code is gone.

607

Code is gone.

1102

The return value is gone.

fhahn edited the summary of this revision. (Show Details)Mar 24 2020, 8:32 AM

Very nice!

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
52–54

Add cl::desc

952

Please add a comment explaining the code that is generated below this point

anemet requested changes to this revision.Mar 29 2020, 12:32 PM
This revision now requires changes to proceed.Mar 29 2020, 12:32 PM
fhahn updated this revision to Diff 253448.Mar 29 2020, 12:46 PM
fhahn marked 2 inline comments as done.

Thanks Adam.

Added cl::desc to new options and alos cl::hidden. Added comments to getNonAliasingPointer

fhahn updated this revision to Diff 253453.Mar 29 2020, 12:57 PM

Adjust bb numbers in tests after change to creating BBs first.

LuoYuanke added inline comments.Mar 29 2020, 8:25 PM
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
1000

It seems the cost of Copy is not added to the cost model.

1010

I'm confused about line 1034 and 1035. Should it be this? There is no edge from Fusion to Copy and from Copy to Check1.
DTUpdates.push_back({DT.Insert, Copy, Fusion});
DTUpdates.push_back({DT.Insert, Check1, Copy});

1082

It seems the tile size should be 2 dimension. 1 for row, and 1 for column.

fhahn updated this revision to Diff 253543.Mar 30 2020, 3:42 AM
fhahn marked 6 inline comments as done.

Remove unnecessary updates, use DT::dominates in stead of OrderedInstructions, after DT::dominates now uses the BB local numbering recently committed.

fhahn added inline comments.Mar 30 2020, 3:44 AM
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
1000

Yes currently the cost-model is kept simple, because we initially are focusing on bringing up the code-generation and are trying to make sure the matrix intrinsics are applicable to a wide range of uses cases. My current priority is progressing the clang patches, adding support for row-major matrixes and running the lowering pass by default on IR containing matrix intrinsics.

I've added a few additional TODOs to improve the cost-modeling. Currently we only automatically fuse operations, if we would run out of registers without fusion, so it only kicks in for matrixes sizes where the cost of coping should be negligible. But this should definitely be improved in the future. Not sure when we will get to it though and any contributions in that direction would be very welcome.

1010

Right, those updates are not needed, which is much clearer after grouping all the update code together. It seems like the verification does not catch those unnecessary updates.

1082

Agreed, but as mentioned above I think using square tiles is a good compromise to bring up the infrastructure. After the initial commit, it should also be easier for other people to work on improving the tiling.

anemet requested changes to this revision.Apr 5 2020, 1:41 PM
anemet added inline comments.
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
57

Say it's square-shaped.

985–987

Are you being overly conservative here? If the end of the store is before the beginning of the load they still don't alias.

This revision now requires changes to proceed.Apr 5 2020, 1:41 PM
fhahn updated this revision to Diff 255198.Apr 5 2020, 2:05 PM
fhahn marked an inline comment as done.

Clarify that the tile size option is for a square shaped tile and add TODO at option to allow non-square tiles.

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
985–987

Yes, I think but the previous condition (line 1008) ensures that the load begins before the end of the store if we check the condition here.

anemet accepted this revision.Apr 5 2020, 2:12 PM

LGTM!

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
985–987

Ah ok!

This revision is now accepted and ready to land.Apr 5 2020, 2:12 PM
This revision was automatically updated to reflect the committed changes.