This is an archive of the discontinued LLVM Phabricator instance.

[Matrix] Add special case dot product lowering
ClosedPublic

Authored by virnarula on Aug 3 2022, 3:51 PM.

Details

Summary

Add special case to matrix lowering for dot products. Normal matrix lowering if optimized for either row-major or column-major, which results in many shufflevector instructions being generated for one vector. We work around this in our special case. We can also use vector-reduce adds instead of sequential adds to sum the result of the element-wise multiplication, which takes advantage of SIMD instructions.

Diff Detail

Event Timeline

virnarula created this revision.Aug 3 2022, 3:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2022, 3:51 PM
virnarula requested review of this revision.Aug 3 2022, 3:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2022, 3:51 PM
virnarula updated this revision to Diff 449816.Aug 3 2022, 3:53 PM

Fix formatting with clang-format

virnarula edited the summary of this revision. (Show Details)Aug 3 2022, 4:03 PM
virnarula added a reviewer: fhahn.
fhahn added inline comments.Aug 4 2022, 5:59 AM
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
885

I don't think this is necessary and it looks a bit odd to overwrite the earlier Changed status. It should be sufficient to remove setting Changed on line 877.

1214

Please add a comment describing what this function does.

1217

I think we should be able to just look up the shape of operands 0 and 1 in ShapeMap instead?

1220

nit: coding style recommends omitting {} for single-line blocks.

1229

nit: Coding style uses UpperCase for variables.

1249

might be good to explain what costs we are comparing.

1252

nit: the style guide recommends avoiding auto in most cases where the type is not entirely obvious from the context (e.g. auto for dyn_cast is fine).

1262

nit: use UpperCase for variables.

1264

To check for a specific intrinsic, you could either use something like match(&I, m_Intrinsic<Intrinsic::matrix_transpose>(m_Value(TA))) or cast to IntrinsicInst and checking getIntrinsicID().

1292

I am not sure if that's correct. Here we need to load a vector with 1 row and N columns. llvm.matrix.column.load may have a stride > 1 between columns, which is ignored. We should probably limit this to .llvm.matrix.column.load with a stride of 1 here for now. Needs a test case.

1317

nit: according to the style guide, comments are meant to be full sentences, with the first letter capitalized and a period at the end. Applies to most comments added in the patch.

1323

nit: no need for {}

1326

nit: no need for {}

1330

nit: no need to return at the end of a void function.

thegameg added inline comments.Aug 4 2022, 9:41 AM
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
1240

You can probably remove this and use IRBuilder::Create(F)AddReduce directly below?

1244

Maybe we can do all the FMF and cost checks before everything?

1272

You might want to specify what you mean by "differently" (not using LowerLoad because we would end up with scalar loads for the row vector).

1274

To add on top of what @fhahn said above, you can combine matchers like this:

match(&I, m_CombineOr(m_Load(), m_Intrinsic<Intrinsic::matrix_column_major_load>())
llvm/test/Transforms/LowerMatrixIntrinsics/dot-product.ll
165–170

Very nice!

virnarula updated this revision to Diff 450957.Aug 8 2022, 2:38 PM
virnarula marked 17 inline comments as done.

Address first round of comments.

virnarula added inline comments.Aug 11 2022, 8:23 AM
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
1217

Using ShapeMap works when we load matrix inside the function. However, it doesn't if the matrix is passed in as an argument. As such, I have not implemented this.

1292

Most of our test cases didn't account for this (they used strides greater than 1). I am submitting an additional review that will update the test cases to use strides of 1 but keep some that don't as test cases.

1292
thegameg accepted this revision.Aug 11 2022, 12:02 PM

LGTM, thank you!

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
1237

Not sure what's going on here. Intrinsic with Instruction::Add? It also doesn't seem to be really used.

1253

You cold match the stride directly here:

ConstantInt *LoadStride = nullptr;
if (match(Val, m_Intrinsic<Intrinsic::matrix_column_major_load>(
                       m_Value(), m_ConstantInt(LoadStride), m_Value(), m_Value()) &&
    LoadStride->getZExtValue() == Stride) {

and pass Stride as an argument of GetBuiltinLoad.

This revision is now accepted and ready to land.Aug 11 2022, 12:02 PM
virnarula marked 2 inline comments as done.Aug 14 2022, 5:58 PM
virnarula added inline comments.
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
1237

I use it to get the function type when estimating sequential add cost. line 1243. Do you suggest doing it a different way?

1253

I tried this but I realized that I would have to have a control flow to check if the load has size of n anyways. This is because if we return null, it could mean there's a LoadInst (instead of a builtin load) or the stride in incorrect, which require different behaviors.

thegameg added inline comments.Aug 15 2022, 4:57 PM
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
1237

It looks like you only use AddOpCode and Add is only used for the type, right?

fhahn updated this revision to Diff 509988.Mar 31 2023, 4:35 AM

Rebased. I also extended int coverage, so we have enough coverage to land this and build in it.

fhahn accepted this revision.Mar 31 2023, 4:37 AM

LGTM, thanks!

This revision was landed with ongoing or failed builds.Mar 31 2023, 4:41 AM
This revision was automatically updated to reflect the committed changes.