Add special case to matrix lowering for dot products. Normal matrix lowering if optimized for either row-major or column-major, which results in many shufflevector instructions being generated for one vector. We work around this in our special case. We can also use vector-reduce adds instead of sequential adds to sum the result of the element-wise multiplication, which takes advantage of SIMD instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1000 | I don't think this is necessary and it looks a bit odd to overwrite the earlier Changed status. It should be sufficient to remove setting Changed on line 877. | |
1321 | Please add a comment describing what this function does. | |
1324 | I think we should be able to just look up the shape of operands 0 and 1 in ShapeMap instead? | |
1327 | nit: coding style recommends omitting {} for single-line blocks. | |
1336 | nit: Coding style uses UpperCase for variables. | |
1356 | might be good to explain what costs we are comparing. | |
1359 | nit: the style guide recommends avoiding auto in most cases where the type is not entirely obvious from the context (e.g. auto for dyn_cast is fine). | |
1369 | nit: use UpperCase for variables. | |
1371 | To check for a specific intrinsic, you could either use something like match(&I, m_Intrinsic<Intrinsic::matrix_transpose>(m_Value(TA))) or cast to IntrinsicInst and checking getIntrinsicID(). | |
1399 | I am not sure if that's correct. Here we need to load a vector with 1 row and N columns. llvm.matrix.column.load may have a stride > 1 between columns, which is ignored. We should probably limit this to .llvm.matrix.column.load with a stride of 1 here for now. Needs a test case. | |
1424 | nit: according to the style guide, comments are meant to be full sentences, with the first letter capitalized and a period at the end. Applies to most comments added in the patch. | |
1430 | nit: no need for {} | |
1433 | nit: no need for {} | |
1437 | nit: no need to return at the end of a void function. |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1347 | You can probably remove this and use IRBuilder::Create(F)AddReduce directly below? | |
1351 | Maybe we can do all the FMF and cost checks before everything? | |
1379 | You might want to specify what you mean by "differently" (not using LowerLoad because we would end up with scalar loads for the row vector). | |
1381 | To add on top of what @fhahn said above, you can combine matchers like this: match(&I, m_CombineOr(m_Load(), m_Intrinsic<Intrinsic::matrix_column_major_load>()) | |
llvm/test/Transforms/LowerMatrixIntrinsics/dot-product.ll | ||
165 ↗ | (On Diff #449816) | Very nice! |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1324 | Using ShapeMap works when we load matrix inside the function. However, it doesn't if the matrix is passed in as an argument. As such, I have not implemented this. | |
1399 | Most of our test cases didn't account for this (they used strides greater than 1). I am submitting an additional review that will update the test cases to use strides of 1 but keep some that don't as test cases. | |
1399 |
LGTM, thank you!
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1344 | Not sure what's going on here. Intrinsic with Instruction::Add? It also doesn't seem to be really used. | |
1360 | You cold match the stride directly here: ConstantInt *LoadStride = nullptr; if (match(Val, m_Intrinsic<Intrinsic::matrix_column_major_load>( m_Value(), m_ConstantInt(LoadStride), m_Value(), m_Value()) && LoadStride->getZExtValue() == Stride) { and pass Stride as an argument of GetBuiltinLoad. |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1344 | I use it to get the function type when estimating sequential add cost. line 1243. Do you suggest doing it a different way? | |
1360 | I tried this but I realized that I would have to have a control flow to check if the load has size of n anyways. This is because if we return null, it could mean there's a LoadInst (instead of a builtin load) or the stride in incorrect, which require different behaviors. |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1344 | It looks like you only use AddOpCode and Add is only used for the type, right? |
Rebased. I also extended int coverage, so we have enough coverage to land this and build in it.
I don't think this is necessary and it looks a bit odd to overwrite the earlier Changed status. It should be sufficient to remove setting Changed on line 877.