Add special case to matrix lowering for dot products. Normal matrix lowering if optimized for either row-major or column-major, which results in many shufflevector instructions being generated for one vector. We work around this in our special case. We can also use vector-reduce adds instead of sequential adds to sum the result of the element-wise multiplication, which takes advantage of SIMD instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
885 | I don't think this is necessary and it looks a bit odd to overwrite the earlier Changed status. It should be sufficient to remove setting Changed on line 877. | |
1214 | Please add a comment describing what this function does. | |
1217 | I think we should be able to just look up the shape of operands 0 and 1 in ShapeMap instead? | |
1220 | nit: coding style recommends omitting {} for single-line blocks. | |
1229 | nit: Coding style uses UpperCase for variables. | |
1249 | might be good to explain what costs we are comparing. | |
1252 | nit: the style guide recommends avoiding auto in most cases where the type is not entirely obvious from the context (e.g. auto for dyn_cast is fine). | |
1262 | nit: use UpperCase for variables. | |
1264 | To check for a specific intrinsic, you could either use something like match(&I, m_Intrinsic<Intrinsic::matrix_transpose>(m_Value(TA))) or cast to IntrinsicInst and checking getIntrinsicID(). | |
1292 | I am not sure if that's correct. Here we need to load a vector with 1 row and N columns. llvm.matrix.column.load may have a stride > 1 between columns, which is ignored. We should probably limit this to .llvm.matrix.column.load with a stride of 1 here for now. Needs a test case. | |
1317 | nit: according to the style guide, comments are meant to be full sentences, with the first letter capitalized and a period at the end. Applies to most comments added in the patch. | |
1323 | nit: no need for {} | |
1326 | nit: no need for {} | |
1330 | nit: no need to return at the end of a void function. |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1240 | You can probably remove this and use IRBuilder::Create(F)AddReduce directly below? | |
1244 | Maybe we can do all the FMF and cost checks before everything? | |
1272 | You might want to specify what you mean by "differently" (not using LowerLoad because we would end up with scalar loads for the row vector). | |
1274 | To add on top of what @fhahn said above, you can combine matchers like this: match(&I, m_CombineOr(m_Load(), m_Intrinsic<Intrinsic::matrix_column_major_load>()) | |
llvm/test/Transforms/LowerMatrixIntrinsics/dot-product.ll | ||
165 | Very nice! |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1217 | Using ShapeMap works when we load matrix inside the function. However, it doesn't if the matrix is passed in as an argument. As such, I have not implemented this. | |
1292 | Most of our test cases didn't account for this (they used strides greater than 1). I am submitting an additional review that will update the test cases to use strides of 1 but keep some that don't as test cases. | |
1292 |
LGTM, thank you!
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1237 | Not sure what's going on here. Intrinsic with Instruction::Add? It also doesn't seem to be really used. | |
1253 | You cold match the stride directly here: ConstantInt *LoadStride = nullptr; if (match(Val, m_Intrinsic<Intrinsic::matrix_column_major_load>( m_Value(), m_ConstantInt(LoadStride), m_Value(), m_Value()) && LoadStride->getZExtValue() == Stride) { and pass Stride as an argument of GetBuiltinLoad. |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1237 | I use it to get the function type when estimating sequential add cost. line 1243. Do you suggest doing it a different way? | |
1253 | I tried this but I realized that I would have to have a control flow to check if the load has size of n anyways. This is because if we return null, it could mean there's a LoadInst (instead of a builtin load) or the stride in incorrect, which require different behaviors. |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1237 | It looks like you only use AddOpCode and Add is only used for the type, right? |
Rebased. I also extended int coverage, so we have enough coverage to land this and build in it.
clang-format not found in user’s local PATH; not linting file.