Add special case to matrix lowering for dot products. Normal matrix lowering if optimized for either row-major or column-major, which results in many shufflevector instructions being generated for one vector. We work around this in our special case. We can also use vector-reduce adds instead of sequential adds to sum the result of the element-wise multiplication, which takes advantage of SIMD instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
892 | I don't think this is necessary and it looks a bit odd to overwrite the earlier Changed status. It should be sufficient to remove setting Changed on line 877. | |
1213 | Please add a comment describing what this function does. | |
1216 | I think we should be able to just look up the shape of operands 0 and 1 in ShapeMap instead? | |
1219 | nit: coding style recommends omitting {} for single-line blocks. | |
1228 | nit: Coding style uses UpperCase for variables. | |
1248 | might be good to explain what costs we are comparing. | |
1251 | nit: the style guide recommends avoiding auto in most cases where the type is not entirely obvious from the context (e.g. auto for dyn_cast is fine). | |
1261 | nit: use UpperCase for variables. | |
1263 | To check for a specific intrinsic, you could either use something like match(&I, m_Intrinsic<Intrinsic::matrix_transpose>(m_Value(TA))) or cast to IntrinsicInst and checking getIntrinsicID(). | |
1291 | I am not sure if that's correct. Here we need to load a vector with 1 row and N columns. llvm.matrix.column.load may have a stride > 1 between columns, which is ignored. We should probably limit this to .llvm.matrix.column.load with a stride of 1 here for now. Needs a test case. | |
1316 | nit: according to the style guide, comments are meant to be full sentences, with the first letter capitalized and a period at the end. Applies to most comments added in the patch. | |
1322 | nit: no need for {} | |
1325 | nit: no need for {} | |
1329 | nit: no need to return at the end of a void function. |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1239 | You can probably remove this and use IRBuilder::Create(F)AddReduce directly below? | |
1243 | Maybe we can do all the FMF and cost checks before everything? | |
1271 | You might want to specify what you mean by "differently" (not using LowerLoad because we would end up with scalar loads for the row vector). | |
1273 | To add on top of what @fhahn said above, you can combine matchers like this: match(&I, m_CombineOr(m_Load(), m_Intrinsic<Intrinsic::matrix_column_major_load>()) | |
llvm/test/Transforms/LowerMatrixIntrinsics/dot-product.ll | ||
165–170 | Very nice! |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1216 | Using ShapeMap works when we load matrix inside the function. However, it doesn't if the matrix is passed in as an argument. As such, I have not implemented this. | |
1291 | Most of our test cases didn't account for this (they used strides greater than 1). I am submitting an additional review that will update the test cases to use strides of 1 but keep some that don't as test cases. | |
1291 |
LGTM, thank you!
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1236 | Not sure what's going on here. Intrinsic with Instruction::Add? It also doesn't seem to be really used. | |
1252 | You cold match the stride directly here: ConstantInt *LoadStride = nullptr; if (match(Val, m_Intrinsic<Intrinsic::matrix_column_major_load>( m_Value(), m_ConstantInt(LoadStride), m_Value(), m_Value()) && LoadStride->getZExtValue() == Stride) { and pass Stride as an argument of GetBuiltinLoad. |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1236 | I use it to get the function type when estimating sequential add cost. line 1243. Do you suggest doing it a different way? | |
1252 | I tried this but I realized that I would have to have a control flow to check if the load has size of n anyways. This is because if we return null, it could mean there's a LoadInst (instead of a builtin load) or the stride in incorrect, which require different behaviors. |
llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp | ||
---|---|---|
1236 | It looks like you only use AddOpCode and Add is only used for the type, right? |
Rebased. I also extended int coverage, so we have enough coverage to land this and build in it.
clang-format not found in user’s local PATH; not linting file.