In the long run, we want to unify the dot product codegen solutions between
all target architectures, but this intrinsic enables experimenting with AVX
specific implementations in the meantime.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/include/mlir/Dialect/X86Vector/X86Vector.td | ||
---|---|---|
302 | Like we argued/settled in other dialects (i.e. neon IIRC), this should have a | |
314 | Then this can become: %0 = x86vector.avx.dot %a, %b : vector<2x4xf32> %1 = vector.extract %0[0, 0]: vector<2x4xf32> %2 = vector.extract %0[1, 0]: vector<2x4xf32> %d = addf %1, %2 : f32 |
mlir/include/mlir/Dialect/X86Vector/X86Vector.td | ||
---|---|---|
302 | Ah, I suppose you are referring to https://reviews.llvm.org/D98198 I somehow missed the second part of the discussion during the review. Let me have a careful read. Are you okay with this CL + later improvement or do you want me to make those changes first? |
I'd like to make sure we are on the same page globally on this.
As far as simple step forward, assuming you'd be in agreement, you could just rename to x86vector.avx.intr.dot and not change anything else.
The "codegen exposed" version would then be a new op and the lowering to x86vector would be a simple reshape.
The big thing then will be the iterative canonicalizations of reshapes all the way to vector.transfer ops which we have to do in a portable fashion.
Yes, I am! The lower/higher lanes semantics are always a bit confusing, and I like the solution of exploiting the higher dimensionality of the vector dialect to represent that better.
I will make both intrinsics, and then add a new op in next revision.
Like we argued/settled in other dialects (i.e. neon IIRC), this should have a
"codegen public" form x86.avx.dot on vector<2x4xf32> and an x86.avx.intrin.dot form on vector<8xf32>.