Adds mask.compress to the AVX512 dialect and defines a lowering to the LLVM dialect.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/include/mlir/Dialect/AVX512/AVX512.td | ||
---|---|---|
34 | There are _mm512_maskz_compress_ps etc. intrinsics for zeroed compress, but they use the same AVX instruction as _mm512_mask_compress_ps: mask.compress. So technically we don't really need the zeroed variant as a separate operation, and can instead set src := vector of all zeros. Should I keep MaskzCompressOp or remove it from this commit? (In most cases (e.g., sparse dot product), we probably use the zeroed variant.) |
mlir/include/mlir/Dialect/AVX512/AVX512.td | ||
---|---|---|
34 | I don't see the value proposition of the zeroed variant, especially since it lowers to the same LLVM IR intrinsic as the non-zeroed one. If you want to avoid looking up the src operand and checking if it is a constant zero vector, you can either have a unit attribute zero on the "main" op + make the src argument optional + add a verifier that either the operand or the attribute is present or have a constant_src attribute with _any_ value (not only zero), that can be used instead of the src argument + the same scheme as above. The latter is closer in spirit to Linalg, which is the dialect that will target this I presume. | |
47 | Surrounding variable names with quotes or backticks is a generally good practice. Doing so for k in error messages in TypesMatchWith would also address the issue of misreading them as a typo. | |
mlir/include/mlir/Dialect/LLVMIR/LLVMAVX512.td | ||
36 | Naming nit: this is a specific kind of overloading that is fully defined by the first result, there may be other kinds so I wouldn't use IntrOverloadedOp here only to find out later that it cannot be used for, e.g., an op that is overloaded by its second operand. | |
42 | Nit: add a comment for 1 for consistency with other literal arguments. |
mlir/include/mlir/Dialect/AVX512/AVX512.td | ||
---|---|---|
34 | Hmm I thought I had replied but apparently did not commit .. I'm be curious to see if using one vs the other can result in significant difference in the final assembly generated by LLVM in interesting cases (likely a few commits down the line until we have a full dot/matvec example). Technically there may be a difference for LLVM's optimizations. Re. where the funneling should go, we could indeed drop 1 op and use an attribute. |
mlir/include/mlir/Dialect/AVX512/AVX512.td | ||
---|---|---|
34 | I decided to add a constant_src attribute. If neither constant_src nor src is specified, a vector of all zeros is used (same as maskz.compress). | |
mlir/include/mlir/Dialect/LLVMIR/LLVMAVX512.td | ||
36 | Added a comment. I can change this if necessary when adding additional ops. |
There are _mm512_maskz_compress_ps etc. intrinsics for zeroed compress, but they use the same AVX instruction as _mm512_mask_compress_ps: mask.compress. So technically we don't really need the zeroed variant as a separate operation, and can instead set src := vector of all zeros.
Should I keep MaskzCompressOp or remove it from this commit? (In most cases (e.g., sparse dot product), we probably use the zeroed variant.)