This is an archive of the discontinued LLVM Phabricator instance.

[mlir] MathApproximations: unroll virtual vectors into hardware vectors for ISA specific operation
ClosedPublic

Authored by ezhulenev on Oct 28 2021, 10:38 AM.

Diff Detail

Event Timeline

ezhulenev created this revision.Oct 28 2021, 10:38 AM
ezhulenev requested review of this revision.Oct 28 2021, 10:38 AM
cota accepted this revision.Oct 28 2021, 12:20 PM

Thanks for this! Just some nits below, feel free to ignore.

One question: do you think the added helper would be useful to others? I wonder whether putting it somewhere in the realm of the Vector dialect (VectorUtils?) would make sense.

mlir/lib/Dialect/Math/Transforms/PolynomialApproximation.cpp
131

'rank' is only used here so consider dropping it and doing innerDim = inputShape.back().

132

Would it be worth asserting that vectorWidth > 0 before dividing by it?

141

Consider adding this comment after the if instead of the two inline comments below it:
// Expand shape from {..., innerDim} to {..., expansionDim, vectorWidth}

142

Would be more readable to use vectorWidth here

mlir/test/Dialect/Math/polynomial-approximation.mlir
470

Consider adding here AVX2-NOT: vector.shape_cast, as below.

This revision is now accepted and ready to land.Oct 28 2021, 12:20 PM
ezhulenev marked 5 inline comments as done.Oct 28 2021, 12:37 PM

Address comments

silvas added a subscriber: silvas.Oct 28 2021, 2:05 PM

This doesn't seem to be the right pass to do target-specific lowerings. Can you add math.rsqrt and move the vector unrolling to a generic pass in the vector dialect? Then this pass wouldn't depend on target-details.

This doesn't seem to be the right pass to do target-specific lowerings. Can you add math.rsqrt and move the vector unrolling to a generic pass in the vector dialect? Then this pass wouldn't depend on target-details.

We already have math.rsqrt, and it can take virtual vectors of arbitrary rank, the problem is that math.rsqrt approximated with rsqrt (which is weird, but the problem is that AVX2 rsqrt instruction does not have enough precision).

I didn't add this vector unrolling anywhere in the vector passes/utils, because in my opinion that would violate the principles of vector dialect, everything should be in virtual vectors for as long as possible.

Can we move this RsqrtOp lowering to the x86vector dialect? It seems really out of place in the math dialect, which should remain target agnostic. (also, since this approximation is newton-raphson based instead of polynomial based, it seems out of place in PolynomialApproximation.cpp as well)

Can we move this RsqrtOp lowering to the x86vector dialect? It seems really out of place in the math dialect, which should remain target agnostic. (also, since this approximation is newton-raphson based instead of polynomial based, it seems out of place in PolynomialApproximation.cpp as well)

  1. Do you mean to update x86vector.avx operation to take arbitrary virtual vectors and do the unrolling in the lowering to LLVM? I have mixed feelings, I kind of like this idea, but then not a big fan that avx operation is not directly mapped to avx intrinsic anymore. Inviting @aartbik for his opinion. Although if we have this "virtual avx" we can unroll it to avx2 or avx512 depending on the vector width.

Anyway, if we do this, we still will not be able to get rid of avx flag, it will just become x86 flag.

  1. We can just remove the polynomial bit from the file name, I'd say any math approximation should be able to live here.
ezhulenev added a comment.EditedOct 28 2021, 2:41 PM

Maybe we can have generic x86vector.rsqrt operation (without avx or avx512 prefix), and then it can be unrolled and lowered to ISA specific intrinsics by the x86dialect passes? /cc @aartbik

  1. I would just like Math dialect transforms to not depend on target-specific dialects. So concretely just RsqrtOpApproximation pattern would move to x86vector/Transform/, since it is the target specific one.
  2. I agree, removing Polynomial would be good.

But avx and avx512 rsqrt have different accuracy

_mm256_rsqrt_ps
The maximum relative error for this approximation is less than 1.5*2^-12.
_mm512_rsqrt23_ps
to 23 bits of accuracy and stores the result in dst.

(+ another version with 14 bits of accuracy)

So this approximation only makes sense in AVX2 case and not the AVX512.

Basically add a pass something like X86vectorMathApproximationsPass?

Basically add a pass something like X86vectorMathApproximationsPass?

Correct (something that lives in X86Vector/Transforms). That pass would have target-specific knowledge of avx vs avx512 approximations.

cota added a comment.Oct 28 2021, 3:54 PM
  1. I would just like Math dialect transforms to not depend on target-specific dialects. So concretely just RsqrtOpApproximation pattern would move to x86vector/Transform/, since it is the target specific one.
  2. I agree, removing Polynomial would be good.

Sounds good to me, I'll do both. Thanks for the feedback.