This expands the reduction cost of i1 and/or/xor, so that larger type sizes get handled by the existing code. For i1 reductions, and will use maxv, or will use minv and xor will use addv, plus the cost of legalizing the type for larger vectors using and/or/xor. The i1 vectors will be legalized to higher width integers (say v16i8), which this overrides the cost of. As with all i1 vectors there is a chance that the types the i1 vector is created with and how it is used will not match, introducing extra extends that are not necessarily costmodelled.
https://godbolt.org/z/6Gc9K6b7T
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/Analysis/CostModel/AArch64/reduce-xor.ll | ||
---|---|---|
20 | Interestingly, we can also do much better for xor reductions like v16i8, v8i16, etc. by using SVE if available too. For a v8i16 xor reduction we can just do: ptrue p0.h, vl8 eorv h0, p0, z0.h fmov w0, s0 whereas I see we currently do ext v1.16b, v0.16b, v0.16b, #8 eor v0.8b, v0.8b, v1.8b fmov x8, d0 eor x8, x8, x8, lsr #32 lsr x9, x8, #16 eor w0, w8, w9 |
Interestingly, we can also do much better for xor reductions like v16i8, v8i16, etc. by using SVE if available too. For a v8i16 xor reduction we can just do:
whereas I see we currently do