Vector-reduction arithmetic accepts vectors as inputs and produces scalars as outputs.
This class of vector operation forms the basis of many scientific computations.
In vector-reduction arithmetic, the evaluation off is independent of the order of the input elements of V.
Details
- Reviewers
AsafBadouh delena craig.topper igorb aymanmus - Commits
- rG25eb42023355: [X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (max|min)…
rC285493: [X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (max|min)…
rL285493: [X86][AVX512][Clang][Intrinsics][reduce] Adding missing reduce (max|min)…
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Headers/avx512fintrin.h | ||
---|---|---|
344 ↗ | (On Diff #75883) | Do we really need these new set1 macros? The epi ones should be fine shouldn't they? |
9955 ↗ | (On Diff #75883) | long long |
9960 ↗ | (On Diff #75883) | long long |
9970 ↗ | (On Diff #75883) | long long |
9975 ↗ | (On Diff #75883) | long long |
9992 ↗ | (On Diff #75883) | intialize is misspelled, but even then I don't think this sentence reads right. |
9998 ↗ | (On Diff #75883) | Use uppercase 0xFFF.... to match the constants below. |
10009 ↗ | (On Diff #75883) | Can we just call the set1 macro outside and pass the result in for Neutral instead of needing T4. |
10028 ↗ | (On Diff #75883) | Use uppercase for consistency. |
10031 ↗ | (On Diff #75883) | long long |
10037 ↗ | (On Diff #75883) | long long |
10101 ↗ | (On Diff #75883) | Do these 512-bit shuffles get narrowed to 256-bit and 128-bit ops for the later stages due to the high bit undefs or do we end up doing 512-bit operations all the way through? |
lib/Headers/avx512fintrin.h | ||
---|---|---|
10101 ↗ | (On Diff #75883) | It will stay all the way 512. This intrinsics only defined on avx512F, and because of that, we can only use the 512bit intrinsics version of the max and min intrinsics. |
LGTM with that 1 comment.
lib/Headers/avx512fintrin.h | ||
---|---|---|
10046 ↗ | (On Diff #76032) | extra space after the 4th -1 |