The standard doesn't define the order of execution for transform_reduce on purpose, so we might as well make use of it.
Details
- Reviewers
- None
- Group Reviewers
Restricted Project
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Benchmarks:
------------------------------------------------------ Benchmark old new ------------------------------------------------------ bm_reduce<float>/1 0.555 ns 0.710 ns bm_reduce<float>/2 0.703 ns 1.58 ns bm_reduce<float>/3 1.56 ns 1.88 ns bm_reduce<float>/4 1.88 ns 2.19 ns bm_reduce<float>/5 2.21 ns 2.51 ns bm_reduce<float>/6 2.50 ns 2.84 ns bm_reduce<float>/7 2.82 ns 3.16 ns bm_reduce<float>/8 3.13 ns 3.45 ns bm_reduce<float>/16 5.63 ns 1.69 ns bm_reduce<float>/64 26.9 ns 2.88 ns bm_reduce<float>/512 412 ns 17.9 ns bm_reduce<float>/4096 3778 ns 219 ns bm_reduce<float>/32768 30722 ns 1913 ns bm_reduce<float>/262144 246213 ns 15445 ns bm_reduce<float>/1048576 994918 ns 62948 ns ---------------------------------------------------------------- Benchmark old new ---------------------------------------------------------------- bm_transform_reduce<float>/1 0.863 ns 0.872 ns bm_transform_reduce<float>/2 1.57 ns 1.58 ns bm_transform_reduce<float>/3 1.88 ns 1.92 ns bm_transform_reduce<float>/4 2.19 ns 2.23 ns bm_transform_reduce<float>/5 2.50 ns 2.51 ns bm_transform_reduce<float>/6 2.83 ns 2.82 ns bm_transform_reduce<float>/7 3.13 ns 3.14 ns bm_transform_reduce<float>/8 3.44 ns 3.46 ns bm_transform_reduce<float>/16 3.23 ns 2.24 ns bm_transform_reduce<float>/64 19.6 ns 4.56 ns bm_transform_reduce<float>/512 356 ns 30.3 ns bm_transform_reduce<float>/4096 3720 ns 241 ns bm_transform_reduce<float>/32768 31159 ns 3177 ns bm_transform_reduce<float>/262144 246021 ns 25263 ns bm_transform_reduce<float>/1048576 1069957 ns 111642 ns
libcxx/include/__numeric/reduce.h | ||
---|---|---|
38 | Same here. | |
libcxx/include/__numeric/transform_reduce.h | ||
51 | We shouldn't be optimizing for user-defined types even if the operation is "trivial" (also applies to the other algo). This makes me think that we need to revisit our definition of a trivial operation. std::plus<> is trivial but only for a specific set of types. And in fact I am not certain what we mean by "is trivial" anymore. | |
59–63 | I would call the unseq version in the PSTL instead, that's more general and we'll end up using the appropriate backend. |
Not looked at the patch closely but I wonder about the usage of the pragma and their scope.
libcxx/include/__numeric/reduce.h | ||
---|---|---|
41 | Will this restore the old value of clang loop vectorize? |
Same here.