It was set in total vector size while the idea was to limit
a number of instructions. Now it started to work with doubles
and thresholds needs to be updated.
Details
Details
- Reviewers
arsenm - Commits
- rG1dfd1b3e4b2b: [AMDGPU] Tune threshold for cmp/select vector lowering
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
To expand a little bit on the reasoning: 256 bits of float/int yield 8 compares and 8 cndmasks, 16 instructions together. For doubles to fall under 16 instructions it takes double5: 5 compares and 10 cndmasks. Currently it is double4 which will be expanded.
I have done perf measurements to compare this expansion to s_set_gpr_idx on Vega10 and it breaks even around 5-6 elements with a tiny margin.
The condition became too complicated for me to understand, so I have just hoisted it into a predicate function. I also think we may move this predicate somewhere later, as we need it at least in GlobalISel, maybe in some other places too. Anyway, the same condition was already used in two places.