This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Tune threshold for cmp/select vector lowering
ClosedPublic

Authored by rampitec on May 20 2020, 2:07 PM.

Details

Summary

It was set in total vector size while the idea was to limit
a number of instructions. Now it started to work with doubles
and thresholds needs to be updated.

Diff Detail

Event Timeline

rampitec created this revision.May 20 2020, 2:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2020, 2:07 PM

To expand a little bit on the reasoning: 256 bits of float/int yield 8 compares and 8 cndmasks, 16 instructions together. For doubles to fall under 16 instructions it takes double5: 5 compares and 10 cndmasks. Currently it is double4 which will be expanded.

I have done perf measurements to compare this expansion to s_set_gpr_idx on Vega10 and it breaks even around 5-6 elements with a tiny margin.

The condition became too complicated for me to understand, so I have just hoisted it into a predicate function. I also think we may move this predicate somewhere later, as we need it at least in GlobalISel, maybe in some other places too. Anyway, the same condition was already used in two places.

arsenm accepted this revision.May 21 2020, 6:49 AM

It would be good if we could commit these benchmarks somewhere

This revision is now accepted and ready to land.May 21 2020, 6:49 AM
This revision was automatically updated to reflect the committed changes.