Page MenuHomePhabricator

[CostModel] remove cost-kind predicate for ctlz/cttz intrinsics in basic TTI implementation
ClosedPublic

Authored by spatel on Oct 15 2020, 6:35 AM.

Details

Summary

The cost modeling for intrinsics is a patchwork based on different expectations from the callers, so it's a mess. I'm hoping to untangle this to allow canonicalization to the new min/max intrinsics in IR.
The general goal is to remove the cost-kind restriction here in the basic implementation class. Ie, if some intrinsic has throughput cost of 104, assume that it has the same size, latency, and blended costs. Effectively, an intrinsic with cost N is composed of N simple instructions. If that's not correct, the target should provide a more accurate override.

The x86-64 SSE2 subtarget cost diffs require explanation:

  1. The scalar ctlz/cttz are assuming "BSR+XOR+CMOV" or "TEST+BSF+CMOV/BRANCH", so not cheap.
  2. The 128-bit SSE vector width versions assume cost of 18 or 26 (no explanation provided in the tables, but this corresponds to a bunch of shift/logic/compare).
  3. The 512-bit vectors in the test file are scaled up by a factor of 4 from the legal vector width costs.
  4. The plain latency cost-kind is not affected in this patch because that calc is diverted before we get to getIntrinsicInstrCost().

Diff Detail

Event Timeline

spatel created this revision.Oct 15 2020, 6:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 15 2020, 6:35 AM
spatel requested review of this revision.Oct 15 2020, 6:35 AM
samparker accepted this revision.Oct 15 2020, 7:00 AM

I'm sorry for not finishing what I started with this intrinsic nonsense... Any simplification of these winding paths sounds great to me.

This revision is now accepted and ready to land.Oct 15 2020, 7:00 AM

I'm sorry for not finishing what I started with this intrinsic nonsense... Any simplification of these winding paths sounds great to me.

I definitely appreciate what you accomplished! And sorry for not being quicker to review your patches. Intrinsic handling is truly awful...