HomePhabricator

[LV] Account for the cost of predication of scalarized load/store

Authored by dmgreen on Mar 17 2021, 3:57 AM.

Description

[LV] Account for the cost of predication of scalarized load/store

This adds the cost of an i1 extract and a branch to the cost in
getMemInstScalarizationCost when the instruction is predicated. These
predicated loads/store would generate blocks of something like:

  %c1 = extractelement <4 x i1> %C, i32 1
  br i1 %c1, label %if, label %else
if:
  %sa = extractelement <4 x i32> %a, i32 1
  %sb = getelementptr inbounds float, float* %pg, i32 %sa
  %sv = extractelement <4 x float> %x, i32 1
  store float %sa, float* %sb, align 4
else:

So this increases the cost by the extract and branch. This is probably
still too low in many cases due to the cost of all that branching, but
there is already an existing hack increasing the cost using
useEmulatedMaskMemRefHack. It will increase the cost of a memop if it is
a load or there are more than one store. This patch improves the cost
for when there is only a single store, and hopefully at some point in
the future the hack can be removed.

Differential Revision: https://reviews.llvm.org/D98243

Details