This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] splat before (f)mul to allow mul-by-element isel
AbandonedPublic

Authored by spatel on Apr 18 2019, 3:06 PM.

Details

Summary

A splat of a vector multiply (either integer or FP) can be turned into a multiply-by-element:

splat (mul X, Y), Lane --> mul (splat X, Lane), (splat Y, Lane) --> mul-by-element (splat X, Lane), Y.[Lane]

These patterns showed up as an ARM regression in D60214, but we have this transform in IR, so it's an existing problem IIUC.

The constant cases look better, but I'm not sure if this is a win if both operands are variables.

Diff Detail

Event Timeline

spatel created this revision.Apr 18 2019, 3:06 PM

The constant cases look better, but I'm not sure if this is a win if both operands are variables.

The key really isn't whether one of the operands is a constant; it's whether the operand is free or cheap to splat. Constants are usually free to splat. A splat is free to splat... although I guess that's unlikely to come up in practice given other optimizations. A load is cheap to splat (the addressing mode for ld1r is very limited, so you're likely adding an extra instruction for address computation). A loop-invariant operand is likely cheap to splat, but I don't think there's any way to handle that in SelectionDAG at the moment. And of course, a multiply with an operand that's free to splat is itself free to splat, recursively. Probably worth adding a testcase with more than one multiply.

For two arbitrary variables, it's basically neutral, like you've noted; splatting an operand has the same cost as splatting a result.

It's worth noting that at least on some chips, 128-bit multiplies have half the throughput of scalar and 64-bit vector multiplies. But I don't think that directly affects this patch; you probably wouldn't want to add extra instructions just to make a multiply smaller.

spatel abandoned this revision.May 26 2020, 5:28 AM

Abandoning - implemented a limited version of splat re-ordering for DAGCombiner with D79886.