This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Resolving PR34474 by transforming mul(x, 2^c +/- 1) -> sub/add(shl(x, c) x) for any type including vector types
Changes PlannedPublic

Authored by pacxx on Sep 15 2017, 1:35 AM.

Details

Summary

PR34474 describes problems for code generation of multiplications with a factor of 2^C +/-1. The patch transforms multiplications of this type to a left shift and an additional addition or subtraction.

The test cases which produced bad MC:

define <2 x i64> @mul7(<2 x i64>) {
  %2 = mul <2 x i64> %0, <i64 7, i64 7>
  ret <2 x i64> %2
}

define <2 x i64> @mul17(<2 x i64>) {
  %2 = mul <2 x i64> %0, <i64 17, i64 17>
  ret <2 x i64> %2
}

define <16 x i8> @mul31(<16 x i8>) {
  %2 = mul <16 x i8> %0, <i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31, i8 31>
  ret <16 x i8> %2
}

and are now lowered as follows:

mul7:
  # BB#0: 
          vpsllq  $3, %xmm0, %xmm1
          vpsubq  %xmm0, %xmm1, %xmm0
          retq


 mul17:
  # BB#0:                    
          vpsllq  $4, %xmm0, %xmm1
          vpaddq  %xmm0, %xmm1, %xmm0
          retq


mul31:
  # BB#0:                                 # %entry
          vpsllw  $5, %xmm0, %xmm1
          vpand   .LCPI4_0(%rip), %xmm1, %xmm1
          vpsubb  %xmm0, %xmm1, %xmm0
          retq

Diff Detail

Event Timeline

pacxx created this revision.Sep 15 2017, 1:35 AM
RKSimon edited edge metadata.Sep 15 2017, 4:25 AM

I've added a range of test cases at rL313353 - please can you update?

It might be we need suitable combines for (add (mul X, C), X) -> (mul (X, C+1)) and (sub (mul X, C), X) -> (mul (X, C-1)) as well

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2824

You should be able to match non-uniform constant vectors with matchUnaryPredicate

2836

The other pow2 combines are before the mul (shl X, c1), c2 case - put this there with them?

pacxx planned changes to this revision.Sep 15 2017, 8:40 AM

Hi Simon, I will update the patch as soon as it covers the new requirements and all tests pass.

mcrosier added a comment.EditedSep 15 2017, 12:17 PM

Both ARM and AArch64 have similar combines in ISelLowering.cpp. It would be nice if these were moved to the target independent code, if possible.

For example, AArch64 currently handles:

// (mul x, 2^N + 1) => (add (shl x, N), x)
// (mul x, 2^N - 1) => (sub (shl x, N), x)
// (mul x, (2^N + 1) * 2^M) => (shl (add (shl x, N), x), M)
// (mul x, -(2^N - 1)) => (sub x, (shl x, N))
// (mul x, -(2^N + 1)) => - (add (shl x, N), x)
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2819

Comments should be in English prose, which means they should use proper capitalization, punctuation, etc.

2820

Would it be interesting to handle negative constants?

// (mul x, -(2^N - 1)) => (sub x, (shl x, N))
// (mul x, -(2^N + 1)) => - (add (shl x, N), x)
mcrosier added a subscriber: llvm-commits.
spatel added a subscriber: spatel.Sep 15 2017, 1:29 PM
pacxx updated this revision to Diff 116473.Sep 23 2017, 1:59 PM

Updated version that transforms all proposed patterns for multiplications into shift patterns. Since the code got quite long it was moved to an own function in DAGCombiner. To avoid regressions with existing transformations, the patch currently only transforms multiplications of vector types.
To avoid problems with the Hexagon backend, the transformation is only performed iff the used vector type is a legal type for the backend.

More tests for the patch were added to vector-mul.ll.

Failing tests were updated to preserve semantic of the tests where necessary and auto updated otherwise.

Please can you commit the new tests you've added to vector-mul.ll with trunk's current codegen and then update this patch to show the diff?

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2624

This should be an early-out

if (!TLI.isTypeLegal(VT))
  return SDValue();
2644

bool Match

2660

(style) Remove braces around single lines

2671
if (!Match)
  return SDValue();
2672

Isn't VT is guaranteed to be a vector?

2697

Is there a need for these to be pointers?

2700

braces

pacxx planned changes to this revision.Sep 28 2017, 3:15 AM

I submitted the (old) code generation for vector-mul.ll in https://reviews.llvm.org/D38350

VT can also be a scalar type. Because of that I opt out on non-vector types since some backends including x86 already handle those multiplications.

I will polish the code as requested.

pacxx updated this revision to Diff 117131.Sep 29 2017, 6:19 AM

Updated the style issues

Please remember to create your diff with context: http://llvm.org/docs/Phabricator.html

pacxx updated this revision to Diff 117281.Oct 1 2017, 7:14 AM

diff with context

RKSimon added inline comments.Oct 8 2017, 4:22 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2675

Please don't use a reference on a SDValue like this, its makes the code a lot more confusing.

pacxx updated this revision to Diff 119577.Oct 19 2017, 5:59 AM

removed the reference as suggested

RKSimon added inline comments.Oct 29 2017, 4:39 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2626

Your reference to a broken lea-3.ll test is down right suspicious, do you have any more information?

2658

What happens when different lanes require different SignDirection values?

2694

Won't this work?

SDValue Shl = DAG.getNode(ISD::SHL, DL, VT, N0, LogBase2, N->getFlags());
2714

Can you clang-format the whole of TransformMulWithPow2DisplacedBy1 - its seems a bit off.

pacxx planned changes to this revision.Nov 6 2017, 9:51 PM
pacxx marked an inline comment as done.

I will reevaluate enabling the patch for scalar types. At a first look with current llvm trunk the regression seemed to be gone and was possible not related to this patch.

When signs of the multiplier mismatch AllPow2 is not incremented and the test if all vector lanes match fails. Nothing will happen.

@pacxx Are you still looking at this?

pacxx added a comment.Jan 25 2018, 7:50 AM

@RKSimon sorry I hat a lot of work in the last months I hope to get back to this in the next two weeks

pacxx added a comment.Feb 15 2018, 5:29 AM

I enabled this opt for scalar values as sugested and have now additional 60 tests failing. The problem is, that for scalar values, this optimization does not only affect multiplication instructions but also address calculations. E.g., test/CodeGen/AArch64/machine-combiner-madd.ll fails because for the tested indexing in a load a madd instruction is expected but this optimization is not performed on the new generated instruction pattern which includes now a left shift and a subtraction.

Another problem is that this optimization on scalar values seems to hinder target specific optimizations like:

16 define i32 @test3(i32 %x) {
17 ; CHECK-LABEL: test3
18 ; CHECK: add w0, w0, w0, lsl #1
19
20   %mul = mul nsw i32 %x, 3
21   ret i32 %mul
22 }

in CodeGen/AArch64/mul_pow2.ll which will now generate also a left shift subtraction pattern instead of the expected add with 3 arguments. Since this goes far beyond the scope of this transformation I would suggest to let this optimization disabled for scalar values because I cannot estimate the impact and possible regressions arising from this optimization for all (11) affected targets.

@pacxx Are you still looking at this at all?

The x86 vector motivating examples for this patch are handled by rL342554. That adds a target hook, so targets can decide when they'd like to enable the transform. TODO comments note possible extensions that would handle what this patch was aiming to do, so I think this can be abandoned.

RKSimon resigned from this revision.Sep 29 2021, 2:59 AM

This can be abandoned - we have decomposeMulByConstant to handle this already