This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
10/10
TruncInstCombine.cpp
-
test/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
9/10
trunc_shifts.ll

Differential D108091

[AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG
ClosedPublic

Authored by anton-afanasyev on Aug 15 2021, 10:39 AM.

Download Raw Diff

Details

Reviewers

lebedev.ri
RKSimon
spatel
aaboud

Commits

rG1f3e35b6d165: [AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG

Summary

Add shl instruction to the DAG post-dominated by trunc, allowing
TruncInstCombine to reduce bitwidth of expressions containing left shifts.

The only thing we need to check is that the target bitwidth
must be wider than the maximal shift amount: https://alive2.llvm.org/ce/z/AwArqu

Part of https://reviews.llvm.org/D107766

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

anton-afanasyev created this revision.Aug 15 2021, 10:39 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 15 2021, 10:39 AM

anton-afanasyev requested review of this revision.Aug 15 2021, 10:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2021, 10:39 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B119619: Diff 366511.Aug 15 2021, 10:40 AM

anton-afanasyev mentioned this in D107766: [AggressiveInstCombine] Add shift instructions to `TruncInstCombine` DAG.Aug 15 2021, 10:42 AM

lebedev.ri edited the summary of this revision. (Show Details)Aug 15 2021, 10:50 AM

lebedev.ri added inline comments.Aug 15 2021, 10:58 AM

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
112–114
385–387	This doesn't sound right? https://alive2.llvm.org/ce/z/DMaieM We drop no-wrap flags on all other instructions here, let's just not bother?

lebedev.ri added inline comments.Aug 15 2021, 11:06 AM

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
110	I think you might want to also pass `DT`, `/CxtI=/CurrentTruncInst`; i guess we don't yet have `AssumptionCache` here in AIC..

anton-afanasyev marked 3 inline comments as done.Aug 15 2021, 12:05 PM

anton-afanasyev added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
110	No, we haven't `AC`, only `DT`. Do you mean me to add it to AIC to use for `computeKnownBits()`?
112–114	Thanks, done
385–387	Changed, thanks! Also fixed test to tackle this case.

Address comments

Harbormaster completed remote builds in B119626: Diff 366518.Aug 15 2021, 12:06 PM

@spatel this appears correct, but is this the right place for this logic, or should it be in getMinBitWidth()?

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
110	Let's just keep this as-is for now, and change later with a test case, i guess?

In D108091#2945739, @lebedev.ri wrote:

@spatel this appears correct, but is this the right place for this logic, or should it be in getMinBitWidth()?

Hmm...I haven't looked at this code much before, so I'm not sure yet.

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
36–37	If the left shift amount >= the truncated bit-width, the result must be 0. Looks like -instcombine gets this, but not -instsimplify. Can we adjust this test to something that does not completely fold away?
49–51	If the intent is to check for a shift amount with no known bits, it would be better to just add a function argument `%y`, so we're not mixing up that constraint with number of uses or some other factor.

Move to getBestTruncatedType() and add early return

Harbormaster completed remote builds in B119652: Diff 366548.Aug 15 2021, 10:27 PM

In D108091#2945739, @lebedev.ri wrote:

@spatel this appears correct, but is this the right place for this logic, or should it be in getMinBitWidth()?

I've moved code closer to getMinBitWidth(), to count known bits for shifts only if correct dag is built. Also added early exit.

Fix test

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
36–37	Hmm, yes, you're right. But this could be adjusted only if shift amount is variable, this case is covered by `@shl_var_not_commute()` function below. I can only remove this test at all.
49–51	Ok, done

Harbormaster completed remote builds in B119654: Diff 366550.Aug 15 2021, 10:52 PM

anton-afanasyev added a reviewer: aaboud.Aug 15 2021, 11:39 PM

I'm not sure how much will this catch, but LG, thank you.
@spatel ?

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
4	Pedantic nitpick: commute means `mul %x, %y <=> mul %y, %x` Here you want to use `negative_test` in place of `not_commute`, and drop `commute`.

This revision is now accepted and ready to land.Aug 16 2021, 12:21 AM

anton-afanasyev marked an inline comment as done.Aug 16 2021, 12:50 AM

anton-afanasyev added inline comments.

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
4	"Commute" here means that `trunc` and `shl` commutes as operators: `trunc ∘ shl = shl ∘ trunc`, i.e. `trunc(shl(, )) = shl(trunc(), trunc()))`. But I'm to change it, thanks.

lebedev.ri added inline comments.Aug 16 2021, 12:54 AM

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
4	I have literally never seen such usage in all of LLVM. But that may be sample size issue. Usually it's marked as `fold`.

LGTM too - see inline for test suggestions.

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
4	I was also confused by the use of "commute" here. The file name says we're truncating shifts, so I'd just make these all "shl..." or "narrow_shl..." (assuming we'll add the other shift ops in follow-up patches).
36–37	It's fine to leave here. It does suggest a possible optimization (or two) - we could be trying to call SimplifyInst or similar utility to make sure the IR is reduced coming in or going out.

anton-afanasyev marked 4 inline comments as done.Aug 17 2021, 2:15 AM

anton-afanasyev added inline comments.

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
4	Ok, renamed tests.

Rename tests

Harbormaster completed remote builds in B119862: Diff 366842.Aug 17 2021, 2:16 AM

anton-afanasyev mentioned this in rG8f8f9260a95f: [Test][AggressiveInstCombine] Add test for shifts.Aug 17 2021, 2:40 AM

This revision was landed with ongoing or failed builds.Aug 17 2021, 3:17 AM

Closed by commit rG1f3e35b6d165: [AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG (authored by anton-afanasyev). · Explain Why

This revision was automatically updated to reflect the committed changes.

anton-afanasyev marked an inline comment as not done.

anton-afanasyev added a commit: rG1f3e35b6d165: [AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG.

For future self-reference, i should have written these proofs originally:
https://godbolt.org/z/1f3aaYcjW
https://alive2.llvm.org/ce/z/fQEvdF

spatel added inline comments.Aug 17 2021, 11:52 AM

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
282	This logic is getting more complicated in D108201, so I want to step back to this patch to check my understanding: Can SrcBitWidth be different than OrigBitWidth? If so, can we write a test for that? If not, then assert that I->getOperand(1)->getType()->getScalarSizeInBits == OrigBitWidth. Then simplify this code to something like: KnownBits KnownRHS = computeKnownBits(I->getOperand(1), DL); unsigned MinBitWidth = KnownRHS.getMaxValue() .uadd_sat(APInt(OrigBitWidth, 1)) .getZExtValue(); if (MinBitWidth >= OrigBitWidth) return nullptr;

Seen some failures and I'm suspecting this commit.

For input like this:

define i32 @foo(i32 %call62, i16 %call63) {
  %conv64142 = zext i32 %call62 to i64
  %conv65 = sext i16 %call63 to i64
  %sh_prom66 = and i64 %conv65, 4294967295
  %shl67 = shl i64 %conv64142, %sh_prom66
  %conv68 = trunc i64 %shl67 to i32
  ret i32 %conv68
}

we now get the following after aggressive instcombine

define i32 @foo(i32 %call62, i16 %call63) {
  %conv65 = sext i16 %call63 to i32
  %sh_prom66 = and i32 %conv65, -1
  %shl67 = shl i32 %call62, %sh_prom66
  ret i32 %shl67
}

which is more poisonous according to https://alive2.llvm.org/ce/z/88tNrs

In my original test case %call63 is 32, so we shift leftby 32, which is ok when shifting the i64 but not after having rewritten into a 32-bit shift.

anton-afanasyev mentioned this in rG803270c0c691: [AggressiveInstCombine] Fix unsigned overflow.Aug 17 2021, 10:43 PM

In D108091#2950930, @bjope wrote:
Seen some failures and I'm suspecting this commit.

For input like this:
define i32 @foo(i32 %call62, i16 %call63) {
  %conv64142 = zext i32 %call62 to i64
  %conv65 = sext i16 %call63 to i64
  %sh_prom66 = and i64 %conv65, 4294967295
  %shl67 = shl i64 %conv64142, %sh_prom66
  %conv68 = trunc i64 %shl67 to i32
  ret i32 %conv68
}
we now get the following after aggressive instcombine
define i32 @foo(i32 %call62, i16 %call63) {
  %conv65 = sext i16 %call63 to i32
  %sh_prom66 = and i32 %conv65, -1
  %shl67 = shl i32 %call62, %sh_prom66
  ret i32 %shl67
}
which is more poisonous according to https://alive2.llvm.org/ce/z/88tNrs

In my original test case %call63 is 32, so we shift leftby 32, which is ok when shifting the i64 but not after having rewritten into a 32-bit shift.

Thanks @bjope, it was overflow of unsigned = uint64_t assignment, fixed it here: https://reviews.llvm.org/rG803270c0c691

anton-afanasyev mentioned this in rG0988488ed461: [Test][AggressiveInstCombine] Add one more test for shift truncation.Aug 17 2021, 11:31 PM

anton-afanasyev marked an inline comment as done.Aug 17 2021, 11:32 PM

anton-afanasyev added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
282	Yes, `SrcBitWidth` can be different than `OrigBitWidth` (smaller or larger). `OrigBitWidth` is bitwidth of original (dominating) `trunc` operand (before truncation) whereas `SrcBitWidth` is bitwidth of shift instruction. Added tests for such cases: https://reviews.llvm.org/rG0988488ed461

In D108091#2951392, @anton-afanasyev wrote:
In D108091#2950930, @bjope wrote:
Seen some failures and I'm suspecting this commit.

For input like this:
define i32 @foo(i32 %call62, i16 %call63) {
  %conv64142 = zext i32 %call62 to i64
  %conv65 = sext i16 %call63 to i64
  %sh_prom66 = and i64 %conv65, 4294967295
  %shl67 = shl i64 %conv64142, %sh_prom66
  %conv68 = trunc i64 %shl67 to i32
  ret i32 %conv68
}
we now get the following after aggressive instcombine
define i32 @foo(i32 %call62, i16 %call63) {
  %conv65 = sext i16 %call63 to i32
  %sh_prom66 = and i32 %conv65, -1
  %shl67 = shl i32 %call62, %sh_prom66
  ret i32 %shl67
}
which is more poisonous according to https://alive2.llvm.org/ce/z/88tNrs

In my original test case %call63 is 32, so we shift leftby 32, which is ok when shifting the i64 but not after having rewritten into a 32-bit shift.
Thanks @bjope, it was overflow of unsigned = uint64_t assignment, fixed it here: https://reviews.llvm.org/rG803270c0c691

I've cherry-picked you fix and the test case that failed now pass again. Thanks!

anton-afanasyev marked an inline comment as done.Aug 19 2021, 7:34 AM

anton-afanasyev added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
282	Btw, I was wrong here: actually `SrcBitWidth == OrigBitWidth`, since `zext`, `sext` and `trunc` can only be leaves of `trunc`-dominated DAG, so `OrigBitWidth` is bitwidth of all DAG's instructions. I've removed all appropriate tests since they all were positive and didn't checked anything. (fixed in the last patch here https://reviews.llvm.org/D108355)

This seems to be causing a code-size regression (https://bugs.llvm.org/show_bug.cgi?id=52289). It would be great if you could take a look.

In D108091#3088848, @fhahn wrote:

This seems to be causing a code-size regression (https://bugs.llvm.org/show_bug.cgi?id=52289). It would be great if you could take a look.

Sure, thanks, I've already assigned this to myself and referered to llvm.org/PR52253. Actually, this PR52289 was splitted out from PR52253 by Theodoros at my request since they have different causes. I'm to fix one after another.

In D108091#3089444, @anton-afanasyev wrote:

In D108091#3088848, @fhahn wrote:

This seems to be causing a code-size regression (https://bugs.llvm.org/show_bug.cgi?id=52289). It would be great if you could take a look.

Sure, thanks, I've already assigned this to myself and referered to llvm.org/PR52253. Actually, this PR52289 was splitted out from PR52253 by Theodoros at my request since they have different causes. I'm to fix one after another.

Sounds great, thanks!

anton-afanasyev mentioned this in D113179: [Passes] Move AggressiveInstCombine after InstCombine.Nov 4 2021, 3:47 AM

anton-afanasyev mentioned this in rGc34d157fc739: [Passes] Move AggressiveInstCombine after InstCombine.Dec 4 2021, 3:24 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

AggressiveInstCombine/

TruncInstCombine.cpp

26 lines

test/

Transforms/

AggressiveInstCombine/

trunc_shifts.ll

46 lines

Diff 366550

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp

Show All 23 Lines

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "AggressiveInstCombineInternal.h" #include "AggressiveInstCombineInternal.h"

#include "llvm/ADT/STLExtras.h" #include "llvm/ADT/STLExtras.h"

#include "llvm/ADT/Statistic.h" #include "llvm/ADT/Statistic.h"

#include "llvm/Analysis/ConstantFolding.h" #include "llvm/Analysis/ConstantFolding.h"

#include "llvm/Analysis/TargetLibraryInfo.h" #include "llvm/Analysis/TargetLibraryInfo.h"

#include "llvm/Analysis/ValueTracking.h"

#include "llvm/IR/DataLayout.h" #include "llvm/IR/DataLayout.h"

#include "llvm/IR/Dominators.h" #include "llvm/IR/Dominators.h"

#include "llvm/IR/IRBuilder.h" #include "llvm/IR/IRBuilder.h"

#include "llvm/IR/Instruction.h" #include "llvm/IR/Instruction.h"

#include "llvm/Support/KnownBits.h"

using namespace llvm; using namespace llvm;

#define DEBUG_TYPE "aggressive-instcombine" #define DEBUG_TYPE "aggressive-instcombine"

STATISTIC( STATISTIC(

NumDAGsReduced, NumDAGsReduced,

"Number of truncations eliminated by reducing bit width of expression DAG"); "Number of truncations eliminated by reducing bit width of expression DAG");

Show All 12 Lines case Instruction::SExt:

// their operands are not relevent. // their operands are not relevent.

break; break;

case Instruction::Add: case Instruction::Add:

case Instruction::Sub: case Instruction::Sub:

case Instruction::Mul: case Instruction::Mul:

case Instruction::And: case Instruction::And:

case Instruction::Or: case Instruction::Or:

case Instruction::Xor: case Instruction::Xor:

case Instruction::Shl:

Ops.push_back(I->getOperand(0)); Ops.push_back(I->getOperand(0));

Ops.push_back(I->getOperand(1)); Ops.push_back(I->getOperand(1));

break; break;

case Instruction::Select: case Instruction::Select:

Ops.push_back(I->getOperand(1)); Ops.push_back(I->getOperand(1));

Ops.push_back(I->getOperand(2)); Ops.push_back(I->getOperand(2));

break; break;

default: default:

Show All 27 Lines if (!Stack.empty() && Stack.back() == I) {

Worklist.pop_back(); Worklist.pop_back();

Stack.pop_back(); Stack.pop_back();

// Insert I to the Info map. // Insert I to the Info map.

InstInfoMap.insert(std::make_pair(I, Info())); InstInfoMap.insert(std::make_pair(I, Info()));

continue; continue;

} }

if (InstInfoMap.count(I)) { if (InstInfoMap.count(I)) {

Worklist.pop_back(); Worklist.pop_back();

lebedev.riUnsubmitted

Done

I think you might want to also pass DT, /*CxtI=*/CurrentTruncInst;
i guess we don't yet have AssumptionCache here in AIC..

lebedev.ri: I think you might want to also pass `DT`, `/*CxtI=*/CurrentTruncInst`; i guess we don't yet…

anton-afanasyevAuthorUnsubmitted

Done

No, we haven't AC, only DT. Do you mean me to add it to AIC to use for computeKnownBits()?

anton-afanasyev: No, we haven't `AC`, only `DT`. Do you mean me to add it to AIC to use for `computeKnownBits()`?

lebedev.riUnsubmitted

Done

Let's just keep this as-is for now, and change later with a test case, i guess?

lebedev.ri: Let's just keep this as-is for now, and change later with a test case, i guess?

continue; continue;

} }

// Add the instruction to the stack before start handling its operands. // Add the instruction to the stack before start handling its operands.

lebedev.riUnsubmitted

Done

const unsigned SrcBitWidth = KnownRHS.getBitWidth();

- MinBitWidth = KnownRHS.getMaxValue().getZExtValue();

- if (MinBitWidth != std::numeric_limits<unsigned>::max())

- MinBitWidth++;

+ MinBitWidth = KnownRHS.getMaxValue().uadd_sat(APInt(SrcBitWidth, 1)).getZExtValue();

MinBitWidth = std::min(MinBitWidth, SrcBitWidth);

lebedev.ri:

anton-afanasyevAuthorUnsubmitted

Done

Thanks, done

anton-afanasyev: Thanks, done

Stack.push_back(I); Stack.push_back(I);

unsigned Opc = I->getOpcode(); unsigned Opc = I->getOpcode();

switch (Opc) { switch (Opc) {

case Instruction::Trunc: case Instruction::Trunc:

case Instruction::ZExt: case Instruction::ZExt:

case Instruction::SExt: case Instruction::SExt:

// trunc(trunc(x)) -> trunc(x) // trunc(trunc(x)) -> trunc(x)

// trunc(ext(x)) -> ext(x) if the source type is smaller than the new dest // trunc(ext(x)) -> ext(x) if the source type is smaller than the new dest

// trunc(ext(x)) -> trunc(x) if the source type is larger than the new // trunc(ext(x)) -> trunc(x) if the source type is larger than the new

// dest // dest

break; break;

case Instruction::Add: case Instruction::Add:

case Instruction::Sub: case Instruction::Sub:

case Instruction::Mul: case Instruction::Mul:

case Instruction::And: case Instruction::And:

case Instruction::Or: case Instruction::Or:

case Instruction::Xor: case Instruction::Xor:

case Instruction::Shl:

case Instruction::Select: { case Instruction::Select: {

SmallVector<Value *, 2> Operands; SmallVector<Value *, 2> Operands;

getRelevantOperands(I, Operands); getRelevantOperands(I, Operands);

append_range(Worklist, Operands); append_range(Worklist, Operands);

break; break;

} }

default: default:

// TODO: Can handle more cases here: // TODO: Can handle more cases here:

// 1. shufflevector, extractelement, insertelement // 1. shufflevector, extractelement, insertelement

// 2. udiv, urem // 2. udiv, urem

// 3. shl, lshr, ashr // 3. lshr, ashr

// 4. phi node(and loop handling) // 4. phi node(and loop handling)

// ... // ...

return false; return false;

} }

return true; return true;

} }

▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines for (auto *U : I->users())

return nullptr; return nullptr;

DesiredBitWidth = ExtInstBitWidth; DesiredBitWidth = ExtInstBitWidth;

} }

unsigned OrigBitWidth = unsigned OrigBitWidth =

CurrentTruncInst->getOperand(0)->getType()->getScalarSizeInBits(); CurrentTruncInst->getOperand(0)->getType()->getScalarSizeInBits();

// Initialize MinBitWidth for `shl` instructions with the minimum number

// that is greater than shift amount (i.e. shift amount + 1).

// Also normalize MinBitWidth not to be greater than source bitwidth.

for (auto &Itr : InstInfoMap) {

Instruction *I = Itr.first;

if (I->getOpcode() == Instruction::Shl) {

spatelUnsubmitted

Done

This logic is getting more complicated in D108201, so I want to step back to this patch to check my understanding:
Can SrcBitWidth be different than OrigBitWidth?
If so, can we write a test for that?
If not, then assert that I->getOperand(1)->getType()->getScalarSizeInBits == OrigBitWidth. Then simplify this code to something like:

KnownBits KnownRHS = computeKnownBits(I->getOperand(1), DL);
unsigned MinBitWidth = KnownRHS.getMaxValue()
                           .uadd_sat(APInt(OrigBitWidth, 1))
                           .getZExtValue();
if (MinBitWidth >= OrigBitWidth)
  return nullptr;

spatel: This logic is getting more complicated in D108201, so I want to step back to this patch to…

anton-afanasyevAuthorUnsubmitted

Done

Yes, SrcBitWidth can be different than OrigBitWidth (smaller or larger). OrigBitWidth is bitwidth of original (dominating) trunc operand (before truncation) whereas SrcBitWidth is bitwidth of shift instruction. Added tests for such cases: https://reviews.llvm.org/rG0988488ed461

anton-afanasyev: Yes, `SrcBitWidth` can be different than `OrigBitWidth` (smaller or larger). `OrigBitWidth` is…

anton-afanasyevAuthorUnsubmitted

Done

Btw, I was wrong here: actually SrcBitWidth == OrigBitWidth, since zext, sext and trunc can only be leaves of trunc-dominated DAG, so OrigBitWidth is bitwidth of all DAG's instructions. I've removed all appropriate tests since they all were positive and didn't checked anything. (fixed in the last patch here https://reviews.llvm.org/D108355)

anton-afanasyev: Btw, I was wrong here: actually `SrcBitWidth == OrigBitWidth`, since `zext`, `sext` and `trunc`…

KnownBits KnownRHS = computeKnownBits(I->getOperand(1), DL);

const unsigned SrcBitWidth = KnownRHS.getBitWidth();

unsigned MinBitWidth =

KnownRHS.getMaxValue().uadd_sat(APInt(SrcBitWidth, 1)).getZExtValue();

MinBitWidth = std::min(MinBitWidth, SrcBitWidth);

if (MinBitWidth >= OrigBitWidth)

return nullptr;

Itr.second.MinBitWidth = MinBitWidth;

}

// Calculate minimum allowed bit-width allowed for shrinking the currently // Calculate minimum allowed bit-width allowed for shrinking the currently

// visited truncate's operand. // visited truncate's operand.

unsigned MinBitWidth = getMinBitWidth(); unsigned MinBitWidth = getMinBitWidth();

// Check that we can shrink to smaller bit-width than original one and that // Check that we can shrink to smaller bit-width than original one and that

// it is similar to the DesiredBitWidth is such exists. // it is similar to the DesiredBitWidth is such exists.

if (MinBitWidth >= OrigBitWidth || if (MinBitWidth >= OrigBitWidth ||

(DesiredBitWidth && DesiredBitWidth != MinBitWidth)) (DesiredBitWidth && DesiredBitWidth != MinBitWidth))

▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines case Instruction::SExt: {

Worklist.push_back(NewCI); Worklist.push_back(NewCI);

break; break;

} }

case Instruction::Add: case Instruction::Add:

case Instruction::Sub: case Instruction::Sub:

case Instruction::Mul: case Instruction::Mul:

case Instruction::And: case Instruction::And:

case Instruction::Or: case Instruction::Or:

case Instruction::Xor: { case Instruction::Xor:

case Instruction::Shl: {

Value *LHS = getReducedOperand(I->getOperand(0), SclTy); Value *LHS = getReducedOperand(I->getOperand(0), SclTy);

Value *RHS = getReducedOperand(I->getOperand(1), SclTy); Value *RHS = getReducedOperand(I->getOperand(1), SclTy);

Res = Builder.CreateBinOp((Instruction::BinaryOps)Opc, LHS, RHS); Res = Builder.CreateBinOp((Instruction::BinaryOps)Opc, LHS, RHS);

break; break;

} }

case Instruction::Select: { case Instruction::Select: {

lebedev.riUnsubmitted

Done

This doesn't sound right?
https://alive2.llvm.org/ce/z/DMaieM

We drop no-wrap flags on all other instructions here, let's just not bother?

lebedev.ri: This doesn't sound right? https://alive2.llvm.org/ce/z/DMaieM We drop no-wrap flags on all…

anton-afanasyevAuthorUnsubmitted

Done

Changed, thanks! Also fixed test to tackle this case.

anton-afanasyev: Changed, thanks! Also fixed test to tackle this case.

Value *Op0 = I->getOperand(0); Value *Op0 = I->getOperand(0);

Value *LHS = getReducedOperand(I->getOperand(1), SclTy); Value *LHS = getReducedOperand(I->getOperand(1), SclTy);

Value *RHS = getReducedOperand(I->getOperand(2), SclTy); Value *RHS = getReducedOperand(I->getOperand(2), SclTy);

Res = Builder.CreateSelect(Op0, LHS, RHS); Res = Builder.CreateSelect(Op0, LHS, RHS);

break; break;

} }

default: default:

llvm_unreachable("Unhandled instruction"); llvm_unreachable("Unhandled instruction");

▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s		; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s

define i16 @shl_1_commute(i8 %x) {		define i16 @shl_1_commute(i8 %x) {
		lebedev.riUnsubmitted Done Reply Inline Actions Pedantic nitpick: commute means `mul %x, %y <=> mul %y, %x` Here you want to use `negative_test` in place of `not_commute`, and drop `commute`. lebedev.ri: Pedantic nitpick: commute means `mul %x, %y <=> mul %y, %x` Here you want to use…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions "Commute" here means that `trunc` and `shl` commutes as operators: `trunc ∘ shl = shl ∘ trunc`, i.e. `trunc(shl(, )) = shl(trunc(), trunc()))`. But I'm to change it, thanks. anton-afanasyev: "Commute" here means that `trunc` and `shl` commutes as operators: `trunc ∘ shl = shl ∘ trunc`…
		lebedev.riUnsubmitted Done Reply Inline Actions I have literally never seen such usage in all of LLVM. But that may be sample size issue. Usually it's marked as `fold`. lebedev.ri: I have literally never seen such usage in all of LLVM. But that may be sample size issue.
		spatelUnsubmitted Not Done Reply Inline Actions I was also confused by the use of "commute" here. The file name says we're truncating shifts, so I'd just make these all "shl..." or "narrow_shl..." (assuming we'll add the other shift ops in follow-up patches). spatel: I was also confused by the use of "commute" here. The file name says we're truncating shifts…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, renamed tests. anton-afanasyev: Ok, renamed tests.
; CHECK-LABEL: @shl_1_commute(		; CHECK-LABEL: @shl_1_commute(
; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[X:%.]] to i32		; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[X:%.]] to i16
; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ZEXT]], 1		; CHECK-NEXT: [[SHL:%.*]] = shl i16 [[ZEXT]], 1
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[SHL]] to i16		; CHECK-NEXT: ret i16 [[SHL]]
; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext = zext i8 %x to i32		%zext = zext i8 %x to i32
%shl = shl i32 %zext, 1		%shl = shl i32 %zext, 1
%trunc = trunc i32 %shl to i16		%trunc = trunc i32 %shl to i16
ret i16 %trunc		ret i16 %trunc
}		}

define i16 @shl_15_commute(i8 %x) {		define i16 @shl_15_commute(i8 %x) {
; CHECK-LABEL: @shl_15_commute(		; CHECK-LABEL: @shl_15_commute(
; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[X:%.]] to i32		; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[X:%.]] to i16
; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ZEXT]], 15		; CHECK-NEXT: [[SHL:%.*]] = shl i16 [[ZEXT]], 15
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[SHL]] to i16		; CHECK-NEXT: ret i16 [[SHL]]
; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext = zext i8 %x to i32		%zext = zext i8 %x to i32
%shl = shl i32 %zext, 15		%shl = shl i32 %zext, 15
%trunc = trunc i32 %shl to i16		%trunc = trunc i32 %shl to i16
ret i16 %trunc		ret i16 %trunc
}		}

define i16 @shl_16_not_commute(i8 %x) {		define i16 @shl_16_not_commute(i8 %x) {
; CHECK-LABEL: @shl_16_not_commute(		; CHECK-LABEL: @shl_16_not_commute(
; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[X:%.]] to i32		; CHECK-NEXT: [[ZEXT:%.]] = zext i8 [[X:%.]] to i32
; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ZEXT]], 16		; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ZEXT]], 16
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[SHL]] to i16		; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[SHL]] to i16
; CHECK-NEXT: ret i16 [[TRUNC]]		; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext = zext i8 %x to i32		%zext = zext i8 %x to i32
%shl = shl i32 %zext, 16		%shl = shl i32 %zext, 16
%trunc = trunc i32 %shl to i16		%trunc = trunc i32 %shl to i16
		spatelUnsubmitted Done Reply Inline Actions If the left shift amount >= the truncated bit-width, the result must be 0. Looks like -instcombine gets this, but not -instsimplify. Can we adjust this test to something that does not completely fold away? spatel: If the left shift amount >= the truncated bit-width, the result must be 0. Looks like…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Hmm, yes, you're right. But this could be adjusted only if shift amount is variable, this case is covered by `@shl_var_not_commute()` function below. I can only remove this test at all. anton-afanasyev: Hmm, yes, you're right. But this could be adjusted only if shift amount is variable, this case…
		spatelUnsubmitted Done Reply Inline Actions It's fine to leave here. It does suggest a possible optimization (or two) - we could be trying to call SimplifyInst or similar utility to make sure the IR is reduced coming in or going out. spatel: It's fine to leave here. It does suggest a possible optimization (or two) - we could be trying…
ret i16 %trunc		ret i16 %trunc
}		}

define i16 @shl_var_not_commute(i8 %x, i8 %y) {		define i16 @shl_var_not_commute(i8 %x, i8 %y) {
; CHECK-LABEL: @shl_var_not_commute(		; CHECK-LABEL: @shl_var_not_commute(
; CHECK-NEXT: [[ZEXT_X:%.]] = zext i8 [[X:%.]] to i32		; CHECK-NEXT: [[ZEXT_X:%.]] = zext i8 [[X:%.]] to i32
; CHECK-NEXT: [[ZEXT_Y:%.]] = zext i8 [[Y:%.]] to i32		; CHECK-NEXT: [[ZEXT_Y:%.]] = zext i8 [[Y:%.]] to i32
; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ZEXT_X]], [[ZEXT_Y]]		; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ZEXT_X]], [[ZEXT_Y]]
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[SHL]] to i16		; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[SHL]] to i16
; CHECK-NEXT: ret i16 [[TRUNC]]		; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext.x = zext i8 %x to i32		%zext.x = zext i8 %x to i32
%zext.y = zext i8 %y to i32		%zext.y = zext i8 %y to i32
%shl = shl i32 %zext.x, %zext.y		%shl = shl i32 %zext.x, %zext.y
		spatelUnsubmitted Done Reply Inline Actions If the intent is to check for a shift amount with no known bits, it would be better to just add a function argument `%y`, so we're not mixing up that constraint with number of uses or some other factor. spatel: If the intent is to check for a shift amount with no known bits, it would be better to just add…
		anton-afanasyevAuthorUnsubmitted Done Reply Inline Actions Ok, done anton-afanasyev: Ok, done
%trunc = trunc i32 %shl to i16		%trunc = trunc i32 %shl to i16
ret i16 %trunc		ret i16 %trunc
}		}

define i16 @shl_var_commute(i8 %x, i8 %y) {		define i16 @shl_var_commute(i8 %x, i8 %y) {
; CHECK-LABEL: @shl_var_commute(		; CHECK-LABEL: @shl_var_commute(
; CHECK-NEXT: [[ZEXT_X:%.]] = zext i8 [[X:%.]] to i32		; CHECK-NEXT: [[ZEXT_X:%.]] = zext i8 [[X:%.]] to i16
; CHECK-NEXT: [[ZEXT_Y:%.]] = zext i8 [[Y:%.]] to i32		; CHECK-NEXT: [[ZEXT_Y:%.]] = zext i8 [[Y:%.]] to i16
; CHECK-NEXT: [[AND:%.*]] = and i32 [[ZEXT_Y]], 15		; CHECK-NEXT: [[AND:%.*]] = and i16 [[ZEXT_Y]], 15
; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ZEXT_X]], [[AND]]		; CHECK-NEXT: [[SHL:%.*]] = shl i16 [[ZEXT_X]], [[AND]]
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[SHL]] to i16		; CHECK-NEXT: ret i16 [[SHL]]
; CHECK-NEXT: ret i16 [[TRUNC]]
;		;
%zext.x = zext i8 %x to i32		%zext.x = zext i8 %x to i32
%zext.y = zext i8 %y to i32		%zext.y = zext i8 %y to i32
%and = and i32 %zext.y, 15		%and = and i32 %zext.y, 15
%shl = shl i32 %zext.x, %and		%shl = shl i32 %zext.x, %and
%trunc = trunc i32 %shl to i16		%trunc = trunc i32 %shl to i16
ret i16 %trunc		ret i16 %trunc
}		}

define <2 x i16> @shl_vector_commute(<2 x i8> %x) {		define <2 x i16> @shl_vector_commute(<2 x i8> %x) {
; CHECK-LABEL: @shl_vector_commute(		; CHECK-LABEL: @shl_vector_commute(
; CHECK-NEXT: [[Z:%.]] = zext <2 x i8> [[X:%.]] to <2 x i32>		; CHECK-NEXT: [[Z:%.]] = zext <2 x i8> [[X:%.]] to <2 x i16>
; CHECK-NEXT: [[S:%.*]] = shl <2 x i32> [[Z]], <i32 4, i32 10>		; CHECK-NEXT: [[S:%.*]] = shl <2 x i16> [[Z]], <i16 4, i16 10>
; CHECK-NEXT: [[T:%.*]] = trunc <2 x i32> [[S]] to <2 x i16>		; CHECK-NEXT: ret <2 x i16> [[S]]
; CHECK-NEXT: ret <2 x i16> [[T]]
;		;
%z = zext <2 x i8> %x to <2 x i32>		%z = zext <2 x i8> %x to <2 x i32>
%s = shl <2 x i32> %z, <i32 4, i32 10>		%s = shl <2 x i32> %z, <i32 4, i32 10>
%t = trunc <2 x i32> %s to <2 x i16>		%t = trunc <2 x i32> %s to <2 x i16>
ret <2 x i16> %t		ret <2 x i16> %t
}		}

define <2 x i8> @shl_vector_commute_but_no_new_vector_type(<2 x i8> %x) {		define <2 x i8> @shl_vector_commute_but_no_new_vector_type(<2 x i8> %x) {
Show All 19 Lines	;
%z = zext <2 x i8> %x to <2 x i32>		%z = zext <2 x i8> %x to <2 x i32>
%s = shl <2 x i32> %z, <i32 16, i32 5>		%s = shl <2 x i32> %z, <i32 16, i32 5>
%t = trunc <2 x i32> %s to <2 x i16>		%t = trunc <2 x i32> %s to <2 x i16>
ret <2 x i16> %t		ret <2 x i16> %t
}		}

define i16 @shl_nuw(i8 %x) {		define i16 @shl_nuw(i8 %x) {
; CHECK-LABEL: @shl_nuw(		; CHECK-LABEL: @shl_nuw(
; CHECK-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i32		; CHECK-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i16
; CHECK-NEXT: [[S:%.*]] = shl nuw i32 [[Z]], 15		; CHECK-NEXT: [[S:%.*]] = shl i16 [[Z]], 15
; CHECK-NEXT: [[T:%.*]] = trunc i32 [[S]] to i16		; CHECK-NEXT: ret i16 [[S]]
; CHECK-NEXT: ret i16 [[T]]
;		;
%z = zext i8 %x to i32		%z = zext i8 %x to i32
%s = shl nuw i32 %z, 15		%s = shl nuw i32 %z, 15
%t = trunc i32 %s to i16		%t = trunc i32 %s to i16
ret i16 %t		ret i16 %t
}		}

define i16 @shl_nsw(i8 %x) {		define i16 @shl_nsw(i8 %x) {
; CHECK-LABEL: @shl_nsw(		; CHECK-LABEL: @shl_nsw(
; CHECK-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i32		; CHECK-NEXT: [[Z:%.]] = zext i8 [[X:%.]] to i16
; CHECK-NEXT: [[S:%.*]] = shl nsw i32 [[Z]], 15		; CHECK-NEXT: [[S:%.*]] = shl i16 [[Z]], 15
; CHECK-NEXT: [[T:%.*]] = trunc i32 [[S]] to i16		; CHECK-NEXT: ret i16 [[S]]
; CHECK-NEXT: ret i16 [[T]]
;		;
%z = zext i8 %x to i32		%z = zext i8 %x to i32
%s = shl nsw i32 %z, 15		%s = shl nsw i32 %z, 15
%t = trunc i32 %s to i16		%t = trunc i32 %s to i16
ret i16 %t		ret i16 %t
}		}

define i16 @lshr_commute(i16 %x) {		define i16 @lshr_commute(i16 %x) {
▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAGClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 366550

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll

[AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG
ClosedPublic