This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
3/3
DivRemPairs.cpp
-
test/Transforms/DivRemPairs/
-
Transforms/
-
DivRemPairs/
-
PowerPC/
-
div-expanded-rem-pair.ll
-
div-rem-pairs.ll
-
X86/
-
div-rem-pairs.ll

Differential D76483

[DivRemPairs] Freeze operands if they can be undef values
ClosedPublic

Authored by aqjune on Mar 20 2020, 2:09 AM.

Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
george.burgess.iv

Commits

rG49f75132bcdf: [DivRemPairs] Freeze operands if they can be undef values

Summary

DivRemPairs is unsound with respect to undef values.

// bb1:
//   %rem = srem %x, %y
// bb2:
//   %div = sdiv %x, %y
// -->
// bb1:
//   %div = sdiv %x, %y
//   %mul = mul %div, %y
//   %rem = sub %x, %mul

If X can be undef, X should be frozen first.
For example, let's assume that Y = 1 & X = undef:

  %div = sdiv undef, 1 // %div = undef
  %rem = srem undef, 1 // %rem = 0
=>
  %div = sdiv undef, 1 // %div = undef
  %mul = mul %div, 1   // %mul = undef
  %rem = sub %x, %mul  // %rem = undef - undef = undef

http://volta.cs.utah.edu:8080/z/m7Xrx5

Same for Y. If X = 1 and Y = (undef | 1), %rem in src is either 1 or 0,
but %rem in tgt can be one of many integer values.

This resolves https://bugs.llvm.org/show_bug.cgi?id=42619 .

This miscompilation disappears if undef value is removed, but it may take a while.
DivRemPair happens pretty late during the optimization pipeline, so this optimization seemed as a good candidate to fix without major regression using freeze than other broken optimizations.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aqjune created this revision.Mar 20 2020, 2:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2020, 2:09 AM

Herald added subscribers: llvm-commits, hiraditya, nemanjai. · View Herald Transcript

aqjune edited the summary of this revision. (Show Details)Mar 20 2020, 2:11 AM

aqjune added reviewers: spatel, lebedev.ri, george.burgess.iv.

aqjune added subscribers: nlopes, regehr.

Herald added a subscriber: • wuzish. · View Herald TranscriptMar 20 2020, 2:11 AM

Harbormaster failed remote builds in B49848: Diff 251573!Mar 20 2020, 3:13 AM

I agree that this is a good candidate for introducing more freeze.
Just to check my understanding of current behavior: the freeze insts will survive to SelectionDAGBuilder, and there they get removed?

llvm/lib/Transforms/Scalar/DivRemPairs.cpp
307	freezed -> frozen
315	freezed -> frozen

spatel added inline comments.Mar 20 2020, 5:00 AM

llvm/lib/Transforms/Scalar/DivRemPairs.cpp
318	I think this should either be "FreezeInst " or "auto " to conform to LLVM coding conventions. (Same for line 325).

I assessed the performance impact by running test-suite & comparing generated assemblies before/after this patch, and found that there was only one assembly file having diff.
It was SingleSource/Regression/C/gcc-c-torture/execute/scal-to-vec1.c , and I could find that freeze was not simplifying vector constant with non-undef integers. I'll update https://reviews.llvm.org/D76010 to cover it.

Just to check my understanding of current behavior: the freeze insts will survive to SelectionDAGBuilder, and there they get removed?

Yes, freeze is simply removed when lowering to SelDag.
The relevant patch is https://reviews.llvm.org/D29014 . I was postponing to landing it to master because MachineIR people didn't accept the patch.
I think it is a good idea to splitting the patch D29014 to cover only SelDag first. What do you think?

Address comments

aqjune marked 3 inline comments as done.Mar 21 2020, 12:27 AM

Harbormaster completed remote builds in B49986: Diff 251824.Mar 21 2020, 1:01 AM

aqjune edited the summary of this revision. (Show Details)Mar 21 2020, 1:07 AM

In D76483#1934978, @aqjune wrote:

Yes, freeze is simply removed when lowering to SelDag.
The relevant patch is https://reviews.llvm.org/D29014 . I was postponing to landing it to master because MachineIR people didn't accept the patch.
I think it is a good idea to splitting the patch D29014 to cover only SelDag first. What do you think?

Yes, let's split it up, so we can make progress. Thanks for pushing things forward!

LGTM - but let's get the relevant parts of D76010 and D29014 committed first, so we don't introduce any known regressions/logic bugs.
You can make this patch depend on the others here in Phabricator to show the relationship.

This revision is now accepted and ready to land.Mar 22 2020, 8:39 AM

aqjune added parent revisions: D76010: [ValueTracking] Let isGuaranteedNotToBeUndefOrPoison look into more constants/instructions, D29014: [SelDag] Add FREEZE.Mar 22 2020, 10:59 PM

aqjune mentioned this in D29014: [SelDag] Add FREEZE.Mar 23 2020, 12:19 AM

As D76702 is merged, I'm looking how SingleSource/Regression/C/gcc-c-torture/execute/scal-to-vec1.c is changed now

I checked that the generated IR of scal-to-vec1.c with this patch is equivalent to the one without this patch because freeze was constant-folded away.
May I land this? Now I'll work on resolving the regression that was described at D29014.

In D76483#1939666, @aqjune wrote:

I checked that the generated IR of scal-to-vec1.c with this patch is equivalent to the one without this patch because freeze was constant-folded away.
May I land this? Now I'll work on resolving the regression that was described at D29014.

Yes, please land.

aqjune edited parent revisions, added: D76702: [ValueTracking] improve undef/poison analysis for constant vectors; removed: D76010: [ValueTracking] Let isGuaranteedNotToBeUndefOrPoison look into more constants/instructions.Mar 24 2020, 11:46 AM

Closed by commit rG49f75132bcdf: [DivRemPairs] Freeze operands if they can be undef values (authored by aqjune). · Explain WhyMar 24 2020, 11:50 AM

This revision was automatically updated to reflect the committed changes.

Hi, we're seeing a performance drop of 1.3% on SPEC 2017 mcf_r (compiled with LTO enabled) on AArch64 that bisects down to this patch. I'm testing whether D76010 happens to fix this regression (I'll comment when I get the results), but if not then this might need some investigation to see what's going on.

In D76483#1946766, @sanwou01 wrote:

Hi, we're seeing a performance drop of 1.3% on SPEC 2017 mcf_r (compiled with LTO enabled) on AArch64 that bisects down to this patch. I'm testing whether D76010 happens to fix this regression (I'll comment when I get the results), but if not then this might need some investigation to see what's going on.

Hi, thank you for the info. Once the blocked transformation is identified, I'll help finding out solutions for the problem.

aqjune mentioned this in D77076: [InstSimplify] Allow some arithmetic optimizations for add/sub/div/rem look through freeze.Mar 30 2020, 11:15 AM

Hi, I can confirm that D76010 unfortunately doesn't fix the regression.

I've narrowed the problem down to Loop Strength Reduction. It looks like SCEV can't see "through" the freeze node.

I'll try to get a reduced reproducer that shows the problem; and I'll be happy to test a patch.

Thanks!

In D76483#1950450, @sanwou01 wrote:

Hi, I can confirm that D76010 unfortunately doesn't fix the regression.

I've narrowed the problem down to Loop Strength Reduction. It looks like SCEV can't see "through" the freeze node.

I'll try to get a reduced reproducer that shows the problem; and I'll be happy to test a patch.

Thanks!

I see. This link might be helpful: https://reviews.llvm.org/D70623
A possible wokaround would be to remove nsw/nuw tags from induction variable and freeze the initial value.

loop:
  %i = phi %init, %i.inc.fr
  %i.inc = add nsw 1, %i
  %i.inc.fr = freeze %i.inc // we can optimize this out by removing nsw flag from %i.inc and freezing %init
  br (%i.inc.fr < %n), loop, exit
=>
  %init.fr = freeze %init
loop:
  %i = phi %init.fr, %i.inc
  %i.inc = add 1, %i
  br (%i.inc < %n), loop, exit

BTW, IIUC one of the main motivation of DivRemPairs was to help better codegen. If there is any (aggressive) optimization after DivRemPairs, wouldn't it possibly shuffle the arranged instructions?
I wonder whether making DivRemPairs fire after other optimizations in LTO makes sense.

[...] It looks like SCEV can't see "through" the freeze node. [...]

I see. This link might be helpful: https://reviews.llvm.org/D70623

Ah, so it is quite difficult to extend SCEV. (at least, it's making my head hurt already!)

I tried turning off DivRemPairs during the pre-link optimzations, but this also broke the optimization in SCEV/loop-reduce, so I don't think it will help to move DivRemPairs later in the pipe.

I'm not sure I understand the other workaround.

However, I did manage to reduce the regression:

$ clang --target=aarch64-linux-gnu -O3 -flto -fno-strict-aliasing -o reduced.o -c reduced.c
$ clang --target=aarch64-linux-gnu -O3 -flto -fno-strict-aliasing -o reduced reduced.o

reduced.c:

typedef long a;
struct arc {
  int b;
} * h;
typedef struct {
  a c;
  struct arc arcs;
  a d, e, f;
} g;
g j;
a k, m;
g *l;
a n();
int main() { n(&j); }
a o(a p) {
  a q = p % l->d;
  if (q > l->e)
    k = p / l->d + (l->e * l->f + (q - l->e) * (l->f - 1));
  else
    k = p / l->d + l->f;
  return k;
}
a n(g *p) {
  a i;
  for (i = 0; i < p->c; i++, h = &p->arcs + o(++m))
    h->b = m;
  return 0;
}

Thank you for the reduced example. I'll have a look.

Thanks, much appreciated!

sanwou01 mentioned this in D77523: Add CanonicalizeFreezeInLoops pass.Apr 17 2020, 4:21 AM

Hi,

I wrote https://bugs.llvm.org/show_bug.cgi?id=45885 about a crash which starts happening with this patch.

My guess is that isGuaranteedNotToBeUndefOrPoison doesn't like "weird" code that may appear in blocks that are not reachable from entry.

Hi. okay, I'll have a look.

sanwou01 mentioned this in D77524: [TargetPassConfig] Add CanonicalizeFreezeInLoops before LSR.May 28 2020, 10:15 AM

aqjune mentioned this in D84940: [JumpThreading] Conditionally freeze its condition when unfolding select.Jul 30 2020, 6:00 PM

Please see: https://bugs.llvm.org/show_bug.cgi?id=50573.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DivRemPairs.cpp

24 lines

test/

Transforms/

DivRemPairs/

PowerPC/

div-expanded-rem-pair.ll

16 lines

div-rem-pairs.ll

72 lines

X86/

div-rem-pairs.ll

8 lines

Diff 252386

llvm/lib/Transforms/Scalar/DivRemPairs.cpp

Show All 11 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar/DivRemPairs.h"		#include "llvm/Transforms/Scalar/DivRemPairs.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/DebugCounter.h"		#include "llvm/Support/DebugCounter.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BypassSlowDivision.h"		#include "llvm/Transforms/Utils/BypassSlowDivision.h"
▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	if (HasDivRemOp) {
// If the div and rem are in the same block, we do the same transform,		// If the div and rem are in the same block, we do the same transform,
// but any code movement would be within the same block.		// but any code movement would be within the same block.

if (!DivDominates)		if (!DivDominates)
DivInst->moveBefore(RemInst);		DivInst->moveBefore(RemInst);
Mul->insertAfter(RemInst);		Mul->insertAfter(RemInst);
Sub->insertAfter(Mul);		Sub->insertAfter(Mul);

		// If X can be undef, X should be frozen first.
		spatelUnsubmitted Done Reply Inline Actions freezed -> frozen spatel: freezed -> frozen
		// For example, let's assume that Y = 1 & X = undef:
		// %div = sdiv undef, 1 // %div = undef
		// %rem = srem undef, 1 // %rem = 0
		// =>
		// %div = sdiv undef, 1 // %div = undef
		// %mul = mul %div, 1 // %mul = undef
		// %rem = sub %x, %mul // %rem = undef - undef = undef
		// If X is not frozen, %rem becomes undef after transformation.
		spatelUnsubmitted Done Reply Inline Actions freezed -> frozen spatel: freezed -> frozen
		// TODO: We need a undef-specific checking function in ValueTracking
		if (!isGuaranteedNotToBeUndefOrPoison(X, DivInst, &DT)) {
		auto *FrX = new FreezeInst(X, X->getName() + ".frozen", DivInst);
		spatelUnsubmitted Done Reply Inline Actions I think this should either be "FreezeInst " or "auto " to conform to LLVM coding conventions. (Same for line 325). spatel: I think this should either be "FreezeInst " or "auto " to conform to LLVM coding conventions.
		DivInst->setOperand(0, FrX);
		Sub->setOperand(0, FrX);
		}
		// Same for Y. If X = 1 and Y = (undef \| 1), %rem in src is either 1 or 0,
		// but %rem in tgt can be one of many integer values.
		if (!isGuaranteedNotToBeUndefOrPoison(Y, DivInst, &DT)) {
		auto *FrY = new FreezeInst(Y, Y->getName() + ".frozen", DivInst);
		DivInst->setOperand(1, FrY);
		Mul->setOperand(1, FrY);
		}

// Now kill the explicit remainder. We have replaced it with:		// Now kill the explicit remainder. We have replaced it with:
// (sub X, (mul (div X, Y), Y)		// (sub X, (mul (div X, Y), Y)
Sub->setName(RemInst->getName() + ".decomposed");		Sub->setName(RemInst->getName() + ".decomposed");
Instruction *OrigRemInst = RemInst;		Instruction *OrigRemInst = RemInst;
// Update AssertingVH<> with new instruction so it doesn't assert.		// Update AssertingVH<> with new instruction so it doesn't assert.
RemInst = Sub;		RemInst = Sub;
// And replace the original instruction with the new one.		// And replace the original instruction with the new one.
OrigRemInst->replaceAllUsesWith(Sub);		OrigRemInst->replaceAllUsesWith(Sub);
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/Transforms/DivRemPairs/PowerPC/div-expanded-rem-pair.ll

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	end:
ret i8 %ret		ret i8 %ret
}		}

; Be careful with RAUW/invalidation if this is a srem-of-srem.		; Be careful with RAUW/invalidation if this is a srem-of-srem.

define i32 @srem_of_srem_unexpanded(i32 %X, i32 %Y, i32 %Z) {		define i32 @srem_of_srem_unexpanded(i32 %X, i32 %Y, i32 %Z) {
; CHECK-LABEL: @srem_of_srem_unexpanded(		; CHECK-LABEL: @srem_of_srem_unexpanded(
; CHECK-NEXT: [[T0:%.]] = mul nsw i32 [[Z:%.]], [[Y:%.*]]		; CHECK-NEXT: [[T0:%.]] = mul nsw i32 [[Z:%.]], [[Y:%.*]]
; CHECK-NEXT: [[T1:%.]] = sdiv i32 [[X:%.]], [[T0]]		; CHECK-NEXT: [[X_FROZEN:%.]] = freeze i32 [[X:%.]]
		; CHECK-NEXT: [[T0_FROZEN:%.*]] = freeze i32 [[T0]]
		; CHECK-NEXT: [[T1:%.*]] = sdiv i32 [[X_FROZEN]], [[T0_FROZEN]]
; CHECK-NEXT: [[T2:%.*]] = mul nsw i32 [[T0]], [[T1]]		; CHECK-NEXT: [[T2:%.*]] = mul nsw i32 [[T0]], [[T1]]
; CHECK-NEXT: [[TMP1:%.*]] = mul i32 [[T1]], [[T0]]		; CHECK-NEXT: [[TMP1:%.*]] = mul i32 [[T1]], [[T0_FROZEN]]
; CHECK-NEXT: [[T3_DECOMPOSED:%.*]] = sub i32 [[X]], [[TMP1]]		; CHECK-NEXT: [[T3_DECOMPOSED:%.*]] = sub i32 [[X_FROZEN]], [[TMP1]]
; CHECK-NEXT: [[T4:%.*]] = sdiv i32 [[T3_DECOMPOSED]], [[Y]]		; CHECK-NEXT: [[T3_DECOMPOSED_FROZEN:%.*]] = freeze i32 [[T3_DECOMPOSED]]
		; CHECK-NEXT: [[Y_FROZEN:%.*]] = freeze i32 [[Y]]
		; CHECK-NEXT: [[T4:%.*]] = sdiv i32 [[T3_DECOMPOSED_FROZEN]], [[Y_FROZEN]]
; CHECK-NEXT: [[T5:%.*]] = mul nsw i32 [[T4]], [[Y]]		; CHECK-NEXT: [[T5:%.*]] = mul nsw i32 [[T4]], [[Y]]
; CHECK-NEXT: [[TMP2:%.*]] = mul i32 [[T4]], [[Y]]		; CHECK-NEXT: [[TMP2:%.*]] = mul i32 [[T4]], [[Y_FROZEN]]
; CHECK-NEXT: [[T6_DECOMPOSED:%.*]] = sub i32 [[T3_DECOMPOSED]], [[TMP2]]		; CHECK-NEXT: [[T6_DECOMPOSED:%.*]] = sub i32 [[T3_DECOMPOSED_FROZEN]], [[TMP2]]
; CHECK-NEXT: ret i32 [[T6_DECOMPOSED]]		; CHECK-NEXT: ret i32 [[T6_DECOMPOSED]]
;		;
%t0 = mul nsw i32 %Z, %Y		%t0 = mul nsw i32 %Z, %Y
%t1 = sdiv i32 %X, %t0		%t1 = sdiv i32 %X, %t0
%t2 = mul nsw i32 %t0, %t1		%t2 = mul nsw i32 %t0, %t1
%t3 = srem i32 %X, %t0		%t3 = srem i32 %X, %t0
%t4 = sdiv i32 %t3, %Y		%t4 = sdiv i32 %t3, %Y
%t5 = mul nsw i32 %t4, %Y		%t5 = mul nsw i32 %t4, %Y
▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/Transforms/DivRemPairs/PowerPC/div-rem-pairs.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -div-rem-pairs -S -mtriple=powerpc64-unknown-unknown \| FileCheck %s		; RUN: opt < %s -div-rem-pairs -S -mtriple=powerpc64-unknown-unknown \| FileCheck %s

declare void @foo(i32, i32)		declare void @foo(i32, i32)

define void @decompose_illegal_srem_same_block(i32 %a, i32 %b) {		define void @decompose_illegal_srem_same_block(i32 %a, i32 %b) {
; CHECK-LABEL: @decompose_illegal_srem_same_block(		; CHECK-LABEL: @decompose_illegal_srem_same_block(
; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[A_FROZEN:%.]] = freeze i32 [[A:%.]]
; CHECK-NEXT: [[TMP1:%.*]] = mul i32 [[DIV]], [[B]]		; CHECK-NEXT: [[B_FROZEN:%.]] = freeze i32 [[B:%.]]
; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i32 [[A]], [[TMP1]]		; CHECK-NEXT: [[DIV:%.*]] = sdiv i32 [[A_FROZEN]], [[B_FROZEN]]
		; CHECK-NEXT: [[TMP1:%.*]] = mul i32 [[DIV]], [[B_FROZEN]]
		; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i32 [[A_FROZEN]], [[TMP1]]
; CHECK-NEXT: call void @foo(i32 [[REM_DECOMPOSED]], i32 [[DIV]])		; CHECK-NEXT: call void @foo(i32 [[REM_DECOMPOSED]], i32 [[DIV]])
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%rem = srem i32 %a, %b		%rem = srem i32 %a, %b
%div = sdiv i32 %a, %b		%div = sdiv i32 %a, %b
call void @foo(i32 %rem, i32 %div)		call void @foo(i32 %rem, i32 %div)
ret void		ret void
}		}

define void @decompose_illegal_urem_same_block(i32 %a, i32 %b) {		define void @decompose_illegal_urem_same_block(i32 %a, i32 %b) {
; CHECK-LABEL: @decompose_illegal_urem_same_block(		; CHECK-LABEL: @decompose_illegal_urem_same_block(
; CHECK-NEXT: [[DIV:%.]] = udiv i32 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[A_FROZEN:%.]] = freeze i32 [[A:%.]]
; CHECK-NEXT: [[TMP1:%.*]] = mul i32 [[DIV]], [[B]]		; CHECK-NEXT: [[B_FROZEN:%.]] = freeze i32 [[B:%.]]
; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i32 [[A]], [[TMP1]]		; CHECK-NEXT: [[DIV:%.*]] = udiv i32 [[A_FROZEN]], [[B_FROZEN]]
		; CHECK-NEXT: [[TMP1:%.*]] = mul i32 [[DIV]], [[B_FROZEN]]
		; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i32 [[A_FROZEN]], [[TMP1]]
; CHECK-NEXT: call void @foo(i32 [[REM_DECOMPOSED]], i32 [[DIV]])		; CHECK-NEXT: call void @foo(i32 [[REM_DECOMPOSED]], i32 [[DIV]])
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%div = udiv i32 %a, %b		%div = udiv i32 %a, %b
%rem = urem i32 %a, %b		%rem = urem i32 %a, %b
call void @foo(i32 %rem, i32 %div)		call void @foo(i32 %rem, i32 %div)
ret void		ret void
}		}

; Hoist and optionally decompose the sdiv because it's safe and free.		; Hoist and optionally decompose the sdiv because it's safe and free.
; PR31028 - https://bugs.llvm.org/show_bug.cgi?id=31028		; PR31028 - https://bugs.llvm.org/show_bug.cgi?id=31028

define i32 @hoist_sdiv(i32 %a, i32 %b) {		define i32 @hoist_sdiv(i32 %a, i32 %b) {
; CHECK-LABEL: @hoist_sdiv(		; CHECK-LABEL: @hoist_sdiv(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[DIV:%.]] = sdiv i32 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[A_FROZEN:%.]] = freeze i32 [[A:%.]]
; CHECK-NEXT: [[TMP0:%.*]] = mul i32 [[DIV]], [[B]]		; CHECK-NEXT: [[B_FROZEN:%.]] = freeze i32 [[B:%.]]
; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i32 [[A]], [[TMP0]]		; CHECK-NEXT: [[DIV:%.*]] = sdiv i32 [[A_FROZEN]], [[B_FROZEN]]
		; CHECK-NEXT: [[TMP0:%.*]] = mul i32 [[DIV]], [[B_FROZEN]]
		; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i32 [[A_FROZEN]], [[TMP0]]
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[REM_DECOMPOSED]], 42		; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[REM_DECOMPOSED]], 42
; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[END:%.]]		; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[END:%.]]
; CHECK: if:		; CHECK: if:
; CHECK-NEXT: br label [[END]]		; CHECK-NEXT: br label [[END]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: [[RET:%.]] = phi i32 [ [[DIV]], [[IF]] ], [ 3, [[ENTRY:%.]] ]		; CHECK-NEXT: [[RET:%.]] = phi i32 [ [[DIV]], [[IF]] ], [ 3, [[ENTRY:%.]] ]
; CHECK-NEXT: ret i32 [[RET]]		; CHECK-NEXT: ret i32 [[RET]]
;		;
Show All 11 Lines	end:
ret i32 %ret		ret i32 %ret
}		}

; Hoist and optionally decompose the udiv because it's safe and free.		; Hoist and optionally decompose the udiv because it's safe and free.

define i64 @hoist_udiv(i64 %a, i64 %b) {		define i64 @hoist_udiv(i64 %a, i64 %b) {
; CHECK-LABEL: @hoist_udiv(		; CHECK-LABEL: @hoist_udiv(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[DIV:%.]] = udiv i64 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[A_FROZEN:%.]] = freeze i64 [[A:%.]]
; CHECK-NEXT: [[TMP0:%.*]] = mul i64 [[DIV]], [[B]]		; CHECK-NEXT: [[B_FROZEN:%.]] = freeze i64 [[B:%.]]
; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i64 [[A]], [[TMP0]]		; CHECK-NEXT: [[DIV:%.*]] = udiv i64 [[A_FROZEN]], [[B_FROZEN]]
		; CHECK-NEXT: [[TMP0:%.*]] = mul i64 [[DIV]], [[B_FROZEN]]
		; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i64 [[A_FROZEN]], [[TMP0]]
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[REM_DECOMPOSED]], 42		; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[REM_DECOMPOSED]], 42
; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[END:%.]]		; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[END:%.]]
; CHECK: if:		; CHECK: if:
; CHECK-NEXT: br label [[END]]		; CHECK-NEXT: br label [[END]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: [[RET:%.]] = phi i64 [ [[DIV]], [[IF]] ], [ 3, [[ENTRY:%.]] ]		; CHECK-NEXT: [[RET:%.]] = phi i64 [ [[DIV]], [[IF]] ], [ 3, [[ENTRY:%.]] ]
; CHECK-NEXT: ret i64 [[RET]]		; CHECK-NEXT: ret i64 [[RET]]
;		;
Show All 11 Lines	end:
ret i64 %ret		ret i64 %ret
}		}

; Hoist the srem if it's safe and free, otherwise decompose it.		; Hoist the srem if it's safe and free, otherwise decompose it.

define i16 @hoist_srem(i16 %a, i16 %b) {		define i16 @hoist_srem(i16 %a, i16 %b) {
; CHECK-LABEL: @hoist_srem(		; CHECK-LABEL: @hoist_srem(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[DIV:%.]] = sdiv i16 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[A_FROZEN:%.]] = freeze i16 [[A:%.]]
		; CHECK-NEXT: [[B_FROZEN:%.]] = freeze i16 [[B:%.]]
		; CHECK-NEXT: [[DIV:%.*]] = sdiv i16 [[A_FROZEN]], [[B_FROZEN]]
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[DIV]], 42		; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[DIV]], 42
; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[END:%.]]		; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[END:%.]]
; CHECK: if:		; CHECK: if:
; CHECK-NEXT: [[TMP0:%.*]] = mul i16 [[DIV]], [[B]]		; CHECK-NEXT: [[TMP0:%.*]] = mul i16 [[DIV]], [[B_FROZEN]]
; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i16 [[A]], [[TMP0]]		; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i16 [[A_FROZEN]], [[TMP0]]
; CHECK-NEXT: br label [[END]]		; CHECK-NEXT: br label [[END]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: [[RET:%.]] = phi i16 [ [[REM_DECOMPOSED]], [[IF]] ], [ 3, [[ENTRY:%.]] ]		; CHECK-NEXT: [[RET:%.]] = phi i16 [ [[REM_DECOMPOSED]], [[IF]] ], [ 3, [[ENTRY:%.]] ]
; CHECK-NEXT: ret i16 [[RET]]		; CHECK-NEXT: ret i16 [[RET]]
;		;
entry:		entry:
%div = sdiv i16 %a, %b		%div = sdiv i16 %a, %b
%cmp = icmp eq i16 %div, 42		%cmp = icmp eq i16 %div, 42
br i1 %cmp, label %if, label %end		br i1 %cmp, label %if, label %end

if:		if:
%rem = srem i16 %a, %b		%rem = srem i16 %a, %b
br label %end		br label %end

end:		end:
%ret = phi i16 [ %rem, %if ], [ 3, %entry ]		%ret = phi i16 [ %rem, %if ], [ 3, %entry ]
ret i16 %ret		ret i16 %ret
}		}

; Hoist the urem if it's safe and free, otherwise decompose it.		; Hoist the urem if it's safe and free, otherwise decompose it.

define i8 @hoist_urem(i8 %a, i8 %b) {		define i8 @hoist_urem(i8 %a, i8 %b) {
; CHECK-LABEL: @hoist_urem(		; CHECK-LABEL: @hoist_urem(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[DIV:%.]] = udiv i8 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[A_FROZEN:%.]] = freeze i8 [[A:%.]]
		; CHECK-NEXT: [[B_FROZEN:%.]] = freeze i8 [[B:%.]]
		; CHECK-NEXT: [[DIV:%.*]] = udiv i8 [[A_FROZEN]], [[B_FROZEN]]
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i8 [[DIV]], 42		; CHECK-NEXT: [[CMP:%.*]] = icmp eq i8 [[DIV]], 42
; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[END:%.]]		; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[END:%.]]
; CHECK: if:		; CHECK: if:
; CHECK-NEXT: [[TMP0:%.*]] = mul i8 [[DIV]], [[B]]		; CHECK-NEXT: [[TMP0:%.*]] = mul i8 [[DIV]], [[B_FROZEN]]
; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i8 [[A]], [[TMP0]]		; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i8 [[A_FROZEN]], [[TMP0]]
; CHECK-NEXT: br label [[END]]		; CHECK-NEXT: br label [[END]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: [[RET:%.]] = phi i8 [ [[REM_DECOMPOSED]], [[IF]] ], [ 3, [[ENTRY:%.]] ]		; CHECK-NEXT: [[RET:%.]] = phi i8 [ [[REM_DECOMPOSED]], [[IF]] ], [ 3, [[ENTRY:%.]] ]
; CHECK-NEXT: ret i8 [[RET]]		; CHECK-NEXT: ret i8 [[RET]]
;		;
entry:		entry:
%div = udiv i8 %a, %b		%div = udiv i8 %a, %b
%cmp = icmp eq i8 %div, 42		%cmp = icmp eq i8 %div, 42
br i1 %cmp, label %if, label %end		br i1 %cmp, label %if, label %end

if:		if:
%rem = urem i8 %a, %b		%rem = urem i8 %a, %b
br label %end		br label %end

end:		end:
%ret = phi i8 [ %rem, %if ], [ 3, %entry ]		%ret = phi i8 [ %rem, %if ], [ 3, %entry ]
ret i8 %ret		ret i8 %ret
}		}

; Be careful with RAUW/invalidation if this is a srem-of-srem.		; Be careful with RAUW/invalidation if this is a srem-of-srem.

define i32 @srem_of_srem_unexpanded(i32 %X, i32 %Y, i32 %Z) {		define i32 @srem_of_srem_unexpanded(i32 %X, i32 %Y, i32 %Z) {
; CHECK-LABEL: @srem_of_srem_unexpanded(		; CHECK-LABEL: @srem_of_srem_unexpanded(
; CHECK-NEXT: [[T0:%.]] = mul nsw i32 [[Z:%.]], [[Y:%.*]]		; CHECK-NEXT: [[T0:%.]] = mul nsw i32 [[Z:%.]], [[Y:%.*]]
; CHECK-NEXT: [[T1:%.]] = sdiv i32 [[X:%.]], [[T0]]		; CHECK-NEXT: [[X_FROZEN:%.]] = freeze i32 [[X:%.]]
		; CHECK-NEXT: [[T0_FROZEN:%.*]] = freeze i32 [[T0]]
		; CHECK-NEXT: [[T1:%.*]] = sdiv i32 [[X_FROZEN]], [[T0_FROZEN]]
; CHECK-NEXT: [[T2:%.*]] = mul nsw i32 [[T0]], [[T1]]		; CHECK-NEXT: [[T2:%.*]] = mul nsw i32 [[T0]], [[T1]]
; CHECK-NEXT: [[TMP1:%.*]] = mul i32 [[T1]], [[T0]]		; CHECK-NEXT: [[TMP1:%.*]] = mul i32 [[T1]], [[T0_FROZEN]]
; CHECK-NEXT: [[T3_DECOMPOSED:%.*]] = sub i32 [[X]], [[TMP1]]		; CHECK-NEXT: [[T3_DECOMPOSED:%.*]] = sub i32 [[X_FROZEN]], [[TMP1]]
; CHECK-NEXT: [[T4:%.*]] = sdiv i32 [[T3_DECOMPOSED]], [[Y]]		; CHECK-NEXT: [[T3_DECOMPOSED_FROZEN:%.*]] = freeze i32 [[T3_DECOMPOSED]]
		; CHECK-NEXT: [[Y_FROZEN:%.*]] = freeze i32 [[Y]]
		; CHECK-NEXT: [[T4:%.*]] = sdiv i32 [[T3_DECOMPOSED_FROZEN]], [[Y_FROZEN]]
; CHECK-NEXT: [[T5:%.*]] = mul nsw i32 [[T4]], [[Y]]		; CHECK-NEXT: [[T5:%.*]] = mul nsw i32 [[T4]], [[Y]]
; CHECK-NEXT: [[TMP2:%.*]] = mul i32 [[T4]], [[Y]]		; CHECK-NEXT: [[TMP2:%.*]] = mul i32 [[T4]], [[Y_FROZEN]]
; CHECK-NEXT: [[T6_DECOMPOSED:%.*]] = sub i32 [[T3_DECOMPOSED]], [[TMP2]]		; CHECK-NEXT: [[T6_DECOMPOSED:%.*]] = sub i32 [[T3_DECOMPOSED_FROZEN]], [[TMP2]]
; CHECK-NEXT: ret i32 [[T6_DECOMPOSED]]		; CHECK-NEXT: ret i32 [[T6_DECOMPOSED]]
;		;
%t0 = mul nsw i32 %Z, %Y		%t0 = mul nsw i32 %Z, %Y
%t1 = sdiv i32 %X, %t0		%t1 = sdiv i32 %X, %t0
%t2 = mul nsw i32 %t0, %t1		%t2 = mul nsw i32 %t0, %t1
%t3 = srem i32 %X, %t0		%t3 = srem i32 %X, %t0
%t4 = sdiv i32 %t3, %Y		%t4 = sdiv i32 %t3, %Y
%t5 = mul nsw i32 %t4, %Y		%t5 = mul nsw i32 %t4, %Y
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	end:
ret i32 %ret		ret i32 %ret
}		}

; If the target doesn't have a unified div/rem op for the type, decompose rem in-place to mul+sub.		; If the target doesn't have a unified div/rem op for the type, decompose rem in-place to mul+sub.

define i128 @dont_hoist_urem(i128 %a, i128 %b) {		define i128 @dont_hoist_urem(i128 %a, i128 %b) {
; CHECK-LABEL: @dont_hoist_urem(		; CHECK-LABEL: @dont_hoist_urem(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[DIV:%.]] = udiv i128 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[A_FROZEN:%.]] = freeze i128 [[A:%.]]
		; CHECK-NEXT: [[B_FROZEN:%.]] = freeze i128 [[B:%.]]
		; CHECK-NEXT: [[DIV:%.*]] = udiv i128 [[A_FROZEN]], [[B_FROZEN]]
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i128 [[DIV]], 42		; CHECK-NEXT: [[CMP:%.*]] = icmp eq i128 [[DIV]], 42
; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[END:%.]]		; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[END:%.]]
; CHECK: if:		; CHECK: if:
; CHECK-NEXT: [[TMP0:%.*]] = mul i128 [[DIV]], [[B]]		; CHECK-NEXT: [[TMP0:%.*]] = mul i128 [[DIV]], [[B_FROZEN]]
; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i128 [[A]], [[TMP0]]		; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i128 [[A_FROZEN]], [[TMP0]]
; CHECK-NEXT: br label [[END]]		; CHECK-NEXT: br label [[END]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: [[RET:%.]] = phi i128 [ [[REM_DECOMPOSED]], [[IF]] ], [ 3, [[ENTRY:%.]] ]		; CHECK-NEXT: [[RET:%.]] = phi i128 [ [[REM_DECOMPOSED]], [[IF]] ], [ 3, [[ENTRY:%.]] ]
; CHECK-NEXT: ret i128 [[RET]]		; CHECK-NEXT: ret i128 [[RET]]
;		;
entry:		entry:
%div = udiv i128 %a, %b		%div = udiv i128 %a, %b
%cmp = icmp eq i128 %div, 42		%cmp = icmp eq i128 %div, 42
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/DivRemPairs/X86/div-rem-pairs.ll

Show First 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	end:
ret i32 %ret		ret i32 %ret
}		}

; If the target doesn't have a unified div/rem op for the type, decompose rem in-place to mul+sub.		; If the target doesn't have a unified div/rem op for the type, decompose rem in-place to mul+sub.

define i128 @dont_hoist_urem(i128 %a, i128 %b) {		define i128 @dont_hoist_urem(i128 %a, i128 %b) {
; CHECK-LABEL: @dont_hoist_urem(		; CHECK-LABEL: @dont_hoist_urem(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[DIV:%.]] = udiv i128 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[A_FROZEN:%.]] = freeze i128 [[A:%.]]
		; CHECK-NEXT: [[B_FROZEN:%.]] = freeze i128 [[B:%.]]
		; CHECK-NEXT: [[DIV:%.*]] = udiv i128 [[A_FROZEN]], [[B_FROZEN]]
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i128 [[DIV]], 42		; CHECK-NEXT: [[CMP:%.*]] = icmp eq i128 [[DIV]], 42
; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[END:%.]]		; CHECK-NEXT: br i1 [[CMP]], label [[IF:%.]], label [[END:%.]]
; CHECK: if:		; CHECK: if:
; CHECK-NEXT: [[TMP0:%.*]] = mul i128 [[DIV]], [[B]]		; CHECK-NEXT: [[TMP0:%.*]] = mul i128 [[DIV]], [[B_FROZEN]]
; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i128 [[A]], [[TMP0]]		; CHECK-NEXT: [[REM_DECOMPOSED:%.*]] = sub i128 [[A_FROZEN]], [[TMP0]]
; CHECK-NEXT: br label [[END]]		; CHECK-NEXT: br label [[END]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: [[RET:%.]] = phi i128 [ [[REM_DECOMPOSED]], [[IF]] ], [ 3, [[ENTRY:%.]] ]		; CHECK-NEXT: [[RET:%.]] = phi i128 [ [[REM_DECOMPOSED]], [[IF]] ], [ 3, [[ENTRY:%.]] ]
; CHECK-NEXT: ret i128 [[RET]]		; CHECK-NEXT: ret i128 [[RET]]
;		;
entry:		entry:
%div = udiv i128 %a, %b		%div = udiv i128 %a, %b
%cmp = icmp eq i128 %div, 42		%cmp = icmp eq i128 %div, 42
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines