Download Raw Diff

Details

Reviewers

fhahn
spatel
lebedev.ri
RKSimon

Commits

rGf61f99a10591: [instcombine] Optimise for zero initialisation of product given fast flags are…

Summary

Currently, clang ignores the 0 initialisation in finite math
For example:

double f_prod = 0;
double arr[1000];
for (size_t i = 0; i < 1000; i++) {
  f_prod *= arr[i];
 }

Clang will ignore that f_prod is set to zero and it will generate assembly to iterate over the loop.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

zjaffal created this revision.Aug 11 2022, 5:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2022, 5:34 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

zjaffal requested review of this revision.Aug 11 2022, 5:34 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptAug 11 2022, 5:34 AM

Harbormaster completed remote builds in B180648: Diff 451814.Aug 11 2022, 7:17 AM

fhahn added reviewers: spatel, lebedev.ri, RKSimon.Aug 11 2022, 10:00 AM

fhahn added inline comments.

llvm/test/Transforms/InstCombine/remove-loop-phi-fastmul.ll
10	It would be good to split off the test changes to a separate patch that just adds the tests.
20	Would be good to use 'rotated' version of the loop to keep things simpler.
37	We should also have tests where all the incoming values are instructions or arguments. We also need some tests with phis with more than 2 incoming values.

zjaffal retitled this revision from [opt] Optimise for zero initialisation of product given finite math in Clang to [instcombine] Optimise for zero initialisation of product given finite math in Clang.Aug 12 2022, 3:33 AM

zjaffal updated this revision to Diff 452131.Aug 12 2022, 3:37 AM

zjaffal added a parent revision: D131757: [instcombine] Test for zero initialisation optimisation of a product given fast flags.

Harbormaster completed remote builds in B180879: Diff 452131.Aug 12 2022, 3:38 AM

spatel added inline comments.Aug 12 2022, 4:55 AM

llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1476 ↗	(On Diff #451814)	Can we use llvm::matchSimpleRecurrence()? Here's an example usage within instcombine: https://github.com/llvm/llvm-project/blob/1828c75d5f4ff657cf977476091fa224c8193e1d/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L2201

zjaffal added inline comments.Aug 15 2022, 3:02 AM

llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1476 ↗	(On Diff #451814)	I tested using `llvm::matchSimpleRecurrence`. Here is what I found We need to move the optimisation into `InstCombineMulDivRem` instead which is fine. We need to add a case into `llvm::matchSimpleRecurrence` to handle `FMul` The main issue of using matchSimpleRecurrence is that it doesn't handle `phi` with more than two operands. I am not sure how common phi's with more than two operands are.

Move functionality to InstCombineMulDivRem.cpp
Use matchSimpleRecurrence

Harbormaster completed remote builds in B181243: Diff 452620.Aug 15 2022, 4:35 AM

spatel added inline comments.Aug 15 2022, 6:14 AM

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
489	Checking for no-inf is correct, but not strictly necessary since 0.0 * Inf = NaN and we check 'nnan' (see InstSimplify for the existing constraints for the fold of X*0.0 to 0.0).
491	We don't need to walk through the phi operands again - the zero we want had to be captured as "Start"? if (matchSimpleRecurrence(&I, Phi, Start, Step) && match(Start, m_Zero()) && I.hasNoInfs() && I.hasNoNaNs() && I.hasNoSignedZeros()) return replaceInstUsesWith(I, Start);
llvm/test/Transforms/InstCombine/remove-loop-phi-fastmul.ll
23	This test should have the minimal FMF needed to trigger the transform.
130–131	I don't think this test is adding value. It just shows the expected X*0.0 simplification to 0.0? All of the tests should be reduced to the minimum IR needed to show the transform. Looking at existing tests in the file "recurrence.ll" might be useful. If you want to verify that a larger example reduces with the usual opt pipeline, it would be better to add a test under "test/Transforms/PhaseOrdering".

Updating D131672: [instcombine] Optimise for zero initialisation of product given finite math in Clang

zjaffal retitled this revision from [instcombine] Optimise for zero initialisation of product given finite math in Clang to [instcombine] Optimise for zero initialisation of product given fast flags are enabled.Aug 15 2022, 10:10 AM

zjaffal marked 7 inline comments as done.

Harbormaster completed remote builds in B181318: Diff 452718.Aug 15 2022, 11:06 AM

spatel added inline comments.Aug 15 2022, 11:37 AM

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
492	Was there a reason to not use the shorter and suggested m_Zero() matcher? It probably doesn't make a difference for this particular pattern, but that code handles things like vector constants, so we won't miss any unusual types.

zjaffal added inline comments.Aug 15 2022, 11:47 AM

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
492	I didn't know it existed. I will push a change now

Updating D131672: [instcombine] Optimise for zero initialisation of product given fast flags are enabled

Harbormaster completed remote builds in B181345: Diff 452759.Aug 15 2022, 12:43 PM

zjaffal marked an inline comment as done.Aug 16 2022, 12:45 AM

Updating D131672: [instcombine] Optimise for zero initialisation of product given fast flags are enabled

Harbormaster completed remote builds in B181473: Diff 452934.Aug 16 2022, 4:06 AM

LGTM - see inline comment for formatting nit.
The tests seem fine too, but let @fhahn have another look in case there are any other suggestions.

I'm not sure what source causes this pattern, but there could be related fast-math simplifications like X * 1.0 --> X.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
492	Remove braces around one-line 'if' clause to be consistent with surrounding code.

This revision is now accepted and ready to land.Aug 16 2022, 4:55 AM

fhahn mentioned this in rG468a9d6d2a5b: [instcombine] Test for zero initialisation optimisation of a product given fast….Aug 16 2022, 6:08 AM

remove brackets around if statement

Harbormaster completed remote builds in B181518: Diff 452993.Aug 16 2022, 7:40 AM

Thanks for the update! Could you also update the description to describe the transformation implemented, with the legality considerations as well?

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
487	Could you add a comment explaining the transform we are applying here? Also, it would probably be good to move this more towards the end of the function, so cheaper patterns are tried first.

Updating D131672: [instcombine] Optimise for zero initialisation of product given fast flags are enabled

@zjaffal Please can you rebase against trunk again? D131838 managed to affect some of your tests

Harbormaster completed remote builds in B181550: Diff 453038.Aug 16 2022, 9:45 AM

Rebase

Harbormaster completed remote builds in B181603: Diff 453100.Aug 16 2022, 1:30 PM

LGTM, thanks!

In D131672#3725714, @spatel wrote:

LGTM - see inline comment for formatting nit.
The tests seem fine too, but let @fhahn have another look in case there are any other suggestions.

I'm not sure what source causes this pattern, but there could be related fast-math simplifications like X * 1.0 --> X.

The source was a user report. I am not sure about FMUL recurrences that have 1.0 as start value. But I think we might want the same fold for integer multiply as follow up.

This revision was landed with ongoing or failed builds.Aug 17 2022, 3:12 AM

Closed by commit rGf61f99a10591: [instcombine] Optimise for zero initialisation of product given fast flags are… (authored by zjaffal, committed by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rGf61f99a10591: [instcombine] Optimise for zero initialisation of product given fast flags are….

Diff 453247

llvm/lib/Analysis/ValueTracking.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,532 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != 2; ++i) {
// TODO: Expand list -- xor, div, gep, uaddo, etc..		// TODO: Expand list -- xor, div, gep, uaddo, etc..
case Instruction::LShr:		case Instruction::LShr:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::Shl:		case Instruction::Shl:
case Instruction::Add:		case Instruction::Add:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Mul: {		case Instruction::Mul:
		case Instruction::FMul: {
Value *LL = LU->getOperand(0);		Value *LL = LU->getOperand(0);
Value *LR = LU->getOperand(1);		Value *LR = LU->getOperand(1);
// Find a recurrence.		// Find a recurrence.
if (LL == P)		if (LL == P)
L = LR;		L = LR;
else if (LR == P)		else if (LR == P)
L = LL;		L = LL;
else		else
▲ Show 20 Lines • Show All 823 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

Show All 9 Lines
// srem, urem, frem.		// srem, urem, frem.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InstCombineInternal.h"		#include "InstCombineInternal.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
▲ Show 20 Lines • Show All 452 Lines • ▼ Show 20 Lines	if (Instruction *FoldedMul = foldBinOpIntoSelectOrPhi(I))
return FoldedMul;		return FoldedMul;

if (Value *FoldedMul = foldMulSelectToNegate(I, Builder))		if (Value *FoldedMul = foldMulSelectToNegate(I, Builder))
return replaceInstUsesWith(I, FoldedMul);		return replaceInstUsesWith(I, FoldedMul);

if (Instruction *R = foldFPSignBitOps(I))		if (Instruction *R = foldFPSignBitOps(I))
return R;		return R;

// X * -1.0 --> -X		// X * -1.0 --> -X
		fhahnUnsubmitted Not Done Reply Inline Actions Could you add a comment explaining the transform we are applying here? Also, it would probably be good to move this more towards the end of the function, so cheaper patterns are tried first. fhahn: Could you add a comment explaining the transform we are applying here? Also, it would probably…
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
if (match(Op1, m_SpecificFP(-1.0)))		if (match(Op1, m_SpecificFP(-1.0)))
		spatelUnsubmitted Done Reply Inline Actions Checking for no-inf is correct, but not strictly necessary since 0.0 * Inf = NaN and we check 'nnan' (see InstSimplify for the existing constraints for the fold of X0.0 to 0.0). spatel:* Checking for no-inf is correct, but not strictly necessary since 0.0 * Inf = NaN and we check…
return UnaryOperator::CreateFNegFMF(Op0, &I);		return UnaryOperator::CreateFNegFMF(Op0, &I);

		spatelUnsubmitted Done Reply Inline Actions We don't need to walk through the phi operands again - the zero we want had to be captured as "Start"? if (matchSimpleRecurrence(&I, Phi, Start, Step) && match(Start, m_Zero()) && I.hasNoInfs() && I.hasNoNaNs() && I.hasNoSignedZeros()) return replaceInstUsesWith(I, Start); spatel: We don't need to walk through the phi operands again - the zero we want had to be captured as…
// -X * C --> X * -C		// -X * C --> X * -C
		spatelUnsubmitted Done Reply Inline Actions Was there a reason to not use the shorter and suggested m_Zero() matcher? It probably doesn't make a difference for this particular pattern, but that code handles things like vector constants, so we won't miss any unusual types. spatel: Was there a reason to not use the shorter and suggested m_Zero() matcher? It probably doesn't…
		zjaffalAuthorUnsubmitted Done Reply Inline Actions I didn't know it existed. I will push a change now zjaffal: I didn't know it existed. I will push a change now
		spatelUnsubmitted Not Done Reply Inline Actions Remove braces around one-line 'if' clause to be consistent with surrounding code. spatel: Remove braces around one-line 'if' clause to be consistent with surrounding code.
Value X, Y;		Value X, Y;
Constant *C;		Constant *C;
if (match(Op0, m_FNeg(m_Value(X))) && match(Op1, m_Constant(C)))		if (match(Op0, m_FNeg(m_Value(X))) && match(Op1, m_Constant(C)))
if (Constant *NegC = ConstantFoldUnaryOpOperand(Instruction::FNeg, C, DL))		if (Constant *NegC = ConstantFoldUnaryOpOperand(Instruction::FNeg, C, DL))
return BinaryOperator::CreateFMulFMF(X, NegC, &I);		return BinaryOperator::CreateFMulFMF(X, NegC, &I);

// (select A, B, C) * (select A, D, E) --> select A, (BD), (CE)		// (select A, B, C) * (select A, D, E) --> select A, (BD), (CE)
if (Value *V = SimplifySelectsFeedingBinaryOp(I, Op0, Op1))		if (Value *V = SimplifySelectsFeedingBinaryOp(I, Op0, Op1))
▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	if (I.isFast()) {
}		}
if (Log2) {		if (Log2) {
Value *Log2 = Builder.CreateUnaryIntrinsic(Intrinsic::log2, X, &I);		Value *Log2 = Builder.CreateUnaryIntrinsic(Intrinsic::log2, X, &I);
Value *LogXTimesY = Builder.CreateFMulFMF(Log2, Y, &I);		Value *LogXTimesY = Builder.CreateFMulFMF(Log2, Y, &I);
return BinaryOperator::CreateFSubFMF(LogXTimesY, Y, &I);		return BinaryOperator::CreateFSubFMF(LogXTimesY, Y, &I);
}		}
}		}

		// Simplify FMUL recurrences starting with 0.0 to 0.0 if nnan and nsz are set.
		// Given a phi node with entry value as 0 and it used in fmul operation,
		// we can replace fmul with 0 safely and eleminate loop operation.
		PHINode *PN = nullptr;
		Value Start = nullptr, Step = nullptr;
		if (matchSimpleRecurrence(&I, PN, Start, Step) && I.hasNoNaNs() &&
		I.hasNoSignedZeros() && match(Start, m_Zero()))
		return replaceInstUsesWith(I, Start);

return nullptr;		return nullptr;
}		}

/// Fold a divide or remainder with a select instruction divisor when one of the		/// Fold a divide or remainder with a select instruction divisor when one of the
/// select operands is zero. In that case, we can use the other select operand		/// select operands is zero. In that case, we can use the other select operand
/// because div/rem by zero is undefined.		/// because div/rem by zero is undefined.
bool InstCombinerImpl::simplifyDivRemOfSelectWithZeroOp(BinaryOperator &I) {		bool InstCombinerImpl::simplifyDivRemOfSelectWithZeroOp(BinaryOperator &I) {
SelectInst *SI = dyn_cast<SelectInst>(I.getOperand(1));		SelectInst *SI = dyn_cast<SelectInst>(I.getOperand(1));
▲ Show 20 Lines • Show All 962 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/remove-loop-phi-fastmul.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=instcombine -S \| FileCheck %s		; RUN: opt < %s -passes=instcombine -S \| FileCheck %s
define double @test_mul_fast_flags(ptr %arr_d) {		define double @test_mul_fast_flags(ptr %arr_d) {
; CHECK-LABEL: @test_mul_fast_flags(		; CHECK-LABEL: @test_mul_fast_flags(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[I_02:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]		; CHECK-NEXT: [[I_02:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
; CHECK-NEXT: [[F_PROD_01:%.]] = phi double [ 0.000000e+00, [[ENTRY]] ], [ [[MUL:%.]], [[FOR_BODY]] ]
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [1000 x double], ptr [[ARR_D:%.]], i64 0, i64 [[I_02]]
; CHECK-NEXT: [[TMP0:%.*]] = load double, ptr [[ARRAYIDX]], align 8
; CHECK-NEXT: [[MUL]] = fmul fast double [[F_PROD_01]], [[TMP0]]
; CHECK-NEXT: [[INC]] = add i64 [[I_02]], 1		; CHECK-NEXT: [[INC]] = add i64 [[I_02]], 1
; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC]], 1000		; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC]], 1000
		fhahnUnsubmitted Done Reply Inline Actions It would be good to split off the test changes to a separate patch that just adds the tests. fhahn: It would be good to split off the test changes to a separate patch that just adds the tests.
; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[END:%.*]]		; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[END:%.*]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: ret double [[MUL]]		; CHECK-NEXT: ret double 0.000000e+00
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%i.02 = phi i64 [ 0, %entry ], [ %inc, %for.body ]		%i.02 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
%f_prod.01 = phi double [ 0.000000e+00, %entry ], [ %mul, %for.body ]		%f_prod.01 = phi double [ 0.000000e+00, %entry ], [ %mul, %for.body ]
		fhahnUnsubmitted Done Reply Inline Actions Would be good to use 'rotated' version of the loop to keep things simpler. fhahn: Would be good to use 'rotated' version of the loop to keep things simpler.
%arrayidx = getelementptr inbounds [1000 x double], ptr %arr_d, i64 0, i64 %i.02		%arrayidx = getelementptr inbounds [1000 x double], ptr %arr_d, i64 0, i64 %i.02
%0 = load double, ptr %arrayidx, align 8		%0 = load double, ptr %arrayidx, align 8
%mul = fmul fast double %f_prod.01, %0		%mul = fmul fast double %f_prod.01, %0
		spatelUnsubmitted Done Reply Inline Actions This test should have the minimal FMF needed to trigger the transform. spatel: This test should have the minimal FMF needed to trigger the transform.
%inc = add i64 %i.02, 1		%inc = add i64 %i.02, 1
%cmp = icmp ult i64 %inc, 1000		%cmp = icmp ult i64 %inc, 1000
br i1 %cmp, label %for.body, label %end		br i1 %cmp, label %for.body, label %end

end: ; preds = %for.body		end: ; preds = %for.body
%f_prod.0.lcssa = phi double [ %mul, %for.body ]		%f_prod.0.lcssa = phi double [ %mul, %for.body ]
ret double %f_prod.0.lcssa		ret double %f_prod.0.lcssa
}		}

define double @test_nsz_nnan_flags_enabled(ptr %arr_d) {		define double @test_nsz_nnan_flags_enabled(ptr %arr_d) {
; CHECK-LABEL: @test_nsz_nnan_flags_enabled(		; CHECK-LABEL: @test_nsz_nnan_flags_enabled(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:		; CHECK: for.body:
		fhahnUnsubmitted Done Reply Inline Actions We should also have tests where all the incoming values are instructions or arguments. We also need some tests with phis with more than 2 incoming values. fhahn: We should also have tests where all the incoming values are instructions or arguments. We also…
; CHECK-NEXT: [[I_02:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]		; CHECK-NEXT: [[I_02:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
; CHECK-NEXT: [[F_PROD_01:%.]] = phi double [ 0.000000e+00, [[ENTRY]] ], [ [[MUL:%.]], [[FOR_BODY]] ]
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [1000 x double], ptr [[ARR_D:%.]], i64 0, i64 [[I_02]]
; CHECK-NEXT: [[TMP0:%.*]] = load double, ptr [[ARRAYIDX]], align 8
; CHECK-NEXT: [[MUL]] = fmul nnan nsz double [[F_PROD_01]], [[TMP0]]
; CHECK-NEXT: [[INC]] = add i64 [[I_02]], 1		; CHECK-NEXT: [[INC]] = add i64 [[I_02]], 1
; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC]], 1000		; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INC]], 1000
; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[END:%.*]]		; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[END:%.*]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: ret double [[MUL]]		; CHECK-NEXT: ret double 0.000000e+00
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%i.02 = phi i64 [ 0, %entry ], [ %inc, %for.body ]		%i.02 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
%f_prod.01 = phi double [ 0.000000e+00, %entry ], [ %mul, %for.body ]		%f_prod.01 = phi double [ 0.000000e+00, %entry ], [ %mul, %for.body ]
%arrayidx = getelementptr inbounds [1000 x double], ptr %arr_d, i64 0, i64 %i.02		%arrayidx = getelementptr inbounds [1000 x double], ptr %arr_d, i64 0, i64 %i.02
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	for.body: ; preds = %entry, %for.body
%inc = add i64 %i.02, 1		%inc = add i64 %i.02, 1
%cmp = icmp ult i64 %inc, 1000		%cmp = icmp ult i64 %inc, 1000
br i1 %cmp, label %for.body, label %end		br i1 %cmp, label %for.body, label %end

end: ; preds = %for.body		end: ; preds = %for.body
%f_prod.0.lcssa = phi double [ %mul, %for.body ]		%f_prod.0.lcssa = phi double [ %mul, %for.body ]
ret double %f_prod.0.lcssa		ret double %f_prod.0.lcssa
}		}

define double @test_nsz_flag_enabled(ptr %arr_d) {		define double @test_nsz_flag_enabled(ptr %arr_d) {
		spatelUnsubmitted Done Reply Inline Actions I don't think this test is adding value. It just shows the expected X0.0 simplification to 0.0? All of the tests should be reduced to the minimum IR needed to show the transform. Looking at existing tests in the file "recurrence.ll" might be useful. If you want to verify that a larger example reduces with the usual opt pipeline, it would be better to add a test under "test/Transforms/PhaseOrdering". spatel:* I don't think this test is adding value. It just shows the expected X*0.0 simplification to 0.0?
; CHECK-LABEL: @test_nsz_flag_enabled(		; CHECK-LABEL: @test_nsz_flag_enabled(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[I_02:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]		; CHECK-NEXT: [[I_02:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
; CHECK-NEXT: [[F_PROD_01:%.]] = phi double [ 0.000000e+00, [[ENTRY]] ], [ [[MUL:%.]], [[FOR_BODY]] ]		; CHECK-NEXT: [[F_PROD_01:%.]] = phi double [ 0.000000e+00, [[ENTRY]] ], [ [[MUL:%.]], [[FOR_BODY]] ]
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [1000 x double], ptr [[ARR_D:%.]], i64 0, i64 [[I_02]]		; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [1000 x double], ptr [[ARR_D:%.]], i64 0, i64 [[I_02]]
; CHECK-NEXT: [[TMP0:%.*]] = load double, ptr [[ARRAYIDX]], align 8		; CHECK-NEXT: [[TMP0:%.*]] = load double, ptr [[ARRAYIDX]], align 8
▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[instcombine] Optimise for zero initialisation of product given fast flags are enabled
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 453247

llvm/lib/Analysis/ValueTracking.cpp

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

llvm/test/Transforms/InstCombine/remove-loop-phi-fastmul.ll

This is an archive of the discontinued LLVM Phabricator instance.

[instcombine] Optimise for zero initialisation of product given fast flags are enabledClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 453247

llvm/lib/Analysis/ValueTracking.cpp

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

llvm/test/Transforms/InstCombine/remove-loop-phi-fastmul.ll

[instcombine] Optimise for zero initialisation of product given fast flags are enabled
ClosedPublic