Download Raw Diff

Details

Reviewers

RKSimon
spatel
efriedma
hfinkel
• dberlin

Commits

rGec95c6cc0a2e: [InstCombine] PR35354: Convert store(bitcast, load bitcast (select (Cond, &V1…
rL320157: [InstCombine] PR35354: Convert store(bitcast, load bitcast (select (Cond, &V1…

Summary

If we have the code like this:

float a, b;
a = std::max(a ,b);

it is converted into something like this:

%call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float* nonnull dereferenceable(4) %a.addr, float* nonnull dereferenceable(4) %b.addr)
%1 = bitcast float* %call to i32*
%2 = load i32, i32* %1, align 4
%3 = bitcast float* %a.addr to i32*
store i32 %2, i32* %3, align 4

After inlinning this code is converted to the next:

%1 = load float, float* %a.addr
%2 = load float, float* %b.addr
%cmp.i = fcmp fast olt float %1, %2
%__b.__a.i = select i1 %cmp.i, float* %a.addr, float* %b.addr
%3 = bitcast float* %__b.__a.i to i32*
%4 = load i32, i32* %3, align 4
%5 = bitcast float* %arrayidx to i32*
store i32 %4, i32* %5, align 4

This pattern is not recognized as minmax pattern.
Patch solves this problem by converting sequence

load bitcast (select (Cond, &V1, &V2))

to a sequence

select(Cond, load bitcast &V1, load bitcast &V2)

After this the code is recognized as minmax pattern.

Diff Detail

Repository: rL LLVM

Event Timeline

ABataev created this revision.Nov 21 2017, 7:06 AM

Harbormaster completed remote builds in B12367: Diff 123785.Nov 21 2017, 7:06 AM

I’m not at a dev machine so can’t review this currently, but is this the same as
https://bugs.llvm.org/show_bug.cgi?id=34603 ?

In D40304#931720, @spatel wrote:

I’m not at a dev machine so can’t review this currently, but is this the same as
https://bugs.llvm.org/show_bug.cgi?id=34603 ?

No, it is a bit different. We don't have the GEP from select here, we have load (bitcast float* to i32* (select)) because of InstCombiner.
I convert this to select (load (bitcast float* to i32 *), (load (bitcast float* to i32*))

Added check that bitcasted address has only one use.

Harbormaster completed remote builds in B12371: Diff 123809.Nov 21 2017, 8:05 AM

This patch is building on a transform that is already suspect (see discussion starting at https://bugs.llvm.org/show_bug.cgi?id=34603#c6 ).

In conjunction with the existing transform, it increases the instruction count from 3 to 5 for this example:

define i32 @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %sel = select i1 %cond, float* %addr1, float* %addr2
  %bc1 = bitcast float* %sel to i32*
  %ld = load i32, i32* %bc1
  ret i32 %ld
}

$ ./opt -instcombine loadsel.ll -S

define i32 @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %addr1.cast = bitcast float* %addr1 to i32*
  %addr2.cast = bitcast float* %addr2 to i32*
  %addr1.cast.val = load i32, i32* %addr1.cast, align 4
  %addr2.cast.val = load i32, i32* %addr2.cast, align 4
  %ld = select i1 %cond, i32 %addr1.cast.val, i32 %addr2.cast.val
  ret i32 %ld
}

Can you provide a reduced C++ source example for how we got to this IR? I couldn't repro with a simple case, so I must be missing some part of it.

Also, there might be something in common with:
https://bugs.llvm.org/show_bug.cgi?id=35354
or
https://bugs.llvm.org/show_bug.cgi?id=35284
?

In D40304#936480, @spatel wrote:
This patch is building on a transform that is already suspect (see discussion starting at https://bugs.llvm.org/show_bug.cgi?id=34603#c6 ).

In conjunction with the existing transform, it increases the instruction count from 3 to 5 for this example:
define i32 @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %sel = select i1 %cond, float* %addr1, float* %addr2
  %bc1 = bitcast float* %sel to i32*
  %ld = load i32, i32* %bc1
  ret i32 %ld
}
$ ./opt -instcombine loadsel.ll -S
define i32 @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %addr1.cast = bitcast float* %addr1 to i32*
  %addr2.cast = bitcast float* %addr2 to i32*
  %addr1.cast.val = load i32, i32* %addr1.cast, align 4
  %addr2.cast.val = load i32, i32* %addr2.cast, align 4
  %ld = select i1 %cond, i32 %addr1.cast.val, i32 %addr2.cast.val
  ret i32 %ld
}
Can you provide a reduced C++ source example for how we got to this IR? I couldn't repro with a simple case, so I must be missing some part of it.

Also, there might be something in common with:
https://bugs.llvm.org/show_bug.cgi?id=35354
or
https://bugs.llvm.org/show_bug.cgi?id=35284
?

This patch fixes PR35354, the first one from your list. I described everything aready. We got extra bitcast from the canonicalization of load/store sequence before function inlining, but this canonialization breaks minmax pattern recognition after the inlining o а min/max functions.

So, what Sanjoy has pointed out is that canonicalization like this to select, breaks the ability to remove redundant loads and stores and do PRE, compared to the equivalent control flow version, in general.

While it's probably true that
IE your canonicalization of:
load bitcast (select (Cond, &V1, &V2))
to
select(Cond, load bitcast &V1, load bitcast &V2)

is not likely to break this more, and will make things a little better, all of our redundancy elimination passes and knowledge propagation passes would be happier with the control flow non-select version.

This is not possible to fix in those passes sanely (you'd need a fake cfg with fake instructions and fake operands to simulate the control flow).
Otherwise, the only way around this is to not canonicalize to select this way that early.

In D40304#936522, @dberlin wrote:

Otherwise, the only way around this is to not canonicalize to select this way that early.

And for reference, I have a patch towards that - D38566.

There may be other fixes needed to solve PR35354 (sorry for missing that in the title!)

In D40304#936527, @spatel wrote:

In D40304#936522, @dberlin wrote:

Otherwise, the only way around this is to not canonicalize to select this way that early.

And for reference, I have a patch towards that - D38566.

There may be other fixes needed to solve PR35354 (sorry for missing that in the title!)

Your patch fixes SimplifyCFG pass, but canonicalization is occurred in InstCombiner (see combineLoadToOperationType function)

spatel mentioned this in D38566: [SimplifyCFG] don't sink common insts too soon (PR34603).Nov 27 2017, 12:51 PM

In D40304#936480, @spatel wrote:
This patch is building on a transform that is already suspect (see discussion starting at https://bugs.llvm.org/show_bug.cgi?id=34603#c6 ).

In conjunction with the existing transform, it increases the instruction count from 3 to 5 for this example:
define i32 @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %sel = select i1 %cond, float* %addr1, float* %addr2
  %bc1 = bitcast float* %sel to i32*
  %ld = load i32, i32* %bc1
  ret i32 %ld
}
$ ./opt -instcombine loadsel.ll -S
define i32 @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %addr1.cast = bitcast float* %addr1 to i32*
  %addr2.cast = bitcast float* %addr2 to i32*
  %addr1.cast.val = load i32, i32* %addr1.cast, align 4
  %addr2.cast.val = load i32, i32* %addr2.cast, align 4
  %ld = select i1 %cond, i32 %addr1.cast.val, i32 %addr2.cast.val
  ret i32 %ld
}
Can you provide a reduced C++ source example for how we got to this IR? I couldn't repro with a simple case, so I must be missing some part of it.

Also, there might be something in common with:
https://bugs.llvm.org/show_bug.cgi?id=35354
or
https://bugs.llvm.org/show_bug.cgi?id=35284
?

The original transformation load (select (Cond, &V1, &V2)) --> select(Cond, load &V1, load &V2) also increases number of instructions from 2 to 3

define float @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %sel = select i1 %cond, float* %addr1, float* %addr2
  %ld = load float, float* %sel
  ret float %ld
}

After the existing transformation we have:

define float @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %addr1.val = load float, float* %addr1, align 4
  %addr2.val = load float, float* %addr2, align 4
  %ld = select i1 %cond, float %addr1.val, float %addr2.val
  ret float %ld
}

In D40304#937624, @ABataev wrote:

The original transformation load (select (Cond, &V1, &V2)) --> select(Cond, load &V1, load &V2) also increases number of instructions from 2 to 3

Yes - that's why I said the existing transform is suspect. There are 2 InstCombine problems here as I think you've noted:

The transform that hoists loads ahead of select.
The transform that adds bitcasts around FP loads ("Try to canonicalize loads which are only ever stored to operate over integers instead of any other type.")

I've looked at this a bit closer now, and SimplifyCFG really wants to create a select for this:

define float* @mymax(float* dereferenceable(4) %__a, float* dereferenceable(4) %__b) {
entry:
  %__comp = alloca %"struct.std::__1::__less", align 1
  %call = call zeroext i1 @less(%"struct.std::__1::__less"* nonnull %__comp, float* nonnull dereferenceable(4) %__a, float* nonnull dereferenceable(4) %
__b)
  br i1 %call, label %cond.true, label %cond.false

cond.true:                                        ; preds = %entry
  br label %cond.end

cond.false:                                       ; preds = %entry
  br label %cond.end

cond.end:                                         ; preds = %cond.false, %cond.true
  %cond-lvalue = phi float* [ %__b, %cond.true ], [ %__a, %cond.false ]
  ret float* %cond-lvalue
}

For this example, a select will be created by FoldTwoEntryPHINode(). If I stub that out, then a select will still be created in HoistThenElseCodeToIf(). If I stub that out, then a select will still be created in SpeculativelyExecuteBB(). So we would have to disable much more of SimplifyCFG to avoid creating the select. But even that is not enough - the bitcasts are interfering with further optimization.

If the consensus is that this instcombine bitcast transform is valid:

target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
define void @store_bitcasted_load2(float* %loadaddr, float* %storeaddr) {
  %ld = load float, float* %loadaddr
  store float %ld, float* %storeaddr
  ret void
}

$ ./opt -instcombine -S bitcastload.ll

define void @store_bitcasted_load2(float* %loadaddr, float* %storeaddr) {
  %1 = bitcast float* %loadaddr to i32*
  %ld1 = load i32, i32* %1, align 4
  %2 = bitcast float* %storeaddr to i32*
  store i32 %ld1, i32* %2, align 4
  ret void
}

...then I think we have to account for that here and look through the bitcasts (as this patch is proposing). We could make this patch not create more instructions than it removes by starting the pattern match at the store instruction rather than the load?

We could make this patch not create more instructions than it removes by starting the pattern match at the store instruction rather than the load?

Yes, I already thought about it. Will try rework it by starting pattern matching from the store.

Make the conversion iff the load is part of load/store canonicalization conversion.

For reference, here are the bitcast-adding commit and discussion (cc @chandlerc ):
https://reviews.llvm.org/rL226781
http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html

Nobody was sure what regressions would be caused by that change...looks like we found one. :)

I'm still concerned that we're building on a fold that might get removed. Let me propose an alternate idea:

Fold the bitcasts out of the 5 instruction sequence starting from visitStoreInst(): store (bitcast (load (bitcast (select ...))))
Create an exception to the bitcasting canonicalization for loads fed by a select, so we don't infinite-loop.

So I think we should be able to do this fold without relying on the fold that moves loads above select, so the test case is minimized to:

define void @bitcasted_store(i1 %cond, float* %loadaddr1, float* %loadaddr2, float* %storeaddr) {
  %sel = select i1 %cond, float* %loadaddr1, float* %loadaddr2
  %int_load_addr = bitcast float* %sel to i32*
  %ld = load i32, i32* %int_load_addr
  %int_store_addr = bitcast float* %storeaddr to i32*
  store i32 %ld, i32* %int_store_addr
  ret void
}

Update after review.

spatel added inline comments.Nov 30 2017, 11:51 AM

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
566 ↗	(On Diff #124948)	Why do we need to match min/max? Does something break if we just match select? If we need to restrict to min/max, this isn't the way to do it. Use pattern matches (m_c_SMax, etc) or value tracking's matchSelectPattern().
1333 ↗	(On Diff #124948)	It doesn't make sense to call this "decanonicalize". Whatever we choose to do here is redefining canonical. "removeBitcastsFromLoadStoreOnSelect"?

ABataev added inline comments.Dec 1 2017, 7:43 AM

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
566 ↗	(On Diff #124948)	Currently, we have troubles only with particular minmax pattern, not all select instructions. I think, at first we should do this for minmax. If it is required for other select-based patterns, we can extend this patch or remove it. Ok, will do it via pattern matchers. Though I can use m_c_<Matcher> directly, because we have a slightly different minmax pattern (comparing of values, but return addresses of this values, not values themselves).
1333 ↗	(On Diff #124948)	Ok, will rename it.

Update after review.

If we require matching the loads + cmp ahead of the select, then this should be the minimal test case:

define void @bitcasted_minmax_with_select_of_pointers(float* %loadaddr1, float* %loadaddr2, float* %storeaddr) {
  %ld1 = load float, float* %loadaddr1, align 4
  %ld2 = load float, float* %loadaddr2, align 4
  %cond = fcmp ogt float %ld1, %ld2
  %sel = select i1 %cond, float* %loadaddr1, float* %loadaddr2
  %int_load_addr = bitcast float* %sel to i32*
  %ld = load i32, i32* %int_load_addr, align 4
  %int_store_addr = bitcast float* %storeaddr to i32*
  store i32 %ld, i32* %int_store_addr, align 4
  ret void
}

...but I'm not getting the transform to fire. Something may be wrong with the pattern matching's use of m_Specific()?

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
567–568 ↗	(On Diff #125146)	Could make this generally available by putting it into PatternMatch.h?
591–592 ↗	(On Diff #125146)	I don't think we need to loop here. Use peekThroughBitcast() instead.

Update after review.

Harbormaster completed remote builds in B12756: Diff 125551.Dec 5 2017, 9:21 AM

I don't see a better way to avoid the problem, so LGTM. See inline for some small issues. Wait a day to commit in case anyone else has ideas.

include/llvm/IR/PatternMatch.h
960–979 ↗	(On Diff #125551)	Please commit this part as an NFC preliminary step.
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
568 ↗	(On Diff #125551)	I still can't read this without thinking it's a duplicate of something in matchSelectPattern(). Include "load" in the name? "isMinMaxWithLoads()"?
1377 ↗	(On Diff #125551)	Remove comment - the description above the function body is enough.

This revision is now accepted and ready to land.Dec 6 2017, 3:07 PM

Closed by commit rL320157: [InstCombine] PR35354: Convert store(bitcast, load bitcast (select (Cond, &V1… (authored by ABataev). · Explain WhyDec 8 2017, 7:32 AM

This revision was automatically updated to reflect the committed changes.

Diff 126153

llvm/trunk/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

Show All 16 Lines
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/Loads.h"		#include "llvm/Analysis/Loads.h"
#include "llvm/IR/ConstantRange.h"		#include "llvm/IR/ConstantRange.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DebugInfo.h"		#include "llvm/IR/DebugInfo.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/MDBuilder.h"		#include "llvm/IR/MDBuilder.h"
		#include "llvm/IR/PatternMatch.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
using namespace llvm;		using namespace llvm;
		using namespace PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

STATISTIC(NumDeadStore, "Number of dead stores eliminated");		STATISTIC(NumDeadStore, "Number of dead stores eliminated");
STATISTIC(NumGlobalCopies, "Number of allocas copied from constant global");		STATISTIC(NumGlobalCopies, "Number of allocas copied from constant global");

/// pointsToConstantGlobal - Return true if V (possibly indirectly) points to		/// pointsToConstantGlobal - Return true if V (possibly indirectly) points to
/// some part of a constant global variable. This intentionally only accepts		/// some part of a constant global variable. This intentionally only accepts
▲ Show 20 Lines • Show All 520 Lines • ▼ Show 20 Lines	case LLVMContext::MD_dereferenceable_or_null:
// These don't apply for stores.		// These don't apply for stores.
break;		break;
}		}
}		}

return NewStore;		return NewStore;
}		}

		/// Returns true if instruction represent minmax pattern like:
		/// select ((cmp load V1, load V2), V1, V2).
		static bool isMinMaxWithLoads(Value *V) {
		assert(V->getType()->isPointerTy() && "Expected pointer type.");
		// Ignore possible ty* to ixx* bitcast.
		V = peekThroughBitcast(V);
		// Check that select is select ((cmp load V1, load V2), V1, V2) - minmax
		// pattern.
		CmpInst::Predicate Pred;
		Instruction *L1;
		Instruction *L2;
		Value *LHS;
		Value *RHS;
		if (!match(V, m_Select(m_Cmp(Pred, m_Instruction(L1), m_Instruction(L2)),
		m_Value(LHS), m_Value(RHS))))
		return false;
		return (match(L1, m_Load(m_Specific(LHS))) &&
		match(L2, m_Load(m_Specific(RHS)))) \|\|
		(match(L1, m_Load(m_Specific(RHS))) &&
		match(L2, m_Load(m_Specific(LHS))));
		}

/// \brief Combine loads to match the type of their uses' value after looking		/// \brief Combine loads to match the type of their uses' value after looking
/// through intervening bitcasts.		/// through intervening bitcasts.
///		///
/// The core idea here is that if the result of a load is used in an operation,		/// The core idea here is that if the result of a load is used in an operation,
/// we should load the type most conducive to that operation. For example, when		/// we should load the type most conducive to that operation. For example, when
/// loading an integer and converting that immediately to a pointer, we should		/// loading an integer and converting that immediately to a pointer, we should
/// instead directly load a pointer.		/// instead directly load a pointer.
///		///
Show All 21 Lines	static Instruction *combineLoadToOperationType(InstCombiner &IC, LoadInst &LI) {

Type *Ty = LI.getType();		Type *Ty = LI.getType();
const DataLayout &DL = IC.getDataLayout();		const DataLayout &DL = IC.getDataLayout();

// Try to canonicalize loads which are only ever stored to operate over		// Try to canonicalize loads which are only ever stored to operate over
// integers instead of any other type. We only do this when the loaded type		// integers instead of any other type. We only do this when the loaded type
// is sized and has a size exactly the same as its store size and the store		// is sized and has a size exactly the same as its store size and the store
// size is a legal integer type.		// size is a legal integer type.
		// Do not perform canonicalization if minmax pattern is found (to avoid
		// infinite loop).
if (!Ty->isIntegerTy() && Ty->isSized() &&		if (!Ty->isIntegerTy() && Ty->isSized() &&
DL.isLegalInteger(DL.getTypeStoreSizeInBits(Ty)) &&		DL.isLegalInteger(DL.getTypeStoreSizeInBits(Ty)) &&
DL.getTypeStoreSizeInBits(Ty) == DL.getTypeSizeInBits(Ty) &&		DL.getTypeStoreSizeInBits(Ty) == DL.getTypeSizeInBits(Ty) &&
!DL.isNonIntegralPointerType(Ty)) {		!DL.isNonIntegralPointerType(Ty) &&
		!isMinMaxWithLoads(
		peekThroughBitcast(LI.getPointerOperand(), /OneUseOnly=/true))) {
if (all_of(LI.users(), [&LI](User *U) {		if (all_of(LI.users(), [&LI](User *U) {
auto *SI = dyn_cast<StoreInst>(U);		auto *SI = dyn_cast<StoreInst>(U);
return SI && SI->getPointerOperand() != &LI &&		return SI && SI->getPointerOperand() != &LI &&
!SI->getPointerOperand()->isSwiftError();		!SI->getPointerOperand()->isSwiftError();
})) {		})) {
LoadInst *NewLoad = combineLoadToNewType(		LoadInst *NewLoad = combineLoadToNewType(
IC, LI,		IC, LI,
Type::getIntNTy(LI.getContext(), DL.getTypeStoreSizeInBits(Ty)));		Type::getIntNTy(LI.getContext(), DL.getTypeStoreSizeInBits(Ty)));
▲ Show 20 Lines • Show All 680 Lines • ▼ Show 20 Lines	if (isa<BinaryOperator>(A) \|\|
if (Instruction *BI = dyn_cast<Instruction>(B))		if (Instruction *BI = dyn_cast<Instruction>(B))
if (cast<Instruction>(A)->isIdenticalToWhenDefined(BI))		if (cast<Instruction>(A)->isIdenticalToWhenDefined(BI))
return true;		return true;

// Otherwise they may not be equivalent.		// Otherwise they may not be equivalent.
return false;		return false;
}		}

		/// Converts store (bitcast (load (bitcast (select ...)))) to
		/// store (load (select ...)), where select is minmax:
		/// select ((cmp load V1, load V2), V1, V2).
		bool removeBitcastsFromLoadStoreOnMinMax(InstCombiner &IC, StoreInst &SI) {
		// bitcast?
		Value *StoreAddr;
		if (!match(SI.getPointerOperand(), m_BitCast(m_Value(StoreAddr))))
		return false;
		// load? integer?
		Value *LoadAddr;
		if (!match(SI.getValueOperand(), m_Load(m_BitCast(m_Value(LoadAddr)))))
		return false;
		auto *LI = cast<LoadInst>(SI.getValueOperand());
		if (!LI->getType()->isIntegerTy())
		return false;
		if (!isMinMaxWithLoads(LoadAddr))
		return false;

		LoadInst *NewLI = combineLoadToNewType(
		IC, *LI, LoadAddr->getType()->getPointerElementType());
		combineStoreToNewValue(IC, SI, NewLI);
		return true;
		}

Instruction *InstCombiner::visitStoreInst(StoreInst &SI) {		Instruction *InstCombiner::visitStoreInst(StoreInst &SI) {
Value *Val = SI.getOperand(0);		Value *Val = SI.getOperand(0);
Value *Ptr = SI.getOperand(1);		Value *Ptr = SI.getOperand(1);

// Try to canonicalize the stored type.		// Try to canonicalize the stored type.
if (combineStoreToValueType(*this, SI))		if (combineStoreToValueType(*this, SI))
return eraseInstFromFunction(SI);		return eraseInstFromFunction(SI);

// Attempt to improve the alignment.		// Attempt to improve the alignment.
unsigned KnownAlign = getOrEnforceKnownAlignment(		unsigned KnownAlign = getOrEnforceKnownAlignment(
Ptr, DL.getPrefTypeAlignment(Val->getType()), DL, &SI, &AC, &DT);		Ptr, DL.getPrefTypeAlignment(Val->getType()), DL, &SI, &AC, &DT);
unsigned StoreAlign = SI.getAlignment();		unsigned StoreAlign = SI.getAlignment();
unsigned EffectiveStoreAlign =		unsigned EffectiveStoreAlign =
StoreAlign != 0 ? StoreAlign : DL.getABITypeAlignment(Val->getType());		StoreAlign != 0 ? StoreAlign : DL.getABITypeAlignment(Val->getType());

if (KnownAlign > EffectiveStoreAlign)		if (KnownAlign > EffectiveStoreAlign)
SI.setAlignment(KnownAlign);		SI.setAlignment(KnownAlign);
else if (StoreAlign == 0)		else if (StoreAlign == 0)
SI.setAlignment(EffectiveStoreAlign);		SI.setAlignment(EffectiveStoreAlign);

// Try to canonicalize the stored type.		// Try to canonicalize the stored type.
if (unpackStoreToAggregate(*this, SI))		if (unpackStoreToAggregate(*this, SI))
return eraseInstFromFunction(SI);		return eraseInstFromFunction(SI);

		if (removeBitcastsFromLoadStoreOnMinMax(*this, SI))
		return eraseInstFromFunction(SI);

// Replace GEP indices if possible.		// Replace GEP indices if possible.
if (Instruction NewGEPI = replaceGEPIdxWithZero(this, Ptr, SI)) {		if (Instruction NewGEPI = replaceGEPIdxWithZero(this, Ptr, SI)) {
Worklist.Add(NewGEPI);		Worklist.Add(NewGEPI);
return &SI;		return &SI;
}		}

// Don't hack volatile/ordered stores.		// Don't hack volatile/ordered stores.
// FIXME: Some bits are legal for ordered atomic stores; needs refactoring.		// FIXME: Some bits are legal for ordered atomic stores; needs refactoring.
▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/load-bitcast-select.ll

Show All 15 Lines
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[I_0]] to i64		; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[I_0]] to i64
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [1000 x float], [1000 x float] @a, i64 0, i64 [[TMP0]]		; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [1000 x float], [1000 x float] @a, i64 0, i64 [[TMP0]]
; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [1000 x float], [1000 x float] @b, i64 0, i64 [[TMP0]]		; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [1000 x float], [1000 x float] @b, i64 0, i64 [[TMP0]]
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[ARRAYIDX2]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load float, float [[ARRAYIDX2]], align 4
; CHECK-NEXT: [[CMP_I:%.*]] = fcmp fast olt float [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[CMP_I:%.*]] = fcmp fast olt float [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[__B___A_I:%.]] = select i1 [[CMP_I]], float [[ARRAYIDX2]], float* [[ARRAYIDX]]		; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[CMP_I]], float [[TMP2]], float [[TMP1]]
; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[__B___A_I]] to i32*		; CHECK-NEXT: store float [[TMP3]], float* [[ARRAYIDX]], align 4
; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4
; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[ARRAYIDX]] to i32*
; CHECK-NEXT: store i32 [[TMP4]], i32* [[TMP5]], align 4
; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_0]], 1		; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_0]], 1
; CHECK-NEXT: br label [[FOR_COND]]		; CHECK-NEXT: br label [[FOR_COND]]
;		;
entry:		entry:
br label %for.cond		br label %for.cond

for.cond: ; preds = %for.body, %entry		for.cond: ; preds = %for.body, %entry
%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]		%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

define void @bitcasted_minmax_with_select_of_pointers(float* %loadaddr1, float* %loadaddr2, float* %storeaddr) {		define void @bitcasted_minmax_with_select_of_pointers(float* %loadaddr1, float* %loadaddr2, float* %storeaddr) {
; CHECK-LABEL: @bitcasted_minmax_with_select_of_pointers(		; CHECK-LABEL: @bitcasted_minmax_with_select_of_pointers(
; CHECK-NEXT: [[LD1:%.]] = load float, float [[LOADADDR1:%.*]], align 4		; CHECK-NEXT: [[LD1:%.]] = load float, float [[LOADADDR1:%.*]], align 4
; CHECK-NEXT: [[LD2:%.]] = load float, float [[LOADADDR2:%.*]], align 4		; CHECK-NEXT: [[LD2:%.]] = load float, float [[LOADADDR2:%.*]], align 4
; CHECK-NEXT: [[COND:%.*]] = fcmp ogt float [[LD1]], [[LD2]]		; CHECK-NEXT: [[COND:%.*]] = fcmp ogt float [[LD1]], [[LD2]]
; CHECK-NEXT: [[SEL:%.]] = select i1 [[COND]], float [[LOADADDR1]], float* [[LOADADDR2]]		; CHECK-NEXT: [[LD3:%.*]] = select i1 [[COND]], float [[LD1]], float [[LD2]]
; CHECK-NEXT: [[INT_LOAD_ADDR:%.]] = bitcast float [[SEL]] to i32*		; CHECK-NEXT: store float [[LD3]], float* [[STOREADDR:%.*]], align 4
; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[INT_LOAD_ADDR]], align 4
; CHECK-NEXT: [[INT_STORE_ADDR:%.]] = bitcast float [[STOREADDR:%.]] to i32
; CHECK-NEXT: store i32 [[LD]], i32* [[INT_STORE_ADDR]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%ld1 = load float, float* %loadaddr1, align 4		%ld1 = load float, float* %loadaddr1, align 4
%ld2 = load float, float* %loadaddr2, align 4		%ld2 = load float, float* %loadaddr2, align 4
%cond = fcmp ogt float %ld1, %ld2		%cond = fcmp ogt float %ld1, %ld2
%sel = select i1 %cond, float* %loadaddr1, float* %loadaddr2		%sel = select i1 %cond, float* %loadaddr1, float* %loadaddr2
%int_load_addr = bitcast float* %sel to i32*		%int_load_addr = bitcast float* %sel to i32*
%ld = load i32, i32* %int_load_addr, align 4		%ld = load i32, i32* %int_load_addr, align 4
%int_store_addr = bitcast float* %storeaddr to i32*		%int_store_addr = bitcast float* %storeaddr to i32*
store i32 %ld, i32* %int_store_addr, align 4		store i32 %ld, i32* %int_store_addr, align 4
ret void		ret void
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] PR35354: Convert load bitcast (select (Cond, &V1, &V2)) --> select(Cond, load bitcast &V1, load bitcast &V2)
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 126153

llvm/trunk/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

llvm/trunk/test/Transforms/InstCombine/load-bitcast-select.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] PR35354: Convert load bitcast (select (Cond, &V1, &V2)) --> select(Cond, load bitcast &V1, load bitcast &V2)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 126153

llvm/trunk/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

llvm/trunk/test/Transforms/InstCombine/load-bitcast-select.ll

[InstCombine] PR35354: Convert load bitcast (select (Cond, &V1, &V2)) --> select(Cond, load bitcast &V1, load bitcast &V2)
ClosedPublic