This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
8
InstCombineLoadStoreAlloca.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
load-bitcast-select.ll

Differential D40304

[InstCombine] PR35354: Convert load bitcast (select (Cond, &V1, &V2)) --> select(Cond, load bitcast &V1, load bitcast &V2)
ClosedPublic

Authored by ABataev on Nov 21 2017, 7:06 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
efriedma
hfinkel
• dberlin

Commits

rGec95c6cc0a2e: [InstCombine] PR35354: Convert store(bitcast, load bitcast (select (Cond, &V1…
rL320157: [InstCombine] PR35354: Convert store(bitcast, load bitcast (select (Cond, &V1…

Summary

If we have the code like this:

float a, b;
a = std::max(a ,b);

it is converted into something like this:

%call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float* nonnull dereferenceable(4) %a.addr, float* nonnull dereferenceable(4) %b.addr)
%1 = bitcast float* %call to i32*
%2 = load i32, i32* %1, align 4
%3 = bitcast float* %a.addr to i32*
store i32 %2, i32* %3, align 4

After inlinning this code is converted to the next:

%1 = load float, float* %a.addr
%2 = load float, float* %b.addr
%cmp.i = fcmp fast olt float %1, %2
%__b.__a.i = select i1 %cmp.i, float* %a.addr, float* %b.addr
%3 = bitcast float* %__b.__a.i to i32*
%4 = load i32, i32* %3, align 4
%5 = bitcast float* %arrayidx to i32*
store i32 %4, i32* %5, align 4

This pattern is not recognized as minmax pattern.
Patch solves this problem by converting sequence

load bitcast (select (Cond, &V1, &V2))

to a sequence

select(Cond, load bitcast &V1, load bitcast &V2)

After this the code is recognized as minmax pattern.

Diff Detail

Build Status

Buildable 12367
Build 12367: arc lint + arc unit

Event Timeline

ABataev created this revision.Nov 21 2017, 7:06 AM

Harbormaster completed remote builds in B12367: Diff 123785.Nov 21 2017, 7:06 AM

I’m not at a dev machine so can’t review this currently, but is this the same as
https://bugs.llvm.org/show_bug.cgi?id=34603 ?

In D40304#931720, @spatel wrote:

I’m not at a dev machine so can’t review this currently, but is this the same as
https://bugs.llvm.org/show_bug.cgi?id=34603 ?

No, it is a bit different. We don't have the GEP from select here, we have load (bitcast float* to i32* (select)) because of InstCombiner.
I convert this to select (load (bitcast float* to i32 *), (load (bitcast float* to i32*))

Added check that bitcasted address has only one use.

Harbormaster completed remote builds in B12371: Diff 123809.Nov 21 2017, 8:05 AM

This patch is building on a transform that is already suspect (see discussion starting at https://bugs.llvm.org/show_bug.cgi?id=34603#c6 ).

In conjunction with the existing transform, it increases the instruction count from 3 to 5 for this example:

define i32 @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %sel = select i1 %cond, float* %addr1, float* %addr2
  %bc1 = bitcast float* %sel to i32*
  %ld = load i32, i32* %bc1
  ret i32 %ld
}

$ ./opt -instcombine loadsel.ll -S

define i32 @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %addr1.cast = bitcast float* %addr1 to i32*
  %addr2.cast = bitcast float* %addr2 to i32*
  %addr1.cast.val = load i32, i32* %addr1.cast, align 4
  %addr2.cast.val = load i32, i32* %addr2.cast, align 4
  %ld = select i1 %cond, i32 %addr1.cast.val, i32 %addr2.cast.val
  ret i32 %ld
}

Can you provide a reduced C++ source example for how we got to this IR? I couldn't repro with a simple case, so I must be missing some part of it.

Also, there might be something in common with:
https://bugs.llvm.org/show_bug.cgi?id=35354
or
https://bugs.llvm.org/show_bug.cgi?id=35284
?

In D40304#936480, @spatel wrote:
This patch is building on a transform that is already suspect (see discussion starting at https://bugs.llvm.org/show_bug.cgi?id=34603#c6 ).

In conjunction with the existing transform, it increases the instruction count from 3 to 5 for this example:
define i32 @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %sel = select i1 %cond, float* %addr1, float* %addr2
  %bc1 = bitcast float* %sel to i32*
  %ld = load i32, i32* %bc1
  ret i32 %ld
}
$ ./opt -instcombine loadsel.ll -S
define i32 @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %addr1.cast = bitcast float* %addr1 to i32*
  %addr2.cast = bitcast float* %addr2 to i32*
  %addr1.cast.val = load i32, i32* %addr1.cast, align 4
  %addr2.cast.val = load i32, i32* %addr2.cast, align 4
  %ld = select i1 %cond, i32 %addr1.cast.val, i32 %addr2.cast.val
  ret i32 %ld
}
Can you provide a reduced C++ source example for how we got to this IR? I couldn't repro with a simple case, so I must be missing some part of it.

Also, there might be something in common with:
https://bugs.llvm.org/show_bug.cgi?id=35354
or
https://bugs.llvm.org/show_bug.cgi?id=35284
?

This patch fixes PR35354, the first one from your list. I described everything aready. We got extra bitcast from the canonicalization of load/store sequence before function inlining, but this canonialization breaks minmax pattern recognition after the inlining o а min/max functions.

So, what Sanjoy has pointed out is that canonicalization like this to select, breaks the ability to remove redundant loads and stores and do PRE, compared to the equivalent control flow version, in general.

While it's probably true that
IE your canonicalization of:
load bitcast (select (Cond, &V1, &V2))
to
select(Cond, load bitcast &V1, load bitcast &V2)

is not likely to break this more, and will make things a little better, all of our redundancy elimination passes and knowledge propagation passes would be happier with the control flow non-select version.

This is not possible to fix in those passes sanely (you'd need a fake cfg with fake instructions and fake operands to simulate the control flow).
Otherwise, the only way around this is to not canonicalize to select this way that early.

In D40304#936522, @dberlin wrote:

Otherwise, the only way around this is to not canonicalize to select this way that early.

And for reference, I have a patch towards that - D38566.

There may be other fixes needed to solve PR35354 (sorry for missing that in the title!)

In D40304#936527, @spatel wrote:

In D40304#936522, @dberlin wrote:

Otherwise, the only way around this is to not canonicalize to select this way that early.

And for reference, I have a patch towards that - D38566.

There may be other fixes needed to solve PR35354 (sorry for missing that in the title!)

Your patch fixes SimplifyCFG pass, but canonicalization is occurred in InstCombiner (see combineLoadToOperationType function)

spatel mentioned this in D38566: [SimplifyCFG] don't sink common insts too soon (PR34603).Nov 27 2017, 12:51 PM

In D40304#936480, @spatel wrote:
This patch is building on a transform that is already suspect (see discussion starting at https://bugs.llvm.org/show_bug.cgi?id=34603#c6 ).

In conjunction with the existing transform, it increases the instruction count from 3 to 5 for this example:
define i32 @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %sel = select i1 %cond, float* %addr1, float* %addr2
  %bc1 = bitcast float* %sel to i32*
  %ld = load i32, i32* %bc1
  ret i32 %ld
}
$ ./opt -instcombine loadsel.ll -S
define i32 @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %addr1.cast = bitcast float* %addr1 to i32*
  %addr2.cast = bitcast float* %addr2 to i32*
  %addr1.cast.val = load i32, i32* %addr1.cast, align 4
  %addr2.cast.val = load i32, i32* %addr2.cast, align 4
  %ld = select i1 %cond, i32 %addr1.cast.val, i32 %addr2.cast.val
  ret i32 %ld
}
Can you provide a reduced C++ source example for how we got to this IR? I couldn't repro with a simple case, so I must be missing some part of it.

Also, there might be something in common with:
https://bugs.llvm.org/show_bug.cgi?id=35354
or
https://bugs.llvm.org/show_bug.cgi?id=35284
?

The original transformation load (select (Cond, &V1, &V2)) --> select(Cond, load &V1, load &V2) also increases number of instructions from 2 to 3

define float @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %sel = select i1 %cond, float* %addr1, float* %addr2
  %ld = load float, float* %sel
  ret float %ld
}

After the existing transformation we have:

define float @store_bitcasted_load(i1 %cond, float* dereferenceable(4) %addr1, float* dereferenceable(4) %addr2) {
  %addr1.val = load float, float* %addr1, align 4
  %addr2.val = load float, float* %addr2, align 4
  %ld = select i1 %cond, float %addr1.val, float %addr2.val
  ret float %ld
}

In D40304#937624, @ABataev wrote:

The original transformation load (select (Cond, &V1, &V2)) --> select(Cond, load &V1, load &V2) also increases number of instructions from 2 to 3

Yes - that's why I said the existing transform is suspect. There are 2 InstCombine problems here as I think you've noted:

The transform that hoists loads ahead of select.
The transform that adds bitcasts around FP loads ("Try to canonicalize loads which are only ever stored to operate over integers instead of any other type.")

I've looked at this a bit closer now, and SimplifyCFG really wants to create a select for this:

define float* @mymax(float* dereferenceable(4) %__a, float* dereferenceable(4) %__b) {
entry:
  %__comp = alloca %"struct.std::__1::__less", align 1
  %call = call zeroext i1 @less(%"struct.std::__1::__less"* nonnull %__comp, float* nonnull dereferenceable(4) %__a, float* nonnull dereferenceable(4) %
__b)
  br i1 %call, label %cond.true, label %cond.false

cond.true:                                        ; preds = %entry
  br label %cond.end

cond.false:                                       ; preds = %entry
  br label %cond.end

cond.end:                                         ; preds = %cond.false, %cond.true
  %cond-lvalue = phi float* [ %__b, %cond.true ], [ %__a, %cond.false ]
  ret float* %cond-lvalue
}

For this example, a select will be created by FoldTwoEntryPHINode(). If I stub that out, then a select will still be created in HoistThenElseCodeToIf(). If I stub that out, then a select will still be created in SpeculativelyExecuteBB(). So we would have to disable much more of SimplifyCFG to avoid creating the select. But even that is not enough - the bitcasts are interfering with further optimization.

If the consensus is that this instcombine bitcast transform is valid:

target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
define void @store_bitcasted_load2(float* %loadaddr, float* %storeaddr) {
  %ld = load float, float* %loadaddr
  store float %ld, float* %storeaddr
  ret void
}

$ ./opt -instcombine -S bitcastload.ll

define void @store_bitcasted_load2(float* %loadaddr, float* %storeaddr) {
  %1 = bitcast float* %loadaddr to i32*
  %ld1 = load i32, i32* %1, align 4
  %2 = bitcast float* %storeaddr to i32*
  store i32 %ld1, i32* %2, align 4
  ret void
}

...then I think we have to account for that here and look through the bitcasts (as this patch is proposing). We could make this patch not create more instructions than it removes by starting the pattern match at the store instruction rather than the load?

We could make this patch not create more instructions than it removes by starting the pattern match at the store instruction rather than the load?

Yes, I already thought about it. Will try rework it by starting pattern matching from the store.

Make the conversion iff the load is part of load/store canonicalization conversion.

For reference, here are the bitcast-adding commit and discussion (cc @chandlerc ):
https://reviews.llvm.org/rL226781
http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html

Nobody was sure what regressions would be caused by that change...looks like we found one. :)

I'm still concerned that we're building on a fold that might get removed. Let me propose an alternate idea:

Fold the bitcasts out of the 5 instruction sequence starting from visitStoreInst(): store (bitcast (load (bitcast (select ...))))
Create an exception to the bitcasting canonicalization for loads fed by a select, so we don't infinite-loop.

So I think we should be able to do this fold without relying on the fold that moves loads above select, so the test case is minimized to:

define void @bitcasted_store(i1 %cond, float* %loadaddr1, float* %loadaddr2, float* %storeaddr) {
  %sel = select i1 %cond, float* %loadaddr1, float* %loadaddr2
  %int_load_addr = bitcast float* %sel to i32*
  %ld = load i32, i32* %int_load_addr
  %int_store_addr = bitcast float* %storeaddr to i32*
  store i32 %ld, i32* %int_store_addr
  ret void
}

Update after review.

spatel added inline comments.Nov 30 2017, 11:51 AM

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
567	Why do we need to match min/max? Does something break if we just match select? If we need to restrict to min/max, this isn't the way to do it. Use pattern matches (m_c_SMax, etc) or value tracking's matchSelectPattern().
1331	It doesn't make sense to call this "decanonicalize". Whatever we choose to do here is redefining canonical. "removeBitcastsFromLoadStoreOnSelect"?

ABataev added inline comments.Dec 1 2017, 7:43 AM

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
567	Currently, we have troubles only with particular minmax pattern, not all select instructions. I think, at first we should do this for minmax. If it is required for other select-based patterns, we can extend this patch or remove it. Ok, will do it via pattern matchers. Though I can use m_c_<Matcher> directly, because we have a slightly different minmax pattern (comparing of values, but return addresses of this values, not values themselves).
1331	Ok, will rename it.

Update after review.

If we require matching the loads + cmp ahead of the select, then this should be the minimal test case:

define void @bitcasted_minmax_with_select_of_pointers(float* %loadaddr1, float* %loadaddr2, float* %storeaddr) {
  %ld1 = load float, float* %loadaddr1, align 4
  %ld2 = load float, float* %loadaddr2, align 4
  %cond = fcmp ogt float %ld1, %ld2
  %sel = select i1 %cond, float* %loadaddr1, float* %loadaddr2
  %int_load_addr = bitcast float* %sel to i32*
  %ld = load i32, i32* %int_load_addr, align 4
  %int_store_addr = bitcast float* %storeaddr to i32*
  store i32 %ld, i32* %int_store_addr, align 4
  ret void
}

...but I'm not getting the transform to fire. Something may be wrong with the pattern matching's use of m_Specific()?

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
566–567	Could make this generally available by putting it into PatternMatch.h?
590–591	I don't think we need to loop here. Use peekThroughBitcast() instead.

Update after review.

Harbormaster completed remote builds in B12756: Diff 125551.Dec 5 2017, 9:21 AM

I don't see a better way to avoid the problem, so LGTM. See inline for some small issues. Wait a day to commit in case anyone else has ideas.

include/llvm/IR/PatternMatch.h
960–979 ↗	(On Diff #125551)	Please commit this part as an NFC preliminary step.
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
567	I still can't read this without thinking it's a duplicate of something in matchSelectPattern(). Include "load" in the name? "isMinMaxWithLoads()"?
1352	Remove comment - the description above the function body is enough.

This revision is now accepted and ready to land.Dec 6 2017, 3:07 PM

Closed by commit rL320157: [InstCombine] PR35354: Convert store(bitcast, load bitcast (select (Cond, &V1… (authored by ABataev). · Explain WhyDec 8 2017, 7:32 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineLoadStoreAlloca.cpp

39 lines

test/

Transforms/

InstCombine/

load-bitcast-select.ll

7 lines

Diff 123785

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

Show All 18 Lines
#include "llvm/IR/ConstantRange.h"		#include "llvm/IR/ConstantRange.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DebugInfo.h"		#include "llvm/IR/DebugInfo.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/MDBuilder.h"		#include "llvm/IR/MDBuilder.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
		#include "llvm/Transforms/Utils/LoopUtils.h"
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

STATISTIC(NumDeadStore, "Number of dead stores eliminated");		STATISTIC(NumDeadStore, "Number of dead stores eliminated");
STATISTIC(NumGlobalCopies, "Number of allocas copied from constant global");		STATISTIC(NumGlobalCopies, "Number of allocas copied from constant global");

/// pointsToConstantGlobal - Return true if V (possibly indirectly) points to		/// pointsToConstantGlobal - Return true if V (possibly indirectly) points to
▲ Show 20 Lines • Show All 522 Lines • ▼ Show 20 Lines	case LLVMContext::MD_dereferenceable_or_null:
break;		break;
}		}
}		}

return NewStore;		return NewStore;
}		}

/// \brief Combine loads to match the type of their uses' value after looking		/// \brief Combine loads to match the type of their uses' value after looking
/// through intervening bitcasts.		/// through intervening bitcasts.
///		///
		spatelUnsubmitted Not Done Reply Inline Actions Why do we need to match min/max? Does something break if we just match select? If we need to restrict to min/max, this isn't the way to do it. Use pattern matches (m_c_SMax, etc) or value tracking's matchSelectPattern(). spatel: Why do we need to match min/max? Does something break if we just match select? If we need to…
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions Currently, we have troubles only with particular minmax pattern, not all select instructions. I think, at first we should do this for minmax. If it is required for other select-based patterns, we can extend this patch or remove it. Ok, will do it via pattern matchers. Though I can use m_c_<Matcher> directly, because we have a slightly different minmax pattern (comparing of values, but return addresses of this values, not values themselves). ABataev: 1. Currently, we have troubles only with particular minmax pattern, not all select instructions.
		spatelUnsubmitted Not Done Reply Inline Actions Could make this generally available by putting it into PatternMatch.h? spatel: Could make this generally available by putting it into PatternMatch.h?
		spatelUnsubmitted Not Done Reply Inline Actions I still can't read this without thinking it's a duplicate of something in matchSelectPattern(). Include "load" in the name? "isMinMaxWithLoads()"? spatel: I still can't read this without thinking it's a duplicate of something in matchSelectPattern().
/// The core idea here is that if the result of a load is used in an operation,		/// The core idea here is that if the result of a load is used in an operation,
/// we should load the type most conducive to that operation. For example, when		/// we should load the type most conducive to that operation. For example, when
/// loading an integer and converting that immediately to a pointer, we should		/// loading an integer and converting that immediately to a pointer, we should
/// instead directly load a pointer.		/// instead directly load a pointer.
///		///
/// However, this routine must never change the width of a load or the number of		/// However, this routine must never change the width of a load or the number of
/// loads as that would introduce a semantic change. This combine is expected to		/// loads as that would introduce a semantic change. This combine is expected to
/// be a semantic no-op which just allows loads to more closely model the types		/// be a semantic no-op which just allows loads to more closely model the types
/// of their consuming operations.		/// of their consuming operations.
///		///
/// Currently, we also refuse to change the precise type used for an atomic load		/// Currently, we also refuse to change the precise type used for an atomic load
/// or a volatile load. This is debatable, and might be reasonable to change		/// or a volatile load. This is debatable, and might be reasonable to change
/// later. However, it is risky in case some backend or other part of LLVM is		/// later. However, it is risky in case some backend or other part of LLVM is
/// relying on the exact type loaded to select appropriate atomic operations.		/// relying on the exact type loaded to select appropriate atomic operations.
static Instruction *combineLoadToOperationType(InstCombiner &IC, LoadInst &LI) {		static Instruction *combineLoadToOperationType(InstCombiner &IC, LoadInst &LI) {
// FIXME: We could probably with some care handle both volatile and ordered		// FIXME: We could probably with some care handle both volatile and ordered
// atomic loads here but it isn't clear that this is important.		// atomic loads here but it isn't clear that this is important.
if (!LI.isUnordered())		if (!LI.isUnordered())
return nullptr;		return nullptr;

if (LI.use_empty())		if (LI.use_empty())
return nullptr;		return nullptr;

// swifterror values can't be bitcasted.		// swifterror values can't be bitcasted.
		spatelUnsubmitted Not Done Reply Inline Actions I don't think we need to loop here. Use peekThroughBitcast() instead. spatel: I don't think we need to loop here. Use peekThroughBitcast() instead.
if (LI.getPointerOperand()->isSwiftError())		if (LI.getPointerOperand()->isSwiftError())
return nullptr;		return nullptr;

Type *Ty = LI.getType();		Type *Ty = LI.getType();
const DataLayout &DL = IC.getDataLayout();		const DataLayout &DL = IC.getDataLayout();

// Try to canonicalize loads which are only ever stored to operate over		// Try to canonicalize loads which are only ever stored to operate over
// integers instead of any other type. We only do this when the loaded type		// integers instead of any other type. We only do this when the loaded type
▲ Show 20 Lines • Show All 411 Lines • ▼ Show 20 Lines	if (Op->hasOneUse()) {
// exposes redundancy in the code.		// exposes redundancy in the code.
//		//
// Note that we cannot do the transformation unless we know that the		// Note that we cannot do the transformation unless we know that the
// introduced loads cannot trap! Something like this is valid as long as		// introduced loads cannot trap! Something like this is valid as long as
// the condition is always false: load (select bool %C, int* null, int* %G),		// the condition is always false: load (select bool %C, int* null, int* %G),
// but it would not be valid if we transformed it to load from null		// but it would not be valid if we transformed it to load from null
// unconditionally.		// unconditionally.
//		//
		// load bitcast (select (Cond, &V1, &V2)) --> select(Cond, load bitcast
		// &V1, load bitcast &V2).
		auto *BC = dyn_cast<BitCastInst>(Op);
		if (BC)
		Op = BC->getOperand(0);

if (SelectInst *SI = dyn_cast<SelectInst>(Op)) {		if (SelectInst *SI = dyn_cast<SelectInst>(Op)) {
// load (select (Cond, &V1, &V2)) --> select(Cond, load &V1, load &V2).		// load (select (Cond, &V1, &V2)) --> select(Cond, load &V1, load &V2).
unsigned Align = LI.getAlignment();		unsigned Align = LI.getAlignment();
if (isSafeToLoadUnconditionally(SI->getOperand(1), Align, DL, SI) &&		if (isSafeToLoadUnconditionally(SI->getOperand(1), Align, DL, SI) &&
isSafeToLoadUnconditionally(SI->getOperand(2), Align, DL, SI)) {		isSafeToLoadUnconditionally(SI->getOperand(2), Align, DL, SI)) {
LoadInst *V1 = Builder.CreateLoad(SI->getOperand(1),		Value *AddrLHS = SI->getOperand(1);
SI->getOperand(1)->getName()+".val");		Value *AddrRHS = SI->getOperand(2);
LoadInst *V2 = Builder.CreateLoad(SI->getOperand(2),		if (BC) {
SI->getOperand(2)->getName()+".val");		AddrLHS = Builder.CreateBitCast(AddrLHS, BC->getDestTy(),
		AddrLHS->getName() + ".cast");
		propagateIRFlags(AddrLHS, BC);
		AddrRHS = Builder.CreateBitCast(AddrRHS, BC->getDestTy(),
		AddrRHS->getName() + ".cast");
		propagateIRFlags(AddrRHS, BC);
		}
		LoadInst *V1 = Builder.CreateLoad(AddrLHS, AddrLHS->getName() + ".val");
		LoadInst *V2 = Builder.CreateLoad(AddrRHS, AddrRHS->getName() + ".val");
assert(LI.isUnordered() && "implied by above");		assert(LI.isUnordered() && "implied by above");
V1->setAlignment(Align);		V1->setAlignment(Align);
V1->setAtomic(LI.getOrdering(), LI.getSyncScopeID());		V1->setAtomic(LI.getOrdering(), LI.getSyncScopeID());
V2->setAlignment(Align);		V2->setAlignment(Align);
V2->setAtomic(LI.getOrdering(), LI.getSyncScopeID());		V2->setAtomic(LI.getOrdering(), LI.getSyncScopeID());
return SelectInst::Create(SI->getCondition(), V1, V2);		return SelectInst::Create(SI->getCondition(), V1, V2);
}		}

// load (select (cond, null, P)) -> load P		// load (select (cond, null, P)) -> load P
if (isa<ConstantPointerNull>(SI->getOperand(1)) &&		if (isa<ConstantPointerNull>(SI->getOperand(1)) &&
LI.getPointerAddressSpace() == 0) {		LI.getPointerAddressSpace() == 0) {
LI.setOperand(0, SI->getOperand(2));		Value *Operand = SI->getOperand(2);
		if (BC) {
		Operand = Builder.CreateBitCast(Operand, BC->getDestTy(),
		Operand->getName() + ".cast");
		propagateIRFlags(Operand, BC);
		}
		LI.setOperand(0, Operand);
return &LI;		return &LI;
}		}

// load (select (cond, P, null)) -> load P		// load (select (cond, P, null)) -> load P
if (isa<ConstantPointerNull>(SI->getOperand(2)) &&		if (isa<ConstantPointerNull>(SI->getOperand(2)) &&
LI.getPointerAddressSpace() == 0) {		LI.getPointerAddressSpace() == 0) {
LI.setOperand(0, SI->getOperand(1));		Value *Operand = SI->getOperand(1);
		if (BC) {
		Operand = Builder.CreateBitCast(Operand, BC->getDestTy(),
		Operand->getName() + ".cast");
		propagateIRFlags(Operand, BC);
		}
		LI.setOperand(0, Operand);
return &LI;		return &LI;
}		}
}		}
}		}
return nullptr;		return nullptr;
}		}

/// \brief Look for extractelement/insertvalue sequence that acts like a bitcast.		/// \brief Look for extractelement/insertvalue sequence that acts like a bitcast.
▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	static bool equivalentAddressValues(Value A, Value B) {

// Otherwise they may not be equivalent.		// Otherwise they may not be equivalent.
return false;		return false;
}		}

Instruction *InstCombiner::visitStoreInst(StoreInst &SI) {		Instruction *InstCombiner::visitStoreInst(StoreInst &SI) {
Value *Val = SI.getOperand(0);		Value *Val = SI.getOperand(0);
Value *Ptr = SI.getOperand(1);		Value *Ptr = SI.getOperand(1);

		spatelUnsubmitted Not Done Reply Inline Actions It doesn't make sense to call this "decanonicalize". Whatever we choose to do here is redefining canonical. "removeBitcastsFromLoadStoreOnSelect"? spatel: It doesn't make sense to call this "decanonicalize". Whatever we choose to do here is…
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions Ok, will rename it. ABataev: Ok, will rename it.
// Try to canonicalize the stored type.		// Try to canonicalize the stored type.
if (combineStoreToValueType(*this, SI))		if (combineStoreToValueType(*this, SI))
return eraseInstFromFunction(SI);		return eraseInstFromFunction(SI);

// Attempt to improve the alignment.		// Attempt to improve the alignment.
unsigned KnownAlign = getOrEnforceKnownAlignment(		unsigned KnownAlign = getOrEnforceKnownAlignment(
Ptr, DL.getPrefTypeAlignment(Val->getType()), DL, &SI, &AC, &DT);		Ptr, DL.getPrefTypeAlignment(Val->getType()), DL, &SI, &AC, &DT);
unsigned StoreAlign = SI.getAlignment();		unsigned StoreAlign = SI.getAlignment();
unsigned EffectiveStoreAlign =		unsigned EffectiveStoreAlign =
StoreAlign != 0 ? StoreAlign : DL.getABITypeAlignment(Val->getType());		StoreAlign != 0 ? StoreAlign : DL.getABITypeAlignment(Val->getType());

if (KnownAlign > EffectiveStoreAlign)		if (KnownAlign > EffectiveStoreAlign)
SI.setAlignment(KnownAlign);		SI.setAlignment(KnownAlign);
else if (StoreAlign == 0)		else if (StoreAlign == 0)
SI.setAlignment(EffectiveStoreAlign);		SI.setAlignment(EffectiveStoreAlign);

// Try to canonicalize the stored type.		// Try to canonicalize the stored type.
if (unpackStoreToAggregate(*this, SI))		if (unpackStoreToAggregate(*this, SI))
return eraseInstFromFunction(SI);		return eraseInstFromFunction(SI);

// Replace GEP indices if possible.		// Replace GEP indices if possible.
		spatelUnsubmitted Not Done Reply Inline Actions Remove comment - the description above the function body is enough. spatel: Remove comment - the description above the function body is enough.
if (Instruction NewGEPI = replaceGEPIdxWithZero(this, Ptr, SI)) {		if (Instruction NewGEPI = replaceGEPIdxWithZero(this, Ptr, SI)) {
Worklist.Add(NewGEPI);		Worklist.Add(NewGEPI);
return &SI;		return &SI;
}		}

// Don't hack volatile/ordered stores.		// Don't hack volatile/ordered stores.
// FIXME: Some bits are legal for ordered atomic stores; needs refactoring.		// FIXME: Some bits are legal for ordered atomic stores; needs refactoring.
if (!SI.isUnordered()) return nullptr;		if (!SI.isUnordered()) return nullptr;
▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

test/Transforms/InstCombine/load-bitcast-select.ll

	Show All 15 Lines
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[I_0]] to i64			; CHECK-NEXT: [[TMP0:%.*]] = zext i32 [[I_0]] to i64
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [1000 x float], [1000 x float] @a, i64 0, i64 [[TMP0]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [1000 x float], [1000 x float] @a, i64 0, i64 [[TMP0]]
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [1000 x float], [1000 x float] @b, i64 0, i64 [[TMP0]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds [1000 x float], [1000 x float] @b, i64 0, i64 [[TMP0]]
	; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = load float, float [[ARRAYIDX2]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load float, float [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[CMP_I:%.*]] = fcmp fast olt float [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[CMP_I:%.*]] = fcmp fast olt float [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[__B___A_I:%.]] = select i1 [[CMP_I]], float [[ARRAYIDX2]], float* [[ARRAYIDX]]			; CHECK-NEXT: [[DOTV:%.*]] = select i1 [[CMP_I]], float [[TMP2]], float [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[__B___A_I]] to i32*			; CHECK-NEXT: store float [[DOTV]], float* [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[ARRAYIDX]] to i32*
	; CHECK-NEXT: store i32 [[TMP4]], i32* [[TMP5]], align 4
	; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_0]], 1			; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_0]], 1
	; CHECK-NEXT: br label [[FOR_COND]]			; CHECK-NEXT: br label [[FOR_COND]]
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond: ; preds = %for.body, %entry			for.cond: ; preds = %for.body, %entry
	%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	Show All 21 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] PR35354: Convert load bitcast (select (Cond, &V1, &V2)) --> select(Cond, load bitcast &V1, load bitcast &V2)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 123785

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

test/Transforms/InstCombine/load-bitcast-select.ll

[InstCombine] PR35354: Convert load bitcast (select (Cond, &V1, &V2)) --> select(Cond, load bitcast &V1, load bitcast &V2)
ClosedPublic