Download Raw Diff

Details

Reviewers

nikic
reames
nlopes
bcahoon
arsenm

Commits

rG5d3412a6d17b: [InstCombine] Insert a bitcast to enable merging similar store insts

Summary

Given two Store instructions with equivalent pointer operands,
they could be merged into their common successor basic block if
the value operand of one is bitcasted to match the type of the
other. This patch only allows a bitcast between non-aggregate
primitive types with matching bitwidths.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

gandhi21299 created this revision.May 18 2023, 12:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2023, 12:17 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

gandhi21299 requested review of this revision.May 18 2023, 12:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2023, 12:17 PM

Herald added subscribers: llvm-commits, • pcwang-thead. · View Herald Transcript

Harbormaster completed remote builds in B232968: Diff 523501.May 18 2023, 12:17 PM

gandhi21299 added reviewers: nikic, reames, nlopes, bcahoon, arsenm.May 18 2023, 12:18 PM

Herald added subscribers: StephenFan, wdng. · View Herald TranscriptMay 18 2023, 12:18 PM

gandhi21299 mentioned this in D150595: [InstCombine] Simplify MemTransferInst with type inference.May 18 2023, 12:19 PM

nikic requested changes to this revision.May 18 2023, 1:35 PM

nikic added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1634	Check for `isBitOrNoopPointerCastable()` instead, and please add a test that requires inserting ptrtoint or inttoptr. Also this is missing a check for hasSameSpecialState(). Please add a test with for example different alignments.
1642	You should only insert the bitcast when actually doing the transform. We should not inserting if the transform is aborted later.

This revision now requires changes to proceed.May 18 2023, 1:35 PM

foad added a subscriber: foad.May 19 2023, 2:50 AM

Replaced bitwidth equivalence check with isBitOrNoopPointerCastable()
Inserted hasSameSpecialState() check
Added a test with different alignments
Added ptrtoint test
Created an AMDGPU-specific test for inttoptr. This test appears to add an iteration to instruction combining until it reaches a fixed point. That is, instcombine will assume that it will run indefinitely when instcombine-infinite-loop-threshold is set to 3 whereas the test passes when I change the threshold value to 4.

Herald added subscribers: kerbowa, jvesely. · View Herald TranscriptMay 19 2023, 11:13 AM

arsenm added inline comments.May 19 2023, 11:17 AM

llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll
221	Can't these merge using the minimum of the alignments?

arsenm added inline comments.May 19 2023, 11:23 AM

llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll
243	This one is undefined because the alloca has default align 2

nikic requested changes to this revision.May 19 2023, 11:36 AM

nikic added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1626	These isSingleValueType / is VectorTy checks should not be necessary, isBitOrNoopPointerCastable will already do all the necessary type checks.
1684	There is no need to rewrite to a new temporary store instruction. You can insert the bitcast directly at the phi node operand. You also need to use the IRBuilder to create the cast, otherwise it will not be queued for reprocessing. In that case you also don't need the InsertBitcast flag, because the IRBuilder will omit the bitcast if it is not needed. If you do everything correctly, then your AMDGPU problem should go away as well -- the instcombine-infinite-loop-threshold flag exists specifically to detect these worklist management bugs.
llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll
221	It's possible, but please not in this patch.

This revision now requires changes to proceed.May 19 2023, 11:36 AM

Use IRBuilder to create cast
Eliminate InsertBitcast flag
Fixed alignment test
Removed redundant checks
Eliminated temporary store

@nikic The issue with infinite-loop-threshold continues to persist.

Harbormaster completed remote builds in B233280: Diff 523906.May 19 2023, 3:13 PM

gandhi21299 marked an inline comment as done.May 19 2023, 4:21 PM

nikic requested changes to this revision.May 20 2023, 1:25 AM

nikic added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1615	Doesn't look like returning `StoreInst *` is needed/useful anymore -- return bool instead?
1648	This `break` got dropped. I think this is the reason for the test regressions and the crashes in pre-merge checks. Took me a while to spot because I usually only look on the right side of the diff...

This revision now requires changes to proceed.May 20 2023, 1:25 AM

Brought back the break statement which was removed accidentally
OtherStoreIsMergable returns a bool now, instead of a StoreInst *
Corrected tests

gandhi21299 marked an inline comment as done.May 20 2023, 1:00 PM

Harbormaster completed remote builds in B233384: Diff 524050.May 20 2023, 1:46 PM

rebase and re-run premerge checks

Harbormaster completed remote builds in B233394: Diff 524061.May 20 2023, 3:58 PM

This basically looks good to me, but I'll have to check what is going on with instcombine-infinite-loop-threshold here.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1619–1620	Can drop this check -- it's a subset of the one below.
1624–1627

Simplified OtherStoreIsMergeable as requested

The reason for another iteration of instcombine for @inttoptr_merge(..) is that mergeStoreIntoSuccessor() generates a case where a store/load pair is merged into an inttoptr which is later merged with the PHI inserted by mergeStoreIntoSuccessor() in the first iteration. I am not sure if implementing these cases in mergeStoreIntoSuccessor() is viable since it will make the code redundant.

clang-format and rebase

Harbormaster completed remote builds in B233450: Diff 524130.May 21 2023, 3:02 PM

foad added inline comments.May 22 2023, 12:25 AM

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1655–1659	This can be simplified as suggested, since if we find a non-mergable store, the `mayWriteToMemory` test just below will do the `return false`. Also I think moving the cast-to-StoreInst inside the helper function would make the patch simpler overall, i.e. something like: auto IsMergeableStore = [&](Instruction *OtherStore) -> bool ...

nikic added inline comments.May 22 2023, 1:51 AM

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1655–1659	We still need the StoreInst to assign to OtherStore, so I don't think moving the cast into the helper would work.

In D150900#4359334, @gandhi21299 wrote:

The reason for another iteration of instcombine for @inttoptr_merge(..) is that mergeStoreIntoSuccessor() generates a case where a store/load pair is merged into an inttoptr which is later merged with the PHI inserted by mergeStoreIntoSuccessor() in the first iteration. I am not sure if implementing these cases in mergeStoreIntoSuccessor() is viable since it will make the code redundant.

I've looked into this as well, and I think this would mostly be solved by D75362 (we visit instructions in the wrong order, in a way that is relevant here). It's okay to ignore this.

It took me a while to understand why AMDGPU is relevant here. The difference is that with default data layout i64 only has 4 byte alignment, so we first need to promote the alignment to 8 before the transform can be performed. For AMDGPU the alignment is 8 to start with. You can make this test AMDGPU independent by just explicitly specifying the alignment on the stores, instead of using natural alignment.

Simplified code
Made inttoptr_merge() target independent by inserting align 8 to store instructions and support from D75362 as mentioned

LGTM

This revision is now accepted and ready to land.May 22 2023, 7:30 AM

Many thanks for the code review. I will push this patch to main.

Removed the inclusion of 'Instructions.h'

This revision was landed with ongoing or failed builds.May 22 2023, 7:37 AM

Closed by commit rG5d3412a6d17b: [InstCombine] Insert a bitcast to enable merging similar store insts (authored by gandhi21299). · Explain Why

This revision was automatically updated to reflect the committed changes.

gandhi21299 added a commit: rG5d3412a6d17b: [InstCombine] Insert a bitcast to enable merging similar store insts.

Harbormaster completed remote builds in B233565: Diff 524299.May 22 2023, 8:33 AM

Diff 524129

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

Show All 12 Lines

#include "InstCombineInternal.h" #include "InstCombineInternal.h"

#include "llvm/ADT/MapVector.h" #include "llvm/ADT/MapVector.h"

#include "llvm/ADT/SmallString.h" #include "llvm/ADT/SmallString.h"

#include "llvm/ADT/Statistic.h" #include "llvm/ADT/Statistic.h"

#include "llvm/Analysis/AliasAnalysis.h" #include "llvm/Analysis/AliasAnalysis.h"

#include "llvm/Analysis/Loads.h" #include "llvm/Analysis/Loads.h"

#include "llvm/IR/DataLayout.h" #include "llvm/IR/DataLayout.h"

#include "llvm/IR/DebugInfoMetadata.h" #include "llvm/IR/DebugInfoMetadata.h"

#include "llvm/IR/Instructions.h"

#include "llvm/IR/IntrinsicInst.h" #include "llvm/IR/IntrinsicInst.h"

#include "llvm/IR/LLVMContext.h" #include "llvm/IR/LLVMContext.h"

#include "llvm/IR/PatternMatch.h" #include "llvm/IR/PatternMatch.h"

#include "llvm/Transforms/InstCombine/InstCombiner.h" #include "llvm/Transforms/InstCombine/InstCombiner.h"

#include "llvm/Transforms/Utils/Local.h" #include "llvm/Transforms/Utils/Local.h"

using namespace llvm; using namespace llvm;

using namespace PatternMatch; using namespace PatternMatch;

▲ Show 20 Lines • Show All 1,577 Lines • ▼ Show 20 Lines if (StoreBB == DestBB || OtherBB == DestBB)

return false; return false;

// Verify that the other block ends in a branch and is not otherwise empty. // Verify that the other block ends in a branch and is not otherwise empty.

BasicBlock::iterator BBI(OtherBB->getTerminator()); BasicBlock::iterator BBI(OtherBB->getTerminator());

BranchInst *OtherBr = dyn_cast<BranchInst>(BBI); BranchInst *OtherBr = dyn_cast<BranchInst>(BBI);

if (!OtherBr || BBI == OtherBB->begin()) if (!OtherBr || BBI == OtherBB->begin())

return false; return false;

auto OtherStoreIsMergeable = [&](StoreInst *OtherStore) -> bool {

nikicUnsubmitted

Done

Doesn't look like returning StoreInst * is needed/useful anymore -- return bool instead?

nikic: Doesn't look like returning `StoreInst *` is needed/useful anymore -- return bool instead?

if (!OtherStore ||

OtherStore->getPointerOperand() != SI.getPointerOperand())

return false;

auto *SIVTy = SI.getValueOperand()->getType();

nikicUnsubmitted

Done

Can drop this check -- it's a subset of the one below.

nikic: Can drop this check -- it's a subset of the one below.

auto *OSVTy = OtherStore->getValueOperand()->getType();

return CastInst::isBitOrNoopPointerCastable(OSVTy, SIVTy, DL) &&

SI.hasSameSpecialState(OtherStore);

};

// If the other block ends in an unconditional branch, check for the 'if then // If the other block ends in an unconditional branch, check for the 'if then

nikicUnsubmitted

Done

These isSingleValueType / is VectorTy checks should not be necessary, isBitOrNoopPointerCastable will already do all the necessary type checks.

nikic: These isSingleValueType / is VectorTy checks should not be necessary…

// else' case. There is an instruction before the branch. // else' case. There is an instruction before the branch.

nikicUnsubmitted

Done

auto *OSVTy = OtherStore->getValueOperand()->getType();

- if (!CastInst::isBitOrNoopPointerCastable(OSVTy, SIVTy, DL) ||

- !SI.hasSameSpecialState(OtherStore))

- return false;

- return true;

+ return CastInst::isBitOrNoopPointerCastable(OSVTy, SIVTy, DL) &&

+ SI.hasSameSpecialState(OtherStore);

};

// If the other block ends in an unconditional branch, check for the 'if then

nikic:

StoreInst *OtherStore = nullptr; StoreInst *OtherStore = nullptr;

if (OtherBr->isUnconditional()) { if (OtherBr->isUnconditional()) {

--BBI; --BBI;

// Skip over debugging info and pseudo probes. // Skip over debugging info and pseudo probes.

while (BBI->isDebugOrPseudoInst() || while (BBI->isDebugOrPseudoInst() ||

(isa<BitCastInst>(BBI) && BBI->getType()->isPointerTy())) { (isa<BitCastInst>(BBI) && BBI->getType()->isPointerTy())) {

if (BBI==OtherBB->begin()) if (BBI==OtherBB->begin())

nikicUnsubmitted

Done

Check for isBitOrNoopPointerCastable() instead, and please add a test that requires inserting ptrtoint or inttoptr.

Also this is missing a check for hasSameSpecialState(). Please add a test with for example different alignments.

nikic: Check for `isBitOrNoopPointerCastable()` instead, and please add a test that requires inserting…

return false; return false;

--BBI; --BBI;

} }

// If this isn't a store, isn't a store to the same location, or is not the // If this isn't a store, isn't a store to the same location, or is not the

// right kind of store, bail out. // right kind of store, bail out.

OtherStore = dyn_cast<StoreInst>(BBI); OtherStore = dyn_cast<StoreInst>(BBI);

if (!OtherStore || OtherStore->getOperand(1) != SI.getOperand(1) || if (!OtherStoreIsMergeable(OtherStore))

!SI.isSameOperationAs(OtherStore))

return false; return false;

nikicUnsubmitted

Done

You should only insert the bitcast when actually doing the transform. We should not inserting if the transform is aborted later.

nikic: You should only insert the bitcast when actually doing the transform. We should not inserting…

} else { } else {

// Otherwise, the other block ended with a conditional branch. If one of the // Otherwise, the other block ended with a conditional branch. If one of the

// destinations is StoreBB, then we have the if/then case. // destinations is StoreBB, then we have the if/then case.

if (OtherBr->getSuccessor(0) != StoreBB && if (OtherBr->getSuccessor(0) != StoreBB &&

OtherBr->getSuccessor(1) != StoreBB) OtherBr->getSuccessor(1) != StoreBB)

return false; return false;

// Okay, we know that OtherBr now goes to Dest and StoreBB, so this is an // Okay, we know that OtherBr now goes to Dest and StoreBB, so this is an

// if/then triangle. See if there is a store to the same ptr as SI that // if/then triangle. See if there is a store to the same ptr as SI that

// lives in OtherBB. // lives in OtherBB.

for (;; --BBI) { for (;; --BBI) {

// Check to see if we find the matching store. // Check to see if we find the matching store.

if ((OtherStore = dyn_cast<StoreInst>(BBI))) { if ((OtherStore = dyn_cast<StoreInst>(BBI))) {

if (OtherStore->getOperand(1) != SI.getOperand(1) || if (!OtherStoreIsMergeable(OtherStore))

!SI.isSameOperationAs(OtherStore))

return false; return false;

break; break;

nikicUnsubmitted

Done

This break got dropped. I think this is the reason for the test regressions and the crashes in pre-merge checks.

Took me a while to spot because I usually only look on the right side of the diff...

nikic: This `break` got dropped. I think this is the reason for the test regressions and the crashes…

} }

foadUnsubmitted

Done

// Check to see if we find the matching store.

- if ((OtherStore = dyn_cast<StoreInst>(BBI))) {

- if (!OtherStoreIsMergeable(OtherStore))

- return false;

+ OtherStore = dyn_cast<StoreInst>(BBI);

+ if (OtherStoreIsMergeable(OtherStore))

break;

- }

// If we find something that may be using or overwriting the stored

This can be simplified as suggested, since if we find a non-mergable store, the mayWriteToMemory test just below will do the return false.

Also I think moving the cast-to-StoreInst inside the helper function would make the patch simpler overall, i.e. something like:

auto IsMergeableStore = [&](Instruction *OtherStore) -> bool ...

foad: This can be simplified as suggested, since if we find a non-mergable store, the…

nikicUnsubmitted

Done

We still need the StoreInst to assign to OtherStore, so I don't think moving the cast into the helper would work.

nikic: We still need the StoreInst to assign to OtherStore, so I don't think moving the cast into the…

// If we find something that may be using or overwriting the stored // If we find something that may be using or overwriting the stored

// value, or if we run out of instructions, we can't do the transform. // value, or if we run out of instructions, we can't do the transform.

if (BBI->mayReadFromMemory() || BBI->mayThrow() || if (BBI->mayReadFromMemory() || BBI->mayThrow() ||

BBI->mayWriteToMemory() || BBI == OtherBB->begin()) BBI->mayWriteToMemory() || BBI == OtherBB->begin())

return false; return false;

} }

// In order to eliminate the store in OtherBr, we have to make sure nothing // In order to eliminate the store in OtherBr, we have to make sure nothing

// reads or overwrites the stored value in StoreBB. // reads or overwrites the stored value in StoreBB.

for (BasicBlock::iterator I = StoreBB->begin(); &*I != &SI; ++I) { for (BasicBlock::iterator I = StoreBB->begin(); &*I != &SI; ++I) {

// FIXME: This should really be AA driven. // FIXME: This should really be AA driven.

if (I->mayReadFromMemory() || I->mayThrow() || I->mayWriteToMemory()) if (I->mayReadFromMemory() || I->mayThrow() || I->mayWriteToMemory())

return false; return false;

} }

// Insert a PHI node now if we need it. // Insert a PHI node now if we need it.

Value *MergedVal = OtherStore->getOperand(0); Value *MergedVal = OtherStore->getValueOperand();

// The debug locations of the original instructions might differ. Merge them. // The debug locations of the original instructions might differ. Merge them.

DebugLoc MergedLoc = DILocation::getMergedLocation(SI.getDebugLoc(), DebugLoc MergedLoc = DILocation::getMergedLocation(SI.getDebugLoc(),

OtherStore->getDebugLoc()); OtherStore->getDebugLoc());

if (MergedVal != SI.getOperand(0)) { if (MergedVal != SI.getValueOperand()) {

PHINode *PN = PHINode::Create(MergedVal->getType(), 2, "storemerge"); PHINode *PN =

PN->addIncoming(SI.getOperand(0), SI.getParent()); PHINode::Create(SI.getValueOperand()->getType(), 2, "storemerge");

PN->addIncoming(OtherStore->getOperand(0), OtherBB); PN->addIncoming(SI.getValueOperand(), SI.getParent());

nikicUnsubmitted

Done

There is no need to rewrite to a new temporary store instruction. You can insert the bitcast directly at the phi node operand.

You also need to use the IRBuilder to create the cast, otherwise it will not be queued for reprocessing. In that case you also don't need the InsertBitcast flag, because the IRBuilder will omit the bitcast if it is not needed.

If you do everything correctly, then your AMDGPU problem should go away as well -- the instcombine-infinite-loop-threshold flag exists specifically to detect these worklist management bugs.

nikic: There is no need to rewrite to a new temporary store instruction. You can insert the bitcast…

Builder.SetInsertPoint(OtherStore);

PN->addIncoming(Builder.CreateBitOrPointerCast(MergedVal, PN->getType()),

OtherBB);

MergedVal = InsertNewInstBefore(PN, DestBB->front()); MergedVal = InsertNewInstBefore(PN, DestBB->front());

PN->setDebugLoc(MergedLoc); PN->setDebugLoc(MergedLoc);

} }

// Advance to a place where it is safe to insert the new store and insert it. // Advance to a place where it is safe to insert the new store and insert it.

BBI = DestBB->getFirstInsertionPt(); BBI = DestBB->getFirstInsertionPt();

StoreInst *NewSI = StoreInst *NewSI =

new StoreInst(MergedVal, SI.getOperand(1), SI.isVolatile(), SI.getAlign(), new StoreInst(MergedVal, SI.getOperand(1), SI.isVolatile(), SI.getAlign(),

Show All 15 Lines

llvm/test/Transforms/InstCombine/AMDGPU/merging-stores-into-successor.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
				; RUN: opt -S -passes=instcombine -o - %s \| FileCheck %s

				target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"
				target triple = "amdgcn-amd-amdhsa"

				define ptr @inttoptr_merge(i1 %cond, i64 %a, ptr %b) {
				; CHECK-LABEL: define ptr @inttoptr_merge
				; CHECK-SAME: (i1 [[COND:%.]], i64 [[A:%.]], ptr [[B:%.*]]) {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 [[COND]], label [[BB0:%.]], label [[BB1:%.]]
				; CHECK: BB0:
				; CHECK-NEXT: [[TMP0:%.*]] = inttoptr i64 [[A]] to ptr
				; CHECK-NEXT: br label [[SINK:%.*]]
				; CHECK: BB1:
				; CHECK-NEXT: br label [[SINK]]
				; CHECK: sink:
				; CHECK-NEXT: [[STOREMERGE:%.*]] = phi ptr [ [[B]], [[BB1]] ], [ [[TMP0]], [[BB0]] ]
				; CHECK-NEXT: ret ptr [[STOREMERGE]]
				;
				entry:
				%alloca = alloca ptr
				br i1 %cond, label %BB0, label %BB1
				BB0:
				store i64 %a, ptr %alloca
				br label %sink
				BB1:
				store ptr %b, ptr %alloca
				br label %sink
				sink:
				%val = load ptr, ptr %alloca
				ret ptr %val
				}

llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	bb10: ; preds = %bb
store i32 %i11, ptr getelementptr inbounds ([0 x i32], ptr @arr_2, i64 0, i64 1), align 4		store i32 %i11, ptr getelementptr inbounds ([0 x i32], ptr @arr_2, i64 0, i64 1), align 4
store i16 %i4, ptr getelementptr inbounds ([0 x i16], ptr @arr_4, i64 0, i64 1), align 2		store i16 %i4, ptr getelementptr inbounds ([0 x i16], ptr @arr_4, i64 0, i64 1), align 2
store i32 %i8, ptr getelementptr inbounds ([8 x i32], ptr @arr_3, i64 0, i64 1), align 4		store i32 %i8, ptr getelementptr inbounds ([8 x i32], ptr @arr_3, i64 0, i64 1), align 4
br label %bb12		br label %bb12

bb12: ; preds = %bb10, %bb9		bb12: ; preds = %bb10, %bb9
ret void		ret void
}		}

		define half @diff_types_same_width_merge(i1 %cond, half %a, i16 %b) {
		; CHECK-LABEL: @diff_types_same_width_merge(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: br i1 [[COND:%.]], label [[BB0:%.]], label [[BB1:%.*]]
		; CHECK: BB0:
		; CHECK-NEXT: br label [[SINK:%.*]]
		; CHECK: BB1:
		; CHECK-NEXT: [[TMP0:%.]] = bitcast i16 [[B:%.]] to half
		; CHECK-NEXT: br label [[SINK]]
		; CHECK: sink:
		; CHECK-NEXT: [[STOREMERGE:%.]] = phi half [ [[TMP0]], [[BB1]] ], [ [[A:%.]], [[BB0]] ]
		; CHECK-NEXT: ret half [[STOREMERGE]]
		;
		entry:
		%alloca = alloca half
		br i1 %cond, label %BB0, label %BB1
		BB0:
		store half %a, ptr %alloca
		br label %sink
		BB1:
		store i16 %b, ptr %alloca
		br label %sink
		sink:
		%val = load half, ptr %alloca
		ret half %val
		}

		define i32 @diff_types_diff_width_no_merge(i1 %cond, i32 %a, i64 %b) {
		; CHECK-LABEL: @diff_types_diff_width_no_merge(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[ALLOCA:%.*]] = alloca i64, align 8
		; CHECK-NEXT: br i1 [[COND:%.]], label [[A:%.]], label [[B:%.*]]
		; CHECK: A:
		; CHECK-NEXT: store i32 [[A:%.*]], ptr [[ALLOCA]], align 8
		; CHECK-NEXT: br label [[SINK:%.*]]
		; CHECK: B:
		; CHECK-NEXT: store i64 [[B:%.*]], ptr [[ALLOCA]], align 8
		; CHECK-NEXT: br label [[SINK]]
		; CHECK: sink:
		; CHECK-NEXT: [[VAL:%.*]] = load i32, ptr [[ALLOCA]], align 8
		; CHECK-NEXT: ret i32 [[VAL]]
		;
		entry:
		%alloca = alloca i64
		br i1 %cond, label %A, label %B
		A:
		store i32 %a, ptr %alloca
		br label %sink
		B:
		store i64 %b, ptr %alloca
		br label %sink
		sink:
		%val = load i32, ptr %alloca
		ret i32 %val
		}

		define <4 x i32> @vec_no_merge(i1 %cond, <2 x i32> %a, <4 x i32> %b) {
		; CHECK-LABEL: @vec_no_merge(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[ALLOCA:%.*]] = alloca i64, align 16
		; CHECK-NEXT: br i1 [[COND:%.]], label [[A:%.]], label [[B:%.*]]
		; CHECK: A:
		; CHECK-NEXT: store <2 x i32> [[A:%.*]], ptr [[ALLOCA]], align 16
		; CHECK-NEXT: br label [[SINK:%.*]]
		; CHECK: B:
		; CHECK-NEXT: store <4 x i32> [[B:%.*]], ptr [[ALLOCA]], align 16
		; CHECK-NEXT: br label [[SINK]]
		; CHECK: sink:
		; CHECK-NEXT: [[VAL:%.*]] = load <4 x i32>, ptr [[ALLOCA]], align 16
		; CHECK-NEXT: ret <4 x i32> [[VAL]]
		;
		entry:
		%alloca = alloca i64
		br i1 %cond, label %A, label %B
		A:
		store <2 x i32> %a, ptr %alloca
		br label %sink
		B:
		store <4 x i32> %b, ptr %alloca
		br label %sink
		sink:
		%val = load <4 x i32>, ptr %alloca
		ret <4 x i32> %val
		}

		%struct.half = type { half };

		define %struct.half @one_elem_struct_merge(i1 %cond, %struct.half %a, half %b) {
		; CHECK-LABEL: @one_elem_struct_merge(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: br i1 [[COND:%.]], label [[BB0:%.]], label [[BB1:%.*]]
		; CHECK: BB0:
		; CHECK-NEXT: [[TMP0:%.]] = extractvalue [[STRUCT_HALF:%.]] [[A:%.*]], 0
		; CHECK-NEXT: br label [[SINK:%.*]]
		; CHECK: BB1:
		; CHECK-NEXT: br label [[SINK]]
		; CHECK: sink:
		; CHECK-NEXT: [[STOREMERGE:%.]] = phi half [ [[TMP0]], [[BB0]] ], [ [[B:%.]], [[BB1]] ]
		; CHECK-NEXT: [[VAL1:%.*]] = insertvalue [[STRUCT_HALF]] poison, half [[STOREMERGE]], 0
		; CHECK-NEXT: ret [[STRUCT_HALF]] [[VAL1]]
		;
		entry:
		%alloca = alloca i64
		br i1 %cond, label %BB0, label %BB1
		BB0:
		store %struct.half %a, ptr %alloca
		br label %sink
		BB1:
		store half %b, ptr %alloca
		br label %sink
		sink:
		%val = load %struct.half, ptr %alloca
		ret %struct.half %val
		}

		%struct.tup = type { half, i32 };

		define %struct.tup @multi_elem_struct_no_merge(i1 %cond, %struct.tup %a, half %b) {
		; CHECK-LABEL: @multi_elem_struct_no_merge(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[ALLOCA:%.*]] = alloca i64, align 8
		; CHECK-NEXT: br i1 [[COND:%.]], label [[A:%.]], label [[B:%.*]]
		; CHECK: A:
		; CHECK-NEXT: store [[STRUCT_TUP:%.]] [[A:%.]], ptr [[ALLOCA]], align 8
		; CHECK-NEXT: br label [[SINK:%.*]]
		; CHECK: B:
		; CHECK-NEXT: store half [[B:%.*]], ptr [[ALLOCA]], align 8
		; CHECK-NEXT: br label [[SINK]]
		; CHECK: sink:
		; CHECK-NEXT: [[VAL:%.*]] = load [[STRUCT_TUP]], ptr [[ALLOCA]], align 8
		; CHECK-NEXT: ret [[STRUCT_TUP]] [[VAL]]
		;
		entry:
		%alloca = alloca i64
		br i1 %cond, label %A, label %B
		A:
		store %struct.tup %a, ptr %alloca
		br label %sink
		B:
		store half %b, ptr %alloca
		br label %sink
		sink:
		%val = load %struct.tup, ptr %alloca
		ret %struct.tup %val
		}

		define i16 @same_types_diff_align_no_merge(i1 %cond, i16 %a, i16 %b) {
		arsenmUnsubmitted Done Reply Inline Actions Can't these merge using the minimum of the alignments? arsenm: Can't these merge using the minimum of the alignments?
		nikicUnsubmitted Done Reply Inline Actions It's possible, but please not in this patch. nikic: It's possible, but please not in this patch.
		; CHECK-LABEL: @same_types_diff_align_no_merge(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[ALLOCA:%.*]] = alloca i16, align 4
		; CHECK-NEXT: br i1 [[COND:%.]], label [[BB0:%.]], label [[BB1:%.*]]
		; CHECK: BB0:
		; CHECK-NEXT: store i16 [[A:%.*]], ptr [[ALLOCA]], align 8
		; CHECK-NEXT: br label [[SINK:%.*]]
		; CHECK: BB1:
		; CHECK-NEXT: store i16 [[B:%.*]], ptr [[ALLOCA]], align 4
		; CHECK-NEXT: br label [[SINK]]
		; CHECK: sink:
		; CHECK-NEXT: [[VAL:%.*]] = load i16, ptr [[ALLOCA]], align 4
		; CHECK-NEXT: ret i16 [[VAL]]
		;
		entry:
		%alloca = alloca i16, align 4
		br i1 %cond, label %BB0, label %BB1
		BB0:
		store i16 %a, ptr %alloca, align 8
		br label %sink
		BB1:
		store i16 %b, ptr %alloca, align 4
		arsenmUnsubmitted Done Reply Inline Actions This one is undefined because the alloca has default align 2 arsenm: This one is undefined because the alloca has default align 2
		br label %sink
		sink:
		%val = load i16, ptr %alloca
		ret i16 %val
		}

		define i64 @ptrtoint_merge(i1 %cond, i64 %a, ptr %b) {
		; CHECK-LABEL: @ptrtoint_merge(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: br i1 [[COND:%.]], label [[BB0:%.]], label [[BB1:%.*]]
		; CHECK: BB0:
		; CHECK-NEXT: br label [[SINK:%.*]]
		; CHECK: BB1:
		; CHECK-NEXT: [[TMP0:%.]] = ptrtoint ptr [[B:%.]] to i64
		; CHECK-NEXT: br label [[SINK]]
		; CHECK: sink:
		; CHECK-NEXT: [[STOREMERGE:%.]] = phi i64 [ [[A:%.]], [[BB0]] ], [ [[TMP0]], [[BB1]] ]
		; CHECK-NEXT: ret i64 [[STOREMERGE]]
		;
		entry:
		%alloca = alloca ptr
		br i1 %cond, label %BB0, label %BB1
		BB0:
		store i64 %a, ptr %alloca
		br label %sink
		BB1:
		store ptr %b, ptr %alloca
		br label %sink
		sink:
		%val = load i64, ptr %alloca
		ret i64 %val
		}

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Insert a bitcast to enable merging similar store insts
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 524129

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

llvm/test/Transforms/InstCombine/AMDGPU/merging-stores-into-successor.ll

llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Insert a bitcast to enable merging similar store instsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 524129

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

llvm/test/Transforms/InstCombine/AMDGPU/merging-stores-into-successor.ll

llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll

[InstCombine] Insert a bitcast to enable merging similar store insts
ClosedPublic