This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
13/19
InstCombineLoadStoreAlloca.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
4/6
load-insert-store.ll

Differential D71828

[InstCombine] Convert vector store to scalar store if only one element updated
AbandonedPublic

Authored by qiucf on Dec 23 2019, 12:23 AM.

Download Raw Diff

Details

Reviewers

spatel
nemanjai
andrewrk
fhahn
efriedma
bogner
lebedev.ri
jdoerfert

Group Reviewers

Restricted Project

Summary

This is a simplified version of https://reviews.llvm.org/D70223. Since we can at least be confident that single element store won't be worse than vector store, it's clearer to put them into InstCombine. And there's already some logic about transforming a shufflevector affecting one element into insertelement, we can take care of the only case.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

qiucf created this revision.Dec 23 2019, 12:23 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptDec 23 2019, 12:23 AM

@spatel @jdoerfert Should/could we use attributor to keep track of the 'storeable' memory range?

llvm/test/Transforms/InstCombine/single-element-store.ll
1 ↗	(On Diff #235102)	Regenerate with update_test_checks.py ?

Update test using auto-genearate tool.

qiucf marked an inline comment as done.Dec 23 2019, 12:52 AM

lebedev.ri added inline comments.Dec 23 2019, 12:55 AM

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1200	I'd strongly suggest to simply create a new `store`.
llvm/test/Transforms/InstCombine/single-element-store.ll
8 ↗	(On Diff #235104)	The alignment doesn't seem correct to me?

InstCombine has some minimal load/store transforms, so this might be ok to add here, but InstCombine is probably not a pass that should try to do anything more complicated with memory ops.

This patch has the same requirements that were requested here:
https://reviews.llvm.org/D70223#1755719
...but it does not implement those.

In other words, the following tests will be miscompiled (and so they should be added to trunk independently before this patch tries to go any further):

define void @insert_store_addr(<16 x i8>* %p, <16 x i8>* %q, i8 %s) {
  %ld = load <16 x i8>, <16 x i8>* %p
  %ins = insertelement <16 x i8> %ld, i8 %s, i32 3
  store <16 x i8> %ins, <16 x i8>* %q ; store to different address
  ret void
}

define void @insert_store_mem_mod(<16 x i8>* %p, <16 x i8>* %q, i8 %s) {
  %ld = load <16 x i8>, <16 x i8>* %p
  store <16 x i8> zeroinitializer, <16 x i8>* %q ; do pointers alias?
  %ins = insertelement <16 x i8> %ld, i8 %s, i32 3
  store <16 x i8> %ins, <16 x i8>* %p
  ret void
}

This revision now requires changes to proceed.Dec 23 2019, 5:47 AM

qiucf added a parent revision: D71886: [NFC] Add test case for load-insert-store pattern in InstCombine.Dec 25 2019, 11:14 PM

Addressed comments:

Add check for address of load and store.
Add check for any memory write instructions between load and store.
Add more test cases for cases above. (Thanks to spatel)

qiucf marked an inline comment as done.Dec 25 2019, 11:18 PM

lkail added a subscriber: lkail.Dec 26 2019, 12:01 AM

lkail added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1196	Not an expert of `opt`, I just wonder is it more proper to check if `Load->getPointerOperand()` `MustAlias` `SI.getPointerOperand()`, since there might be `bitcast`s of these pointers?

In D71828#1794697, @RKSimon wrote:

@spatel @jdoerfert Should/could we use attributor to keep track of the 'storeable' memory range?

For simple examples like this we already do. Run the Attributor on them and you get dereferenceable annotations for the argument pointers.
Nevertheless, we can lose information. My suggestions to keep it is to emit something like this:

`call void @llvm.assume(i1 true) ["derefereceable"(%ptr, sizeof(access-we-just-removed))]`

I do have a RFC on the mailing list on this topic but no conclusion yet. If we decide to go forward with it we start emitting them and I
we write a combiner pass as well as integration into things like the Attributor to use the information.

Long story short, I think doing this is fine, once we have better assume capabilities we should emit one to keep the information about dereferenceability.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1196	I'd just call `stripPointerCastSameRepresentation()` on both pointers. That should give you all cases you care about (wrt. casts). Though, the `match` above might take care of it actually.
1216	The GEP should be `inbounds` and the store should keep the `nontemporal` flag if present on the original.

Create inbounds store
Strip casted pointer before comparison
Make sure load and store belong to the same BB
Keep nontemporal metadata of store

I am generally fine with this but others are reviewing this change already.

Ping..

I think this is safe now (although limited to a single basic block), but I don't have enough experience with memop transforms to confidently approve. Please get a 2nd look from @lebedev.ri @jdoerfert @efriedma or someone else.

I'm also not sure if there's a better way to preserve metadata. For example, we have "copyMetadataForLoad()" - is there something like that that we can use here?

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1193	Using 'auto *' is recommended with dyn_cast.
1213–1217	Using 'auto' on these lines is not recommended. http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable
llvm/test/Transforms/InstCombine/load-insert-store.ll
7–8	Need to regenerate the CHECK lines? All tests should have 'inbounds' on the GEP, right?
74	Please add a test comment to explain the 2 transforms. Something like: ; p and q may alias, so it is not safe to remove the first load. ; r is known to not alias the other pointers, so it is safe to remove the second load.

I think you need to check the elements are byte-sized?

I'm a little concerned we're doing this too early, and it will end up blocking optimizations like GVN (since it can't reason about the partially overlapping store). Not sure how likely that is to trigger in practice, though.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1204	AliasAnalysis has a dedicated method getModRefInfo() to compute the exact property you want without caring about the specific kind of instruction.

Address some comments.

Change some use of auto.
Update test cases with comments.
Use getModRefInfo.

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Harbormaster failed remote builds in B43507: Diff 236816!Jan 8 2020, 7:04 AM

qiucf removed a parent revision: D71886: [NFC] Add test case for load-insert-store pattern in InstCombine.Jan 8 2020, 7:49 AM

The merge check bot should have some problems in resolving parent-child revision with some already committed. Currently, they have no problem applying into master and tests are passed.

llvm/test/Transforms/InstCombine/load-insert-store.ll
7–8	Sure. In some cases `inbounds` might be simplified, now we have them :-)

lebedev.ri requested changes to this revision.Jan 8 2020, 8:07 AM

lebedev.ri added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1189	Match `m_Load()` specifically?
1204	Is it bitfield? Should we be checking `if (AA->getModRefInfo(&*BBI, MemoryLocation::get(&SI) & ModRefInfo::Mod)` ? This does correctly work for calls right, not just instructions?

This revision now requires changes to proceed.Jan 8 2020, 8:07 AM

Use dedicated method to check ModRefInfo.
Add tests about calls.

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1189	I think `m_Load()` is used for matching a load to some specific address. `m_Load(Src)` set `Src` as address of the load. But here we need the load instr to do more checks. Is that necessary?

Unit tests: pass. 61802 tests passed, 0 failed and 781 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster completed remote builds in B43915: Diff 237891.Jan 14 2020, 2:28 AM

lebedev.ri added inline comments.Jan 14 2020, 2:39 AM

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1201–1203	Hm, how do we know both the load and the store are in the same basic block?
1204	This looks better, but i don't have prior expirience with `ModRefInfo`, so this still looks suspicious to me - is this conservatively correct? `isModSet()` looks for `MustMod`, which seems stronger than `Mod`?

(marking as reviewed)

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1201–1203	(I would expect that there should either already be a function to do this check, or this should be refactored into a helper function from the getgo.)

This revision now requires changes to proceed.Jan 14 2020, 8:40 AM

Wrap memory modified check into a single method (which can help other methods)

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1201–1203	We checked parent of `Load` and `SI` above, if they aren't the same, exit.
1204	hmm.. As the doc/comment says, I think this bit indicates that the mem location may/must be modified, and there's another bit about it's a 'must' or a 'may'. So the three bit should each mean `May Mod Ref`. So `Mod` means it may write and `mustAlias` not found, while `MustMod` means it may write and `mustAlias` found. Here we only need to care if it may write or not (the bit), regardless of alias. Maybe I'm not correct. I found a related commit, rG50db8a, for a look :)

Unit tests: pass. 61912 tests passed, 0 failed and 783 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster completed remote builds in B44131: Diff 238436.Jan 16 2020, 2:55 AM

lebedev.ri resigned from this revision.Jan 25 2020, 3:19 AM

spatel mentioned this in D73480: [VectorCombine] new IR transform pass for partial vector ops.Jan 28 2020, 2:54 PM

Hm, i don't fully recall the story here, but @qiucf, are you still interested in this?
I think this looks about right now, with some nits.

@spatel, @jdoerfert: any other thoughts?

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1200	We indeed can't easily (if at all?) handle non-constant case. But there are no tests that we don't transform when the idx is a variable. Likewise i suspect constantexprs will give is trouble here, so i strongly recommend using `m_ConstantInt()`.
1203	I think we are better-off early-returning instead.
1218	Needs explicit `Align(1)` now. Though i think at least for constant offsets we could deduce the alignment, something close to: commonAlignment( SI.getAlign(), Idx->getUniqueInteger().getLimitedValue() * (NewElement->getType()->getScalarSizeInBits() / 8)) But that can be done in a followup patch,
llvm/test/Transforms/InstCombine/load-insert-store.ll
2	I wasn't sure whether or not this is endian-sensitive, alive2 says it's not, but maybe we should be proactive and do ; RUN: opt < %s -instcombine -S -data-layout=e \| FileCheck %s --check-prefix=ALL --check-prefixes=CHECK,LE ; RUN: opt < %s -instcombine -S -data-layout=E \| FileCheck %s --check-prefix=ALL --check-prefixes=CHECK,BE
4	Add same test but with `<8 x i16>` (so it's not byte-sized)?

This revision now requires changes to proceed.Jun 8 2020, 9:55 AM

lebedev.ri added inline comments.Jun 8 2020, 9:57 AM

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1184–1185	Do we care whether these are single-use?

One thing I've realized reading this thread again is that it's not only the compiler that can get confused by a wrong-width store; the CPU itself can also run into issues with store->load forwarding. See recent discussion http://lists.llvm.org/pipermail/llvm-dev/2020-May/141837.html . So whether this transform is worthwhile might depend on the context.

In D71828#2080810, @efriedma wrote:

One thing I've realized reading this thread again is that it's not only the compiler that can get confused by a wrong-width store; the CPU itself can also run into issues with store->load forwarding. See recent discussion http://lists.llvm.org/pipermail/llvm-dev/2020-May/141837.html . So whether this transform is worthwhile might depend on the context.

Right. And if we know problematic context/conditions, we can always perform reverse fold later on, in backend. Correct?

In D71828#2080937, @lebedev.ri wrote:

In D71828#2080810, @efriedma wrote:

One thing I've realized reading this thread again is that it's not only the compiler that can get confused by a wrong-width store; the CPU itself can also run into issues with store->load forwarding. See recent discussion http://lists.llvm.org/pipermail/llvm-dev/2020-May/141837.html . So whether this transform is worthwhile might depend on the context.

Right. And if we know problematic context/conditions, we can always perform reverse fold later on, in backend. Correct?

The C memory model doesn't allow reversing the fold in a lot of cases without encoding more information in the IR. Also, I'm not sure what the heuristic looks like, so I'm not sure how hard it is to compute in the backend.

I feel there are opportunities here as our vector handling needs to improve a lot.
Though, without more data to decide (1) where to put this (in the pipeline) and (2) when to enable this (platform, context,...), it is hard to say for sure.
Generally, I would like to have this in-tree (potentially under a flag).

I think the code looks good, though we can add FIXMEs for missing parts, e.g, better alignment, the load version of this, ...

We have -vector-combine now, and it only runs late in the pipeline. If we put this transform in there, that should bypass concerns about interfering with GVN. That pass also has access to the cost model, so we could try to limit the transform if there are known cases where it would not be profitable (ie, no need to try to implement a codegen reversal).

Thanks for comments!

In D71828#2082229, @spatel wrote:

We have -vector-combine now, and it only runs late in the pipeline. If we put this transform in there, that should bypass concerns about interfering with GVN. That pass also has access to the cost model, so we could try to limit the transform if there are known cases where it would not be profitable (ie, no need to try to implement a codegen reversal).

I'll try moving this into vector combine. Maybe we can do/know more by help from TTI.

llvm/test/Transforms/InstCombine/load-insert-store.ll
2	We have few docs explicitly talked about this, and I remember endianness doesn't matter. But adding BE/LE check is fine. Thanks.

qiucf mentioned this in D98240: [VectorCombine] Simplify to scalar store if only one element updated.Mar 9 2021, 2:00 AM

D98240 landed.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineLoadStoreAlloca.cpp

50 lines

test/

Transforms/

InstCombine/

load-insert-store.ll

50 lines

Diff 237891

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

Show First 20 Lines • Show All 1,164 Lines • ▼ Show 20 Lines	if (!SI.isAtomic() \|\| isSupportedAtomicType(U->getType())) {
return true;		return true;
}		}

// FIXME: We should also canonicalize stores of vectors when their elements		// FIXME: We should also canonicalize stores of vectors when their elements
// are cast to other types.		// are cast to other types.
return false;		return false;
}		}

		// Combine patterns like:
		// %0 = load <4 x i32>, <4 x i32>* %a
		// %1 = insertelement <4 x i32> %0, i32 %b, i32 1
		// store <4 x i32> %1, <4 x i32>* %a
		// to:
		// %0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a, i64 0, i64 1
		// store i32 %b, i32* %0
		static Instruction *foldSingleElementStore(InstCombiner &IC, StoreInst &SI,
		AliasAnalysis *AA) {
		if (!SI.isSimple() \|\| !SI.getValueOperand()->getType()->isVectorTy())
		return nullptr;

		Instruction *Source;
		lebedev.riUnsubmitted Not Done Reply Inline Actions Do we care whether these are single-use? lebedev.ri: Do we care whether these are single-use?
		Value *NewElement;
		Constant *Idx;
		if (!match(SI.getValueOperand(),
		m_InsertElement(m_Instruction(Source), m_Value(NewElement),
		lebedev.riUnsubmitted Not Done Reply Inline Actions Match `m_Load()` specifically? lebedev.ri: Match `m_Load()` specifically?
		qiucfAuthorUnsubmitted Done Reply Inline Actions I think `m_Load()` is used for matching a load to some specific address. `m_Load(Src)` set `Src` as address of the load. But here we need the load instr to do more checks. Is that necessary? qiucf: I think `m_Load()` is used for matching a load to some specific address. `m_Load(Src)` set…
		m_Constant(Idx))))
		return nullptr;

		if (auto *Load = dyn_cast<LoadInst>(Source)) {
		spatelUnsubmitted Done Reply Inline Actions Using 'auto ' is recommended with dyn_cast. spatel:* Using 'auto *' is recommended with dyn_cast.
		Value *SrcAddr = Load->getPointerOperand()->stripPointerCasts();

		// Don't optimize for atomic/volatile load or stores.
		lkailUnsubmitted Done Reply Inline Actions Not an expert of `opt`, I just wonder is it more proper to check if `Load->getPointerOperand()` `MustAlias` `SI.getPointerOperand()`, since there might be `bitcast`s of these pointers? lkail: Not an expert of `opt`, I just wonder is it more proper to check if `Load->getPointerOperand()`…
		jdoerfertUnsubmitted Done Reply Inline Actions I'd just call `stripPointerCastSameRepresentation()` on both pointers. That should give you all cases you care about (wrt. casts). Though, the `match` above might take care of it actually. jdoerfert: I'd just call `stripPointerCastSameRepresentation()` on both pointers. That should give you all…
		if (!Load->isSimple() \|\| Load->getParent() != SI.getParent() \|\|
		SrcAddr != SI.getPointerOperand()->stripPointerCasts())
		return nullptr;

		lebedev.riUnsubmitted Done Reply Inline Actions I'd strongly suggest to simply create a new `store`. lebedev.ri: I'd strongly suggest to simply create a new `store`.
		lebedev.riUnsubmitted Not Done Reply Inline Actions We indeed can't easily (if at all?) handle non-constant case. But there are no tests that we don't transform when the idx is a variable. Likewise i suspect constantexprs will give is trouble here, so i strongly recommend using `m_ConstantInt()`. lebedev.ri: We indeed can't easily (if at all?) handle non-constant case. But there are no tests that we…
		// Make sure memory isn't modified between the two.
		for (BasicBlock::iterator BBI = Load->getIterator();
		BBI != SI.getIterator(); ++BBI)
		lebedev.riUnsubmitted Done Reply Inline Actions Hm, how do we know both the load and the store are in the same basic block? lebedev.ri: Hm, how do we know both the load and the store are in the same basic block?
		qiucfAuthorUnsubmitted Done Reply Inline Actions We checked parent of `Load` and `SI` above, if they aren't the same, exit. qiucf: We checked parent of `Load` and `SI` above, if they aren't the same, exit.
		lebedev.riUnsubmitted Done Reply Inline Actions (I would expect that there should either already be a function to do this check, or this should be refactored into a helper function from the getgo.) lebedev.ri: (I would expect that there should either already be a function to do this check, or this should…
		lebedev.riUnsubmitted Not Done Reply Inline Actions I think we are better-off early-returning instead. lebedev.ri: I think we are better-off early-returning instead.
		if (isModSet(AA->getModRefInfo(&*BBI, MemoryLocation::get(&SI))))
		efriedmaUnsubmitted Done Reply Inline Actions AliasAnalysis has a dedicated method getModRefInfo() to compute the exact property you want without caring about the specific kind of instruction. efriedma: AliasAnalysis has a dedicated method getModRefInfo() to compute the exact property you want…
		lebedev.riUnsubmitted Done Reply Inline Actions Is it bitfield? Should we be checking `if (AA->getModRefInfo(&BBI, MemoryLocation::get(&SI) & ModRefInfo::Mod)` ? This does correctly work for calls right, not just instructions? lebedev.ri:* 1. Is it bitfield? Should we be checking `if (AA->getModRefInfo(&*BBI, MemoryLocation::get(&SI)…
		lebedev.riUnsubmitted Not Done Reply Inline Actions This looks better, but i don't have prior expirience with `ModRefInfo`, so this still looks suspicious to me - is this conservatively correct? `isModSet()` looks for `MustMod`, which seems stronger than `Mod`? lebedev.ri: This looks better, but i don't have prior expirience with `ModRefInfo`, so this still looks…
		qiucfAuthorUnsubmitted Done Reply Inline Actions hmm.. As the doc/comment says, I think this bit indicates that the mem location may/must be modified, and there's another bit about it's a 'must' or a 'may'. So the three bit should each mean `May Mod Ref`. So `Mod` means it may write and `mustAlias` not found, while `MustMod` means it may write and `mustAlias` found. Here we only need to care if it may write or not (the bit), regardless of alias. Maybe I'm not correct. I found a related commit, rG50db8a, for a look :) qiucf: hmm.. As the doc/comment says, I think this bit indicates that the mem location may/must be…
		return nullptr;

		Type *ElePtrType = NewElement->getType()->getPointerTo();
		Value *ElePtr =
		IC.Builder.CreatePointerCast(SI.getPointerOperand(), ElePtrType);
		Value *GEP =
		IC.Builder.CreateInBoundsGEP(NewElement->getType(), ElePtr, Idx);
		StoreInst *NSI = new StoreInst(NewElement, GEP);
		NSI->copyMetadata(SI, {LLVMContext::MD_nontemporal});
		return NSI;
		}

		jdoerfertUnsubmitted Done Reply Inline Actions The GEP should be `inbounds` and the store should keep the `nontemporal` flag if present on the original. jdoerfert: The GEP should be `inbounds` and the store should keep the `nontemporal` flag if present on the…
		return nullptr;
		spatelUnsubmitted Done Reply Inline Actions Using 'auto' on these lines is not recommended. http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable spatel: Using 'auto' on these lines is not recommended. http://llvm.org/docs/CodingStandards.html#use…
		}
		lebedev.riUnsubmitted Not Done Reply Inline Actions Needs explicit `Align(1)` now. Though i think at least for constant offsets we could deduce the alignment, something close to: commonAlignment( SI.getAlign(), Idx->getUniqueInteger().getLimitedValue() * (NewElement->getType()->getScalarSizeInBits() / 8)) But that can be done in a followup patch, lebedev.ri: Needs explicit `Align(1)` now. Though i think at least for constant offsets we could deduce the…

static bool unpackStoreToAggregate(InstCombiner &IC, StoreInst &SI) {		static bool unpackStoreToAggregate(InstCombiner &IC, StoreInst &SI) {
// FIXME: We could probably with some care handle both volatile and atomic		// FIXME: We could probably with some care handle both volatile and atomic
// stores here but it isn't clear that this is important.		// stores here but it isn't clear that this is important.
if (!SI.isSimple())		if (!SI.isSimple())
return false;		return false;

Value *V = SI.getValueOperand();		Value *V = SI.getValueOperand();
Type *T = V->getType();		Type *T = V->getType();
▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	if (Instruction NewGEPI = replaceGEPIdxWithZero(this, Ptr, SI)) {
Worklist.Add(NewGEPI);		Worklist.Add(NewGEPI);
return &SI;		return &SI;
}		}

// Don't hack volatile/ordered stores.		// Don't hack volatile/ordered stores.
// FIXME: Some bits are legal for ordered atomic stores; needs refactoring.		// FIXME: Some bits are legal for ordered atomic stores; needs refactoring.
if (!SI.isUnordered()) return nullptr;		if (!SI.isUnordered()) return nullptr;

		if (Instruction NewSI = foldSingleElementStore(this, SI, AA))
		return NewSI;

// If the RHS is an alloca with a single use, zapify the store, making the		// If the RHS is an alloca with a single use, zapify the store, making the
// alloca dead.		// alloca dead.
if (Ptr->hasOneUse()) {		if (Ptr->hasOneUse()) {
if (isa<AllocaInst>(Ptr))		if (isa<AllocaInst>(Ptr))
return eraseInstFromFunction(SI);		return eraseInstFromFunction(SI);
if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Ptr)) {		if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Ptr)) {
if (isa<AllocaInst>(GEP->getOperand(0))) {		if (isa<AllocaInst>(GEP->getOperand(0))) {
if (GEP->getOperand(0)->hasOneUse())		if (GEP->getOperand(0)->hasOneUse())
▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/load-insert-store.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -instcombine < %s \| FileCheck %s			; RUN: opt -S -instcombine < %s \| FileCheck %s
				lebedev.riUnsubmitted Not Done Reply Inline Actions I wasn't sure whether or not this is endian-sensitive, alive2 says it's not, but maybe we should be proactive and do ; RUN: opt < %s -instcombine -S -data-layout=e \| FileCheck %s --check-prefix=ALL --check-prefixes=CHECK,LE ; RUN: opt < %s -instcombine -S -data-layout=E \| FileCheck %s --check-prefix=ALL --check-prefixes=CHECK,BE lebedev.ri: I wasn't sure whether or not this is endian-sensitive, alive2 says it's not, but maybe we…
				qiucfAuthorUnsubmitted Done Reply Inline Actions We have few docs explicitly talked about this, and I remember endianness doesn't matter. But adding BE/LE check is fine. Thanks. qiucf: We have few docs explicitly talked about this, and I remember endianness doesn't matter. But…

	define void @insert_store(<16 x i8>* %q, i8 zeroext %s) {			define void @insert_store(<16 x i8>* %q, i8 zeroext %s) {
				lebedev.riUnsubmitted Not Done Reply Inline Actions Add same test but with `<8 x i16>` (so it's not byte-sized)? lebedev.ri: Add same test but with `<8 x i16>` (so it's not byte-sized)?
	; CHECK-LABEL: @insert_store(			; CHECK-LABEL: @insert_store(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <16 x i8>, <16 x i8> [[Q:%.*]], align 16			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds <16 x i8>, <16 x i8> [[Q:%.*]], i64 0, i64 3
	; CHECK-NEXT: [[VECINS:%.]] = insertelement <16 x i8> [[TMP0]], i8 [[S:%.]], i32 3			; CHECK-NEXT: store i8 [[S:%.]], i8 [[TMP0]], align 1
				spatelUnsubmitted Done Reply Inline Actions Need to regenerate the CHECK lines? All tests should have 'inbounds' on the GEP, right? spatel: Need to regenerate the CHECK lines? All tests should have 'inbounds' on the GEP, right?
				qiucfAuthorUnsubmitted Done Reply Inline Actions Sure. In some cases `inbounds` might be simplified, now we have them :-) qiucf: Sure. In some cases `inbounds` might be simplified, now we have them :-)
	; CHECK-NEXT: store <16 x i8> [[VECINS]], <16 x i8>* [[Q]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load <16 x i8>, <16 x i8>* %q			%0 = load <16 x i8>, <16 x i8>* %q
	%vecins = insertelement <16 x i8> %0, i8 %s, i32 3			%vecins = insertelement <16 x i8> %0, i8 %s, i32 3
	store <16 x i8> %vecins, <16 x i8>* %q			store <16 x i8> %vecins, <16 x i8>* %q
	ret void			ret void
	}			}

	define void @single_shuffle_store(<4 x i32>* %a, i32 %b) {			define void @single_shuffle_store(<4 x i32>* %a, i32 %b) {
	; CHECK-LABEL: @single_shuffle_store(			; CHECK-LABEL: @single_shuffle_store(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> [[A:%.*]], align 16			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds <4 x i32>, <4 x i32> [[A:%.*]], i64 0, i64 1
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> [[TMP0]], i32 [[B:%.]], i32 1			; CHECK-NEXT: store i32 [[B:%.]], i32 [[TMP0]], align 4, !nontemporal !0
	; CHECK-NEXT: store <4 x i32> [[TMP1]], <4 x i32>* [[A]], align 16, !nontemporal !0
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load <4 x i32>, <4 x i32>* %a			%0 = load <4 x i32>, <4 x i32>* %a
	%1 = insertelement <4 x i32> %0, i32 %b, i32 1			%1 = insertelement <4 x i32> %0, i32 %b, i32 1
	%2 = shufflevector <4 x i32> %0, <4 x i32> %1, <4 x i32> <i32 0, i32 5, i32 2, i32 3>			%2 = shufflevector <4 x i32> %0, <4 x i32> %1, <4 x i32> <i32 0, i32 5, i32 2, i32 3>
	store <4 x i32> %2, <4 x i32>* %a, !nontemporal !0			store <4 x i32> %2, <4 x i32>* %a, !nontemporal !0
	ret void			ret void
	}			}

				; Should not support volatile or atomic load/stores.
	define void @volatile_update(<16 x i8>* %q, <16 x i8>* %p, i8 zeroext %s) {			define void @volatile_update(<16 x i8>* %q, <16 x i8>* %p, i8 zeroext %s) {
	; CHECK-LABEL: @volatile_update(			; CHECK-LABEL: @volatile_update(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <16 x i8>, <16 x i8> [[Q:%.*]], align 16			; CHECK-NEXT: [[TMP0:%.]] = load <16 x i8>, <16 x i8> [[Q:%.*]], align 16
	; CHECK-NEXT: [[VECINS0:%.]] = insertelement <16 x i8> [[TMP0]], i8 [[S:%.]], i32 3			; CHECK-NEXT: [[VECINS0:%.]] = insertelement <16 x i8> [[TMP0]], i8 [[S:%.]], i32 3
	; CHECK-NEXT: store volatile <16 x i8> [[VECINS0]], <16 x i8>* [[Q]], align 16			; CHECK-NEXT: store volatile <16 x i8> [[VECINS0]], <16 x i8>* [[Q]], align 16
	; CHECK-NEXT: [[TMP1:%.]] = load volatile <16 x i8>, <16 x i8> [[P:%.*]], align 16			; CHECK-NEXT: [[TMP1:%.]] = load volatile <16 x i8>, <16 x i8> [[P:%.*]], align 16
	; CHECK-NEXT: [[VECINS1:%.*]] = insertelement <16 x i8> [[TMP1]], i8 [[S]], i32 1			; CHECK-NEXT: [[VECINS1:%.*]] = insertelement <16 x i8> [[TMP1]], i8 [[S]], i32 1
	Show All 21 Lines
	;			;
	entry:			entry:
	%ld = load <16 x i8>, <16 x i8>* %p			%ld = load <16 x i8>, <16 x i8>* %p
	%ins = insertelement <16 x i8> %ld, i8 %s, i32 3			%ins = insertelement <16 x i8> %ld, i8 %s, i32 3
	store <16 x i8> %ins, <16 x i8>* %q			store <16 x i8> %ins, <16 x i8>* %q
	ret void			ret void
	}			}

				; We can't transform if any instr could modify memory in between.
				; Here p and q may alias, so we can't remove the load.
				; r is impossible to alias with others, so it's safe to transform.
	define void @insert_store_mem_modify(<16 x i8>* %p, <16 x i8>* %q, <16 x i8>* noalias %r, i8 %s) {			define void @insert_store_mem_modify(<16 x i8>* %p, <16 x i8>* %q, <16 x i8>* noalias %r, i8 %s) {
				spatelUnsubmitted Done Reply Inline Actions Please add a test comment to explain the 2 transforms. Something like: ; p and q may alias, so it is not safe to remove the first load. ; r is known to not alias the other pointers, so it is safe to remove the second load. spatel: Please add a test comment to explain the 2 transforms. Something like: ; p and q may alias…
	; CHECK-LABEL: @insert_store_mem_modify(			; CHECK-LABEL: @insert_store_mem_modify(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load <16 x i8>, <16 x i8> [[P:%.*]], align 16			; CHECK-NEXT: [[LD:%.]] = load <16 x i8>, <16 x i8> [[P:%.*]], align 16
	; CHECK-NEXT: store <16 x i8> zeroinitializer, <16 x i8>* [[Q:%.*]], align 16			; CHECK-NEXT: store <16 x i8> zeroinitializer, <16 x i8>* [[Q:%.*]], align 16
	; CHECK-NEXT: [[INS:%.]] = insertelement <16 x i8> [[LD]], i8 [[S:%.]], i32 3			; CHECK-NEXT: [[INS:%.]] = insertelement <16 x i8> [[LD]], i8 [[S:%.]], i32 3
	; CHECK-NEXT: store <16 x i8> [[INS]], <16 x i8>* [[P]], align 16			; CHECK-NEXT: store <16 x i8> [[INS]], <16 x i8>* [[P]], align 16
	; CHECK-NEXT: [[LD2:%.]] = load <16 x i8>, <16 x i8> [[Q]], align 16
	; CHECK-NEXT: store <16 x i8> zeroinitializer, <16 x i8>* [[R:%.*]], align 16			; CHECK-NEXT: store <16 x i8> zeroinitializer, <16 x i8>* [[R:%.*]], align 16
	; CHECK-NEXT: [[INS2:%.*]] = insertelement <16 x i8> [[LD2]], i8 [[S]], i32 7			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds <16 x i8>, <16 x i8> [[Q]], i64 0, i64 7
	; CHECK-NEXT: store <16 x i8> [[INS2]], <16 x i8>* [[Q]], align 16			; CHECK-NEXT: store i8 [[S]], i8* [[TMP0]], align 1
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%ld = load <16 x i8>, <16 x i8>* %p			%ld = load <16 x i8>, <16 x i8>* %p
	store <16 x i8> zeroinitializer, <16 x i8>* %q			store <16 x i8> zeroinitializer, <16 x i8>* %q
	%ins = insertelement <16 x i8> %ld, i8 %s, i32 3			%ins = insertelement <16 x i8> %ld, i8 %s, i32 3
	store <16 x i8> %ins, <16 x i8>* %p			store <16 x i8> %ins, <16 x i8>* %p

	%ld2 = load <16 x i8>, <16 x i8>* %q			%ld2 = load <16 x i8>, <16 x i8>* %q
	store <16 x i8> zeroinitializer, <16 x i8>* %r			store <16 x i8> zeroinitializer, <16 x i8>* %r
	%ins2 = insertelement <16 x i8> %ld2, i8 %s, i32 7			%ins2 = insertelement <16 x i8> %ld2, i8 %s, i32 7
	store <16 x i8> %ins2, <16 x i8>* %q			store <16 x i8> %ins2, <16 x i8>* %q
	ret void			ret void
	}			}

				; Check cases when calls may modify memory
				define void @insert_store_with_call(<16 x i8>* %p, <16 x i8>* %q, i8 %s) {
				; CHECK-LABEL: @insert_store_with_call(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[LD:%.]] = load <16 x i8>, <16 x i8> [[P:%.*]], align 16
				; CHECK-NEXT: call void @maywrite(<16 x i8>* nonnull [[P]])
				; CHECK-NEXT: [[INS:%.]] = insertelement <16 x i8> [[LD]], i8 [[S:%.]], i32 3
				; CHECK-NEXT: store <16 x i8> [[INS]], <16 x i8>* [[P]], align 16
				; CHECK-NEXT: call void @foo()
				; CHECK-NEXT: call void @nowrite(<16 x i8>* nonnull [[P]])
				; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds <16 x i8>, <16 x i8> [[P]], i64 0, i64 7
				; CHECK-NEXT: store i8 [[S]], i8* [[TMP0]], align 1
				; CHECK-NEXT: ret void
				;
				entry:
				%ld = load <16 x i8>, <16 x i8>* %p
				call void @maywrite(<16 x i8>* %p)
				%ins = insertelement <16 x i8> %ld, i8 %s, i32 3
				store <16 x i8> %ins, <16 x i8>* %p
				call void @foo() ; Barrier
				%ld2 = load <16 x i8>, <16 x i8>* %p
				call void @nowrite(<16 x i8>* %p)
				%ins2 = insertelement <16 x i8> %ld2, i8 %s, i32 7
				store <16 x i8> %ins2, <16 x i8>* %p
				ret void
				}

				declare void @foo()
				declare void @maywrite(<16 x i8>*)
				declare void @nowrite(<16 x i8>*) readonly

	!0 = !{}			!0 = !{}

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Convert vector store to scalar store if only one element updatedAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 237891

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

llvm/test/Transforms/InstCombine/load-insert-store.ll

[InstCombine] Convert vector store to scalar store if only one element updated
AbandonedPublic