This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
1/3
DeadStoreElimination.cpp
-
test/Transforms/DeadStoreElimination/
-
Transforms/
-
DeadStoreElimination/
-
MSSA/
-
OverwriteStoreBegin.ll
-
OverwriteStoreEnd.ll
-
MemDepAnalysis/
-
OverwriteStoreBegin.ll

Differential D93530

[DSE] Add support for not aligned begin/end
ClosedPublic

Authored by ebrevnov on Dec 18 2020, 2:43 AM.

Download Raw Diff

Details

Reviewers

fhahn

Commits

rGe94125f05431: [DSE] Add support for not aligned begin/end

Summary

This is an attempt to improve handling of partial overlaps in case of unaligned begin\end.

Existing implementation just bails out if it encounters such cases. Even when it doesn't I believe existing code checking alignment constraints is not quite correct. It tries to ensure alignment of the "later" start/end offset while should be preserving relative alignment between earlier and later start/end.

The idea behind the change is simple. When start/end is not aligned as we wish instead of bailing out let's adjust it as necessary to get desired alignment.

I'll update with performance results as measured by the test-suite...it's still running...

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ebrevnov created this revision.Dec 18 2020, 2:43 AM

Herald added subscribers: jfb, hiraditya. · View Herald TranscriptDec 18 2020, 2:43 AM

ebrevnov requested review of this revision.Dec 18 2020, 2:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 18 2020, 2:43 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

ebrevnov edited the summary of this revision. (Show Details)Dec 18 2020, 2:59 AM

ebrevnov added a reviewer: fhahn.

Harbormaster completed remote builds in B82928: Diff 312735.Dec 18 2020, 3:23 AM

Initial update

Harbormaster completed remote builds in B82936: Diff 312748.Dec 18 2020, 4:19 AM

I had already tried to measure performance with the test-suite previously with out success. This time again I observe big variation. I'm using dedicated performance machine which runs only my process. I've build test-suite as follows:

cmake -DTEST_SUITE_BENCHMARKING_ONLY=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -C../cmake/caches/O3.cmake ..; make -j16

And ran the tests 3 times. Here is the results of comparison first vs second vs third runs.

> utils/compare.py --merge-average  --filter-short ~/results.coffeelake.orig1.json  vs ~/results.coffeelake.orig2.json 
Tests: 736
Short Running: 197 (filtered out)
Remaining: 539
Metric: exec_time

Program                                        lhs     rhs     diff  
                                                                     
 test-suite....test:BM_MULADDSUB_LAMBDA/5001    10.18   12.39  21.8% 
 test-suite...da.test:BM_PIC_1D_LAMBDA/44217   514.08  436.69  -15.1%
 test-suite...sCRaw.test:BM_HYDRO_2D_RAW/171     9.97   11.12  11.5% 
 test-suite...bda.test:BM_PIC_1D_LAMBDA/5001    53.98   48.92  -9.4% 
 test-suite...lcalsCRaw.test:BM_ADI_RAW/5001    90.87   84.16  -7.4% 
 test-suite...XRayFDRMultiThreaded/threads:2   142.53  151.64   6.4% 
 test-suite...sARaw.test:BM_VOL3D_CALC_RAW/2     2.57    2.43  -5.6% 
 test-suite....test:BENCHMARK_HARRIS/256/256   341.95  359.82   5.2% 
 test-suite...a/kernels/doitgen/doitgen.test     0.67    0.70   3.8% 
 test-suite.../Applications/spiff/spiff.test     1.12    1.08  -3.5% 
 test-suite...g/correlation/correlation.test     1.27    1.23  -3.2% 
 test-suite....test:BENCHMARK_HARRIS/512/512   1854.99 1913.18  3.1% 
 test-suite...test:BM_MULADDSUB_LAMBDA/44217    95.43   98.36   3.1% 
 test-suite...aw.test:BM_MULADDSUB_RAW/44217    95.66   92.73  -3.1% 
 test-suite...CRaw.test:BM_MAT_X_MAT_RAW/171   105.88  102.78  -2.9% 
 Geomean difference                                             nan% 
                 lhs            rhs        diff
count  538.000000     539.000000     538.000000
mean   1554.916420    1548.375947   -0.000238  
std    13204.345562   13143.680859   0.015499  
min    0.610700       0.609000      -0.150543  
25%    2.729454       2.729260      -0.000955  
50%    92.554771      90.861409      0.000000  
75%    555.904231     555.793696     0.000634  
max    208982.569333  207906.284333  0.217786

>utils/compare.py --merge-average  --filter-short ~/results.coffeelake.orig2.json  vs ~/results.coffeelake.orig3.json 
Tests: 736
Short Running: 197 (filtered out)
Remaining: 539
Metric: exec_time

Program                                        lhs      rhs      diff  
                                                                       
 test-suite...sCRaw.test:BM_PIC_1D_RAW/44217   433.62   519.59   19.8% 
 test-suite....test:BM_MULADDSUB_LAMBDA/5001    12.39    10.36   -16.4%
 test-suite...CHMARK_ANISTROPIC_DIFFUSION/64   2056.09  2369.92  15.3% 
 test-suite...HMARK_ANISTROPIC_DIFFUSION/128   8850.38  10172.76 14.9% 
 test-suite...HMARK_ANISTROPIC_DIFFUSION/256   36797.14 42257.00 14.8% 
 test-suite...CHMARK_ANISTROPIC_DIFFUSION/32   457.93   523.86   14.4% 
 test-suite....test:BENCHMARK_HARRIS/256/256   359.82   310.62   -13.7%
 test-suite...lsCRaw.test:BM_PIC_1D_RAW/5001    48.22    54.78   13.6% 
 test-suite...sCRaw.test:BM_HYDRO_2D_RAW/171    11.12    11.98    7.7% 
 test-suite...flt/LoopRestructuring-flt.test     2.64     2.79    5.5% 
 test-suite...XRayFDRMultiThreaded/threads:2   151.64   143.43   -5.4% 
 test-suite...CRaw.test:BM_HYDRO_2D_RAW/5001   304.73   320.81    5.3% 
 test-suite...mbda.test:BM_INIT3_LAMBDA/5001     9.20     9.68    5.2% 
 test-suite.../Applications/spiff/spiff.test     1.08     1.11    2.8% 
 test-suite...++/Shootout-C++-ackermann.test     0.63     0.65    2.8% 
 Geomean difference                                               nan% 
                 lhs            rhs        diff
count  539.000000     538.000000     538.000000
mean   1548.375947    1552.758096    0.001124  
std    13143.680859   12998.591022   0.020444  
min    0.609000       0.609400      -0.164172  
25%    2.729260       2.770800      -0.000865  
50%    90.861409      91.507016     -0.000025  
75%    555.793696     555.931031     0.000464  
max    207906.284333  204187.808667  0.198265

Is this expected? What I'm doing wrong?

Thanks
Evgeniy

ping

Just realized that I missed this one.

With respect to the performance numbers you posted earlier, I am not sure if that is caused by the patch. For SPEC2000/SPEC2006 & MultiSource with -O3 -flto on X86, only 12 benchmarks have binary changes and they are all very minor. The total number of stores is as below (that's with --filter-hash --all). What's a bit curious is that in 2 cases we are left with a tiny number of more stores, but the number of eliminated stores stays the same.

It seems like we do not have a stats counter to measure the number of times we successfully shortened operations. The patch basically looks good to me, but it would be good to understand where the additional stores are coming from.

Same hash: 225 (filtered out)
Remaining: 12
Metric: dse.NumRemainingStores

Program                                        base     patch    diff
 test-suite...006/450.soplex/450.soplex.test   8700.00  8701.00   0.0%
 test-suite...006/447.dealII/447.dealII.test   90533.00 90535.00  0.0%
 test-suite.../CINT2000/176.gcc/176.gcc.test   18291.00 18291.00  0.0%
 test-suite.../CINT2006/403.gcc/403.gcc.test   36093.00 36093.00  0.0%
 test-suite...nal/skidmarks10/skidmarks.test   1161.00  1161.00   0.0%
 test-suite...lications/ClamAV/clamscan.test   10078.00 10078.00  0.0%
 test-suite...lications/viterbi/viterbi.test   102.00   102.00    0.0%
 test-suite...marks/7zip/7zip-benchmark.test   35503.00 35503.00  0.0%
 test-suite...oxyApps-C/RSBench/rsbench.test   145.00   145.00    0.0%
 test-suite...oxyApps-C/XSBench/XSBench.test   120.00   120.00    0.0%
 test-suite...nsumer-jpeg/consumer-jpeg.test   5972.00  5972.00   0.0%
 test-suite...nsumer-lame/consumer-lame.test   4501.00  4501.00   0.0%

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1095	IIUC this is the adjustment to keep `ToRemoveStart` aligned to `PrefAlign`, right? Might be worth a comment be worth a comment.
1135	perhaps it would be worth adjusting the message to make it clear that the second number is the size we shorten by?

Florian,

I appreciate your time to review and experimenting with the patch. I agree it's kind of unexpected to see less stores get removed. Would it be possible for you to share an IR where that happens (not reduced one is fine)? I'm asking for this because I don't have access to those SPECs.

Thanks
Evgeniy

In D93530#2571231, @ebrevnov wrote:

Florian,

I appreciate your time to review and experimenting with the patch. I agree it's kind of unexpected to see less stores get removed. Would it be possible for you to share an IR where that happens (not reduced one is fine)? I'm asking for this because I don't have access to those SPECs.

Unfortunately I could not really reproduce this without LTO and I don't want to hold this up until I have more time to extract a reproducer from an LTO build. Give that it is a tiny number of changes I don't think that's necessary and the improvements in the patch in terms of readability are more than enough to offset them. It could also be the case where we now choose a more profitable offset.

LGTM, with my remaining minor suggestions (optional) with respect to the debug message.

This revision is now accepted and ready to land.Mar 2 2021, 3:40 AM

ebrevnov added inline comments.Mar 3 2021, 1:03 AM

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
1135	I agree. This message confused me when I met it in the dump. Other places uses (start,end] notation to represent removed interval. I think it makes sense to unify this one with other places.

Update

Harbormaster completed remote builds in B91772: Diff 327727.Mar 3 2021, 6:32 AM

Closed by commit rGe94125f05431: [DSE] Add support for not aligned begin/end (authored by Evgeniy Brevnov <ybrevnov@azul.com>). · Explain WhyMar 3 2021, 9:24 PM

This revision was automatically updated to reflect the committed changes.

Evgeniy Brevnov <ybrevnov@azul.com> added a commit: rGe94125f05431: [DSE] Add support for not aligned begin/end.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

88 lines

test/

Transforms/

DeadStoreElimination/

MSSA/

OverwriteStoreBegin.ll

66 lines

OverwriteStoreEnd.ll

58 lines

MemDepAnalysis/

OverwriteStoreBegin.ll

14 lines

Diff 328014

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp

Show First 20 Lines • Show All 1,061 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator BBI = BB.end(); BBI != BB.begin(); ){
// to find anything else to process.		// to find anything else to process.
if (DeadStackObjects.empty())		if (DeadStackObjects.empty())
break;		break;
}		}

return MadeChange;		return MadeChange;
}		}

static bool tryToShorten(Instruction *EarlierWrite, int64_t &EarlierOffset,		static bool tryToShorten(Instruction *EarlierWrite, int64_t &EarlierStart,
uint64_t &EarlierSize, int64_t LaterOffset,		uint64_t &EarlierSize, int64_t LaterStart,
uint64_t LaterSize, bool IsOverwriteEnd) {		uint64_t LaterSize, bool IsOverwriteEnd) {
// TODO: base this on the target vector size so that if the earlier
// store was too small to get vector writes anyway then its likely
// a good idea to shorten it
// Power of 2 vector writes are probably always a bad idea to optimize
// as any store/memset/memcpy is likely using vector instructions so
// shortening it to not vector size is likely to be slower
auto *EarlierIntrinsic = cast<AnyMemIntrinsic>(EarlierWrite);		auto *EarlierIntrinsic = cast<AnyMemIntrinsic>(EarlierWrite);
unsigned EarlierWriteAlign = EarlierIntrinsic->getDestAlignment();		Align PrefAlign = EarlierIntrinsic->getDestAlign().valueOrOne();
if (!IsOverwriteEnd)
LaterOffset = int64_t(LaterOffset + LaterSize);

if (!(isPowerOf2_64(LaterOffset) && EarlierWriteAlign <= LaterOffset) &&		// We assume that memet/memcpy operates in chunks of the "largest" native
!((EarlierWriteAlign != 0) && LaterOffset % EarlierWriteAlign == 0))		// type size and aligned on the same value. That means optimal start and size
		// of memset/memcpy should be modulo of preferred alignment of that type. That
		// is it there is no any sense in trying to reduce store size any further
		// since any "extra" stores comes for free anyway.
		// On the other hand, maximum alignment we can achieve is limited by alignment
		// of initial store.

		// TODO: Limit maximum alignment by preferred (or abi?) alignment of the
		// "largest" native type.
		// Note: What is the proper way to get that value?
		// Should TargetTransformInfo::getRegisterBitWidth be used or anything else?
		// PrefAlign = std::min(DL.getPrefTypeAlign(LargestType), PrefAlign);

		int64_t ToRemoveStart = 0;
		uint64_t ToRemoveSize = 0;
		// Compute start and size of the region to remove. Make sure 'PrefAlign' is
		// maintained on the remaining store.
		if (IsOverwriteEnd) {
		// Calculate required adjustment for 'LaterStart'in order to keep remaining
		fhahnUnsubmitted Not Done Reply Inline Actions IIUC this is the adjustment to keep `ToRemoveStart` aligned to `PrefAlign`, right? Might be worth a comment be worth a comment. fhahn: IIUC this is the adjustment to keep `ToRemoveStart` aligned to `PrefAlign`, right? Might be…
		// store size aligned on 'PerfAlign'.
		uint64_t Off =
		offsetToAlignment(uint64_t(LaterStart - EarlierStart), PrefAlign);
		ToRemoveStart = LaterStart + Off;
		if (EarlierSize <= uint64_t(ToRemoveStart - EarlierStart))
return false;		return false;
		ToRemoveSize = EarlierSize - uint64_t(ToRemoveStart - EarlierStart);
		} else {
		ToRemoveStart = EarlierStart;
		assert(LaterSize >= uint64_t(EarlierStart - LaterStart) &&
		"Not overlapping accesses?");
		ToRemoveSize = LaterSize - uint64_t(EarlierStart - LaterStart);
		// Calculate required adjustment for 'ToRemoveSize'in order to keep
		// start of the remaining store aligned on 'PerfAlign'.
		uint64_t Off = offsetToAlignment(ToRemoveSize, PrefAlign);
		if (Off != 0) {
		if (ToRemoveSize <= (PrefAlign.value() - Off))
		return false;
		ToRemoveSize -= PrefAlign.value() - Off;
		}
		assert(isAligned(PrefAlign, ToRemoveSize) &&
		"Should preserve selected alignment");
		}

int64_t NewLength = IsOverwriteEnd		assert(ToRemoveSize > 0 && "Shouldn't reach here if nothing to remove");
? LaterOffset - EarlierOffset		assert(EarlierSize > ToRemoveSize && "Can't remove more than original size");
: EarlierSize - (LaterOffset - EarlierOffset);

		uint64_t NewSize = EarlierSize - ToRemoveSize;
if (auto *AMI = dyn_cast<AtomicMemIntrinsic>(EarlierWrite)) {		if (auto *AMI = dyn_cast<AtomicMemIntrinsic>(EarlierWrite)) {
// When shortening an atomic memory intrinsic, the newly shortened		// When shortening an atomic memory intrinsic, the newly shortened
// length must remain an integer multiple of the element size.		// length must remain an integer multiple of the element size.
const uint32_t ElementSize = AMI->getElementSizeInBytes();		const uint32_t ElementSize = AMI->getElementSizeInBytes();
if (0 != NewLength % ElementSize)		if (0 != NewSize % ElementSize)
return false;		return false;
}		}

LLVM_DEBUG(dbgs() << "DSE: Remove Dead Store:\n OW "		LLVM_DEBUG(dbgs() << "DSE: Remove Dead Store:\n OW "
<< (IsOverwriteEnd ? "END" : "BEGIN") << ": "		<< (IsOverwriteEnd ? "END" : "BEGIN") << ": "
<< *EarlierWrite << "\n KILLER (offset " << LaterOffset		<< *EarlierWrite << "\n KILLER [" << ToRemoveStart << ", "
<< ", " << EarlierSize << ")\n");		<< int64_t(ToRemoveStart + ToRemoveSize) << ")\n");
		fhahnUnsubmitted Not Done Reply Inline Actions perhaps it would be worth adjusting the message to make it clear that the second number is the size we shorten by? fhahn: perhaps it would be worth adjusting the message to make it clear that the second number is the…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions I agree. This message confused me when I met it in the dump. Other places uses (start,end] notation to represent removed interval. I think it makes sense to unify this one with other places. ebrevnov: I agree. This message confused me when I met it in the dump. Other places uses (start,end]…

Value *EarlierWriteLength = EarlierIntrinsic->getLength();		Value *EarlierWriteLength = EarlierIntrinsic->getLength();
Value *TrimmedLength =		Value *TrimmedLength =
ConstantInt::get(EarlierWriteLength->getType(), NewLength);		ConstantInt::get(EarlierWriteLength->getType(), NewSize);
EarlierIntrinsic->setLength(TrimmedLength);		EarlierIntrinsic->setLength(TrimmedLength);
		EarlierIntrinsic->setDestAlignment(PrefAlign);

EarlierSize = NewLength;
if (!IsOverwriteEnd) {		if (!IsOverwriteEnd) {
int64_t OffsetMoved = (LaterOffset - EarlierOffset);
Value *Indices[1] = {		Value *Indices[1] = {
ConstantInt::get(EarlierWriteLength->getType(), OffsetMoved)};		ConstantInt::get(EarlierWriteLength->getType(), ToRemoveSize)};
GetElementPtrInst *NewDestGEP = GetElementPtrInst::CreateInBounds(		GetElementPtrInst *NewDestGEP = GetElementPtrInst::CreateInBounds(
EarlierIntrinsic->getRawDest()->getType()->getPointerElementType(),		EarlierIntrinsic->getRawDest()->getType()->getPointerElementType(),
EarlierIntrinsic->getRawDest(), Indices, "", EarlierWrite);		EarlierIntrinsic->getRawDest(), Indices, "", EarlierWrite);
NewDestGEP->setDebugLoc(EarlierIntrinsic->getDebugLoc());		NewDestGEP->setDebugLoc(EarlierIntrinsic->getDebugLoc());
EarlierIntrinsic->setDest(NewDestGEP);		EarlierIntrinsic->setDest(NewDestGEP);
EarlierOffset = EarlierOffset + OffsetMoved;
}		}

		// Finally update start and size of earlier access.
		if (!IsOverwriteEnd)
		EarlierStart += ToRemoveSize;
		EarlierSize = NewSize;

return true;		return true;
}		}

static bool tryToShortenEnd(Instruction *EarlierWrite,		static bool tryToShortenEnd(Instruction *EarlierWrite,
OverlapIntervalsTy &IntervalMap,		OverlapIntervalsTy &IntervalMap,
int64_t &EarlierStart, uint64_t &EarlierSize) {		int64_t &EarlierStart, uint64_t &EarlierSize) {
if (IntervalMap.empty() \|\| !isShortenableAtTheEnd(EarlierWrite))		if (IntervalMap.empty() \|\| !isShortenableAtTheEnd(EarlierWrite))
return false;		return false;
▲ Show 20 Lines • Show All 1,637 Lines • Show Last 20 Lines

llvm/test/Transforms/DeadStoreElimination/MSSA/OverwriteStoreBegin.ll

Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines
entry:		entry:
%p3 = bitcast i32* %p to i8*		%p3 = bitcast i32* %p to i8*
call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 4 %p3, i8 0, i64 32, i32 4)		call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 4 %p3, i8 0, i64 32, i32 4)
%p4 = bitcast i32* %p to i16*		%p4 = bitcast i32* %p to i16*
store atomic i16 1, i16* %p4 unordered, align 4		store atomic i16 1, i16* %p4 unordered, align 4
ret void		ret void
}		}

define void @dontwrite2to9(i32* nocapture %p) {		define void @write2to10(i32* nocapture %p) {
; CHECK-LABEL: @dontwrite2to9(		; CHECK-LABEL: @write2to10(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1		; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*		; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*
; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[P3]], i8 0, i64 32, i1 false)		; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[P3]], i64 4
		; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[TMP0]], i8 0, i64 28, i1 false)
; CHECK-NEXT: [[P4:%.]] = bitcast i32 [[P]] to i16*		; CHECK-NEXT: [[P4:%.]] = bitcast i32 [[P]] to i16*
; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i16, i16 [[P4]], i64 1		; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i16, i16 [[P4]], i64 1
; CHECK-NEXT: [[P5:%.]] = bitcast i16 [[ARRAYIDX2]] to i64*		; CHECK-NEXT: [[P5:%.]] = bitcast i16 [[ARRAYIDX2]] to i64*
; CHECK-NEXT: store i64 1, i64* [[P5]], align 8		; CHECK-NEXT: store i64 1, i64* [[P5]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%arrayidx0 = getelementptr inbounds i32, i32* %p, i64 1		%arrayidx0 = getelementptr inbounds i32, i32* %p, i64 1
%p3 = bitcast i32* %arrayidx0 to i8*		%p3 = bitcast i32* %arrayidx0 to i8*
call void @llvm.memset.p0i8.i64(i8* align 4 %p3, i8 0, i64 32, i1 false)		call void @llvm.memset.p0i8.i64(i8* align 4 %p3, i8 0, i64 32, i1 false)
%p4 = bitcast i32* %p to i16*		%p4 = bitcast i32* %p to i16*
%arrayidx2 = getelementptr inbounds i16, i16* %p4, i64 1		%arrayidx2 = getelementptr inbounds i16, i16* %p4, i64 1
%p5 = bitcast i16* %arrayidx2 to i64*		%p5 = bitcast i16* %arrayidx2 to i64*
store i64 1, i64* %p5, align 8		store i64 1, i64* %p5, align 8
ret void		ret void
}		}

define void @dontwrite2to9_atomic(i32* nocapture %p) {		define void @write2to10_atomic(i32* nocapture %p) {
; CHECK-LABEL: @dontwrite2to9_atomic(		; CHECK-LABEL: @write2to10_atomic(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1		; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*		; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*
; CHECK-NEXT: call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 4 [[P3]], i8 0, i64 32, i32 4)		; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[P3]], i64 4
		; CHECK-NEXT: call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 4 [[TMP0]], i8 0, i64 28, i32 4)
; CHECK-NEXT: [[P4:%.]] = bitcast i32 [[P]] to i16*		; CHECK-NEXT: [[P4:%.]] = bitcast i32 [[P]] to i16*
; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i16, i16 [[P4]], i64 1		; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i16, i16 [[P4]], i64 1
; CHECK-NEXT: [[P5:%.]] = bitcast i16 [[ARRAYIDX2]] to i64*		; CHECK-NEXT: [[P5:%.]] = bitcast i16 [[ARRAYIDX2]] to i64*
; CHECK-NEXT: store atomic i64 1, i64* [[P5]] unordered, align 8		; CHECK-NEXT: store atomic i64 1, i64* [[P5]] unordered, align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%arrayidx0 = getelementptr inbounds i32, i32* %p, i64 1		%arrayidx0 = getelementptr inbounds i32, i32* %p, i64 1
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	entry:
store i64 1, i64* %base64_1, align 8		store i64 1, i64* %base64_1, align 8
store atomic i64 2, i64* %base64_0 unordered, align 8		store atomic i64 2, i64* %base64_0 unordered, align 8
ret void		ret void
}		}

declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) nounwind		declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i1) nounwind
declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* nocapture, i8, i64, i32) nounwind		declare void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* nocapture, i8, i64, i32) nounwind

		define void @ow_begin_align1(i8* nocapture %p) {
		; CHECK-LABEL: @ow_begin_align1(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i64 1
		; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 7
		; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 1 [[TMP0]], i8 0, i64 25, i1 false)
		; CHECK-NEXT: [[P2:%.]] = bitcast i8 [[P]] to i64*
		; CHECK-NEXT: store i64 1, i64* [[P2]], align 1
		; CHECK-NEXT: ret void
		;
		entry:
		%p1 = getelementptr inbounds i8, i8* %p, i64 1
		call void @llvm.memset.p0i8.i64(i8* align 1 %p1, i8 0, i64 32, i1 false)
		%p2 = bitcast i8* %p to i64*
		store i64 1, i64* %p2, align 1
		ret void
		}

		define void @ow_end_align4(i8* nocapture %p) {
		; CHECK-LABEL: @ow_end_align4(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i64 1
		; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 4
		; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[TMP0]], i8 0, i64 28, i1 false)
		; CHECK-NEXT: [[P2:%.]] = bitcast i8 [[P]] to i64*
		; CHECK-NEXT: store i64 1, i64* [[P2]], align 1
		; CHECK-NEXT: ret void
		;
		entry:
		%p1 = getelementptr inbounds i8, i8* %p, i64 1
		call void @llvm.memset.p0i8.i64(i8* align 4 %p1, i8 0, i64 32, i1 false)
		%p2 = bitcast i8* %p to i64*
		store i64 1, i64* %p2, align 1
		ret void
		}

		define void @ow_end_align8(i8* nocapture %p) {
		; CHECK-LABEL: @ow_end_align8(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i64 1
		; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 8 [[P1]], i8 0, i64 32, i1 false)
		; CHECK-NEXT: [[P2:%.]] = bitcast i8 [[P]] to i64*
		; CHECK-NEXT: store i64 1, i64* [[P2]], align 1
		; CHECK-NEXT: ret void
		;
		entry:
		%p1 = getelementptr inbounds i8, i8* %p, i64 1
		call void @llvm.memset.p0i8.i64(i8* align 8 %p1, i8 0, i64 32, i1 false)
		%p2 = bitcast i8* %p to i64*
		store i64 1, i64* %p2, align 1
		ret void
		}

llvm/test/Transforms/DeadStoreElimination/MSSA/OverwriteStoreEnd.ll

Show First 20 Lines • Show All 382 Lines • ▼ Show 20 Lines	entry:

%base64_2 = getelementptr inbounds i64, i64* %P, i64 2		%base64_2 = getelementptr inbounds i64, i64* %P, i64 2
%base64_3 = getelementptr inbounds i64, i64* %P, i64 3		%base64_3 = getelementptr inbounds i64, i64* %P, i64 3

store atomic i64 3, i64* %base64_2 unordered, align 8		store atomic i64 3, i64* %base64_2 unordered, align 8
store i64 3, i64* %base64_3, align 8		store i64 3, i64* %base64_3, align 8
ret void		ret void
}		}

		define void @ow_end_align1(i8* nocapture %p) {
		; CHECK-LABEL: @ow_end_align1(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i64 1
		; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 1 [[P1]], i8 0, i64 27, i1 false)
		; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 27
		; CHECK-NEXT: [[P2_I64:%.]] = bitcast i8 [[P2]] to i64*
		; CHECK-NEXT: store i64 1, i64* [[P2_I64]], align 1
		; CHECK-NEXT: ret void
		;
		entry:
		%p1 = getelementptr inbounds i8, i8* %p, i64 1
		call void @llvm.memset.p0i8.i64(i8* align 1 %p1, i8 0, i64 32, i1 false)
		%p2 = getelementptr inbounds i8, i8* %p1, i64 27
		%p2.i64 = bitcast i8* %p2 to i64*
		store i64 1, i64* %p2.i64, align 1
		ret void
		}

		define void @ow_end_align4(i8* nocapture %p) {
		; CHECK-LABEL: @ow_end_align4(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i64 1
		; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[P1]], i8 0, i64 28, i1 false)
		; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 27
		; CHECK-NEXT: [[P2_I64:%.]] = bitcast i8 [[P2]] to i64*
		; CHECK-NEXT: store i64 1, i64* [[P2_I64]], align 1
		; CHECK-NEXT: ret void
		;
		entry:
		%p1 = getelementptr inbounds i8, i8* %p, i64 1
		call void @llvm.memset.p0i8.i64(i8* align 4 %p1, i8 0, i64 32, i1 false)
		%p2 = getelementptr inbounds i8, i8* %p1, i64 27
		%p2.i64 = bitcast i8* %p2 to i64*
		store i64 1, i64* %p2.i64, align 1
		ret void
		}

		define void @ow_end_align8(i8* nocapture %p) {
		; CHECK-LABEL: @ow_end_align8(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[P1:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i64 1
		; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 8 [[P1]], i8 0, i64 32, i1 false)
		; CHECK-NEXT: [[P2:%.]] = getelementptr inbounds i8, i8 [[P1]], i64 27
		; CHECK-NEXT: [[P2_I64:%.]] = bitcast i8 [[P2]] to i64*
		; CHECK-NEXT: store i64 1, i64* [[P2_I64]], align 1
		; CHECK-NEXT: ret void
		;
		entry:
		%p1 = getelementptr inbounds i8, i8* %p, i64 1
		call void @llvm.memset.p0i8.i64(i8* align 8 %p1, i8 0, i64 32, i1 false)
		%p2 = getelementptr inbounds i8, i8* %p1, i64 27
		%p2.i64 = bitcast i8* %p2 to i64*
		store i64 1, i64* %p2.i64, align 1
		ret void
		}

llvm/test/Transforms/DeadStoreElimination/MemDepAnalysis/OverwriteStoreBegin.ll

	Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines
	entry:			entry:
	%p3 = bitcast i32* %p to i8*			%p3 = bitcast i32* %p to i8*
	call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 4 %p3, i8 0, i64 32, i32 4)			call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 4 %p3, i8 0, i64 32, i32 4)
	%p4 = bitcast i32* %p to i16*			%p4 = bitcast i32* %p to i16*
	store atomic i16 1, i16* %p4 unordered, align 4			store atomic i16 1, i16* %p4 unordered, align 4
	ret void			ret void
	}			}

	define void @dontwrite2to9(i32* nocapture %p) {			define void @write2to10(i32* nocapture %p) {
	; CHECK-LABEL: @dontwrite2to9(			; CHECK-LABEL: @write2to10(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
	; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*			; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*
	; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[P3]], i8 0, i64 32, i1 false)			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[P3]], i64 4
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[TMP0]], i8 0, i64 28, i1 false)
	; CHECK-NEXT: [[P4:%.]] = bitcast i32 [[P]] to i16*			; CHECK-NEXT: [[P4:%.]] = bitcast i32 [[P]] to i16*
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i16, i16 [[P4]], i64 1			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i16, i16 [[P4]], i64 1
	; CHECK-NEXT: [[P5:%.]] = bitcast i16 [[ARRAYIDX2]] to i64*			; CHECK-NEXT: [[P5:%.]] = bitcast i16 [[ARRAYIDX2]] to i64*
	; CHECK-NEXT: store i64 1, i64* [[P5]], align 8			; CHECK-NEXT: store i64 1, i64* [[P5]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%arrayidx0 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx0 = getelementptr inbounds i32, i32* %p, i64 1
	%p3 = bitcast i32* %arrayidx0 to i8*			%p3 = bitcast i32* %arrayidx0 to i8*
	call void @llvm.memset.p0i8.i64(i8* align 4 %p3, i8 0, i64 32, i1 false)			call void @llvm.memset.p0i8.i64(i8* align 4 %p3, i8 0, i64 32, i1 false)
	%p4 = bitcast i32* %p to i16*			%p4 = bitcast i32* %p to i16*
	%arrayidx2 = getelementptr inbounds i16, i16* %p4, i64 1			%arrayidx2 = getelementptr inbounds i16, i16* %p4, i64 1
	%p5 = bitcast i16* %arrayidx2 to i64*			%p5 = bitcast i16* %arrayidx2 to i64*
	store i64 1, i64* %p5, align 8			store i64 1, i64* %p5, align 8
	ret void			ret void
	}			}

	define void @dontwrite2to9_atomic(i32* nocapture %p) {			define void @write2to10_atomic(i32* nocapture %p) {
	; CHECK-LABEL: @dontwrite2to9_atomic(			; CHECK-LABEL: @write2to10_atomic(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX0:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
	; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*			; CHECK-NEXT: [[P3:%.]] = bitcast i32 [[ARRAYIDX0]] to i8*
	; CHECK-NEXT: call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 4 [[P3]], i8 0, i64 32, i32 4)			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i8, i8 [[P3]], i64 4
				; CHECK-NEXT: call void @llvm.memset.element.unordered.atomic.p0i8.i64(i8* align 4 [[TMP0]], i8 0, i64 28, i32 4)
	; CHECK-NEXT: [[P4:%.]] = bitcast i32 [[P]] to i16*			; CHECK-NEXT: [[P4:%.]] = bitcast i32 [[P]] to i16*
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i16, i16 [[P4]], i64 1			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i16, i16 [[P4]], i64 1
	; CHECK-NEXT: [[P5:%.]] = bitcast i16 [[ARRAYIDX2]] to i64*			; CHECK-NEXT: [[P5:%.]] = bitcast i16 [[ARRAYIDX2]] to i64*
	; CHECK-NEXT: store atomic i64 1, i64* [[P5]] unordered, align 8			; CHECK-NEXT: store atomic i64 1, i64* [[P5]] unordered, align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%arrayidx0 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx0 = getelementptr inbounds i32, i32* %p, i64 1
	▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines