This is an archive of the discontinued LLVM Phabricator instance.

[LoopDataPrefetch] Don't prefetch past a known total trip count
AbandonedPublic

Authored by jonpa on Oct 1 2019, 9:39 AM.

Download Raw Diff

Details

Reviewers

uweigand
hfinkel
anemet
jfb
efriedma
fhahn

Summary

Compare iterations ahead against a constant trip count and do not emit any prefetches in case it seems that they address memory not accessed in the loop.

Diff Detail

Event Timeline

jonpa created this revision.Oct 1 2019, 9:39 AM

Herald added a subscriber: dexonsmith. · View Herald TranscriptOct 1 2019, 9:39 AM

I noticed that the output of Loop Strength Reduce differs with this simple patch, and the diff includes actual instructions and opcodes, and this is when LoopDataPrefetch does not emit any prefetches.

It seems that the call SE->getSmallConstantTripCount(L) changes data structures so that when LSR is later run it outputs different code in the preheader of the loop:

master <> patched
<     %xtraiter144 = and i64 %1, 3
17a17,19
>     %2 = trunc i64 %1 to i8
>     %3 = trunc i8 %2 to i2
>     %4 = zext i2 %3 to i64

I am not sure exactly why or what should be done. However, if I remove the AU.addPreserved<ScalarEvolutionWrapperPass>(); from LoopDataPrefetch, then this problem disappears.

Filed a bugreport for ScalarEvolution relating to this as this is an issue also without this particular patch: https://bugs.llvm.org/show_bug.cgi?id=43545

Use getSmallConstantMaxTripCount() instead of getSmallConstantTripCount() to catch a few more cases.

As discussed before, it seems that the call to SE->getSmallConstantTripCount(L) changes data structures which affects later passes like LSR. I wonder if this would have to stop us from committing this patch? If the call to getSmallConstantTripCount() causes SE to update itself, then LSR would actually make better decisions, or?

(On SPEC 2006, 8 files change with getSmallConstantTripCount(), and with getSmallConstantMaxTripCount() 2 more (10 in total). This is just making the call without changing anything else.)

With this patch I see 15 less prefetch instructions emitted on SPEC 2006 / SystemZ.

We could also check if LoopConstantTripCount == 1, and return if that's the case, but I'm not sure if that's useful. It might help avoid the problem encountered at https://bugs.llvm.org/show_bug.cgi?id=43679.

Hopefully someone more familiar with LoopDataPrefetch can review, but this looks reasonable.

For the LSR thing, see https://reviews.llvm.org/D68592 .

This change is included instead in https://reviews.llvm.org/D70228.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LoopDataPrefetch.cpp

4 lines

test/

CodeGen/

SystemZ/

prefetch-02.ll

33 lines

Diff 225623

lib/Transforms/Scalar/LoopDataPrefetch.cpp

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	bool LoopDataPrefetch::runOnLoop(Loop *L) {

unsigned ItersAhead = getPrefetchDistance() / LoopSize;		unsigned ItersAhead = getPrefetchDistance() / LoopSize;
if (!ItersAhead)		if (!ItersAhead)
ItersAhead = 1;		ItersAhead = 1;

if (ItersAhead > getMaxPrefetchIterationsAhead())		if (ItersAhead > getMaxPrefetchIterationsAhead())
return MadeChange;		return MadeChange;

		unsigned LoopConstantTripCount = SE->getSmallConstantMaxTripCount(L);
		if (LoopConstantTripCount && LoopConstantTripCount < ItersAhead)
		return MadeChange;

LLVM_DEBUG(dbgs() << "Prefetching " << ItersAhead		LLVM_DEBUG(dbgs() << "Prefetching " << ItersAhead
<< " iterations ahead (loop size: " << LoopSize << ") in "		<< " iterations ahead (loop size: " << LoopSize << ") in "
<< L->getHeader()->getParent()->getName() << ": " << *L);		<< L->getHeader()->getParent()->getName() << ": " << *L);

SmallVector<std::pair<Instruction , const SCEVAddRecExpr >, 16> PrefLoads;		SmallVector<std::pair<Instruction , const SCEVAddRecExpr >, 16> PrefLoads;
for (const auto BB : L->blocks()) {		for (const auto BB : L->blocks()) {
for (auto &I : *BB) {		for (auto &I : *BB) {
Value *PtrValue;		Value *PtrValue;
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/prefetch-02.ll

This file was added.

				; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z14 -prefetch-distance=100 \
				; RUN: -stop-after=loop-data-prefetch \| FileCheck %s -check-prefix=FAR-PREFETCH
				; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z14 -prefetch-distance=50 \
				; RUN: -stop-after=loop-data-prefetch \| FileCheck %s -check-prefix=NEAR-PREFETCH
				;
				; Check that prefetches are not emitted when the known constant trip count of
				; the loop is smaller than the estimated "iterations ahead" of the prefetch.
				;
				; FAR-PREFETCH-LABEL: fun
				; FAR-PREFETCH-NOT: call void @llvm.prefetch

				; NEAR-PREFETCH-LABEL: fun
				; NEAR-PREFETCH: call void @llvm.prefetch


				define void @fun(i32* nocapture %Src, i32* nocapture readonly %Dst) {
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next.9, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %Dst, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %Src, i64 %indvars.iv
				store i32 %0, i32* %arrayidx2, align 4
				%indvars.iv.next.9 = add nuw nsw i64 %indvars.iv, 1600
				%cmp.9 = icmp ult i64 %indvars.iv.next.9, 11200
				br i1 %cmp.9, label %for.body, label %for.cond.cleanup
				}