This is an archive of the discontinued LLVM Phabricator instance.

[LoopDataPrefetch] Don't prefetch past a known total trip count
AbandonedPublic

Authored by jonpa on Oct 1 2019, 9:39 AM.

Details

Summary

Compare iterations ahead against a constant trip count and do not emit any prefetches in case it seems that they address memory not accessed in the loop.

Diff Detail

Event Timeline

jonpa created this revision.Oct 1 2019, 9:39 AM
jonpa added a comment.Oct 3 2019, 2:27 AM

I noticed that the output of Loop Strength Reduce differs with this simple patch, and the diff includes actual instructions and opcodes, and this is when LoopDataPrefetch does not emit any prefetches.

It seems that the call SE->getSmallConstantTripCount(L) changes data structures so that when LSR is later run it outputs different code in the preheader of the loop:

master <> patched
<     %xtraiter144 = and i64 %1, 3
17a17,19
>     %2 = trunc i64 %1 to i8
>     %3 = trunc i8 %2 to i2
>     %4 = zext i2 %3 to i64

I am not sure exactly why or what should be done. However, if I remove the AU.addPreserved<ScalarEvolutionWrapperPass>(); from LoopDataPrefetch, then this problem disappears.

jonpa added a comment.Oct 3 2019, 4:01 AM

Filed a bugreport for ScalarEvolution relating to this as this is an issue also without this particular patch: https://bugs.llvm.org/show_bug.cgi?id=43545

jonpa updated this revision to Diff 225623.Oct 18 2019, 7:56 AM
jonpa added reviewers: efriedma, fhahn.

Use getSmallConstantMaxTripCount() instead of getSmallConstantTripCount() to catch a few more cases.

As discussed before, it seems that the call to SE->getSmallConstantTripCount(L) changes data structures which affects later passes like LSR. I wonder if this would have to stop us from committing this patch? If the call to getSmallConstantTripCount() causes SE to update itself, then LSR would actually make better decisions, or?

(On SPEC 2006, 8 files change with getSmallConstantTripCount(), and with getSmallConstantMaxTripCount() 2 more (10 in total). This is just making the call without changing anything else.)

With this patch I see 15 less prefetch instructions emitted on SPEC 2006 / SystemZ.

We could also check if LoopConstantTripCount == 1, and return if that's the case, but I'm not sure if that's useful. It might help avoid the problem encountered at https://bugs.llvm.org/show_bug.cgi?id=43679.

Hopefully someone more familiar with LoopDataPrefetch can review, but this looks reasonable.

jonpa abandoned this revision.Nov 14 2019, 3:22 AM

This change is included instead in https://reviews.llvm.org/D70228.