Compare iterations ahead against a constant trip count and do not emit any prefetches in case it seems that they address memory not accessed in the loop.
Diff Detail
Event Timeline
I noticed that the output of Loop Strength Reduce differs with this simple patch, and the diff includes actual instructions and opcodes, and this is when LoopDataPrefetch does not emit any prefetches.
It seems that the call SE->getSmallConstantTripCount(L) changes data structures so that when LSR is later run it outputs different code in the preheader of the loop:
master <> patched < %xtraiter144 = and i64 %1, 3 17a17,19 > %2 = trunc i64 %1 to i8 > %3 = trunc i8 %2 to i2 > %4 = zext i2 %3 to i64
I am not sure exactly why or what should be done. However, if I remove the AU.addPreserved<ScalarEvolutionWrapperPass>(); from LoopDataPrefetch, then this problem disappears.
Filed a bugreport for ScalarEvolution relating to this as this is an issue also without this particular patch: https://bugs.llvm.org/show_bug.cgi?id=43545
Use getSmallConstantMaxTripCount() instead of getSmallConstantTripCount() to catch a few more cases.
As discussed before, it seems that the call to SE->getSmallConstantTripCount(L) changes data structures which affects later passes like LSR. I wonder if this would have to stop us from committing this patch? If the call to getSmallConstantTripCount() causes SE to update itself, then LSR would actually make better decisions, or?
(On SPEC 2006, 8 files change with getSmallConstantTripCount(), and with getSmallConstantMaxTripCount() 2 more (10 in total). This is just making the call without changing anything else.)
With this patch I see 15 less prefetch instructions emitted on SPEC 2006 / SystemZ.
We could also check if LoopConstantTripCount == 1, and return if that's the case, but I'm not sure if that's useful. It might help avoid the problem encountered at https://bugs.llvm.org/show_bug.cgi?id=43679.
Hopefully someone more familiar with LoopDataPrefetch can review, but this looks reasonable.