Page MenuHomePhabricator

[LoopDataPrefetch + SystemZ] Let target decide on prefetching on a per loop basis
Needs ReviewPublic

Authored by jonpa on Nov 14 2019, 3:14 AM.

Details

Summary

This patch adds

  • New arguments to getMinPrefetchStride() to let the target decide on a per-loop basis if software prefetching should be done even with a stride within the limit of the hw prefetcher.
  • New TTI hook enableWritePrefetching() to let a target do write prefetching by default (defaults to false).

LoopDataPrefetch:

  • A search through the whole loop to gather information before emitting any prefetches. This way the target can get information via new arguments to getMinPrefetchStride() and emit prefetches more selectively. Collected information includes: Does the loop have a call, how many memory accesses, how many of them are strided, how many prefetches will cover them. This is NFC to before as long as the target does not change its definition of getMinPrefetchStride().
  • If a previous access to the same exact address was 'read', and the current one is 'write', make it a 'write' prefetch.
  • If two accesses that are covered by the same prefetch do not dominate each other, put the prefetch in a block that dominates both of them.
  • If a ConstantMaxTripCount is less than ItersAhead, then skip the loop.

SystemZ:

  • increase the distance of prefetching (to meet the hot lbm loop with prefetching 9 iterations ahead).
  • enable write prefetching by default.
  • emit sw prefetching for any stride according to new heuristics in getMinPrefetchStride(), which includes lbm.
  • Do we need a test to test getMinPrefetchStride()?

Diff Detail

Event Timeline

jonpa created this revision.Nov 14 2019, 3:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2019, 3:14 AM
jonpa marked 2 inline comments as done.Nov 14 2019, 3:18 AM
jonpa added inline comments.
llvm/lib/Transforms/Scalar/LoopDataPrefetch.cpp
232

The Prefetch struct MemI member is used solely for debug/ORE output. It is currently not guarded by '#ifndef NDEBUG', but maybe it should be, even though it's just one pointer? The debug output could perhaps be improved also, I merely tried to keep what there was while also printing the new values gathered. The MemI member is not needed for anything else, so if we could change the debug output instead perhaps it's not even needed, or?

250

I am trusting DT->findNearestCommonDominator() and DomBB->getTerminator() to do this, but I am not 100% sure that there is always a terminator in each block...?