Irregular streams typically consist of symbolic strided accesses.
For example:
struct MyStruct { int field; char kk[60]; } *my_struct; int f(struct MyStruct *p, struct MyStruct *q, int N) { int total = 0; struct MyStruct *r = p; for (int i = 0; i < N/300; ++i) for (r = p + i; r < q; r += N) total += r->field; return total; }
This software prefetching scheme looks for such irregular symbolic strides and prefetches
'PrefetchDegree' cache lines ahead of the visiting load address. Adds a TTI interface
'getPrefetchDegree' to query the target parameter.
This is currently enabled for Kryo only. A target would have to provide this information
to opt in to prefetch 'PrefetchDegree' cache lines ahead.
This improves performance of these Spec200x benchmarks on Kryo:
Benchmark | Diff (%) |
spec2006/povray:ref | +1.738% |
spec2006/gcc:ref | +1.749% |
spec2006/mcf:ref | +7.936% |
spec2000/gap:ref | +16.51% |
So you are saying on one hand (MinPrefetchStride = 1024) that we shouldn't bother prefetching unless the stride is at least 1K but then you say (PrefetchDegree = 1) that you want to prefetch the very next cache line anytime the stride is not known at compile time.
I feel that there is a contradiction here. The former suggest that you have a pretty powerful HW prefetcher, the latter that you don't and willing to speculate aggressively to compensate for it.
It seems that something is wrong with the model here.