This is an archive of the discontinued LLVM Phabricator instance.

[LoopInterchange] Change cost function to use bytes in cache line.
AbandonedPublic

Authored by fhahn on Jul 10 2017, 10:30 AM.

Details

Summary

This patch updates the cost function to use the number of bytes that fit
into cache lines to evaluate if it is beneficial to interchange loops.
Loops are interchanged if more bytes fit into cache lines if we swap the
order.

By operating on the pointer operand of load and store instructions and
only using SCEV, we can also support accesses, where the pointer
calculation is more complex, e.g. for loops like

for(int i=0;i<n;i++)
  for(int j=1;j<25;j++)
    A[j*25+i] = B[n+i]

This cost function gives a speed up on a few benchmarks (and no noticeable
regressions) from a wide range of benchmarks from the test-suite, SPEC2000,
SPEC2006 and a range of proprietary suites on AArch64.

Diff Detail

Event Timeline

fhahn created this revision.Jul 10 2017, 10:30 AM
fhahn updated this revision to Diff 107832.Jul 23 2017, 8:24 AM
fhahn added reviewers: hfinkel, dberlin.

Fix formatting, rebased.

I think GCC uses/used a similar stride-based heuristic to check profitability.

Ping. Any thoughts?

With the new cost function, 20 more loops are interchanged in SPEC2006. I manually checked the interchanged loops and all of them are expected to be interchanged due to the change. They don't look like hot loops though and there is no performance change either way in SPEC2k6:

/benchspec/CPU2006/464.h264ref/src/decoder.c:131:7:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/decoder.c:417:5:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/decoder.c:51:5:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/image.c:1838:5:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/image.c:1838:5:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/macroblock.c:527:7:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/macroblock.c:527:7:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/rdopt.c:1283:9:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/rdopt.c:1324:9:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/rdopt.c:1335:5:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/rdopt.c:1347:7:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/rdopt.c:1488:9:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/rdopt.c:1499:9:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/rdopt.c:1509:7:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/rdopt.c:1521:9:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/rdopt.c:2078:3:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/rdopt.c:5013:5:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/rdopt.c:5021:7:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/transform8x8.c:593:5:
      remark: Loop interchanged with enclosing loop.
/benchspec/CPU2006/464.h264ref/src/transform8x8.c:593:5:
      remark: Loop interchanged with enclosing loop.

A few noticable speedups with this cost function are in the test-suite and proprietary suites on AArch64. I suspect that more benefits become apparent when some of the limitations of the LoopInterchange pass have been removed.

fhahn planned changes to this revision.May 4 2018, 10:26 AM

I'll be picking this up again soonish.

fhahn abandoned this revision.Jul 1 2022, 7:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2022, 7:23 AM