This is an archive of the discontinued LLVM Phabricator instance.

[LV] Relax high loop trip count threshold for deciding to interleave the loop
DraftPublic

Authored by nilanjana_basu on Sep 25 2023, 6:30 PM.
This is a draft revision that has not yet been submitted for review.

Details

Reviewers
None
Summary

A set of microbenchmarks in llvm-test-suite (https://github.com/llvm/llvm-test-suite/pull/26), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial once the interleaved part of the loop runs at least twice. The performance improves as trip count increases and is best for trip counts that only run the vectorized &/or interleaved loop. For example, if VW is the vectorization width & IC is the interleaving count, loops with trip count TC > VW*IC*2 performs better with interleaving and the performance is best when TC is a multiple of VW*IC.

The current trip count threshold to allow loop interleaving is 128 which seems arbitrarily high & uncorrelated with factors like VW, IC, register pressure.

We have also found example in an application benchmark that was compiled with PGO where a hot loop with trip count less than 24 shows a 40% regression since it didn't get interleaved because the profile-driven trip count was less than 128. Therefore, it seems reasonable to get rid of this threshold and use the trip count for computing interleaving count instead.

Diff Detail

Event Timeline

nilanjana_basu created this revision.Sep 25 2023, 6:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2023, 6:30 PM

Updated all unit tests that failed because of this change.

nilanjana_basu edited the summary of this revision. (Show Details)Sep 27 2023, 9:38 AM
nilanjana_basu edited the summary of this revision. (Show Details)Sep 27 2023, 1:39 PM

Updated commit message

Removed the option of -tiny-trip-count-interleave-threshold which will no longer be used, and updated interleave_short_tc.ll accordingly

Removed tiny-trip-count-interleave-threshold from zero_unroll.ll

Updated test case comments