Page MenuHomePhabricator

[LoopFlatten] Add a loop-flattening pass
Needs ReviewPublic

Authored by dmgreen on Jan 22 2018, 3:45 AM.
This revision needs review, but there are no reviewers specified.



This is a simple pass that flattens nested loops.

The intention is to optimise loop nests like this, which together
access an array linearly:

for (int i = 0; i < N; ++i)
  for (int j = 0; j < M; ++j)

into one loop:

for (int i = 0; i < (N*M); ++i)

It can also flatten loops where the induction variables are not used
in the loop. This can help with codesize and runtime, especially on
simple cpus without advanced branch prediction.

This is only worth flattening if the induction variables are only used
in an expression like i*M+j. If they had any other uses, we would have
to insert a div/mod to reconstruct the original values, so this
wouldn't be profitable.

This pass was originally written by Oliver Stannard, but he had to
go do other super important things. This have been changed a little
since then. There is a patch to enable loop versioning that will

Diff Detail

Event Timeline

dmgreen created this revision.Jan 22 2018, 3:45 AM
fhahn added a subscriber: fhahn.Jan 22 2018, 3:47 AM

Hi Dave,

Do you have any performance and compile time overhead numbers?


samparker added inline comments.Jan 22 2018, 5:35 AM
114 ↗(On Diff #130861)

Why are these predicates valid and not any others?

samparker added inline comments.Jan 23 2018, 1:35 AM
422 ↗(On Diff #130861)

Couldn't you just use 'OuterLoop->isLoopInvariant(InnerLimit)?


So when this comes up, it helps to simplify loops a little. The fourinarow test in the testsuite is helped out by this, decreasing the total time at -Os on a M23 by 1.5% (at -O3 the whole loop is just unrolled either way). It also helps knock off 32bytes off the total filesize. Both these results are a little higher than I would expect, considering it's only flattening a single loop.

For an M33, something like this loop takes 880 cycles when flattened, compared to 1350 cycles from before. Both LSR and the registry allocator seem to have a better time when they don't have to deal with outer loops, along with the obvious benefits of removing an outer level of compare/branches/IV etc.

for (int i=0; i<N; i++)
  for (int j=0; j<N; j++)
    V[i*N+j] += val;

I don't have compile time numbers yet, but I don't expect this pass to be very heavy with the checks we are doing. I'll look into it to be sure.

When it comes up, it helps a little. Not much, but hopefully not an expensive pass to do so :)

114 ↗(On Diff #130861)

My understanding is that indvars simplify will canonicalise this check to NE. I'll make sure this is actually correct and update things if I can get other forms to come up.

422 ↗(On Diff #130861)

Nice. Like it.

dmgreen edited the summary of this revision. (Show Details)Jan 24 2018, 9:35 AM
dmgreen updated this revision to Diff 131970.Jan 30 2018, 8:12 AM

Added some isLoopInvariant's and updated the checks for compare predicates a little, as per review comments.

bkramer added a subscriber: bkramer.Feb 7 2020, 4:56 AM

Was this ever committed? Loop flattening looks like something that's useful to have.

Was this ever committed?


Loop flattening looks like something that's useful to have.


dmgreen marked an inline comment as done.Feb 7 2020, 6:33 AM

Nope. Reviews welcome :)

I guess this is super old and could do with a rebase. I'll try and give that a go.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2020, 7:21 AM
samparker added inline comments.Feb 10 2020, 2:11 AM

How are the limits decided for the number of users? My counting seems to be off-by-one for both the compare and the increment.


Doesn't the pattern matching make this check redundant?


Do we check for LCSSA? Or is this covered by it being a 'simple' loop?


reduce nesting here?


Is inbounds not enough? Also, why do we return on the first valid gep, don't we have to check all the users?

dmgreen marked 5 inline comments as done.Sun, May 3, 1:32 AM

Sorry for the delay here. I asked for review when I got some suddenly realized it had been years since I actually looked at this (and didn't write it to begin with!) And I apparently didn't have the time right now to look into the details again.

On a conceptual level, it would also be good if it wasn't just me and "the person sat opposite me" (current situation withstanding) that looked at this. I believe large(ish) target independent changes like this should preferably get looked at by more than just one group.

If there is anyone else out there that is interested in it, please feel free to take a look or take it off my hands :)


Increment will be used by Phi and Cmp, Cmp will be used by Br I think.


You are likely correct, but safer to keep it around?


This being a LoopPass implies LCSSA form.


Sounds good.


I believe the logic is that if we find a GEP that is inbounds, from that we can prove that we don't overflow.

hintonda removed a subscriber: hintonda.Mon, May 4, 12:34 PM