This is an archive of the discontinued LLVM Phabricator instance.

[LSR] Don't try and create post-inc expressions on non-rotated loops
ClosedPublic

Authored by jmolloy on Jun 8 2016, 6:30 AM.

Details

Summary

If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!

Motivating testcase:

void f(float *a, float *b, float *c, int n) {
  while (n-- > 0)
    *c++ = *a++ + *b++;
}

It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.

Brendan, this change makes the Hexagon test "hwloop-crit-edge.ll" fail. To me it looks like the HexagonLoopOptimizer is perhaps a little too rigid in what it expects? I think the change in output of LSR is good, as it reduces register pressure and should be producing one less COPY. Could you please take a look and see what you think?

Diff Detail

Repository
rL LLVM

Event Timeline

jmolloy updated this revision to Diff 60022.Jun 8 2016, 6:30 AM
jmolloy retitled this revision from to [LSR] Don't try and create post-inc expressions on non-rotated loops.
jmolloy updated this object.
jmolloy added reviewers: atrick, bcahoon, dexonsmith.
jmolloy set the repository for this revision to rL LLVM.
jmolloy added a subscriber: mcrosier.
junbuml added a subscriber: junbuml.Jun 8 2016, 6:32 AM
bcahoon edited edge metadata.Jun 8 2016, 9:32 AM

Hi James - I think your patch is reasonable. You're correct about our hardware loop pass. We would need to improve it to catch this case. I'm not sure how easy/hard it is to do that without a little more investigation. Until we do that, I'm fine with either xfailing or removing the test (or some other suggestion?).

junbuml added inline comments.Jun 8 2016, 10:42 AM
test/Transforms/LoopStrengthReduce/post-inc-optsize.ll
7

For me the comment is little confusing. I think the post-inc transformation was avoided only in "%n.addr.0" so that IV increments stay in latch block instead of being moved up before icmp.

12

Don't you think you also need to check that icmp is not in the post-inc form, so pre-inc form in while.cond?

jmolloy updated this revision to Diff 60162.Jun 9 2016, 5:15 AM
jmolloy edited edge metadata.

Thanks Jun and Brendon.

I've XFAILed the test and I've updated the comments and checks according to Jun's review.

Adding another slew of reviewers.

junbuml added inline comments.Jun 13 2016, 9:40 AM
lib/Transforms/Scalar/LoopStrengthReduce.cpp
2084

Did you want to specifically handle this for a loop with only one exiting block? Why not doing this within the if statement below in line 2130 ?

test/Transforms/LoopStrengthReduce/post-inc-optsize.ll
7

I think this test checks if IV increments (incdec.ptr, incdec.ptr1, and incdec.ptr2) are not moved to the exiting block(while.cond) because this patch avoid transforming %cmp to the post-inc form that update IVIncInsertPos to %cmp.

jmolloy updated this revision to Diff 64131.Jul 15 2016, 5:29 AM

Hi Jun,

Thanks very much for the review! I've addressed your comments, I think.

With regards multiple exiting blocks, you're right. I hadn't considered that.

junbuml edited edge metadata.Jul 15 2016, 11:44 AM

This looks good to me, but I want to let Andrew take the final review on it.

Hi Jun,

Thanks! Andy hasn't been very active recently - is there anyone else who you feel could give the final review on this?

Cheers,

James

qcolombet edited edge metadata.Jul 22 2016, 4:02 PM

Hi James,

The patch looks sensible, but why do we have to XFAIL the hexagon test?

BTW, I like the effort you put in the comments to explain the heuristic!

Couple of comments inlined.

Cheers,
-Quentin

test/CodeGen/Hexagon/hwloop-crit-edge.ll
2

That does not seem right.

test/Transforms/LoopStrengthReduce/post-inc-optsize.ll
2

Although you have check lines, you do not have a call to FileCheck :).

33

opt -instnamer first, please.

jmolloy updated this revision to Diff 65734.EditedJul 27 2016, 7:07 AM
jmolloy edited edge metadata.

Hi Quentin,

Thanks for finding the test problems! With regards the XFAIL, it's due to a deficiency in Hexagon's hardware loops pass. I asked a Hexagon dev earlier up the comments list on this patch about it and he said:

Hi James - I think your patch is reasonable. You're correct about our hardware loop pass. We would need to improve it to catch this case. I'm not sure how easy/hard it is to do that without a little more investigation. Until we do that, I'm fine with either xfailing or removing the test (or some other suggestion?).

Cheers,

James

Ping! Quentin, are you happy with the rationale for XFAILing the test?

qcolombet accepted this revision.Aug 8 2016, 10:32 AM
qcolombet edited edge metadata.

Hi James,

Thanks for your patience.

LGTM.

Cheers,
-Quentin

This revision is now accepted and ready to land.Aug 8 2016, 10:32 AM
jmolloy closed this revision.Aug 15 2016, 1:04 AM

Thanks! r278658