This is an archive of the discontinued LLVM Phabricator instance.

[IVUser] Limit the iterations to check whether a loop has dedicated exits for extreme large case
ClosedPublic

Authored by wmi on Sep 9 2019, 9:37 AM.

Details

Summary

We had a case that a single loop which has 4000 exits and the average number of predecessors of each exit is > 1000. It increased the compilation time of IVUser analysis significantly. This patch adds a limit for the iterations to check whether a loop has only dedicated exits. With the patch, the time to compile our testcase reduced from 1000s to 200s (clang release build).

Diff Detail

Repository
rL LLVM

Event Timeline

wmi created this revision.Sep 9 2019, 9:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2019, 9:37 AM

I wonder whether the problem is in this function or in analysis itself?
If it is a problem of analysis it is better to restrict it but not an utility function.

wmi added a comment.Sep 9 2019, 9:18 PM

I wonder whether the problem is in this function or in analysis itself?
If it is a problem of analysis it is better to restrict it but not an utility function.

I feel the problem is in the utility function. Currently the problem is exposed in IVUser pass but the same can happen anywhere else if only the function is called for a huge loop with many exits and with each exit having many predecessors.

In D67359#1664242, @wmi wrote:

I wonder whether the problem is in this function or in analysis itself?
If it is a problem of analysis it is better to restrict it but not an utility function.

I feel the problem is in the utility function. Currently the problem is exposed in IVUser pass but the same can happen anywhere else if only the function is called for a huge loop with many exits and with each exit having many predecessors.

Can you do some measurements to ensure that the issue is really in this utility function but not in the user of it?

At the same time it makes sense to update function description to specify that with your change the function can conservatively returns no if it could not get the precise answer.

fhahn added a subscriber: fhahn.Sep 10 2019, 3:31 AM

Is there any chance you could provide the test case that shows the behavior? There might be something we could do in hasDedicatedExits to make it faster for large loops with a large number of exit blocks and successors.

wmi added a comment.Sep 10 2019, 10:27 AM
In D67359#1664242, @wmi wrote:

I wonder whether the problem is in this function or in analysis itself?
If it is a problem of analysis it is better to restrict it but not an utility function.

I feel the problem is in the utility function. Currently the problem is exposed in IVUser pass but the same can happen anywhere else if only the function is called for a huge loop with many exits and with each exit having many predecessors.

Can you do some measurements to ensure that the issue is really in this utility function but not in the user of it?

I did that. For the huge loop, every call of hasDedicatedExits took about 3~5 seconds. hasDedicatedExits is called by isSimplifiedLoopNest which is used by IVUser pass. Indeed there is another problem that IVUser pass calls isSimplifiedLoopNest for the same loop multiple times (~200). However, IVUser is a loop analysis pass, and isSimplifiedLoopNest can be called for a loop different from the current loop analyzed. So isSimplifiedLoopNest is called for the same huge loop many times for many different invocations of IVUser pass analyzing different loops, and the cache mechanism local to IVUser doesn't work for it.

I understand it is possible to mitigate the problem in IVUser (I don't know how to do that yet. The original commit to add isSimplifiedLoopNest was r152892 in 2012 and the original bug was in rdar://problem/11049788 which I cannot read) but it seems also reasonable to put a limit in hasDedicatedExits -- For my case, the loop with 4000 exits + each exit has 1000 predecessors took 3~5 seconds for each call of hasDedicatedExits. It is already noticeable to compiler users. If the loop is larger, it will take more time even if hasDedicatedExits is called only once for it. I think it is better to put a limit for such high cost function and that is independent to improve the cache or algorithm in IVUser.

At the same time it makes sense to update function description to specify that with your change the function can conservatively returns no if it could not get the precise answer.

That makes sense. I will do that.

wmi added a comment.Sep 11 2019, 2:57 PM

Is there any chance you could provide the test case that shows the behavior? There might be something we could do in hasDedicatedExits to make it faster for large loops with a large number of exit blocks and successors.

The loop contains a huge switch case statement. I will see if I can create something showing similar behavior.

wmi added a comment.Sep 23 2019, 4:02 PM

Sorry for taking a while to get back. Upload a testfile 1.cc:

.

~/workarea/llvm-r371746/rbuild/bin/clang -O2 -S 1.cc
Without the patch, it took 52s to compile.
With the patch, it took 2s to compile.
gcc took 9s to compile.

And for the testcase, disabling LSR didn't help much, so it is not an IVUser specific problem. It took so long mainly because of hasDedicatedExits.

Maybe you can add this test case to llvm test suite?

wmi added a comment.Sep 23 2019, 4:25 PM

Maybe you can add this test case to llvm test suite?

Do you know how to add a compile time testcase in testsuite usually?

Thanks,
Wei.

wmi added a comment.Sep 23 2019, 4:49 PM

Thanks, I can put the test there, but I am not sure how large should I increase the size of the test to so it will definitely timeout without the patch, and do I need to consider the test takes different time to compile on different machines? I remember there is a fixed compile time limit to build a file in testsuite but I don't remember how long it is. Do you happen to know that?

wmi updated this revision to Diff 221437.Sep 23 2019, 4:52 PM

Address Serguei's comment: update function description.

In D67359#1679990, @wmi wrote:

Thanks, I can put the test there, but I am not sure how large should I increase the size of the test to so it will definitely timeout without the patch, and do I need to consider the test takes different time to compile on different machines? I remember there is a fixed compile time limit to build a file in testsuite but I don't remember how long it is. Do you happen to know that?

I dont know :( but 2s is probably okay. Maybe @fhahn can you help here..

skatkov added inline comments.Sep 23 2019, 8:14 PM
include/llvm/Analysis/LoopInfoImpl.h
88

I'm sorry if I was not specific but I meant the comment in LoopInfo.h file for the method hasDedicatedExits not in LoopInfoImpl.h.

wmi updated this revision to Diff 221464.Sep 23 2019, 10:36 PM

Address Serguei's comment.

wmi marked an inline comment as done.Sep 23 2019, 10:37 PM
wmi added inline comments.
include/llvm/Analysis/LoopInfoImpl.h
88

Ah, got it, thanks. Fixed.

wmi added a comment.Sep 24 2019, 10:37 AM
In D67359#1679990, @wmi wrote:

Thanks, I can put the test there, but I am not sure how large should I increase the size of the test to so it will definitely timeout without the patch, and do I need to consider the test takes different time to compile on different machines? I remember there is a fixed compile time limit to build a file in testsuite but I don't remember how long it is. Do you happen to know that?

I dont know :( but 2s is probably okay. Maybe @fhahn can you help here..

I find the compile time limit in lnt nightly test is 500 seconds. To stably get time out without the patch, I estimate I need to increase the size of test so it takes > 30seconds to compile with the patch. I am not sure if that is ok.

The last thing. Would not you like to add a test examine the max-dedicate-exit-iterations functionality?

The test can use the small number for max-dedicate-exit-iterations and ensure that the function returns false? If it is difficult examine through opt you can probably write a unit test?
It seems it may protect you from future modification of this method.

If it is difficult or impossible to do -> lgtm.

wmi added a comment.Sep 25 2019, 8:38 AM

The last thing. Would not you like to add a test examine the max-dedicate-exit-iterations functionality?

Good point. Test added.

wmi updated this revision to Diff 221780.Sep 25 2019, 8:38 AM

Address Serguei's comment: add a unittest.

Test added.

Great.

skatkov accepted this revision.Sep 25 2019, 7:43 PM

Thanks.

This revision is now accepted and ready to land.Sep 25 2019, 7:43 PM
wmi added a comment.Sep 26 2019, 8:23 AM

Thanks all for the review!

This revision was automatically updated to reflect the committed changes.