We had a case that a single loop which has 4000 exits and the average number of predecessors of each exit is > 1000. It increased the compilation time of IVUser analysis significantly. This patch adds a limit for the iterations to check whether a loop has only dedicated exits. With the patch, the time to compile our testcase reduced from 1000s to 200s (clang release build).
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I wonder whether the problem is in this function or in analysis itself?
If it is a problem of analysis it is better to restrict it but not an utility function.
I feel the problem is in the utility function. Currently the problem is exposed in IVUser pass but the same can happen anywhere else if only the function is called for a huge loop with many exits and with each exit having many predecessors.
Can you do some measurements to ensure that the issue is really in this utility function but not in the user of it?
At the same time it makes sense to update function description to specify that with your change the function can conservatively returns no if it could not get the precise answer.
Is there any chance you could provide the test case that shows the behavior? There might be something we could do in hasDedicatedExits to make it faster for large loops with a large number of exit blocks and successors.
I did that. For the huge loop, every call of hasDedicatedExits took about 3~5 seconds. hasDedicatedExits is called by isSimplifiedLoopNest which is used by IVUser pass. Indeed there is another problem that IVUser pass calls isSimplifiedLoopNest for the same loop multiple times (~200). However, IVUser is a loop analysis pass, and isSimplifiedLoopNest can be called for a loop different from the current loop analyzed. So isSimplifiedLoopNest is called for the same huge loop many times for many different invocations of IVUser pass analyzing different loops, and the cache mechanism local to IVUser doesn't work for it.
I understand it is possible to mitigate the problem in IVUser (I don't know how to do that yet. The original commit to add isSimplifiedLoopNest was r152892 in 2012 and the original bug was in rdar://problem/11049788 which I cannot read) but it seems also reasonable to put a limit in hasDedicatedExits -- For my case, the loop with 4000 exits + each exit has 1000 predecessors took 3~5 seconds for each call of hasDedicatedExits. It is already noticeable to compiler users. If the loop is larger, it will take more time even if hasDedicatedExits is called only once for it. I think it is better to put a limit for such high cost function and that is independent to improve the cache or algorithm in IVUser.
At the same time it makes sense to update function description to specify that with your change the function can conservatively returns no if it could not get the precise answer.
That makes sense. I will do that.
The loop contains a huge switch case statement. I will see if I can create something showing similar behavior.
Sorry for taking a while to get back. Upload a testfile 1.cc:
.~/workarea/llvm-r371746/rbuild/bin/clang -O2 -S 1.cc
Without the patch, it took 52s to compile.
With the patch, it took 2s to compile.
gcc took 9s to compile.
And for the testcase, disabling LSR didn't help much, so it is not an IVUser specific problem. It took so long mainly because of hasDedicatedExits.
Probably just put it here:
https://github.com/llvm/llvm-test-suite/tree/master/SingleSource/Regression/C
Thanks, I can put the test there, but I am not sure how large should I increase the size of the test to so it will definitely timeout without the patch, and do I need to consider the test takes different time to compile on different machines? I remember there is a fixed compile time limit to build a file in testsuite but I don't remember how long it is. Do you happen to know that?
include/llvm/Analysis/LoopInfoImpl.h | ||
---|---|---|
88 ↗ | (On Diff #221437) | I'm sorry if I was not specific but I meant the comment in LoopInfo.h file for the method hasDedicatedExits not in LoopInfoImpl.h. |
include/llvm/Analysis/LoopInfoImpl.h | ||
---|---|---|
88 ↗ | (On Diff #221437) | Ah, got it, thanks. Fixed. |
I find the compile time limit in lnt nightly test is 500 seconds. To stably get time out without the patch, I estimate I need to increase the size of test so it takes > 30seconds to compile with the patch. I am not sure if that is ok.
The last thing. Would not you like to add a test examine the max-dedicate-exit-iterations functionality?
The test can use the small number for max-dedicate-exit-iterations and ensure that the function returns false? If it is difficult examine through opt you can probably write a unit test?
It seems it may protect you from future modification of this method.
If it is difficult or impossible to do -> lgtm.