[CodeGen] Add support for multiple memory operands in MachineInstr::mayAlias

Authored by Kayjukh on May 21 2020, 7:30 AM.


[CodeGen] Add support for multiple memory operands in MachineInstr::mayAlias

To support all targets, the mayAlias member function needs to support instructions with multiple operands.

This revision also changes the order of the emitted instructions in some test cases.

Reviewers: efriedma, hfinkel, craig.topper, dmgreen

Reviewed By: efriedma

Subscribers: MatzeB, dmgreen, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80161

Event Timeline

This patch is causing failures on bootstrap bots. Example: http://lab.llvm.org:8011/builders/ppc64le-lld-multistage-test/builds/9545

It appears that the quadratic nature of this change adds a non-trivial amount of overhead. This seems surprising to me since I wouldn't expect a machine instruction to have more than a very small number of machine memory operands (i.e. 1-2). However, when compiling SemaChecking.cpp on ppc64le, there are 759,437,451 calls to the lambda. Before this patch, we reached a similar point in the code 92625 while compiling a similar file.
The compile time for that file on my ppc64le machine has gone from under a minute to just under an hour. This is particularly problematic for the buildbot because the build will sometimes time out and sometimes it won't. So many developers will likely receive false failure notifications when the bot is flipping from a successful build to a failing one.

We are working on a reproducer and an analysis of why this is happening. In the meantime can you please revert this patch to bring the buildbots back to green?

Kayjukh added a comment.EditedMay 22 2020, 12:23 PM

Sure. I didn't realize it would cause such an overhead. If I remember correctly, some MachineInstr instances sometimes have duplicate memory operands for the same load or store.

Thanks for investigating the issue.

nemanjai added a subscriber: anil9.May 25 2020, 9:27 AM

OK. So it turns out that what happens is the Branch Folder merges a bunch of memory operands when it is tail merging the target blocks of a jump table. The reproducer at https://pastebin.com/tRtTQdSa shows about an 80x difference in compile time (and cycles) on a PowerPC box.

We really need to do something better than a quadratic algorithm here as this is an unacceptable compile time increase. Perhaps some sort of caching mechanism is required.

P.S. I produced the "minimal" reproducer to be large enough to show a significant difference. To do so, I arbitrarily chose 1000 as the threshold for the product of the number of memory operands in the two instructions. That's why this reproducer ends up with two memory access instructions, each with 32 memory operands.

Thanks @anil9 for his work on reducing a test case for this issue.