Instead of setting Store to true conservatively, handle some cases where Store can be determined by some analysis if MI.parent() dom SuccToSinkTo and SuccToSinkTo post dom MI.parent().
Thus we can get more profitable loads to be sunk.
Paths
| Differential D86864
[MachineSinking] sink more profitable loads ClosedPublic Authored by shchenz on Aug 31 2020, 2:11 AM.
Details
Summary Instead of setting Store to true conservatively, handle some cases where Store can be determined by some analysis if MI.parent() dom SuccToSinkTo and SuccToSinkTo post dom MI.parent(). Thus we can get more profitable loads to be sunk.
Diff Detail
Event TimelineHerald added subscribers: llvm-commits, danielkiss, luismarques and 23 others. · View Herald Transcript Comment Actions Do we need some threshold to limit the analysis? I'm worried this could get very expensive.
Comment Actions 1: can not check alias for call instruction by using mayAlias Comment Actions
I add a new cache to save compiling time. Could you be more specific on how to add the threshold? Thanks.
Comment Actions Hi @efriedma, Given you've already taken a look, can you do the review or do you want me to step in? Cheers, Comment Actions I'd appreciate it if you'd step in, Quentin. I have a backlog of reviews, and I still haven't quite wrapped my head around the use of post-dominance here. Comment Actions @efriedma @qcolombet haha, thanks for your review. Please take your time. ^-^ . I want to make the alias check simple(find all basic blocks in all paths from block From to block To), so I guard the dominance between From and To as From dominates To and To post dominates From. This will make us still miss some cases in which load instruction can be sunk. But it can make the complexity simple. Comment Actions Hi, Looks reasonable but I'd like to have an idea of the compile time impact of this patch. Make sure to fix the Lint comments as well. Cheers,
Comment Actions 1: clear the cache just before function returns Comment Actions Thanks very much for your review. @qcolombet Update accordingly. I collected some compiling time numbers for llvm test-suite. No obivious degradation found.
Comment Actions Ah, I used @nikic perf testing tool for this patch on the X86 target(?). (Thanks for this great tool ^-^) Here is the result of the compiling time: https://llvm-compile-time-tracker.com/compare.php?from=2d25004a137223a02aa06e8bfd512a648f3b3f94&to=abf39366779bed8b9f2d276cd199c3b78471e3b1&stat=instructions GEO degradates about 0.02% and no obivious outlier. I guess the compiling time increase on these benchmarks should be acceptable? Comment Actions
Yes, that's fine by me.
This revision is now accepted and ready to land.Oct 27 2020, 9:15 AM This revision was landed with ongoing or failed builds.Nov 1 2020, 6:16 PM Closed by commit rG24a31922ce2a: [MachineSink] sink more profitable loads (authored by shchenz). · Explain Why This revision was automatically updated to reflect the committed changes. Comment Actions Hi, @shchenz. Our several opecncl benchmarks have appeared great compile time regression. Comment Actions
Yes, we have been aware that this patch may introduce compiling time degradations. And as you can see in previous comments, I already tested the compiling time on X86 arch. Sadly, the tested benchmarks don't expose any regressions. Could you please help to send me your regression function/IR? So I can have a look about how to fix it? Thanks. Comment Actions
I am sorry that I can't provide the case to you directly. I am trying making a small reproduce but encountered some problems. Comment Actions
I can surely do that. But I think the most reasonable solution would be fix the compiling time issue. Since compiling time tests I did before does not expose any regression, your test case must be a little special. Could you find out the special point, for example the function has too many blocks or some/many blocks in the function has too many instructions? Thanks. Comment Actions
I think the increase in compile time is because the function has too many instructions and blocks. The function has Thousands of lines of instructions. Can you add limitation for the number of instructions or the number of blocks so the check for 'store' can end early? Comment Actions
Could you please help to check if the following change can solve your issue? C diff --git a/llvm/lib/CodeGen/MachineSink.cpp b/llvm/lib/CodeGen/MachineSink.cpp index 0abdf89..8ca3520 100644 --- a/llvm/lib/CodeGen/MachineSink.cpp +++ b/llvm/lib/CodeGen/MachineSink.cpp @@ -79,6 +79,12 @@ static cl::opt<unsigned> SplitEdgeProbabilityThreshold( "splitted critical edge"), cl::init(40), cl::Hidden); +static cl::opt<unsigned> SinkLoadInstsPerBlockThreshold( + "machine-sink-load-instrs-threshold", + cl::desc("Do not try to find alias store for a load if there is a in-path " + "block whose instruction number is higher than this threshold."), + cl::init(2000), cl::Hidden); + STATISTIC(NumSunk, "Number of machine instructions sunk"); STATISTIC(NumSplit, "Number of critical edges split"); STATISTIC(NumCoalesces, "Number of copies coalesced"); @@ -1036,6 +1042,12 @@ bool MachineSinking::hasStoreBetween(MachineBasicBlock *From, HandledBlocks.insert(BB); // To post dominates BB, it must be a path from block From. if (PDT->dominates(To, BB)) { + // If this BB is too big, stop searching to save compiling time. + if (BB->size() > SinkLoadInstsPerBlockThreshold) { + HasStoreCache[BlockPair] = true; + return true; + } + for (MachineInstr &I : *BB) { // Treat as alias conservatively for a call or an ordered memory // operation. Comment Actions
Unfortunately, this patch doesn't work. I don't think the number of blocks is necessarily related to the number of instructions. Comment Actions
The above threshold is for number of MIs. BB->size() is to get instruction number of BB. I committed 31c2b93d83f63ce7f9bb4977f58de2e00bf18e0f to further reduce compiling time. You can have a try Comment Actions
I don't think so. If there is a huge block that is reachable from block From but is not dominated by From and can not flow to To, we will mark {From, To} as hasStore. This is not true, we should only care about the blocks which are dominated by From and can flow to To. Comment Actions
Thanks. It works for our tests.
Revision Contents
Diff 302198 llvm/lib/CodeGen/MachineSink.cpp
llvm/test/CodeGen/RISCV/select-optimize-multiple.ll
llvm/test/CodeGen/X86/2007-01-13-StackPtrIndex.ll
llvm/test/CodeGen/X86/MachineSink-eflags.ll
llvm/test/CodeGen/X86/avx2-masked-gather.ll
llvm/test/CodeGen/X86/cmovcmov.ll
llvm/test/CodeGen/X86/select.ll
llvm/test/CodeGen/X86/vec_int_to_fp.ll
|
Would it make sense to preserve the cache as long as we don't move stores, calls and loads with ordered memory ref?
I'm worried about the compile time impact of this patch so I am wondering if we want to be smarter about invalidating the cache.