This depends on D93694, and extends the SinkIntoLoop analysis to be less conservative. I.e., it was assuming we couldn't move across a store with isSafeToMove(AA, DontMoveAcrossStore), but now we do actually analyse that by querying HasStoreBetween(). Thus, the SinkIntoLoop part now starts reusing the infrastructure already available in MachineSink, which was the motivation of D93694. Function HasStoreBetween() doesn't analyse the From and To blocks, and now we also perform alias analysis of the instructions in these blocks to see if sinking is legal (which also reuses bits and pieces from MachineSink).
Diff Detail
Event Timeline
Rebased after D93694, and this now shows changes in the tests that were added there. New tests have been added to check alias analysis, and some other minor modifications.
Summarising:
- this now performs alias analysis of all instructions/blocks between the from and to location when an instruction is sunk.
- and isSafeToMove is less conservative because now we actually analyse hasStoreBetween rather than assuming this is the case.
llvm/lib/CodeGen/MachineSink.cpp | ||
---|---|---|
1213 | I have removed this, because otherwise it is extremely difficult to actually let this do and sink something. Thus, in this patch we address the analysis/correctness part of the transformation, and in the follow up I will address here the profitability question. |
llvm/lib/CodeGen/MachineSink.cpp | ||
---|---|---|
382 | branches. (Store instructions can still have defs, as in for post-inc's) | |
387 | Why remove this call to isSafeToMove? Don't we need to check for stores/exceptions/unmodelled side effects/etc? | |
1229 | If it's sinking into a loop, does it not have to check all the instructions in the loop for aliasing? | |
llvm/test/CodeGen/AArch64/loop-sink.mir | ||
1051–1052 | Some of these look very expensive to avoid a stack reload. Is the reload not cheaper, even if it is in the loop? |
llvm/lib/CodeGen/MachineSink.cpp | ||
---|---|---|
382 | Will clarify this comment a bit. | |
387 | This has not been deleted but, it has been moved to line 1290 and is now part of MachineSinking::IsSafeToMove(). The reason is that we need to have found the sink block and loop in order analyse DontMoveAcrossStore. | |
1229 | Yeah, good point! I was assuming/using single block loops, but not all loops confirm to that standard. ;-) | |
llvm/test/CodeGen/AArch64/loop-sink.mir | ||
1051–1052 | Yep, that might very well be the case. But this is about building up the framework to support loop sinking which requires quite a bit of surgery: after this analysis/correctness patch, we indeed need to work on the cost-model/profitability (your remark), we may need options to force or limit the number of sinks, and then we need to enable this by default. Thus, quite a few steps to go, and cost analysis will be next after this patch. |
llvm/lib/CodeGen/MachineSink.cpp | ||
---|---|---|
387 | Oh yeah. There it is. My Ctrl+f failed me. | |
llvm/test/CodeGen/AArch64/loop-sink.mir | ||
1051–1052 | Do you have the cost part yet? It's difficult to see if this will be a good thing to do without the complete picture. I guess having the code to do it shouldn't hurt in any case. Naively it would seem to trade one stack reload in the loop for a sunk instruction and the possibly the stack reload of it's arguments. And this appears to be AArch64 - a system with a lot of registers and comparatively cheap stack reloads. On v6m it would be much more likely to come up with the limited amount of registers available. Without being careful it has a high chance of making things worse, not better! |
Thanks for reviewing!
Find replies inlined to your comments/questions. I am now addressing the issue that all blocks in the loop should be analysed as remarked previously.
llvm/test/CodeGen/AArch64/loop-sink.mir | ||
---|---|---|
1051–1052 |
I think the complete picture does not exist. Or, there are many complete pictures, thus I don't think I can meaningfully sketch this or claim to have provided this. The general picture is that we have aggressive hoisting as a canonicalisation transformation, but that's the only mode that we have, and there's nothing in between, please see also my commit message rG48ecba350ed6 of the groundwork of this. This has links to discussions on the llvm dev mailing list, where this has been discussed several times and highlighted as a generic LLVM issue. This is adding the framework to at least be able to support this.
Nope, not yet. My motivating example is in D92488, which has actually been massively reduced. I think there's low hanging fruit here where we can easily do better.
Yep, sure, this needs careful tuning, like most other things. This is what we should do next. |
Analyse all blocks/instructions in the loop, not only the sink block, and a multi loops, multi-block loop test case cant_sink_multi_block_loop_with_call for that.
hasStoreBetween is a little time consuming, so it is better to do some compiling time testing also.
llvm/lib/CodeGen/MachineSink.cpp | ||
---|---|---|
449 | Will it cause something wrong if we don't clear the cache here? I guess the cached data here should not impact the following sinking from the loop preheader to the loop body. | |
1224 | I guess we only need to call hasStoreBetween for load instructions? | |
1232 | This is a little strange, if we check there are no alias instructions in the whole loop to I, do we still need the above hasStoreBetween check? I is from the loop preheader, SinkTo is a block in the loop, hasStoreBetween is also checking alias for load/store instructions. |
Thanks for looking!
This is really where I think/hope the caching of the result is going to help, so that essentially the analysis is done once, and the rest are lookups. And since we run the loop sinking as a second second, I hope it's almost free.
But I said "hope" a few times, and it's a good suggestion to check this, which I will do! Also, you made a good point about clearing this cache, see inline.
llvm/lib/CodeGen/MachineSink.cpp | ||
---|---|---|
449 | Good point. I haven't seen it doing anything wrong, I didn't try very hard though, but I put it there because I saw the preheader block being modified in the first sinking step. Loop sinking as a second step, then tries to sink instructions from that modified preheader block, so I thought about just clearing it just to be sure. But now thinking about it more, I don't think this can cause problems, so after verifying this I will remove this clearing of the cache. | |
1224 | Yep, will change that. | |
1232 | Agreed that this has become messy. I.e., it's doing more work than necessary. |
This addresses the code review comments: removes the hasStoreBetween call which is unnecessary, and some other minor changes.
Will now to see if this has any effect on compile times.
Re: compile times, I haven't found anything concerning. It's all within the noise.
I found I have one thing to fix though.
- This has been tested on the llvm test-suite for correctness and compile times, and both look good.
- This fixes an issue related to loop-invariant checks, the HasLoopPHIUse check. It was incorrectly sinking instruction when they were not loop-invariant, i.e. when they had phi uses. This was an existing check, but I missed this when I moved this from MachineLICM to MachineSink. This is now a helper in MachineLoop, and used in both passes.
- I have improved the MIR tests, especially the alias tests.
llvm/lib/CodeGen/MachineSink.cpp | ||
---|---|---|
1197 | Thanks Dave. I am investigating a miscompilation that I noticed for the llvm test suite. Will address your remark once I have a fix, and include it in the new revision for that. |
This fixes a miscompilation for x86 which was related to eflags setting instructions. This check was already present in MachineSink, which I can now "borrow" for the loop-sinking. This now passes a the llvm test suite on x86 and on aarch64, and also a bootstrap.
Fixed typo.
I will look into this separately:
Can this set the HasStoreCache for the block if it hasn't seen all the instructions in it yet?
I have uploaded D96485 to control compile-times.
Like I said, this passes my testing. So, what do you think, good to go?
Please note this is still disabled by default.
Next step:
- the interesting bit: cost-model to decide when to sink.
Friendly ping. Are we happy with this?
I appreciate that test X86/loop-sink.mir is a bit large which I added in the latest revisions, but this has been reduced from a llvm-suite miscompilation and I couldn't reduce it further within a reasonable amount of time. I thought one larger tests wouldn't be bad, actually saw some benefits.
The alternative is that I remove it. The added check is a corner case for X86, and it's an existing check that I am reusing so it is check that is covered. Just for completeness it's this check that I refactored into HasInstrSideEffects:
// If the instruction to move defines a dead physical register which is live // when leaving the basic block, don't move it because it could turn into a // "zombie" define of that preg. E.g., EFLAGS. (<rdar://problem/8030636>) static bool HasInstrSideEffects(MachineInstr &MI, MachineBasicBlock *SinkTo) {
The X86 test is indeed large! Considering that it's just an unrolled loop with the same sequence repeated MANY times, could you not just delete 3/4 of the loop and update the indvar accordingly?
Hmmmm, I have tried quite a few things, didn't see this particular check triggering, but will give it another try then.
I will look into this separately:
Can this set the HasStoreCache for the block if it hasn't seen all the instructions in it yet?
hasStoreBetween is called by SinkInstruction, and we can't break the existing code even if the rest is currently not enabled by default.
llvm/lib/CodeGen/MachineSink.cpp | ||
---|---|---|
1205 | This doesn't seem to have anything to do with side effects. I'm surprised there isn't more needed for clobbering physical registers. Because we are sinking into the start of a block, we only need to check the live-in, not any other instructions along the way? |
Why remove this call to isSafeToMove? Don't we need to check for stores/exceptions/unmodelled side effects/etc?