This is an archive of the discontinued LLVM Phabricator instance.

[MachineSink] SinkIntoLoop: analyse stores and aliases in between
AbandonedPublic

Authored by SjoerdMeijer on Jan 8 2021, 7:31 AM.

Details

Summary

This depends on D93694, and extends the SinkIntoLoop analysis to be less conservative. I.e., it was assuming we couldn't move across a store with isSafeToMove(AA, DontMoveAcrossStore), but now we do actually analyse that by querying HasStoreBetween(). Thus, the SinkIntoLoop part now starts reusing the infrastructure already available in MachineSink, which was the motivation of D93694. Function HasStoreBetween() doesn't analyse the From and To blocks, and now we also perform alias analysis of the instructions in these blocks to see if sinking is legal (which also reuses bits and pieces from MachineSink).

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Jan 8 2021, 7:31 AM
SjoerdMeijer requested review of this revision.Jan 8 2021, 7:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 8 2021, 7:31 AM
SjoerdMeijer retitled this revision from [MachineSink] SinkIntoLoop: analyse stores in between. WIP to [MachineSink] SinkIntoLoop: analyse stores in between.
SjoerdMeijer edited the summary of this revision. (Show Details)
SjoerdMeijer added a reviewer: shchenz.

Rebased after D93694, and this now shows changes in the tests that were added there. New tests have been added to check alias analysis, and some other minor modifications.

Summarising:

  • this now performs alias analysis of all instructions/blocks between the from and to location when an instruction is sunk.
  • and isSafeToMove is less conservative because now we actually analyse hasStoreBetween rather than assuming this is the case.
SjoerdMeijer retitled this revision from [MachineSink] SinkIntoLoop: analyse stores in between to [MachineSink] SinkIntoLoop: analyse stores and aliases in between.Jan 29 2021, 3:03 AM

Restored a FIXME comment.

SjoerdMeijer added inline comments.Feb 1 2021, 4:59 AM
llvm/lib/CodeGen/MachineSink.cpp
1213

I have removed this, because otherwise it is extremely difficult to actually let this do and sink something. Thus, in this patch we address the analysis/correctness part of the transformation, and in the follow up I will address here the profitability question.

dmgreen added inline comments.Feb 1 2021, 10:26 AM
llvm/lib/CodeGen/MachineSink.cpp
381

branches. (Store instructions can still have defs, as in for post-inc's)

387

Why remove this call to isSafeToMove? Don't we need to check for stores/exceptions/unmodelled side effects/etc?

1228

If it's sinking into a loop, does it not have to check all the instructions in the loop for aliasing?

llvm/test/CodeGen/AArch64/loop-sink.mir
1035–1036

Some of these look very expensive to avoid a stack reload. Is the reload not cheaper, even if it is in the loop?

SjoerdMeijer added inline comments.Feb 1 2021, 11:28 AM
llvm/lib/CodeGen/MachineSink.cpp
381

Will clarify this comment a bit.

387

This has not been deleted but, it has been moved to line 1290 and is now part of MachineSinking::IsSafeToMove(). The reason is that we need to have found the sink block and loop in order analyse DontMoveAcrossStore.

1228

Yeah, good point! I was assuming/using single block loops, but not all loops confirm to that standard. ;-)
This should indeed check all loop blocks/instructions.

llvm/test/CodeGen/AArch64/loop-sink.mir
1035–1036

Yep, that might very well be the case. But this is about building up the framework to support loop sinking which requires quite a bit of surgery: after this analysis/correctness patch, we indeed need to work on the cost-model/profitability (your remark), we may need options to force or limit the number of sinks, and then we need to enable this by default. Thus, quite a few steps to go, and cost analysis will be next after this patch.

dmgreen added inline comments.Feb 2 2021, 12:44 AM
llvm/lib/CodeGen/MachineSink.cpp
387

Oh yeah. There it is. My Ctrl+f failed me.

llvm/test/CodeGen/AArch64/loop-sink.mir
1035–1036

Do you have the cost part yet? It's difficult to see if this will be a good thing to do without the complete picture. I guess having the code to do it shouldn't hurt in any case.

Naively it would seem to trade one stack reload in the loop for a sunk instruction and the possibly the stack reload of it's arguments. And this appears to be AArch64 - a system with a lot of registers and comparatively cheap stack reloads. On v6m it would be much more likely to come up with the limited amount of registers available. Without being careful it has a high chance of making things worse, not better!

Thanks for reviewing!
Find replies inlined to your comments/questions. I am now addressing the issue that all blocks in the loop should be analysed as remarked previously.

llvm/test/CodeGen/AArch64/loop-sink.mir
1035–1036

It's difficult to see if this will be a good thing to do without the complete picture.

I think the complete picture does not exist. Or, there are many complete pictures, thus I don't think I can meaningfully sketch this or claim to have provided this. The general picture is that we have aggressive hoisting as a canonicalisation transformation, but that's the only mode that we have, and there's nothing in between, please see also my commit message rG48ecba350ed6 of the groundwork of this. This has links to discussions on the llvm dev mailing list, where this has been discussed several times and highlighted as a generic LLVM issue. This is adding the framework to at least be able to support this.

Do you have the cost part yet?

Nope, not yet. My motivating example is in D92488, which has actually been massively reduced. I think there's low hanging fruit here where we can easily do better.

Naively it would seem to trade one stack reload in the loop for a sunk instruction and the possibly the stack reload of it's arguments. And this appears to be AArch64 - a system with a lot of registers and comparatively cheap stack reloads. On v6m it would be much more likely to come up with the limited amount of registers available. Without being careful it has a high chance of making things worse, not better!

Yep, sure, this needs careful tuning, like most other things. This is what we should do next.

Analyse all blocks/instructions in the loop, not only the sink block, and a multi loops, multi-block loop test case cant_sink_multi_block_loop_with_call for that.

hasStoreBetween is a little time consuming, so it is better to do some compiling time testing also.

llvm/lib/CodeGen/MachineSink.cpp
448

Will it cause something wrong if we don't clear the cache here?

I guess the cached data here should not impact the following sinking from the loop preheader to the loop body.
Till now, all the sinkings are from the same depth loop or from a bigger depth loop to a smaller depth loop.

1223

I guess we only need to call hasStoreBetween for load instructions?

1231

This is a little strange, if we check there are no alias instructions in the whole loop to I, do we still need the above hasStoreBetween check? I is from the loop preheader, SinkTo is a block in the loop, hasStoreBetween is also checking alias for load/store instructions.

Thanks for looking!

hasStoreBetween is a little time consuming, so it is better to do some compiling time testing also.

This is really where I think/hope the caching of the result is going to help, so that essentially the analysis is done once, and the rest are lookups. And since we run the loop sinking as a second second, I hope it's almost free.
But I said "hope" a few times, and it's a good suggestion to check this, which I will do! Also, you made a good point about clearing this cache, see inline.

llvm/lib/CodeGen/MachineSink.cpp
448

Good point. I haven't seen it doing anything wrong, I didn't try very hard though, but I put it there because I saw the preheader block being modified in the first sinking step. Loop sinking as a second step, then tries to sink instructions from that modified preheader block, so I thought about just clearing it just to be sure. But now thinking about it more, I don't think this can cause problems, so after verifying this I will remove this clearing of the cache.

1223

Yep, will change that.

1231

Agreed that this has become messy. I.e., it's doing more work than necessary.
We need to analyse all instructions in preheader block, which is what we do in 1).
Then, we need to analyse all blocks/instructions in the loop, which is what we do in 3).
I think we can now indeed get rid of 2), which would be useful for more generic cases but here in our case the preheader is directly dominating the loop blocks so don't need to analyse a path in between that.

This addresses the code review comments: removes the hasStoreBetween call which is unnecessary, and some other minor changes.

Will now to see if this has any effect on compile times.

Re: compile times, I haven't found anything concerning. It's all within the noise.

I found I have one thing to fix though.

  • This has been tested on the llvm test-suite for correctness and compile times, and both look good.
  • This fixes an issue related to loop-invariant checks, the HasLoopPHIUse check. It was incorrectly sinking instruction when they were not loop-invariant, i.e. when they had phi uses. This was an existing check, but I missed this when I moved this from MachineLICM to MachineSink. This is now a helper in MachineLoop, and used in both passes.
  • I have improved the MIR tests, especially the alias tests.
dmgreen added inline comments.Feb 8 2021, 7:59 AM
llvm/lib/CodeGen/MachineSink.cpp
1093

I happened to notice betweeen -> between

1196

Can this set the HasStoreCache for the block if it hasn't seen all the instructions in it yet?

SjoerdMeijer added inline comments.Feb 9 2021, 11:08 AM
llvm/lib/CodeGen/MachineSink.cpp
1196

Thanks Dave. I am investigating a miscompilation that I noticed for the llvm test suite. Will address your remark once I have a fix, and include it in the new revision for that.

This fixes a miscompilation for x86 which was related to eflags setting instructions. This check was already present in MachineSink, which I can now "borrow" for the loop-sinking. This now passes a the llvm test suite on x86 and on aarch64, and also a bootstrap.

Fixed typo.

I will look into this separately:

Can this set the HasStoreCache for the block if it hasn't seen all the instructions in it yet?

I have uploaded D96485 to control compile-times.

Like I said, this passes my testing. So, what do you think, good to go?
Please note this is still disabled by default.

Next step:

  • the interesting bit: cost-model to decide when to sink.

Friendly ping. Are we happy with this?

I appreciate that test X86/loop-sink.mir is a bit large which I added in the latest revisions, but this has been reduced from a llvm-suite miscompilation and I couldn't reduce it further within a reasonable amount of time. I thought one larger tests wouldn't be bad, actually saw some benefits.
The alternative is that I remove it. The added check is a corner case for X86, and it's an existing check that I am reusing so it is check that is covered. Just for completeness it's this check that I refactored into HasInstrSideEffects:

// If the instruction to move defines a dead physical register which is live
// when leaving the basic block, don't move it because it could turn into a
// "zombie" define of that preg. E.g., EFLAGS. (<rdar://problem/8030636>)
static bool HasInstrSideEffects(MachineInstr &MI, MachineBasicBlock *SinkTo) {

The X86 test is indeed large! Considering that it's just an unrolled loop with the same sequence repeated MANY times, could you not just delete 3/4 of the loop and update the indvar accordingly?

Hmmmm, I have tried quite a few things, didn't see this particular check triggering, but will give it another try then.

I will look into this separately:

Can this set the HasStoreCache for the block if it hasn't seen all the instructions in it yet?

hasStoreBetween is called by SinkInstruction, and we can't break the existing code even if the rest is currently not enabled by default.

llvm/lib/CodeGen/MachineSink.cpp
1204

This doesn't seem to have anything to do with side effects. I'm surprised there isn't more needed for clobbering physical registers. Because we are sinking into the start of a block, we only need to check the live-in, not any other instructions along the way?

Matt added a subscriber: Matt.Jan 5 2022, 3:21 PM
SjoerdMeijer abandoned this revision.Mar 17 2023, 1:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2023, 1:44 AM
Herald added a subscriber: StephenFan. · View Herald Transcript