This is an archive of the discontinued LLVM Phabricator instance.

[MachineLICM][MachineSink] Move SinkIntoLoop from MachineLICM to MachineSink
ClosedPublic

Authored by SjoerdMeijer on Dec 22 2020, 5:49 AM.

Details

Summary

This moves SinkIntoLoop from MachineLICM to MachineSink.

Not only is this arguably a better home for this, but more importantly there is infrastructure available in MachineSink that is required for making SinkIntoLoop more generic. For example, hasStoreBetween that is available could be queried by SinkIntoLoop to make it less conservative. This would be the next step of this work. At the moment, this is a non-functional change.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Dec 22 2020, 5:49 AM
SjoerdMeijer requested review of this revision.Dec 22 2020, 5:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2020, 5:49 AM
fhahn added inline comments.Dec 22 2020, 5:57 AM
llvm/test/CodeGen/AArch64/machine-licm-sink-instr.ll
8 ↗(On Diff #313310)

Can this be a MIR test? That would make it slightly easier to relate the test to the code changes.

SjoerdMeijer added inline comments.Dec 22 2020, 6:04 AM
llvm/test/CodeGen/AArch64/machine-licm-sink-instr.ll
8 ↗(On Diff #313310)

Yeah, I definitely thought about that. But since it is also nice/convenient to look at the final codegen, so decided to create a llc test. But if you prefer a mir test, I am happy with that too.

Now with a MIR test.

fhahn added inline comments.Dec 22 2020, 8:01 AM
llvm/lib/CodeGen/MachineLICM.cpp
780

This only checks for stores/calls in the preheader. What about the case where there are stores/clobbering calls in the loop body,

e.g. something like

%20:gpr64common = ADRP target-flags(aarch64-page) @A
STRWui %3, %20, target-flags(aarch64-pageoff, aarch64-nc) @A  :: (store 4 into `i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0)`)

in the test case?

llvm/test/CodeGen/AArch64/machine-licm-sink-instr.mir
159 ↗(On Diff #313341)

I don't think there's anything preventing _Z3usei to modify @A in the loop, so if we move the load into the loop, we may read different values on each iteration?

Hmmmm, good points, will check. I was assuming that the legality checks were performed, but yeah, this is impressively broken if not!

Yeah, so thanks again for looking at this. After looking more into this, I don't think this change at the moment makes much sense.

Sinking instructions is a bit of a mess as we have LoopSink, MachineSink, and MachineLICM all doing this (in different ways, and/or different use cases). This change here and function HasSuccessiveStoreInst() that I added is very similar to hasStoreBetween() in MachineSink. It's actually better, so I would like reuse that. As different strategies I have considered hoisting that out to a new MachineLoopUtils helper, but there's state (caching) that doesn't make this very convenient. I think the better alternative is to move function SinkIntoLoop() from MachineLICM to MachineSink because that is actually a more natural place for it, and there seems to be more/better infrastructure in MachineSink to do this (like hasStoreBetween() that I mentioned earlier).

A first prep step for this D94082, which creates a new helper isLoopInvariant() which we can then also use in MachineSink. After this, I would like to repurpose this ticket and move SinkIntoLoop() from MachineLICM to MachineSink if we agree that this makes sense. That should be a NFC, and then as a third step I would like to make the change that I wanted to make.

SjoerdMeijer retitled this revision from [MachineLICM] SinkIntoLoop: analyse successive store instructions to [MachineLICM][MachineSink] Move SinkIntoLoop from MachineLICM to MachineSink.
SjoerdMeijer edited the summary of this revision. (Show Details)

See updated Title and Summary.

Fixed formatting and description of the command line option. Related to this, and in addition to my previous message, this is off-by-default and is currently covered by exactly 1 test:
test/CodeGen/X86/sink-cheap-instructions.ll. After this NFC change, I will look into extending this.

lkail added a subscriber: lkail.Jan 7 2021, 7:04 AM
SjoerdMeijer added a comment.EditedJan 8 2021, 7:34 AM

I have upload WIP patch D94308 to show the direction and motivation for this change here.

Little friendly ping: how do we like this little reshuffle?

samparker added inline comments.
llvm/lib/CodeGen/MachineSink.cpp
86

Is there a plan to add an option for the maximum number of instructions sunk, or something like that?

363

SmallVectorImpl<MachineInstr*>&

387

What are we doing these checks for?

1172

So how do we know that MI is in a loop?

SjoerdMeijer added inline comments.Jan 15 2021, 7:20 AM
llvm/lib/CodeGen/MachineSink.cpp
86

Thanks for looking at this!

Getting loop-sinking to do something useful is going to be at least a 3-step approach:

  1. This is just moving the code, not changing it, so is a NFC and just a prep step use the infrastructure in MachineSink. Note, that loop-sink is disabled by default, so needs an option to be triggered.
  2. A first functional change to make loop-sink a little bit less restrictive, which it really is at the moment, is the change in D94308. This let's it do a bit more analysis using functions in MachineSink, making it a bit more powerful. Nothing changes much: still off by default, but is a good demonstrator for the reshuffle.
  3. This is the going to be he interesting step: decision making when and how many instructions to sink. This will be driven by the register pressure, and deciding if reducing live-ranges and loop sinking will help in better performance. I guess this is the place where also a maximum can be introduced and option to control this further, although I agree of course there's in principle nothing stopping us from introducing this earlier.
  4. Once we are happy with 3), this should be enabled by default, that should be the end goal of this exercise.
387

I think this is being conservative: when an instruction has multiple defs, it's trickier to analyse or we don't want to sink these instructions and skip over it. But I haven't looked much into this (just moved the code), so will need to do that.

1172

Yeah, good point. I think the only simple answer here is that at this point we don't.
Will look into this.

shchenz added a comment.EditedJan 17 2021, 8:23 PM

Is it possible to implement this new logic to the original MachineSink infra? If so, we can not limit to sink the instruction in loop preheader? I guess we prevent this kind of sinking in machinesink pass is because we treat it as nonprofitable in MachineSinking::isProfitableToSinkTo?
And we already have some simple register estimation model in MachineSinking::isProfitableToSinkTo, should we do this kind of sinking based on register pressure, for example, only sinking to loop if the register pressure of the loop is high?

llvm/lib/CodeGen/MachineSink.cpp
1179

Is this necessary? Assuming MI is a loop user, how does a loop COPY user impact the cost of sinking instruction I to the loop? I could still be any instruction like a load instruction.

Thank you very much for looking at this @shchenz !

Is it possible to implement this new logic to the original MachineSink infra? If so, we can not limit to sink the instruction in loop preheader? I guess we prevent this kind of sinking in machinesink pass is because we treat it as nonprofitable in MachineSinking::isProfitableToSinkTo?

It's good you mention this, because this was actually my first approach! :-) I started integrating this logic into the existing code, but run into problems. The main problem was that the existing logic and the new loop sink started interacting, which means I got infinite loops of new blocks being created and things moved. This could probably be fixed, which is what I started doing, but then found that this didn't make things clearer. Thus, since the existing logic and loop sink are slightly different algorithms, I saw the benefit of keeping them separate, for simplicity and clarity. Thus, in this draft, first the original sink logic runs, followed by loop sink. And there might also be a benefit of running them separate and one after the other, as opposed to one integrated solution (that's something I can imagine, but don't have the proof for yet). Long story short, my preference is to keep it separated for now, but that shouldn't exclude reconsidering this if we find a need for this later.

And we already have some simple register estimation model in MachineSinking::isProfitableToSinkTo, should we do this kind of sinking based on register pressure, for example, only sinking to loop if the register pressure of the loop is high?

Exactly right. That is another example and motivation to move this code to here, this is exactly the kind of things I want to reuse. Another example is function hasStoreBetween, which is what I am using in D94308. This patch is functionally complete, but I just need to add all the different MIR tests; but again, it's the justification of moving this code to MachineSink.
Please see also my reply earlier to @samparker, i.e. the four steps that I outlined that are necessary to make loop sinking useful. I am just pointing this out, because using isProfitableToSinkTo is exactly one of these next steps to let loop sink make an informed decision. And in other words, this is exactly what I will be doing once we have agreed on this NFC change, this change is the basis for these next steps.

llvm/lib/CodeGen/MachineSink.cpp
1179

You're absolutely right again, this could be any instruction, and is thus not necessary. But in absence of any cost-modeling or other decision making (based on register pressure), the original authors of this code found that it is always profitable to move the instruction if the user is a copy.
So yes, this is exactly the first thing that we will be changing when we integrate MachineSinking::isProfitableToSinkTo into this work in a next step.

SjoerdMeijer updated this revision to Diff 317537.EditedJan 19 2021, 5:30 AM
  • Added statistics NumLoopSunk, the number of instructions sunk into a loop,
  • Check that a use of a candidate is in a loop,
  • Added a bunch of negative tests.
  • Ported over the only positive test test/CodeGen/X86/sink-cheap-instructions.ll to a MIR test: this is sink_add in the new MIR test file.
  • Walk candidates in reverse order, to start at the bottom of def-use chains, and try using the uses first before attempting to sink a def.

The plan sounds good to me. Thanks for doing this @SjoerdMeijer

llvm/lib/CodeGen/MachineSink.cpp
365

Should we call TII->shouldSink() at the very beginning? Some targets may not want to sink some instructions.

373

Maybe I am wrong, but for the sinking, we must sink an instruction to a block that is dominated by the instruction's parent block. So if the destinate block is executed, BB must also be executed. Do we need the check for IsGuaranteedToExecute? I think this is not the same as hoisting. For hoisting, we must make sure the load must be executed as it will be hoisted to the loop preheader which will be executed surely.

383

This may be unnecessary as all the instructions come from loop preheader?

1175

This seems strange, MI should be a user inside the loop, why its parent block must be equal to the loop preheader?

SjoerdMeijer added inline comments.Jan 22 2021, 5:44 AM
llvm/lib/CodeGen/MachineSink.cpp
365

Thanks, agreed.

373

yeah, good catch. We are sinking from preheader to a loop block BB, and preheader dominates BB. I agree that we don't need this check, this is indeed different for hoisting and sinking. I will remove it.

383

I agreed and changed this into an assert, which seemed like to a good precondition to assert.
But this new assert triggered for one of the test cases where the preheader is the entry block, containing COPY instructions of the function arguments which use physical registers. Physical registers are marked as not loop invariant, which is why this assert triggered. So I think I will just keep this for now.

1175

I thought I had a reason, but it doesn't seem to make sense, so will remove that part of the condition.

Thanks @shchenz, comments addressed.

samparker added inline comments.Jan 22 2021, 6:55 AM
llvm/lib/CodeGen/MachineSink.cpp
459

Do we need to break here as soon as SinkIntoLoop is false? Otherwise, couldn't we sink an 'earlier' instruction that is an input to an instruction that hasn't been sunk?

SjoerdMeijer added inline comments.Jan 22 2021, 7:21 AM
llvm/lib/CodeGen/MachineSink.cpp
459

Unless I miss something, but I don't think so. I think that is too conservative. For example, if we have a number of sink candidates, and stop after e.g. the very first one if that can't be sunk for some reason, then we miss a lot of opportunities if this instruction is completely Independent from the other candidates.

Correct me if I am wrong, but I think what you describe is simply a case of not "IsSafeToMove": there is a dependency that we must respect and can't break.
That's why I also walk the candidates in reverse order. Because for a little def-use chain in a preheader, these 2 candidate instructions:

%12:gpr64sp = nuw ADDXri %9, 12, 0
%2:gpr64all = COPY %12

if we start with trying to move ADD, then that has a use in the same block, and we can't move the def passed that. If we first move the COPY, then the ADD, or at least try that, we can move the partial/whole chain.

samparker added inline comments.Jan 25 2021, 4:14 AM
llvm/lib/CodeGen/MachineSink.cpp
459

I thought IsSafeToMove is just about side-effects and memory operations? I was thinking purely about the data dependency chain. In your copy example, what's stopping us from skipping the COPY but sinking the ADD? (I understand we'll always sink the COPY really, so let's pretend it's just another general consumer of ADD)

SjoerdMeijer added inline comments.Jan 25 2021, 5:27 AM
llvm/lib/CodeGen/MachineSink.cpp
459

Yeah, okay. In my examples and added tests, I am sinking only load instructions (probably need to add some more tests). The extra argument DontMoveAcrossStore set to true makes this conservative. But you're right that for non-memory ops, this isn't sufficient. In D94308, I added a local MachineSinking::IsSafeToMove that adds alias checks.

So, long story short, this NFC and refactoring is bug-compatible with the original code. I thought separating out this refactoring from any new changes would be best, in order to make this change manageable and not too big. (Please note that this off by default.) This still looks the easiest to me, but if you think this is best, then I will start merging D94308 into this, let me know.

samparker added inline comments.Jan 25 2021, 6:53 AM
llvm/lib/CodeGen/MachineSink.cpp
459

Well how about just adding the break then? I think it's too simple to ignore and would be best to start with a base change that might be correct.

SjoerdMeijer added inline comments.Jan 25 2021, 6:58 AM
llvm/lib/CodeGen/MachineSink.cpp
459

Yeah, okay, nice one, let's do that.

Thanks, this now:

  • breaks if we can't sink an instruction,
  • and I have removed test/CodeGen/X86/sink-cheap-instructions.ll as that has no value anymore; this has been integrated as a MIR test and keeping this IR test around doesn't add anything at this point. When this loop sinking starts doing something useful and correct, better target tests need to be added.

Just added a few more test, trying to sink an add instruction: sink_add, store_after_add, and aliased_store_after_add

The last 2 are negative tests, and shouldn't sink things passed stores. The first, sink_add, is interesting because it shows that the original sink algorithm, and then loop-sink working one after the other, working "together", which is the good thing to do here. This means that a load instruction from the entry block:

Sink instr %12:gpr32common = LDRWui %9:gpr64common, 0 :: (load 4 from %ir.read, !tbaa !0)
      into block bb.1.for.body.preheader:

is sunk into the preheader so that the whole chain ends up in the preheader:

bb.1.for.body.preheader:
    successors: %bb.3(0x80000000)
    %12:gpr32common = LDRWui %9:gpr64common, 0 :: (load 4 from %ir.read, !tbaa !0)
    %14:gpr32sp = ADDWri %12, 42, 0
    %1:gpr32all = COPY %14
    B %bb.3

Which are then all considered for loop sinking (and that will be sunk with follow up patch D94308).

This revision is now accepted and ready to land.Jan 27 2021, 12:46 AM

Many thanks @shchenz, @fhahn and @samparker for your help and reviews!