This is an archive of the discontinued LLVM Phabricator instance.

[AntiDepBreaker] Don't rename callee saved register restore instruction
AcceptedPublic

Authored by Carrot on Apr 10 2017, 3:02 PM.

Details

Summary

Current AntiDepBreaker implementation can rename callee saved register restore instruction, and then callee saved register contain garbage value after return.

We should not rename callee saved register restore instruction.

Diff Detail

Event Timeline

Carrot created this revision.Apr 10 2017, 3:02 PM
echristo edited edge metadata.Apr 26 2017, 4:55 PM

Testcase? Seems reasonable otherwise, but Matthias might have something more here.

MatzeB added reviewers: hfinkel, kparzysz.EditedApr 26 2017, 5:04 PM
MatzeB edited edge metadata.

This hits on a known problem in llvms CodeGen today: There is no good way to differentiate between a register use/def that is required because of ABI (or other) constraints and just a normal assignment by the register allocator. Only the latter can be renamed/recolored which makes it tricky to it correctly today.

I personally don't care about AggressiveAntiDepBreaker too much. However this "fix" seems pretty pessimistic/conservative to me: As I understand it you just exclude all the callee saves from renaming. I think targets that rely on the anti dep breaker (seems to be used by PowerPC/Hexagon today) should at least do some benchmarking before accepting this.

FWIW this is fixing a wrong code case for Power.

As a possible better design: Last time I had a similar problem in the context of copy propagation I checked whether the uses/defs involved had a register forced in their MCInstrDesc description and I assumed all implicit uses/defs to be constrained as well as they are typically used for calls/asm.

That seems to work for most targets (some had additional constraints not matched by this heuristic but I think the big ones X86, ARM, AArch64, Power lit tests worked when I experimented with that) though none of these heuristics made it into trunk so take this with a grain of salt...

FWIW this is fixing a wrong code case for Power.

Sure I guess wrong code beats performance. Though at some point you may just as well disable the AntiDepBreaker pass for improved correctness :)

MatzeB accepted this revision.Apr 26 2017, 5:16 PM

Anyway won't be me standing in the way of this fix, I leave the decision to powerpc/hexagon folks.

This revision is now accepted and ready to land.Apr 26 2017, 5:16 PM

Testcase? Seems reasonable otherwise, but Matthias might have something more here.

Unfortunately the test case in pr32608 is not reliable, the usage of callee saved register is not optimal, future enhancement can and should remove the callee saved register usage in the test case.

This hits on a known problem in llvms CodeGen today: There is no good way to differentiate between a register use/def that is required because of ABI (or other) constraints and just a normal assignment by the register allocator. Only the latter can be renamed/recolored which makes it tricky to it correctly today.

I personally don't care about AggressiveAntiDepBreaker too much. However this "fix" seems pretty pessimistic/conservative to me: As I understand it you just exclude all the callee saves from renaming. I think targets that rely on the anti dep breaker (seems to be used by PowerPC/Hexagon today) should at least do some benchmarking before accepting this.

The patch also checks (RegRefs.count(AntiDepReg) == 1), with this condition, it almost ensures the instruction is callee saved register restore instruction, if it is not, it should be some other rare cases.

kparzysz edited edge metadata.Apr 28 2017, 3:17 PM

We add all callee saved registers as live-in on all paths from the restores to the exit, so this problem shouldn't affect us. Actually the anti-dep breaker was the reason why we did it in the first place.

We add all callee saved registers as live-in on all paths from the restores to the exit, so this problem shouldn't affect us. Actually the anti-dep breaker was the reason why we did it in the first place.

All callee saves? That would be new to me. I think we only do that for callee saves that are saved/restored *somewhere* in the function, the rest is considered "pristine" and not explicitely mentioned in the live-in lists.

All callee saves? That would be new to me. I think we only do that for callee saves that are saved/restored *somewhere* in the function, the rest is considered "pristine" and not explicitely mentioned in the live-in lists.

Yes, that's what I meant---the ones that were actually clobbered in the function.

All callee saves? That would be new to me. I think we only do that for callee saves that are saved/restored *somewhere* in the function, the rest is considered "pristine" and not explicitely mentioned in the live-in lists.

Yes, that's what I meant---the ones that were actually clobbered in the function.

In any case as far as I understand it this patch is about not renaming one of the restored registers (so we do not suddenly restore the callee save into some unrelated register that happens to be free). So it's not about liveness but about which registers are legal to rename/recolor.

This hits on a known problem in llvms CodeGen today: There is no good way to differentiate between a register use/def that is required because of ABI (or other) constraints and just a normal assignment by the register allocator. Only the latter can be renamed/recolored which makes it tricky to it correctly today.

I personally don't care about AggressiveAntiDepBreaker too much. However this "fix" seems pretty pessimistic/conservative to me: As I understand it you just exclude all the callee saves from renaming. I think targets that rely on the anti dep breaker (seems to be used by PowerPC/Hexagon today) should at least do some benchmarking before accepting this.

The patch also checks (RegRefs.count(AntiDepReg) == 1), with this condition, it almost ensures the instruction is callee saved register restore instruction, if it is not, it should be some other rare cases.

Oh indeed that sounds a lot less conservative as I initially assumed. So the effect on performance is probably minimal/negligible.

I don't think this is really the best approach. This will simply disable renaming of callee-saved registers, no matter where they are used.

In any case as far as I understand it this patch is about not renaming one of the restored registers (so we do not suddenly restore the callee save into some unrelated register that happens to be free). So it's not about liveness but about which registers are legal to rename/recolor.

Then what's the problematic situation? It was exactly my understanding that we renamed a callee-saved register defined by the restore instruction. What else could be the problem?

In any case as far as I understand it this patch is about not renaming one of the restored registers (so we do not suddenly restore the callee save into some unrelated register that happens to be free). So it's not about liveness but about which registers are legal to rename/recolor.

Then what's the problematic situation? It was exactly my understanding that we renamed a callee-saved register defined by the restore instruction. What else could be the problem?

Yes that's what I understood as well, but I don't see how that is related to liveness information.

kparzysz added a comment.EditedApr 28 2017, 3:47 PM

Being live-on-exit will prevent renaming, so by updating the liveness we prevent renaming on the paths where it shouldn't happen. So, the problem that this patch is trying to fix will not happen on Hexagon.

Ah, I remember---it was the reverse problem: renaming other register into a restored callee-saved register.

In any case, the problem that this patch is trying to fix still doesn't apply to Hexagon. We add all of the clobbered callee-saved register as implicit uses to the return instructions.

I think that this patch tries to fix a problem where it manifests itself, not where it actually happens.

Carrot added a comment.May 2 2017, 2:55 PM

We add all callee saved registers as live-in on all paths from the restores to the exit, so this problem shouldn't affect us. Actually the anti-dep breaker was the reason why we did it in the first place.

The second loop in function AggressiveAntiDepBreaker::StartBlock tries to do the work you described. Unfortunately it doesn't work with shrink wrapping. It says:

Mark live-out callee-saved registers. In a return block this is
all callee-saved registers. In non-return this is any
// callee-saved register that is not saved in the prolog.

1 The function MFI.getPristineRegs(MF) gets all registers actually saved in the function, not necessary at the prolog position if shrink wrapping has occurred.
2 If the register restore instructions are moved to non-return BB, the logic should be same as return block, then the logic in the code is wrong.

Any suggestions on how to fix it?

MatzeB added a comment.May 2 2017, 4:25 PM

1 The function MFI.getPristineRegs(MF) gets all registers actually saved in the function, not necessary at the prolog position if shrink wrapping has occurred.

This statement is wrong! getPristineRegs() gives you all ABI callee saved registers that are not save/restored anywhere in the function (because the register was never modified or clobbered, so the save/restore so there was no need for a save/restore).

Carrot added a comment.May 2 2017, 4:56 PM

1 The function MFI.getPristineRegs(MF) gets all registers actually saved in the function, not necessary at the prolog position if shrink wrapping has occurred.

This statement is wrong! getPristineRegs() gives you all ABI callee saved registers that are not save/restored anywhere in the function (because the register was never modified or clobbered, so the save/restore so there was no need for a save/restore).

Thank you for the correction, actually I want to say

The condition !Pristine.test(Reg) can only check if the specified callee saved register has been saved in the function, but not necessary at the prolog position if shrink wrapping has occurred.

We add all callee saved registers as live-in on all paths from the restores to the exit, so this problem shouldn't affect us. Actually the anti-dep breaker was the reason why we did it in the first place.

The second loop in function AggressiveAntiDepBreaker::StartBlock tries to do the work you described. Unfortunately it doesn't work with shrink wrapping. It says:

This should be done in PEI, not in the anti-dependency breaker. This is where the CSR registers get the concrete saving/restoring code. Passes running afterwards should not need to know about what register is CSR or not (except pristine registers, but that's general enough).

MatzeB added a comment.May 3 2017, 1:20 PM

We add all callee saved registers as live-in on all paths from the restores to the exit, so this problem shouldn't affect us. Actually the anti-dep breaker was the reason why we did it in the first place.

The second loop in function AggressiveAntiDepBreaker::StartBlock tries to do the work you described. Unfortunately it doesn't work with shrink wrapping. It says:

This should be done in PEI, not in the anti-dependency breaker. This is where the CSR registers get the concrete saving/restoring code. Passes running afterwards should not need to know about what register is CSR or not (except pristine registers, but that's general enough).

Note that this is true for live-ins, however we don't explicitely safe live-out lists in the basic blocks. So for return blocks you have to take all CSR registers (depending on what you do with/without pristines). (See also https://reviews.llvm.org/D32464 where I partially rediscovered that fact).

So for return blocks you have to take all CSR registers (depending on what you do with/without pristines). (See also https://reviews.llvm.org/D32464 where I partially rediscovered that fact).

We (Hexagon) add all the non-pristine CSRs as implicit uses of the return instructions.

MatzeB added a comment.May 3 2017, 1:40 PM

So for return blocks you have to take all CSR registers (depending on what you do with/without pristines). (See also https://reviews.llvm.org/D32464 where I partially rediscovered that fact).

We (Hexagon) add all the non-pristine CSRs as implicit uses of the return instructions.

Yeah I like that way of modeling things explicitely. Unfortunately that is not how most llvm targets work today.

Yeah I like that way of modeling things explicitely. Unfortunately that is not how most llvm targets work today.

Would it be reasonable to implement that for all targets?

hfinkel edited edge metadata.May 4 2017, 7:35 AM

Yeah I like that way of modeling things explicitely. Unfortunately that is not how most llvm targets work today.

Would it be reasonable to implement that for all targets?

Could we check this in the verifier? One advantage this might have is that, for blocks ending in unreachable, we can easily realize that we don't need the restoration code.

Could we check this in the verifier? One advantage this might have is that, for blocks ending in unreachable, we can easily realize that we don't need the restoration code.

Are we adding restoration code to such blocks now? That could be checked in the verifier, but it's not incorrect. Maybe a warning could be issued...

The verifier should be able to check if the callee-saved registers are properly marked as live/used in the relevant places (return instructions, etc.)

Could we check this in the verifier? One advantage this might have is that, for blocks ending in unreachable, we can easily realize that we don't need the restoration code.

Are we adding restoration code to such blocks now? That could be checked in the verifier, but it's not incorrect. Maybe a warning could be issued...

I didn't mean checking that in the verifier. I meant checking that the return insts had the necessary implicit operands in the non-unreachable case. Looks like you say yes.

The verifier should be able to check if the callee-saved registers are properly marked as live/used in the relevant places (return instructions, etc.)

gberry added a subscriber: gberry.May 4 2017, 8:11 AM

I didn't mean checking that in the verifier. I meant checking that the return insts had the necessary implicit operands in the non-unreachable case. Looks like you say yes.

The verifier should be able to check if the callee-saved registers are properly marked as live/used in the relevant places (return instructions, etc.)

Sorry, yes, that is correct.

Yeah I like that way of modeling things explicitely. Unfortunately that is not how most llvm targets work today.

Would it be reasonable to implement that for all targets?

I think it would work and I think it would make the representation a bit cleaner. If there happened to be a target with hundreds of CSRs that would be a lot of extra operands, on the other hand that target already has hundreds of entries in the CSR list anyway.

Changing this would simplify some code in LivePhysReg, RegScavenger, DepBreaker etc. where people iterate the CSRs today. It's a small thing though I don't expect it to have a huge impact one way or another...

Could we check this in the verifier? One advantage this might have is that, for blocks ending in unreachable, we can easily realize that we don't need the restoration code.

Verifying it today would be easy. Though if we go that route it may be reasonable to just not model an explicit list of CSRs anymore after PrologEpilog inserter (which means we couldn't verify anymore, but I always like it if we can do the same with less concepts/complexity).

It would indeed fix some situations in which we cannot properly differentiate between return vs. unreachable/noreturn call anymore (imagine if-converting a return instructions that ends up in the middle of a basic block, which MachineBasicBlock::isReturnBlock() would miss).

Carrot added a comment.May 4 2017, 2:44 PM

We add all callee saved registers as live-in on all paths from the restores to the exit, so this problem shouldn't affect us. Actually the anti-dep breaker was the reason why we did it in the first place.

The second loop in function AggressiveAntiDepBreaker::StartBlock tries to do the work you described. Unfortunately it doesn't work with shrink wrapping. It says:

This should be done in PEI, not in the anti-dependency breaker. This is where the CSR registers get the concrete saving/restoring code. Passes running afterwards should not need to know about what register is CSR or not (except pristine registers, but that's general enough).

Note that this is true for live-ins, however we don't explicitely safe live-out lists in the basic blocks. So for return blocks you have to take all CSR registers (depending on what you do with/without pristines). (See also https://reviews.llvm.org/D32464 where I partially rediscovered that fact).

Can your patch in D32464 also fix this problem?

MatzeB added a comment.May 4 2017, 2:46 PM

We add all callee saved registers as live-in on all paths from the restores to the exit, so this problem shouldn't affect us. Actually the anti-dep breaker was the reason why we did it in the first place.

The second loop in function AggressiveAntiDepBreaker::StartBlock tries to do the work you described. Unfortunately it doesn't work with shrink wrapping. It says:

This should be done in PEI, not in the anti-dependency breaker. This is where the CSR registers get the concrete saving/restoring code. Passes running afterwards should not need to know about what register is CSR or not (except pristine registers, but that's general enough).

Note that this is true for live-ins, however we don't explicitely safe live-out lists in the basic blocks. So for return blocks you have to take all CSR registers (depending on what you do with/without pristines). (See also https://reviews.llvm.org/D32464 where I partially rediscovered that fact).

Can your patch in D32464 also fix this problem?

Possibly, try it out.

MatzeB added a comment.May 4 2017, 2:48 PM

We add all callee saved registers as live-in on all paths from the restores to the exit, so this problem shouldn't affect us. Actually the anti-dep breaker was the reason why we did it in the first place.

The second loop in function AggressiveAntiDepBreaker::StartBlock tries to do the work you described. Unfortunately it doesn't work with shrink wrapping. It says:

This should be done in PEI, not in the anti-dependency breaker. This is where the CSR registers get the concrete saving/restoring code. Passes running afterwards should not need to know about what register is CSR or not (except pristine registers, but that's general enough).

Note that this is true for live-ins, however we don't explicitely safe live-out lists in the basic blocks. So for return blocks you have to take all CSR registers (depending on what you do with/without pristines). (See also https://reviews.llvm.org/D32464 where I partially rediscovered that fact).

Can your patch in D32464 also fix this problem?

Possibly, try it out.

If that is the case then there probably wasn't a bug in AggressiveAntiDepBreaker but you got hit by incomplete live-in lists.

Carrot added a comment.May 4 2017, 3:34 PM

Can your patch in D32464 also fix this problem?

Possibly, try it out.

Will do, thanks a lot.

Carrot added a comment.May 5 2017, 2:22 PM

Note that this is true for live-ins, however we don't explicitely safe live-out lists in the basic blocks. So for return blocks you have to take all CSR registers (depending on what you do with/without pristines). (See also https://reviews.llvm.org/D32464 where I partially rediscovered that fact).

Can your patch in D32464 also fix this problem?

Possibly, try it out.

D32464 does fix the problem in pr32608.
Thanks a lot!

I have an initial implementation of the discussed approach in D33003.