Page MenuHomePhabricator

[mips] MIPS32R6 compact branch support
ClosedPublic

Authored by sdardis on Jan 20 2016, 4:44 AM.

Details

Summary

Summary:
MIPSR6 introduces a class of branches called compact branches. Unlike the
traditional MIPS branches which have a delay slot, compact branches do not
have a delay slot. The instruction following the compact branch is only
executed if the branch is not taken and must not be a branch.

This patch implements support for compact branches on MIPS32R6.

Generate compact branches for MIPS32R6 when the delay slot filler cannot fill
a delay slot. Inspect the generated code for forbidden slot hazards (a compact
branch with an adjacent branch or other CTI) and insert nops to clear this
hazard.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
vkalintiris added inline comments.Jan 20 2016, 5:15 PM
lib/Target/Mips/MipsDelaySlotFiller.cpp
507–559 ↗(On Diff #45374)

We should make this function a member of MipsSEInstrInfo. This is where we keep similar logic for branch instruction operations.

Can you merge the first switch statement into the second? The check for the number of explicit operands becomes unnecessary once you've identified the opcode. Also, this is where we should provide the relevant logic for branches in microMIPS mode.

557 ↗(On Diff #45374)

I think that we should return 0 here. Mips::ZERO is a positive number.

564–612 ↗(On Diff #45374)

The same functionality is implemented in MipsInstrInfo::genInstrWithNewOpc(). You can expand the logic of that function, for branches that use the $zero register.

This revision now requires changes to proceed.Jan 20 2016, 5:15 PM
sdardis updated this revision to Diff 45548.Jan 21 2016, 9:50 AM
sdardis edited edge metadata.
sdardis marked 6 inline comments as done.

Addressed comments. Shifted the $zero detection for branches into MipsInstrInfo::getInstrWithNewOpc. I've elected to keep the separate hazard pass for the moment, see my reply there.

Thanks,
Simon

Apologies, comment didn't get posted, reproduced here:

Shouldn't we just take care so that the MipsDelaySlotFiller pass does not create any hazards in the first place?

For this patch that approach could be taken but I think its better it introduce a fairly specific pass to handle forbidden slot hazards.

Handling hazards as a separate pass introduces a high level of safety as then compact branches can be produced anywhere in the compilation pipeline without the subtly incorrect reliance that the delay slot filler will fix those hazards. It's a separation of concerns issue in my view.

Aside: GCC for MIPS uses a very similar hazard strategy for forbidden slot, hi/lo and delayed value hazards.

Thoughts?

lib/Target/Mips/Mips.h
34 ↗(On Diff #45374)

Shouldn't we just take care so that the MipsDelaySlotFiller pass does not create any hazards in the first place?

For this patch that approach could be taken but I think its better it introduce a fairly specific pass to handle forbidden slot hazards.

Handling hazards as a separate pass introduces a high level of safety as then compact branches can be produced anywhere in the compilation pipeline without the subtly incorrect reliance that the delay slot filler will fix those hazards. It's a separation of concerns issue in my view.

Aside: GCC for MIPS uses a very similar hazard strategy for forbidden slot, hi/lo and delayed value hazards.

Thoughts?

Apologies, comment didn't get posted, reproduced here:

Shouldn't we just take care so that the MipsDelaySlotFiller pass does not create any hazards in the first place?

For this patch that approach could be taken but I think its better it introduce a fairly specific pass to handle forbidden slot hazards.

Handling hazards as a separate pass introduces a high level of safety as then compact branches can be produced anywhere in the compilation pipeline without the subtly incorrect reliance that the delay slot filler will fix those hazards. It's a separation of concerns issue in my view.

Aside: GCC for MIPS uses a very similar hazard strategy for forbidden slot, hi/lo and delayed value hazards.

Thoughts?

Following this approach means that the MipsDelaySlotFiller pass will no longer generate correct output code, even in the presence of correct input code. As far as I can tell, the code generator provides (or at least it should provide) correct input code to the MipsDelaySlotFiller pass as it does not generate any CTIs in any slot.

Even if we start emitting CTIs in forbidden slots for some unforeseeable reason in the future, conceptually, the MipsDelaySlotFiller pass would be the more appropriate place to remove them as it is responsbile for the handling
of slots either way.

Having said that, I wouldn't be against a dedicated hazard handling pass if we couldn't avoid generating them from the code generator, or handle them at the right pass.

sdardis updated this revision to Diff 45690.Jan 22 2016, 8:15 AM
sdardis edited edge metadata.

Shifted the hazard clearing logic into the delay slot filler.

vkalintiris added inline comments.Jan 25 2016, 5:17 AM
lib/Target/Mips/MipsDelaySlotFiller.cpp
506–510 ↗(On Diff #45690)

If you update Branch with the return value of genInstrWithNewOpc() then deleting the previous branch is as simple as: std::next(Branch)->eraseFromParent();

533–575 ↗(On Diff #45690)

I think that it would be useful to have these functions at MipsInstrInfo as we could call them from other passes too.

557 ↗(On Diff #45690)

The cast is redundant.

595–659 ↗(On Diff #45690)

There's no need to iterate over every instruction twice. We can check whether we have to add a NOP inside replaceWithCompactBranch() and bundle it together with the compact branch.

lib/Target/Mips/MipsInstrInfo.cpp
286 ↗(On Diff #45690)

We don't have to check for the number of explicit operands.

287 ↗(On Diff #45690)

You can use MipsABIInfo::GetZeroReg() to test with the right zero register. Also, the cast is unnecessary.

302–303 ↗(On Diff #45690)

Shouldn't we set ZeroOperandBranch to false in the default case?

lib/Target/Mips/MipsSEInstrInfo.cpp
438 ↗(On Diff #45690)

Redundant newline.

473–482 ↗(On Diff #45690)

We will never reach this if STI.hasMips32r6() is true. We should merge the logic of opcode selection into the first switch statement.

sdardis updated this revision to Diff 45959.Jan 26 2016, 2:58 AM
sdardis retitled this revision from [mips] MIPSR6 compact branch support to [mips] MIPS32R6 compact branch support.
sdardis updated this object.
sdardis marked 8 inline comments as done.

Addressed outstanding comments except for the one regarding Filler::clearFSHazards. See my comment there.

Added a FIXME to MipsISelLowering.cpp: MipsTargetLowering::emitAtomicBinary on an outstanding code gen issue and to MipsInstrInfo::genInstrWithNewOpc as they're related. Without the checking for Mips::ZERO and MipsABIInfo::GetZeroReg() we get more verbose asm than necessary.

Updated commit text to reflect that its for MIPS32R6 only.

sdardis updated this object.Jan 26 2016, 2:58 AM
sdardis added inline comments.
lib/Target/Mips/MipsDelaySlotFiller.cpp
595–659 ↗(On Diff #45690)

We can check whether we have to add a NOP inside replaceWithCompactBranch() and bundle it together with the compact branch.

We can't. You've missed case C in the comment.

Consider the following from function l() in the test case before delay slot filling (using assembly here to keep things clear):

      move   $16, $2
      jal      j
      bne    $16, $2, $BB0_2
# BB#1:                                 # %if.then
      addiu   $4, $zero, -2
      jal     f

For the first jal, the delay slot filler will insert the 'move' into its delay slot. The delay slot filler has no instruction to put in the delay slot of bne, so it transforms it into bnec. No nop will be inserted as a) there is no CTI following bnec in that basic block, and b) the first non-debug instruction in the physically following basic block is not a CTI.

The delay filler then considers BB#1 and schedules the addiu into the delay slot of the jal. The delay filler has now created a forbidden slot hazard. Handling this in the delay filler itself requires identifying such special cases, then inserting a nop into the basic block of the compact branch (which could be the previous basic block to the one we're working on).

This is in addition to cases A and B described in the comment. My reasoning for processing all the instructions again is that it trivializes all those cases (and possibly others) into a 'is the physical successor instruction of a compact branch a CTI?' examination rather than inserting checks where ever we could create a FS hazard. Inserting handling FS hazard logic in multiple places I think could lead to correctness issues as we'd need to ensure we cover all possible cases, rather than a simple (mini-)pass decoupled from the delay slot filling logic.

Thoughts?

lib/Target/Mips/MipsInstrInfo.cpp
287 ↗(On Diff #45690)

You can use MipsABIInfo::GetZeroReg() to test with the right zero register.

Strangely enough, this doesn't work.

MipsTargeLowering::emitAtomicBinary will generate a Mips::BEQ with Mips::ZERO as an operand on mips64 if the requested size is 4. For mips64 it appears we need to check for MipsABIInfo::GetZeroReg() and Mips::ZERO for the moment.

This appears to be an outstanding issue with emitAtomicBinary.

lib/Target/Mips/MipsSEInstrInfo.cpp
438 ↗(On Diff #45690)

clang-format inserts the newline, leave or keep it?

vkalintiris added inline comments.Jan 27 2016, 4:09 AM
lib/Target/Mips/MipsDelaySlotFiller.cpp
549–613 ↗(On Diff #45959)

The (C) case will happen when filling the delay slot of a normal branch, ie. inside the MipsDelaySlotFiller pass. We can teach the "search-backwards" part of the delay slot filler (DSF) algorithm to handle this by checking the last instruction of the BB's layout predecessor. If there are other issues that conceptually belong to the DSF pass, then we should add them too by modifying the pass accordingly.

I don't see how (B) can happen as we only emit compact branches at this pass for the time being. From my perspective, if a previous pass would like to add a compact-branch, then I can think of two options: (a) insert the compact branch only if the next instruction is not a CTI-hazard and leave the task of fixing the corner cases that happen during the filling of normal branches by the DSF, or (b) insert a NOP and just offload everything to the DSF. I can't think of any reason for a post-DSF pass to insert new/additional compact-branches or move non-CTI instructions from a forbidden slot.

Even if we find ourselves constrained by having the DSF pass handling everything, then we can add a separate/final pass that fixes every case left over from previous passes. However, with this design we would achieve the minimum number of places were our code is wrong/invalid.

If we'd want to be on the really safe side and make sure that we are aware of every possible case that we might haven't consider yet, then we can add a separate pass and enable it only for debug builds. It could assert upon finding a compact-branch that contains a CTI in its forbidden slot.

lib/Target/Mips/MipsInstrInfo.cpp
330 ↗(On Diff #45959)

Ok, if I understand correctly then we should just check for Mips::ZERO and keep the FIXME comment, ie. we can remove the Subtarget.getABI().GetZeroReg() check as it's not working at the moment.

lib/Target/Mips/MipsSEInstrInfo.cpp
438 ↗(On Diff #45959)

It doesn't insert one for me, so I suggest that we remove the newline.

sdardis added inline comments.Jan 27 2016, 5:39 AM
lib/Target/Mips/MipsDelaySlotFiller.cpp
549–613 ↗(On Diff #45959)

Even if we find ourselves constrained by having the DSF pass handling everything, then we can add a separate/final pass that fixes every case left over from previous passes. However, with this design we would achieve the minimum number of places were our code is wrong/invalid.

Rather than wiring FS hazard handling logic into the DSF, we could split it off into a separate pass and enable it after the delay slot filler, like the first diff. Thoughts dsanders? It keeps the implementation safer and simpler.

sdardis updated this revision to Diff 46283.Jan 28 2016, 9:11 AM
sdardis marked 2 inline comments as done.

Addressed comments. move hazard schedule back into it's own pass. You ok with that change Daniel?

sdardis updated this revision to Diff 46763.Feb 3 2016, 3:54 AM

Simplify MipsHazardSchedule pass according to style guide.

dsanders requested changes to this revision.Feb 16 2016, 5:39 AM
dsanders added a reviewer: dsanders.

Most of these are nits but there a a couple important ones.

Regarding testing this pass. Ideally we would be using MIR but I'm not going to require that since I haven't had chance to try it myself.

Could you put the new tests in a subdirectory so that we keep forbidden slot tests together and can migrate them to MIR tests later?

lib/Target/Mips/MipsDelaySlotFiller.cpp
512 ↗(On Diff #46763)

Mips16 uses the delay slot filler too as far as I know. If so, this could be a Mips16InstrInfo

559–623 ↗(On Diff #46763)

The separation makes sense to me given that we're only using compact branches when delay slots go unfilled.

There are a couple cases I can think of where this may not be the right thing to do:

  1. If we need a nop for both the delay slot and the forbidden slot then we need to decide which path should have the bubble. Ideally, it should be on the coldest path.
  2. Some machines may generally prefer compact branches.

For #1, we should allow the hazard scheduler to revert the branch to a delay slot branch if the BB probabilities suggest that the taken path is colder than the not-taken path.

For #2, I think it's reasonable to expect that such machine will exist in the future but I'm not aware of any yet. It shouldn't be difficult to deal with that if/when it arises.

565–566 ↗(On Diff #46763)

This could be a Mips16InstrInfo

612–615 ↗(On Diff #46763)

Could you put braces around this?

lib/Target/Mips/MipsHazardSchedule.cpp
11–43 ↗(On Diff #46763)

Could you change this into doxygen documentation? (see '\file')

86 ↗(On Diff #46763)

Don't repeat the function name in new code. Also, please use doxygen comments ('///').

97 ↗(On Diff #46763)

We shouldn't have the '|| inMicroMipsMode()' here. microMIPSR6 has forbidden slots too.

104 ↗(On Diff #46763)

Use the arrow operator instead of '(*FI).'. There's a few other cases of this below

122 ↗(On Diff #46763)

We can only have one fallthrough. We should stop iterating once we find it.

125–128 ↗(On Diff #46763)

Please factor out the duplicate code.

lib/Target/Mips/MipsInstrInfo.cpp
260–302 ↗(On Diff #46763)

We ought to tablegen-erate these.

324 ↗(On Diff #46763)

I read this as being a branch with no operands. Can you make it clearer?

330 ↗(On Diff #46763)

We'll should test for both ZERO and ZERO_64 so that this is still correct when the atomics are fixed.

lib/Target/Mips/MipsInstrInfo.h
74 ↗(On Diff #46763)

Please make this doxygen comment document the function

74–78 ↗(On Diff #46763)

Please make the arguments references since nullptr is not permitted

lib/Target/Mips/MipsSEInstrInfo.cpp
451 ↗(On Diff #46763)

This doesn't look equivalent to BEQ to me. What happens if the second operand of the comparison isn't $zero?

456 ↗(On Diff #46763)

This doesn't look equivalent to BEQ to me. What happens if the second operand of the comparison isn't $zero?

lib/Target/Mips/MipsSEInstrInfo.h
69 ↗(On Diff #46763)

The iterator should be const too.

This revision now requires changes to proceed.Feb 16 2016, 5:39 AM
sdardis updated this revision to Diff 48297.Feb 18 2016, 3:15 AM
sdardis edited edge metadata.
sdardis marked 11 inline comments as done.

Addressed nits. Instructions are now flagged as CTIs or having forbidden slots in the .td files using target specific flags. Moved getEquivalentCompactForm to MipsInstrInfo as it can actually apply to MIPS16, MIPSR6, microMIPS, microMIPSR6.

sdardis updated this revision to Diff 48308.Feb 18 2016, 6:17 AM
sdardis edited edge metadata.

Rebased to ToT.

lib/Target/Mips/MipsDelaySlotFiller.cpp
559–623 ↗(On Diff #46763)

For #1, I'm not quite following you here. We have to place the nop after the compact branch as part the same basic block. For example:

BB1:
   <no instructions to go in a delay slot>
   bne v0,L3
BB2:
   jal g
   * move a0, v0
BB3:
....

The bne here will be transformed into a compact form. Since the next instruction is unsafe in the forbidden slot, we have to put a nop in the slot. We can't put the nop in BB3 as the first instruction as that doesn't clear the hazard. The hazard is due to instruction layout, not execution.

For #2, it's somewhat trivial to block delay slot filling for branches which have corresponding delay slot forms.

lib/Target/Mips/MipsHazardSchedule.cpp
97 ↗(On Diff #46763)

microMIPSR6 doesn't have forbidden slots. The MD00582-2B-microMIPS32-AFP-06.03 says that "Any instruction, including a branch or jump, may immediately follow a branch or jump, that is, delay slot restrictions do not apply in Release 6". Confusingly MIPSR6 ISA has them.

lib/Target/Mips/MipsSEInstrInfo.cpp
451 ↗(On Diff #46763)

If the second operand of the comparison is not $zero canUseMicroMipsBranches is false as that explicitly checks for conditions that beqz16 can be used in. However, if we have microMIPSR6 we can generate beqc (long version). if both canUseMicroMipsBranches and hasMips32r6() is false we return 0 since there's no equivalent form.

I've renamed canUseMicroMipsBranches to canUseShortMMBranches to better reflect what it's doing.

456 ↗(On Diff #46763)

See above.

Re: MIR tests:

I have another ~6 patches for compact branch/jumps pending. Unfortunately every one of them modifies compact-branches.ll, I haven't submitted them upstream yet because this one is the necessary groundwork for all of them. After submitting of them all or in the last patch, I could redo test/Codegen/Mips/compactbranches/compact-branches.ll as a MIR based test.

dsanders accepted this revision.Mar 3 2016, 6:16 AM
dsanders edited edge metadata.

LGTM with a couple nits.

lib/Target/Mips/MipsDelaySlotFiller.cpp
557–624 ↗(On Diff #48308)

For #1, I'm trying to say that, in some circumstances, delay-slot branches can be a better choice than compact branches when both are available.

For example:

BB1:
  ...
  bne $2, BB3
  nop
BB2:
  jal g
BB3:
  ...

introduces a bubble on both the taken and not-taken paths while:

BB1:
  ...
  bnec $2, BB3
  nop
BB2:
  jal g
BB3:
  ...

introduces a bubble on the taken path and possibly two bubbles on the not-taken path depending on implementation.
I'm told this kind of thing occurs between delay-slot returns and compact-branch returns but we need to verify that.

My comment was intended to be something to think about rather than something to do in this patch. Forbidden slots is a correctness issue so it's important that we get a solution in place and we can expand on performance decisions in later patches.

The hazard is due to instruction layout, not execution.

That's right, but we aren't forced to choose code that has this hazard.

lib/Target/Mips/MipsHazardSchedule.cpp
98 ↗(On Diff #48308)

Ok then. I'm pretty sure it had them at one point but it doesn't now

lib/Target/Mips/MipsInstrInfo.cpp
259–260 ↗(On Diff #48308)

Repeated name in comment.

lib/Target/Mips/MipsInstrInfo.td
1118–1122 ↗(On Diff #48308)

Why move this above DEI_FT?

1321–1325 ↗(On Diff #48308)

Could you indent this since we have nested 'let' statements

lib/Target/Mips/MipsSEInstrInfo.cpp
451 ↗(On Diff #46763)

Ok, thanks.

456 ↗(On Diff #46763)

Ok, thanks

sdardis updated this revision to Diff 49944.Mar 7 2016, 3:25 AM
sdardis edited edge metadata.
sdardis marked 2 inline comments as done.

Patch updated to address nits. Could you commit on my behalf? Thanks.

lib/Target/Mips/MipsInstrInfo.td
1118–1122 ↗(On Diff #48308)

I was grouping it syscall/break/(d)eret classes so I could mark them all with one let isCTI =1 {..}. Want me to move it back?

Could you commit on my behalf?

Sure

lib/Target/Mips/MipsInstrInfo.td
1118–1122 ↗(On Diff #49944)

Ah ok. I was associating the '}' with the class rather than the let so it looked like there was no reason to move it.

sdardis updated this revision to Diff 50596.Mar 14 2016, 8:02 AM

Rebased and retested to ToT (r263428).

This revision was automatically updated to reflect the committed changes.