Page MenuHomePhabricator

[RISCV] Avoid Splitting MBB in RISCVExpandPseudo
Changes PlannedPublic

Authored by lenary on Jul 1 2020, 11:53 AM.



Since the RISCVExpandPseudo pass has been split from
RISCVExpandAtomicPseudo pass, it would be nice to run the former as
early as possible (The latter has to be run as late as possible to
ensure correctness). Running earlier means we can reschedule these pairs
as we see fit.

Running earlier in the machine pass pipeline is good, but would mean
teaching many more passes about hasLabelMustBeEmitted. Splitting the
basic blocks also pessimises possible optimisations because some
optimisations are MBB-local, and others are disabled if the block has
its address taken (which is notionally what hasLabelMustBeEmitted

This patch uses a new approach of setting the pre-instruction symbol on
the AUIPC instruction to a temporary symbol and referencing that. This
avoids splitting the basic block, but allows us to reference exactly the
instruction that we need to. Notionally, this approach seems more
correct because we do actually want to address a specific instruction.

This then allows the pass to be moved much earlier in the pass pipeline,
before both scheduling and register allocation. However, to do so we must
leave the MIR in SSA form (by not redefining registers), and so use a
virtual register for the intermediate value. By using this virtual register,
this pass now has to come before register allocation.

Diff Detail

Event Timeline

lenary created this revision.Jul 1 2020, 11:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2020, 11:53 AM

This is looking quite good. As Sam says, not splitting BBs is preferable, so while the D79635 issues could be resolved in other ways, if we can make it work like this then that's even better.
I suppose that this patch could also remove everything related to LabelMustBeEmitted.
If anyone wants to provide about the status of setPreInstrSymbol that would be helpful. Currently it's marked as FIXME: This is not fully implemented yet., although Sam indicates that it's being used by x86, so that may be just a holdover.

Thanks a lot @lenary, this seems a much better approach.

I am also in favour of removing LabelMustBeEmitted machinery which now is redundant (as only RISC-V was using it).

lenary updated this revision to Diff 274908.Jul 1 2020, 1:35 PM
  • Remove LabelMustBeEmitted API.

I have a question out to llvm-dev about the stability of PreInstrSymbol, but
given it's used by the retpoline pass, I am reasonably confident the
functionality is complete enough for our uses.

lenary marked an inline comment as done.Jul 1 2020, 1:50 PM
lenary added inline comments.

Oh this loop can be simplifed - though I'm not sure I should be incrementing MBBI if we've inserted new instructions using BuildMI, which I think also increments the iterator automatically. Guidance here would be helpful.

luismarques accepted this revision.Jul 6 2020, 2:26 PM

LGTM. Nice!


Maybe I misunderstood your issue, but I think the increment is correct:

  1. It is what AArch64 does.
  2. If you have two consecutive pseudo-instructions this code expands them both, so you aren't skipping over the second by performing the explicit increment.
  3. You don't want to iterate over the expanded instructions, to recursively expand them, since we don't emit other pseudo-instructions in the expansions.

BTW, what simplification were you considering for this loop?

This revision is now accepted and ready to land.Jul 6 2020, 2:26 PM
asb requested changes to this revision.Jul 8 2020, 10:37 PM

This is a nice simplification, thanks. My only request before committing is to split out the RISCVTargetMachine to a separate pass, as that is logically distinct.

This revision now requires changes to proceed.Jul 8 2020, 10:37 PM
lenary marked an inline comment as done.Jul 9 2020, 2:54 AM

The RISCVTargetMachine changes are not distinct. You can only use virtual registers (ScratchReg) if you are before register allocation, as I understand it.


Turning it back into a for loop like:

for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end(); MBBI != E; MBBI++) {
  Modified |= expandMI(MBB, MBBI)
asb accepted this revision.Jul 9 2020, 3:03 AM

Got it, thanks. In that case LGTM and please tweak the commit message (the last paragraph specifically) so it's clear that the two changes are interlinked.

This revision is now accepted and ready to land.Jul 9 2020, 3:03 AM
lenary edited the summary of this revision. (Show Details)Jul 9 2020, 3:16 AM
This revision was automatically updated to reflect the committed changes.
rogfer01 added a comment.EditedJul 13 2020, 5:48 AM

Hi @lenary,

I'm sorry to report that this approach is still problematic. In some of the testcases of the LLVM test-suite the pass when built with -fPIC -O3, pass Branch Probability Basic Block Placement decides to copy the AUIPC / LD pair but the copies carry around the same symbol. This fails with:

fatal error: error in backend: invalid symbol redefinition

I've seen other testcases in which we trigger

fatal error: error in backend: could not find corresponding %pcrel_hi
; predecessors: %bb.4

  renamable $x10 = AUIPC target-flags(riscv-got-hi) @perSCC, pre-instr-symbol <mcsymbol .Ltmp1>
  renamable $x10 = LD killed renamable $x10, target-flags(riscv-pcrel-lo) <mcsymbol .Ltmp1>
  renamable $x12 = LD killed renamable $x10, 0 :: (dereferenceable load 8 from @perSCC)
  PseudoCALL target-flags(riscv-plt) @SCCofVCG, <regmask $x1 $x3 $x4 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_d $f9_d $f18_d $f19_d $f20_d $f21_d $f22_d $f23_d $f24_d $f25_d $f26_d $f27_d $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f and 6 more...>, implicit-def dead $x1, implicit undef $x10, implicit undef $x11, implicit $x12, implicit-def $x2

; predecessors: %bb.3
  liveins: $x11
  SD $x0, killed renamable $x11, 0 :: (store 8 into %ir.4)
  renamable $x10 = AUIPC target-flags(riscv-got-hi) @perSCC, pre-instr-symbol <mcsymbol .Ltmp1>
  renamable $x10 = LD killed renamable $x10, target-flags(riscv-pcrel-lo) <mcsymbol .Ltmp1>
  renamable $x12 = LD killed renamable $x10, 0 :: (dereferenceable load 8 from @perSCC)
  PseudoCALL target-flags(riscv-plt) @SCCofVCG, <regmask $x1 $x3 $x4 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_d $f9_d $f18_d $f19_d $f20_d $f21_d $f22_d $f23_d $f24_d $f25_d $f26_d $f27_d $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f and 6 more...>, implicit-def dead $x1, implicit undef $x10, implicit undef $x11, implicit $x12, implicit-def $x2

I don't have many ideas here. I understand the root cause is that we can't really copy the instruction (unless we're somehow able to create a new symbol for the copied AUIPC and map the reference of the copied LD to the new one). A specific pass that fixes these cases doesn't seem very appealing either.

Kind regards,

Would it make sense to still expand these constructs late (e.g. renaming RISCVExpandAtomicPseudoInsts.cpp into RISCVLateExpandPseudoInsts.cpp) still using PreInstrSymbol?

Instructions have an isNotDuplicable property which can be used to stop various code duplication passes.

That said, it's not clear you want to stop the code duplication early, so you might want to move the expansion later anyway.

lenary reopened this revision.Jul 14 2020, 3:13 AM

We're close to the branch date. I'm going to revert rG97106f9d80f6 and look at this again. Sound good?

This revision is now accepted and ready to land.Jul 14 2020, 3:13 AM

fatal error: error in backend: could not find corresponding %pcrel_hi

We're also seeing this in our RISC-V Linux kernel builds:

@rogfer01 @efriedma I would love this pass to happen early in the pipeline, so that auipc/l{d,w} can be rescheduled or shared. Maybe the easy solution is to mark the auipc as not duplicatable (as that's the thing you want to share, anyway), so the l{d,w} can be duplicated if necessary?

The other solution is to work out what's going on with duplicating symbols and setPreInstrSymbol and try something there, but the issue is if a MBB has been duplicated, I cannot RAUW the symbol, because some of the instructions should refer to the old symbol, and some to the new. I think this is what Roger is alluding to with the "fixup pass", but I think this would have to be done at duplication-time, not after.

I doubt I have time to investigate why this is going wrong any time soon, and especially not before the branch tomorrow.

lenary planned changes to this revision.Jul 14 2020, 10:28 AM
lenary added a subscriber: mundaym.Jan 26 2021, 1:24 PM

@luismarques @asb @mundaym Please can one of you take over this patch if lowRISC intends to keep working on it? I do realise that we weren't sure what the scheduling/performance payoff would be from this work, and it was showing some pretty obscure bugs.

rkruppe removed a subscriber: rkruppe.Jan 27 2021, 7:59 AM