This is an archive of the discontinued LLVM Phabricator instance.

[AVR] Replace shift-to-loop IR pass with common shift code
AbandonedPublic

Authored by Patryk27 on Jun 3 2023, 3:08 AM.

Details

Reviewers
benshi001
aykevl
Summary

I'm submitting code on the behalf of aykevl which is the author of this patch -
the original commit message[1] says:

I wrote a late IR pass a while ago to replace non-constant shift
operations with a constant shift in a loop (see
https://reviews.llvm.org/D96677). This worked, but wasn't optimal and
there is a possibility later passes could again re-introduce such shift
operations. But as a stopgap, it worked well enough.

Now that constant 32-bit shift operations are optimized in a later
stage, we can use the same code to lower shift operations to loops.

This patch removes that IR pass and replaces it with one that runs
during ISel lowering, like the constant shift operations.

I did not expect this to affect binary size in any significant way, but
apparently it does. The compiler-rt library (when compiled inside
TinyGo) becomes 1.8% smaller, and picolibc becomes 0.4% smaller. That's
a nice win for what was intended to be mostly just a refactor.

Issue fixed here was discovered in rustc[2] and I can confirm that this patch
fixes the problem; using simavr I've checked the left- and right-shifts, and
the rest of integer math operations, and everything seems to work as intended.

Since aykevl mentioned that they are not currently working on the AVR
backend[1], please let me know if anything can / should be adjusted here and
I'll do my best :-)

[1] https://github.com/rust-lang/compiler-builtins/issues/523#issuecomment-1500734938

[2] https://github.com/rust-lang/compiler-builtins/issues/523, https://github.com/rust-lang/rust/issues/112140

Diff Detail

Event Timeline

Patryk27 created this revision.Jun 3 2023, 3:08 AM
Patryk27 requested review of this revision.Jun 3 2023, 3:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2023, 3:08 AM
Patryk27 edited the summary of this revision. (Show Details)Jun 3 2023, 3:09 AM
Patryk27 added a reviewer: benshi001.
This comment was removed by Patryk27.
aykevl added a subscriber: aykevl.Jun 3 2023, 4:48 AM

IIRC for the code size reductions to really work, you also need to enable sub register liveness (that's the other patch in my branch). I don't remember how big of a difference it made. It's not needed for correctness though.

IIRC for the code size reductions to really work, you also need to enable sub register liveness (that's the other patch in my branch). I don't remember how big of a difference it made. It's not needed for correctness though.

Yes, that's included in this patch as well :-)

aykevl added a comment.Jun 4 2023, 6:38 AM

Ah I see, I missed that. I split it up in a separate patch because subreg liveness is also a big change (not in patch size but in how the register allocator works) but I guess it's not too big of a problem to merge those changes in the same patch.

I'd say LGTM but it's my patch so can't really comment on it.

benshi001 added inline comments.Jun 8 2023, 6:57 AM
llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp
263

Why and with zero should be special? Could you please give an example? Or add a test .ll ?

llvm/lib/Target/AVR/AVRISelLowering.cpp
1901

It would be better to be DL than dl.

2172

DL

Also a brief commit message is needed,

  1. what is the issue
  2. how it is fixed

than current form.

aykevl added inline comments.Jun 8 2023, 8:29 AM
llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp
263

This is part of subreg liveness: https://github.com/aykevl/llvm-project/commit/8af44a965e16a16c4edd4e01143ab1c260e3d924

I don't remember exactly why this was needed but I think the machine verifier complained otherwise.

benshi001 added inline comments.Jun 8 2023, 8:21 PM
llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp
263

I see and I have reproduce the complaint. Maybe the result of Rx & 0x00 does not depend on the value of Rx, so we need to set it to undefined.

I suggest we separate this patch to two ones, one for the 32-bit shift, and another for the subreg liveness. And each one need a clear but brief commit message. We need not mention aykevl mentioned that they are not currently working on the AVR backend, just focus on the issue and code is OK enough.

benshi001 added inline comments.Jun 8 2023, 10:07 PM
llvm/lib/Target/AVR/AVRISelLowering.cpp
2192

Can we change the logic to

if (EntryBB->canFallThrough()) {
  // do the padding of the RJMP
}
ExitBB = EntryBB->splitAt(MI, false);

or

if (MI is the last instruction of EntryBB) {
  // do the padding of the RJMP
}
ExitBB = EntryBB->splitAt(MI, false);
benshi001 added inline comments.Jun 8 2023, 10:25 PM
llvm/lib/Target/AVR/AVRISelLowering.cpp
2246

Don't we need to erase the 32-bit shift pseudo instruction from EntryBB ?

ExitBB = EntryBB->splitAt(MI, false);
benshi001 added inline comments.Jun 9 2023, 12:45 AM
llvm/test/CodeGen/AVR/shift-loop.ll
22

Where is %10 defined ?

34

Where is %17 defined ?

benshi001 added inline comments.Jun 9 2023, 12:56 AM
llvm/lib/Target/AVR/AVRISelLowering.cpp
2219

Will the field second of PhiRegPairs be over written to sub_lo or sub_hi by insertMultibyteShift ?

benshi001 added inline comments.Jun 9 2023, 1:39 AM
llvm/lib/Target/AVR/AVRISelLowering.cpp
2229

After this line, how is the field second of Regs[I] set to sub_lo or sub_hi? Would it be better to

Regs[I] = PhiRegPairs[I];

?

I suggest we separate this patch to two ones, one for the 32-bit shift, and another for the subreg liveness. And each one need a clear but brief commit message. We need not mention aykevl mentioned that they are not currently working on the AVR backend, just focus on the issue and code is OK enough.

I have separated the subreg liveness part to
https://reviews.llvm.org/D152605
https://reviews.llvm.org/D152606

You can help me approve them.

Hi, thanks for reviewing - I'll try to address the comments over the weekend :-)

Patryk27 abandoned this revision.Jun 17 2023, 4:16 AM
Patryk27 marked 11 inline comments as done.

I've addressed all the comments and I'll create a new revision with them in a moment :-)

llvm/lib/Target/AVR/AVRISelLowering.cpp
2219

I'm not sure what you mean 🤔

sub_lo and sub_hi are used in insertWideShift which is not called here - and insertMultibyteShift, given ShiftAmt = 1 (like in here), reduces down to this block:

while (ShiftLeft && ShiftAmt) {
  // Shift one to the left.
  for (ssize_t I = Regs.size() - 1; I >= 0; I--) {
    Register Out = MRI.createVirtualRegister(&AVR::GPR8RegClass);
    Register In = Regs[I].first;
    Register InSubreg = Regs[I].second;
    if (I == (ssize_t)Regs.size() - 1) { // first iteration
      BuildMI(*BB, MBBI, DL, TII.get(AVR::ADDRdRr), Out)
          .addReg(In, 0, InSubreg)
          .addReg(In, 0, InSubreg);
    } else {
      BuildMI(*BB, MBBI, DL, TII.get(AVR::ADCRdRr), Out)
          .addReg(In, 0, InSubreg)
          .addReg(In, 0, InSubreg);
    }
    Regs[I] = std::pair(Out, 0);
  }
  ShiftAmt--;
}

... which does overwrite PhiRegPairs with the output register.

2229

Note that insertMultibyteShift modifies given Regs - it changes them from input-registers into output-registers (see the comment above with the related code block), and so doing:

Regs[I] = PhiRegPairs[I];

... is different from doing:

Regs[I] = std::pair(PhiRegs[I], 0);

... in the sense that PhiRegPairs there contains output-registers (with shifted values), while PhiRegs continues to refer to the input-registers, so both fragments do something else.

2246

This already happens at the end of insertWideShift:

  // Remove the pseudo instruction.
  MI.eraseFromParent();
  return BB;
}
llvm/test/CodeGen/AVR/shift-loop.ll
22

Nice catch - it seems that update_mir_test_checks.py cannot resolve cases where declaration happens after usage, i.e. the current approach generates:

body:             |
  bb.0 (%ir-block.0):
    successors: %bb.2(0x80000000)
    liveins: $r23r22, $r25r24, $r19r18
  
    %2:dregs = COPY $r19r18
    %1:dregs = COPY $r25r24
    %0:dregs = COPY $r23r22
    %4:gpr8 = COPY %2.sub_lo
    RJMPk %bb.2
  
  bb.1 (%ir-block.0):
    successors: %bb.2(0x80000000)
  
    %12:gpr8 = ADDRdRr %10, %10, implicit-def $sreg
    %13:gpr8 = ADCRdRr %9, %9, implicit-def $sreg, implicit $sreg
    %14:gpr8 = ADCRdRr %8, %8, implicit-def $sreg, implicit $sreg
    %15:gpr8 = ADCRdRr %7, %7, implicit-def $sreg, implicit $sreg
  
  bb.2 (%ir-block.0):
    successors: %bb.1(0x40000000), %bb.3(0x40000000)
  
    %7:gpr8 = PHI %1.sub_hi, %bb.0, %15, %bb.1
    %8:gpr8 = PHI %1.sub_lo, %bb.0, %14, %bb.1
    %9:gpr8 = PHI %0.sub_hi, %bb.0, %13, %bb.1
    %10:gpr8 = PHI %0.sub_lo, %bb.0, %12, %bb.1
    %16:gpr8 = PHI %4, %bb.0, %17, %bb.1
    %17:gpr8 = DECRd %16, implicit-def $sreg
    BRPLk %bb.1, implicit $sreg
  
  bb.3 (%ir-block.0):
    %6:dregs = REG_SEQUENCE %7, %subreg.sub_hi, %8, %subreg.sub_lo
    %5:dregs = REG_SEQUENCE %9, %subreg.sub_hi, %10, %subreg.sub_lo
    $r23r22 = COPY %5
    $r25r24 = COPY %6
    RET implicit $r23r22, implicit $r25r24, implicit $r1

To be fair, I'm not sure why those blocks are generated in this order - I switched it now to a simpler approach (note different block order and BRMIk):

body:             |
  bb.0 (%ir-block.0):
    successors: %bb.1(0x80000000)
    liveins: $r23r22, $r25r24, $r19r18
  
    %2:dregs = COPY $r19r18
    %1:dregs = COPY $r25r24
    %0:dregs = COPY $r23r22
    %4:gpr8 = COPY %2.sub_lo
  
  bb.1 (%ir-block.0):
    successors: %bb.2(0x40000000), %bb.3(0x40000000)
  
    %7:gpr8 = PHI %1.sub_hi, %bb.0, %15, %bb.2
    %8:gpr8 = PHI %1.sub_lo, %bb.0, %14, %bb.2
    %9:gpr8 = PHI %0.sub_hi, %bb.0, %13, %bb.2
    %10:gpr8 = PHI %0.sub_lo, %bb.0, %12, %bb.2
    %16:gpr8 = PHI %4, %bb.0, %17, %bb.2
    %17:gpr8 = DECRd %16, implicit-def $sreg
    BRMIk %bb.3, implicit $sreg
  
  bb.2 (%ir-block.0):
    successors: %bb.1(0x80000000)
  
    %12:gpr8 = ADDRdRr %10, %10, implicit-def $sreg
    %13:gpr8 = ADCRdRr %9, %9, implicit-def $sreg, implicit $sreg
    %14:gpr8 = ADCRdRr %8, %8, implicit-def $sreg, implicit $sreg
    %15:gpr8 = ADCRdRr %7, %7, implicit-def $sreg, implicit $sreg
    RJMPk %bb.1
  
  bb.3 (%ir-block.0):
    %6:dregs = REG_SEQUENCE %7, %subreg.sub_hi, %8, %subreg.sub_lo
    %5:dregs = REG_SEQUENCE %9, %subreg.sub_hi, %10, %subreg.sub_lo
    $r23r22 = COPY %5
    $r25r24 = COPY %6
    RET implicit $r23r22, implicit $r25r24, implicit $r1

... and it works correctly (+ generates the same assembly, even), so I guess we'll stick to that.

llvm/test/CodeGen/AVR/inline-asm/inline-asm3.ll