This is an archive of the discontinued LLVM Phabricator instance.

[AVR] Expand shifts during AVRISelLowering
AbandonedPublic

Authored by Patryk27 on Jun 17 2023, 4:30 AM.

Details

Reviewers
benshi001
Summary

Some passes can introduce shifts after AVRShiftExpandPass has completed;
if this happens, we panic during isel because we assume such shifts must
have been already expanded before.

This commit integrates our shift-expansion pass with isel-selection pass
so that isel doesn't get surprised by shifts of non-constant amounts
anymore.

Spotted in the wild in rustc:

Diff Detail

Event Timeline

Patryk27 created this revision.Jun 17 2023, 4:30 AM
Patryk27 requested review of this revision.Jun 17 2023, 4:30 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 17 2023, 4:30 AM
Patryk27 added inline comments.Jun 17 2023, 4:33 AM
llvm/lib/Target/AVR/AVRISelLowering.cpp
2193

Note that I've changed to code to re-arrange the generated blocks a bit, from:

body:             |
  bb.0 (%ir-block.0):
    successors: %bb.2(0x80000000)
    liveins: $r23r22, $r25r24, $r19r18
  
    %2:dregs = COPY $r19r18
    %1:dregs = COPY $r25r24
    %0:dregs = COPY $r23r22
    %4:gpr8 = COPY %2.sub_lo
    RJMPk %bb.2
  
  bb.1 (%ir-block.0):
    successors: %bb.2(0x80000000)
  
    %12:gpr8 = ADDRdRr %10, %10, implicit-def $sreg
    %13:gpr8 = ADCRdRr %9, %9, implicit-def $sreg, implicit $sreg
    %14:gpr8 = ADCRdRr %8, %8, implicit-def $sreg, implicit $sreg
    %15:gpr8 = ADCRdRr %7, %7, implicit-def $sreg, implicit $sreg
  
  bb.2 (%ir-block.0):
    successors: %bb.1(0x40000000), %bb.3(0x40000000)
  
    %7:gpr8 = PHI %1.sub_hi, %bb.0, %15, %bb.1
    %8:gpr8 = PHI %1.sub_lo, %bb.0, %14, %bb.1
    %9:gpr8 = PHI %0.sub_hi, %bb.0, %13, %bb.1
    %10:gpr8 = PHI %0.sub_lo, %bb.0, %12, %bb.1
    %16:gpr8 = PHI %4, %bb.0, %17, %bb.1
    %17:gpr8 = DECRd %16, implicit-def $sreg
    BRPLk %bb.1, implicit $sreg
  
  bb.3 (%ir-block.0):
    %6:dregs = REG_SEQUENCE %7, %subreg.sub_hi, %8, %subreg.sub_lo
    %5:dregs = REG_SEQUENCE %9, %subreg.sub_hi, %10, %subreg.sub_lo
    $r23r22 = COPY %5
    $r25r24 = COPY %6
    RET implicit $r23r22, implicit $r25r24, implicit $r1

... to:

body:             |
  bb.0 (%ir-block.0):
    successors: %bb.1(0x80000000)
    liveins: $r23r22, $r25r24, $r19r18
  
    %2:dregs = COPY $r19r18
    %1:dregs = COPY $r25r24
    %0:dregs = COPY $r23r22
    %4:gpr8 = COPY %2.sub_lo
    # fall-through instead of jumping
  
  bb.1 (%ir-block.0):
    successors: %bb.2(0x40000000), %bb.3(0x40000000)
  
    %7:gpr8 = PHI %1.sub_hi, %bb.0, %15, %bb.2
    %8:gpr8 = PHI %1.sub_lo, %bb.0, %14, %bb.2
    %9:gpr8 = PHI %0.sub_hi, %bb.0, %13, %bb.2
    %10:gpr8 = PHI %0.sub_lo, %bb.0, %12, %bb.2
    %16:gpr8 = PHI %4, %bb.0, %17, %bb.2
    %17:gpr8 = DECRd %16, implicit-def $sreg
    BRMIk %bb.3, implicit $sreg # <- reversed comparison + fallthrough
  
  bb.2 (%ir-block.0):
    successors: %bb.1(0x80000000)
  
    %12:gpr8 = ADDRdRr %10, %10, implicit-def $sreg
    %13:gpr8 = ADCRdRr %9, %9, implicit-def $sreg, implicit $sreg
    %14:gpr8 = ADCRdRr %8, %8, implicit-def $sreg, implicit $sreg
    %15:gpr8 = ADCRdRr %7, %7, implicit-def $sreg, implicit $sreg
    RJMPk %bb.1 # <- jump to the beginning
  
  bb.3 (%ir-block.0):
    %6:dregs = REG_SEQUENCE %7, %subreg.sub_hi, %8, %subreg.sub_lo
    %5:dregs = REG_SEQUENCE %9, %subreg.sub_hi, %10, %subreg.sub_lo
    $r23r22 = COPY %5
    $r25r24 = COPY %6
    RET implicit $r23r22, implicit $r25r24, implicit $r1

It looks like the generated assembly remained the same, I've also checked the actual binary through rustc + simavr.

Patryk27 added inline comments.Jun 18 2023, 4:25 AM
llvm/lib/Target/AVR/AVRISelLowering.cpp
2207

Alright, this is wrong, after all - I've just tested it on a more elaborate code in rustc and EntryBB->removeSuccessor(ExitBB); triggers an LLVM panic (presumably because EntryBB == ExitBB).

I kinda don't understand why doing something like this:

MachineBasicBlock *ExitBB = EntryBB->splitAt(MI, false);

if (EntryBB == ExitBB) {
  assert(EntryBB->canFallThrough() && "Expected a fallthrough block!");
  ExitBB = EntryBB->getFallThrough();
}

... is not sufficient, though 👀

benshi001 added inline comments.Jun 20 2023, 1:47 AM
llvm/lib/Target/AVR/AVRISelLowering.cpp
2207

Is it possible to fix the 32-bit shift issue in moderate way? for example, keep the pass in AVRShiftExpand.cpp.

Now that I think about it, there might be a simpler way!

So, the underlying issue is that sometimes LLVM spawns new shifts during the instruction selection pass - for instance given this IR¹:

define i64 @test(i64 %x, i32 %y) {
start:
  %0 = or i32 %y, 38
  %1 = zext i32 %0 to i64
  %2 = lshr i64 %x, %1
  ret i64 %2
}

... this 64-bit shift will not get considered by our shift-expansion pass due to a condition here:

https://github.com/llvm/llvm-project/blob/eaaacc3c651e5b2c23bfa9648b6b0d69aab64d00/llvm/lib/Target/AVR/AVRShiftExpand.cpp#L54

... but later, during isel, LLVM will reduce this 64-bit shift into a 32-bit shift² and then ask us to expand this brand new 32-bit shift - that causes a panic since we expect those to have been already expanded.

In order to solve this issue, I think we could simply expand all variable-shifts greater than i8 (instead of expanding only 32-bit shifts):

diff --git a/llvm/lib/Target/AVR/AVRShiftExpand.cpp b/llvm/lib/Target/AVR/AVRShiftExpand.cpp
index b7dcd860467d..4ea6d9fdb57c 100644
--- a/llvm/lib/Target/AVR/AVRShiftExpand.cpp
+++ b/llvm/lib/Target/AVR/AVRShiftExpand.cpp
@@ -51,8 +51,7 @@ bool AVRShiftExpand::runOnFunction(Function &F) {
     if (!I.isShift())
       // Only expand shift instructions (shl, lshr, ashr).
       continue;
-    if (I.getType() != Type::getInt32Ty(Ctx))
-      // Only expand plain i32 types.
+    if (I.getType() == Type::getInt8Ty(Ctx))
       continue;
     if (isa<ConstantInt>(I.getOperand(1)))
       // Only expand when the shift amount is not known.
@@ -75,7 +74,7 @@ bool AVRShiftExpand::runOnFunction(Function &F) {
 void AVRShiftExpand::expand(BinaryOperator *BI) {
   auto &Ctx = BI->getContext();
   IRBuilder<> Builder(BI);
-  Type *Int32Ty = Type::getInt32Ty(Ctx);
+  Type *InputTy = cast<Instruction>(BI)->getType();
   Type *Int8Ty = Type::getInt8Ty(Ctx);
   Value *Int8Zero = ConstantInt::get(Int8Ty, 0);
 
@@ -101,7 +100,7 @@ void AVRShiftExpand::expand(BinaryOperator *BI) {
   Builder.SetInsertPoint(LoopBB);
   PHINode *ShiftAmountPHI = Builder.CreatePHI(Int8Ty, 2);
   ShiftAmountPHI->addIncoming(ShiftAmount, BB);
-  PHINode *ValuePHI = Builder.CreatePHI(Int32Ty, 2);
+  PHINode *ValuePHI = Builder.CreatePHI(InputTy, 2);
   ValuePHI->addIncoming(BI->getOperand(0), BB);
 
   // Subtract the shift amount by one, as we're shifting one this loop
@@ -116,13 +115,13 @@ void AVRShiftExpand::expand(BinaryOperator *BI) {
   Value *ValueShifted;
   switch (BI->getOpcode()) {
   case Instruction::Shl:
-    ValueShifted = Builder.CreateShl(ValuePHI, ConstantInt::get(Int32Ty, 1));
+    ValueShifted = Builder.CreateShl(ValuePHI, ConstantInt::get(InputTy, 1));
     break;
   case Instruction::LShr:
-    ValueShifted = Builder.CreateLShr(ValuePHI, ConstantInt::get(Int32Ty, 1));
+    ValueShifted = Builder.CreateLShr(ValuePHI, ConstantInt::get(InputTy, 1));
     break;
   case Instruction::AShr:
-    ValueShifted = Builder.CreateAShr(ValuePHI, ConstantInt::get(Int32Ty, 1));
+    ValueShifted = Builder.CreateAShr(ValuePHI, ConstantInt::get(InputTy, 1));
     break;
   default:
     llvm_unreachable("asked to expand an instruction that is not a shift");
@@ -137,7 +136,7 @@ void AVRShiftExpand::expand(BinaryOperator *BI) {
   // Collect the resulting value. This is necessary in the IR but won't produce
   // any actual instructions.
   Builder.SetInsertPoint(BI);
-  PHINode *Result = Builder.CreatePHI(Int32Ty, 2);
+  PHINode *Result = Builder.CreatePHI(InputTy, 2);
   Result->addIncoming(BI->getOperand(0), BB);
   Result->addIncoming(ValueShifted, LoopBB);

Overall, this solution seems to work - I've checked it on a couple of Rust applications and the binaries behave correctly, both some simple and more complex ones.

I think the only disadvantage of this approach, as compared to expanding shifts during isel, are optimizations: expanding shifts eagerly means that we'll lose some of the optimizations we could have applied otherwise.

For instance, following that first example from Rust's standard library, we will expand that instruction into a 64-bit shift even though in principle a 32-bit shift would suffice (but we don't know that yet during shift-expansion pass).

For safety, we could implement this simpler approach first, as presented in the diff here, and maybe come back to merging shift-expansion with isel in the future - what do you think about it?

¹ minimized case from Rust's standard library - originally: _ZN4core3num7dec2flt6lemire13compute_float17hc1d4de6247502c96E
² I'm not 100% sure why, though - seems to be somehow related to this or + zext combination

ping ping, @benshi001 👀

Could you please upload your final version of this patch ? I see you have made some changes, but only mentioned in your comment.

llvm/lib/Target/AVR/AVRTargetMachine.cpp