This is an archive of the discontinued LLVM Phabricator instance.

[AVR] Custom lower 32-bit shift instructions
ClosedPublic

Authored by aykevl on Dec 22 2022, 11:13 AM.

Details

Summary

32-bit shift instructions were previously expanded using the default
SelectionDAG expander, which meant it used 16-bit constant shifts and
ORed them together. This works, but is far from optimal.

I've optimized 32-bit shifts on AVR using a custom inserter. This is
done using three new pseudo-instructions that take the upper and lower
bits of the value in two separate 16-bit registers and outputs two
16-bit registers.

This is the first commit in a series. When completed, shift instructions
will take around 31% less instructions on average for constant 32-bit
shifts, and is in all cases equal or better than the old behavior. It
also tends to match or outperform avr-gcc: the only cases where avr-gcc
does better is when it uses a loop to shift, or when the LLVM register
allocator inserts some unnecessary movs. But it even outperforms avr-gcc
in some cases where avr-gcc does not use a loop.

As a side effect, non-constant 32-bit shifts also become more efficient.

For some real-world differences: the build of compiler-rt I use in
TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9%
smaller. I think picolibc is a better representation of real-world code,
but even a ~1% reduction in code size is really significant.

The current patch just lays the groundwork. The result is actually a
regression in code size. Later patches will use this as a basis to
optimize these shift instructions.

Diff Detail

Event Timeline

aykevl created this revision.Dec 22 2022, 11:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2022, 11:13 AM
Herald added subscribers: Jim, hiraditya. · View Herald Transcript
aykevl requested review of this revision.Dec 22 2022, 11:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2022, 11:13 AM
aykevl updated this revision to Diff 484903.Dec 22 2022, 11:14 AM
aykevl edited the summary of this revision. (Show Details)

(testing a no-op update with arcanist)

aykevl updated this revision to Diff 484913.Dec 22 2022, 11:36 AM

(arcanist used the last commit instead of the intended commit, restoring the diff again)

benshi001 added inline comments.Dec 23 2022, 1:19 AM
llvm/lib/Target/AVR/AVRISelLowering.cpp
291

"Expected a constant shift amount!"

1864

Would it be better to be

for (ssize_t i = Regs.size() - 1; i >= 0; i--) {
 ...
 if (i == Regs.size() - 1) {
   ...
  } else {
   ...
  }
}
benshi001 accepted this revision.Dec 23 2022, 1:22 AM

LGTM. Thanks!

This revision is now accepted and ready to land.Dec 23 2022, 1:22 AM

Thanks, I'll fix these things before committing.

llvm/lib/Target/AVR/AVRISelLowering.cpp
291

Fixed locally.

1864

Fixed locally.

arsenm added a subscriber: arsenm.Dec 25 2022, 7:45 AM
arsenm added inline comments.
llvm/lib/Target/AVR/AVRISelLowering.cpp
291

You can't guarantee this condition. If you don't want to handle this properly, should use emitError, or at least report_fatal_error and not unreachable. This could easily be run into by users

318

Why can't you handle everything here as a normal custom lowering? Why do you need the custom inserter?

1858

const ref

1925–1929

Can directly initialize (also could use std::array?)

benshi001 added inline comments.Dec 25 2022, 7:19 PM
llvm/lib/Target/AVR/AVRISelLowering.cpp
291

Yes. This is problemetic. I guess the Ayke's intention is to let non-const shift amounts fall into the default ordinary process.

benshi001 added inline comments.Dec 25 2022, 7:51 PM
llvm/lib/Target/AVR/AVRISelLowering.cpp
1864

I saw you use captical loop variable I in your other patch, https://reviews.llvm.org/D140570, maybe it would be better to also use I instead of i here.

benshi001 requested changes to this revision.Dec 25 2022, 8:02 PM
This revision now requires changes to proceed.Dec 25 2022, 8:02 PM
aykevl added inline comments.Dec 26 2022, 8:33 AM
llvm/lib/Target/AVR/AVRISelLowering.cpp
291

You can't guarantee this condition. If you don't want to handle this properly, should use emitError, or at least report_fatal_error and not unreachable. This could easily be run into by users

How so? Non-constant shifts are expanded by an IR pass to constant shifts (in a loop), see llvm/lib/Target/AVR/AVRShiftExpand.cpp. Maybe there's a way to circumvent it but with some testing I couldn't come up with IR that hits this assert. But I'll change it to report_fatal_error in an update.

(I also have a local patch that removes the IR pass and instead handles non-constant shifts here).

318

I'm not sure what you mean? How can this be done in a different way?

If you mean building the entire thing in SelectionDAG, I think that would be a lot more complicated if not impossible because eventually I want to generate loops. I intend to introduce a later patch that lowers these shits to a loop if emitting the full sequence results in more instructions and minsize is set (avr-gcc does the same).

1858

fixed

1864

That was already fixed locally :)

1925–1929

Good idea, fixed.

arsenm added inline comments.Dec 26 2022, 8:36 AM
llvm/lib/Target/AVR/AVRISelLowering.cpp
291

There's no verifier for this, so you still need to be more diligent about error handling. A DAG combine could still choose to introduce a non-constant shift amount, or any IR pass after the point you lowered it

318

If you need to emit control flow, you don't have much choice besides the custom inserter

aykevl updated this revision to Diff 485309.Dec 26 2022, 8:36 AM
aykevl set the repository for this revision to rG LLVM Github Monorepo.
  • apply review comments
aykevl added inline comments.Dec 26 2022, 8:45 AM
llvm/lib/Target/AVR/AVRISelLowering.cpp
291

Okay.
It's a report_fatal_error now so it won't silently miscompile here anymore. If it ever happens, it must be very rare because before this patch it would have resulted in a linker error (__lshlsi3 etc) instead of a compiler error.

318

Ok, understood. Next time I do an optimization that doesn't require flow control I'll try doing this in XXXDAGToDAGISel::trySelect (I assume that's what you mean?)

aykevl updated this revision to Diff 485313.EditedDec 26 2022, 9:55 AM

Update to use opcodes instead of ShiftAmt and ArithmeticShift.

@benshi001 I updated the patch following your comment here: https://reviews.llvm.org/D138529#inline-1345614. While working on lowering the shift as a loop I found opcodes to be easier to work with. If you agree with this change, I will update all other patches as well.

This comment was removed by benshi001.
llvm/lib/Target/AVR/AVRISelLowering.cpp
1858

Can it be const DebugLoc &dl = MI.getDebugLoc(); ? In which dl is a reference than a local variable ?

Update to use opcodes instead of ShiftAmt and ArithmeticShift.

@benshi001 I updated the patch following your comment here: https://reviews.llvm.org/D138529#inline-1345614. While working on lowering the shift as a loop I found opcodes to be easier to work with. If you agree with this change, I will update all other patches as well.

Sure. Please go ahead. I do prefer the solution of exposing opcode.

aykevl updated this revision to Diff 485730.Dec 30 2022, 4:12 PM
aykevl set the repository for this revision to rG LLVM Github Monorepo.
  • change const DebugLoc dl to const DebugLoc &dl

Beside my latest two pieces of inline comment, There is failure of LLVM.CodeGen/AVR::shift.ll in the pre-merge checks on both windows and linux, I think you need to have a check.

llvm/lib/Target/AVR/AVRISelLowering.cpp
1860

can it be const bool ShiftLeft = Opc == ISD::SHL; ?

1861

const bool ArithmeticShift = Opc == ISD::SRA;

1886

This loop variable need to be I, like the above loop.

Looks like 3bb5ddd1756d4573d3104f8b86d2973dbc550402 broke the pre-merge check. I'll rebase the patch.

aykevl updated this revision to Diff 485815.Jan 1 2023, 12:58 PM
  • Special-case logical shifts of 16 bits. This fixes a number of issues: it avoids unnecessary code changes in this PR, it fixes an issue after rebasing on the main branch (as seen in the buildbot failure), and it fixes an issue I found while working on D140822.
  • Apply review feedback.

This is ready for review again.

aykevl updated this revision to Diff 485818.Jan 1 2023, 1:30 PM
  • run clang-format
benshi001 added inline comments.Jan 1 2023, 6:15 PM
llvm/lib/Target/AVR/AVRISelLowering.cpp
303

Is this necessary? Can it be covered by https://reviews.llvm.org/D140570 ?

If you think it is better to handle ShiftAmount == 16 at here, I suggest you add tests for that in current patch.

  • Special-case logical shifts of 16 bits. This fixes a number of issues: it avoids unnecessary code changes in this PR, it fixes an issue after rebasing on the main branch (as seen in the buildbot failure), and it fixes an issue I found while working on D140822.
  • Apply review feedback.

This is ready for review again.

I see your idea, it is great. My only concern is that it would be better to add tests for the 16-bit shifts in current patch. You can do that while committing.

benshi001 accepted this revision.Jan 1 2023, 6:58 PM
This revision is now accepted and ready to land.Jan 1 2023, 6:58 PM
This revision was automatically updated to reflect the committed changes.