This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Allow zextended load combines.
ClosedPublic

Authored by courbet on Nov 20 2019, 4:05 AM.

Details

Summary

or(zext(load8(base)), zext(load8(base+1)) -> zext(load16 base)

Event Timeline

courbet created this revision.Nov 20 2019, 4:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 20 2019, 4:05 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
courbet updated this revision to Diff 230231.Nov 20 2019, 4:31 AM

Rebase to show test diffs.

RKSimon added inline comments.Nov 20 2019, 9:20 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6517

NFC fix?

6748

why are you adding const? most of the time we don't bother

llvm/test/CodeGen/ARM/load-combine-big-endian.ll
637

regression

llvm/test/CodeGen/ARM/load-combine.ll
666

regression

courbet marked an inline comment as done.Nov 20 2019, 9:32 AM
courbet added inline comments.
llvm/test/CodeGen/ARM/load-combine-big-endian.ll
637

Yes, that's something I meant to highlight in thsi patch, but forgot to send the comment:

I've seen in other tests for ARM we're OK paying quite a large number of instructions to avoid loads, e.g.

define i32 @load_i32_by_i8_bswap(i32* %arg) {
; BSWAP is not supported by 32 bit target
; CHECK-LABEL: load_i32_by_i8_bswap:
; CHECK:       @ %bb.0:
; CHECK-NEXT:    ldr r0, [r0]
; CHECK-NEXT:    mov r1, #65280
; CHECK-NEXT:    mov r2, #16711680
; CHECK-NEXT:    and r1, r1, r0, lsr #8
; CHECK-NEXT:    and r2, r2, r0, lsl #8
; CHECK-NEXT:    orr r1, r1, r0, lsr #24
; CHECK-NEXT:    orr r0, r2, r0, lsl #24
; CHECK-NEXT:    orr r0, r0, r1
; CHECK-NEXT:    mov pc, lr

I'm not very familiar with these targets, where do we draw the line ?

dmgreen added inline comments.
llvm/test/CodeGen/ARM/load-combine-big-endian.ll
637

This appears to be running under a very old architecture where bswap is not legal.

Any idea why it is trying to perform this optimisation when there is no bswap to make it efficient?

courbet updated this revision to Diff 230384.Nov 21 2019, 12:29 AM
courbet marked an inline comment as done.

rebase on NFC change

courbet marked an inline comment as done.Nov 21 2019, 1:16 AM
courbet added inline comments.
llvm/test/CodeGen/ARM/load-combine-big-endian.ll
637
// Before legalize we can introduce illegal bswaps which will be later
// converted to an explicit bswap sequence. This way we end up with a single
// load and byte shuffling instead of several loads and byte shuffling.
if (NeedsBswap && LegalOperations && !TLI.isOperationLegal(ISD::BSWAP, VT))
  return SDValue();

I can always bail out on zext:

if (NeedsBswap && (LegalOperations || NeedsZext) && !TLI.isOperationLegal(ISD::BSWAP, VT))
  return SDValue();

But it's not clear to me what's worth between a load and 4 arithmetic instructions on ARM.

courbet updated this revision to Diff 230388.Nov 21 2019, 1:16 AM

Be conservative on ARM and do not introduce illegal BSWAPs when zero-extending.

courbet updated this revision to Diff 230390.Nov 21 2019, 1:19 AM

When zero-extending, check the legality of the zero-extended load.

couple of minors

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6784

Extra brackets to avoid static analyzer precedence warnings

(++ZeroExtendedBytes != (ByteWidth - static_cast<unsigned>(i)))
6789

Improve the assertion message

courbet updated this revision to Diff 230393.Nov 21 2019, 1:35 AM
courbet marked 2 inline comments as done.

Address comments.

courbet retitled this revision from Summary: [DAGCombiner] Allow zextended load combines. to [DAGCombiner] Allow zextended load combines..Nov 21 2019, 1:35 AM
dmgreen added inline comments.Nov 21 2019, 2:41 AM
llvm/test/CodeGen/ARM/load-combine-big-endian.ll
637

I think:

  • On any remotely modern ARM system we will have a rev instruction, so I'm not very worried about the regressions here. They do show that an architecture without bswap may get worse though, if that is a thing that comes up in other places (I think all the archs here will have a bswap of some sort).
  • I think counting instructions can be as good a measure as any on some of the smaller arm devices. Maybe something like 2:1 for loads to arithmetic.
  • I'm not sure why we would do this at-all if we need a bswap but don't have one. But if the VT is still an illegal type, it will likely always look illegal even if it is eventually expanded into something sane.

Can you add these two targets to the two test files:
-mtriple=thumbv6m-none-eabi
-mtriple=thumbv7m-none-eabi

If they are OK (along with the armv6 case) then I, from an ARM perspective, am happy with this.

courbet marked an inline comment as done.Nov 21 2019, 4:18 AM
courbet added inline comments.
llvm/test/CodeGen/ARM/load-combine-big-endian.ll
637

I think counting instructions can be as good a measure as any on some of the smaller arm devices. Maybe something like 2:1 for loads to arithmetic.

OK, so it looks like this was a real regression. So I've disabled illegal bswaps when zext-ending. Thanks for the information !

Can you add these two targets to the two test files:

Done.

courbet updated this revision to Diff 230422.Nov 21 2019, 4:18 AM

Rebase on thumb tests.

RKSimon added inline comments.Nov 21 2019, 5:25 AM
llvm/test/CodeGen/ARM/load-combine-big-endian.ll
3–4

@dmgreen Is this a big endian triple?

dmgreen added inline comments.Nov 21 2019, 5:38 AM
llvm/test/CodeGen/ARM/load-combine-big-endian.ll
3–4

Ah, yeah. Try thumbv6meb-none-eabi and thumbv7meb-none-eabi for big endian.

courbet marked an inline comment as done.Nov 21 2019, 6:09 AM
courbet added inline comments.
llvm/test/CodeGen/ARM/load-combine-big-endian.ll
3–4

D'oh. Sorry about that.

courbet updated this revision to Diff 230447.Nov 21 2019, 6:11 AM

Use the right triple for big-endian thumb.

dmgreen added inline comments.Nov 21 2019, 6:27 AM
llvm/test/CodeGen/ARM/load-combine-big-endian.ll
3–4

Nice one.

I'm happy, if RKSimon is happy.

RKSimon accepted this revision.Nov 21 2019, 6:42 AM

LGTM - cheers

This revision is now accepted and ready to land.Nov 21 2019, 6:42 AM

Thanks for the review.

This revision was automatically updated to reflect the committed changes.