Page MenuHomePhabricator

[DAG] Attempt to fold bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2))))
ClosedPublic

Authored by RKSimon on Feb 19 2022, 11:35 AM.

Details

Summary

If the shl is at least half the bitwidth (i.e. the lower half of the bswap source is zero), then we can reduce the shift and perform the bswap at half the bitwidth and just zero extend.

I've currently allowed any shift value >= bw/2, but we could limit this to modulo16 so that the shl is always folded away? I'll probably enforce that limit for the InstCombine variant of this fold for PR53867, but I was wondering whether we should be more relaxed in DAG.

Based off PR51391 + PR53867

Diff Detail

Event Timeline

RKSimon created this revision.Feb 19 2022, 11:35 AM
RKSimon requested review of this revision.Feb 19 2022, 11:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2022, 11:35 AM
spatel added inline comments.Feb 22 2022, 10:04 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9616

Is this intentionally different than the usual "N0.hasOneUse()"?

9621

If I'm seeing it correctly, we need to either tighten this clause by removing !LegalOperations or add the "BW % 16" requirement.

As-is (and we probably want a test either way), we could end up worse. For example on AArch64:

define i32 @test_bswap32_shift17(i32 %a0) {
  %s = shl i32 %a0, 17
  %b = call i32 @llvm.bswap.i32(i32 %s)
  ret i32 %b
}

is:

	lsl	w8, w0, #17
	rev	w0, w8

but this patch would make it:

	lsl	w8, w0, #1
	rev	w8, w8
	lsr	w0, w8, #16
RKSimon added inline comments.Feb 22 2022, 11:19 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9616

I'll change it - I'm just used to writing it that way to handle binops with duplicated ops.

9621

Yeah, I figured allowing any shift amount was probably too far - I'll reduce it to (ShAmt %16) == 0 - its what I've been doing on the InstCombine variant that I'm still working on.

craig.topper added inline comments.Feb 22 2022, 10:28 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9624

Should we go through getShiftAmountTy?

RKSimon updated this revision to Diff 410757.Feb 23 2022, 2:52 AM

address feedback

spatel accepted this revision.Feb 24 2022, 8:13 AM

LGTM

This revision is now accepted and ready to land.Feb 24 2022, 8:13 AM
This revision was landed with ongoing or failed builds.Feb 24 2022, 11:34 AM
This revision was automatically updated to reflect the committed changes.

This patch causes a failed assertion when building the arm64 Linux kernel.

A reduced C reproducer:

struct mlx5_ifc_cmd_hca_cap_bits {
  char gid_table_size[16]
};
enum { MLX5_CAP_GENERAL, MLX5_CAP_NUM };
struct mlx5_hca_cap {
  int cur[0]
};
struct mlx5_core_dev {
  struct {
    struct mlx5_hca_cap *hca[MLX5_CAP_NUM]
  } caps
};
short mlx5_query_hca_port_props_1;
mlx5_query_hca_port___trans_tmp_1;
mlx5_query_hca_port() {
  struct mlx5_core_dev *dev = to_mdev();
  int rep = mlx5_query_hca_port___trans_tmp_1 = __builtin_constant_p(0);
  gid_tbl_len(
      (mlx5_query_hca_port___trans_tmp_1
           ? *dev->caps.hca[MLX5_CAP_GENERAL]->cur << 24 |
                 (*dev->caps.hca[MLX5_CAP_GENERAL]->cur & 16711680) >> 8 |
                 (*dev->caps.hca[MLX5_CAP_GENERAL]->cur & 4278190080) >> 24
           : 0) &
      (1 << sizeof((struct mlx5_ifc_cmd_hca_cap_bits *)0)->gid_table_size) - 1);
  mlx5_query_hca_port_props_1 = rep;
}
$ clang --target=aarch64-linux-gnu -O2 -c -o /dev/null main.i
...
clang: /home/nathan/cbl/src/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:979: void (anonymous namespace)::SelectionDAGLegalize::LegalizeOp(llvm::SDNode *): Assertion `(TLI.getTypeAction(*DAG.getContext(), Op.getValueType()) == TargetLowering::TypeLegal || Op.getOpcode() == ISD::TargetConstant || Op.getOpcode() == ISD::Register) && "Unexpected illegal type!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang --target=aarch64-linux-gnu -O2 -c -o /dev/null main.i
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'main.i'.
4.      Running pass 'AArch64 Instruction Selection' on function '@mlx5_query_hca_port'
...

A reduced LLVM IR reproducer:

target triple = "aarch64-unknown-linux-gnu"

@mlx5_query_hca_port_props_1 = external global i16

define i32 @mlx5_query_hca_port(i16* %mlx5_query_hca_port_props_1) {
entry:
  %0 = load i32, i32* null, align 4
  %1 = and i32 %0, -65536
  %2 = tail call i32 @llvm.bswap.i32(i32 %1)
  %and16 = zext i32 %2 to i64
  %call17 = tail call i32 bitcast (i32 (...)* @gid_tbl_len to i32 (i64)*)(i64 %and16)
  store i16 0, i16* %mlx5_query_hca_port_props_1, align 4
  ret i32 0
}

declare i32 @to_mdev(...)

declare i32 @gid_tbl_len(...)

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare i32 @llvm.bswap.i32(i32) #0

attributes #0 = { nofree nosync nounwind readnone speculatable willreturn }
$ llc reduced.ll
llc: /home/nathan/cbl/src/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:979: void (anonymous namespace)::SelectionDAGLegalize::LegalizeOp(llvm::SDNode *): Assertion `(TLI.getTypeAction(*DAG.getContext(), Op.getValueType()) == TargetLowering::TypeLegal || Op.getOpcode() == ISD::TargetConstant || Op.getOpcode() == ISD::Register) && "Unexpected illegal type!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc reduced.ll
1.      Running pass 'Function Pass Manager' on module 'reduced.ll'.
2.      Running pass 'AArch64 Instruction Selection' on function '@mlx5_query_hca_port'
...

@nathanchance Thanks for the repro, I'll get this fixed over the weekend

@nathanchance Please can you confirm if rGfadd20f80d69 fixes your build?

@nathanchance Please can you confirm if rGfadd20f80d69 fixes your build?

Yes, from my preliminary tests, everything appears okay with that fix! Thank you for the quick turnaround.