This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Improve bswap optimization
ClosedPublic

Authored by austin880625 on May 2 2023, 3:27 PM.

Details

Summary

The implementation of D149577

The patch implements a helper function that matches and fold the following cases in the InstCombine pass:

  1. bswap(logic_op(x, bswap(y))) -> logic_op(bswap(x), y)
  2. bswap(logic_op(bswap(x), y)) -> logic_op(x, bswap(y))
  3. bswap(logic_op(bswap(x), bswap(y))) -> logic_op(x, y) in multiuse case, which still reduces the number of instructions.

The helper function accepts bswap and bitreverse intrinsics. This patch folds the bswap cases and remain the bitreverse optimization for the future

Diff Detail

Event Timeline

austin880625 created this revision.May 2 2023, 3:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2023, 3:27 PM
austin880625 requested review of this revision.May 2 2023, 3:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2023, 3:27 PM
austin880625 edited the summary of this revision. (Show Details)May 2 2023, 3:35 PM
goldstein.w.n added inline comments.May 2 2023, 3:59 PM
llvm/lib/Analysis/InstructionSimplify.cpp
6029 ↗(On Diff #518884)

"bswap/bitreverse" -> "bswap"

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10756 ↗(On Diff #518884)

Likewise a helper here so it can be easily extended to handle bitreverse/any other intrinsics that it applies to.
Also, there are no tests for the DAGCombiner changes at the moment.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
70 ↗(On Diff #518884)

What is this change for?

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1334

Instead of making this a member function. Can you leave it static. To create new instructions you can add an extra argument InstCombiner::BuilderTy &Builder.

It should look like: InstCombineAndOrXor.cc::reassociateFCmps

1775
if (Instruction *crossLogicOpFold = foldBitOrderCrossLogicOp<Intrinsic::bswap>(IIOperand))
   return crossLogicOpFold;
austin880625 added inline comments.May 3 2023, 9:51 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
70 ↗(On Diff #518884)

if one of the operand has more than one use, then the transform will not reduce the number of instructions(can look at the bs_and64_multiuse* test case diff) and in DAGCombiner the same optimization also uses || instead of &&:

// Unary ops: logic_op (bswap x), (bswap y) --> bswap (logic_op x, y)
if (HandOpcode == ISD::BSWAP) {
  // If either operand has other uses, this transform is not an improvement.
  if (!N0.hasOneUse() || !N1.hasOneUse())
    return SDValue();
  SDValue Logic = DAG.getNode(LogicOpcode, DL, XVT, X, Y);
  return DAG.getNode(HandOpcode, DL, VT, Logic);                                   
}

I noticed it just because I was referencing D6407 , It's minor and not directly related to the improvement this time. I'm not sure if this kind of change can be included.

goldstein.w.n added inline comments.May 3 2023, 10:25 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
70 ↗(On Diff #518884)

if one of the operand has more than one use, then the transform will not reduce the number of instructions(can look at the bs_and64_multiuse* test case diff) and in DAGCombiner the same optimization also uses || instead of &&:

// Unary ops: logic_op (bswap x), (bswap y) --> bswap (logic_op x, y)
if (HandOpcode == ISD::BSWAP) {
  // If either operand has other uses, this transform is not an improvement.
  if (!N0.hasOneUse() || !N1.hasOneUse())
    return SDValue();
  SDValue Logic = DAG.getNode(LogicOpcode, DL, XVT, X, Y);
  return DAG.getNode(HandOpcode, DL, VT, Logic);                                   
}

I noticed it just because I was referencing D6407 , It's minor and not directly related to the improvement this time. I'm not sure if this kind of change can be included.

It should be a seperate patch.

Its generally okay for a combine to result in the same amount of instructions so if you have a case where this is useful its fine to post as another patch.

nikic added inline comments.May 3 2023, 1:01 PM
llvm/lib/Analysis/InstructionSimplify.cpp
6032 ↗(On Diff #518884)

I don't think we need this comment... This is not the kind of optimization that needs extra justification in the code.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10756 ↗(On Diff #518884)

In any case, this should be part of a separate patch, not mixed with InstCombine changes.

austin880625 updated this revision to Diff 519223.EditedMay 3 2023, 1:07 PM
austin880625 retitled this revision from [InstCombine] Improve bswap optimization to [InstCombine][DAGCombiner] Improve bswap optimization.

add the diff on the test cases for CodeGen(DAGCombiner)

move the pattern matching in DAGCombiner into a helper function that also accepts BITREVERSE operation in future improvement

austin880625 added inline comments.May 3 2023, 1:11 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10756 ↗(On Diff #518884)

oh I just see the new comment. Let me separate this again

austin880625 added inline comments.May 3 2023, 1:19 PM
llvm/lib/Analysis/InstructionSimplify.cpp
6032 ↗(On Diff #518884)

I just see some discussions where this optimization was mentioned, so thought about this comment might help whoever run into it
https://github.com/dotnet/runtime/issues/66249
https://reviews.llvm.org/D6407#change-hPSGuibm8ixX

If this is still not needed, I'll remove it together in the next version of the patch

austin880625 updated this revision to Diff 519240.EditedMay 3 2023, 1:57 PM
austin880625 retitled this revision from [InstCombine][DAGCombiner] Improve bswap optimization to [InstCombine] Improve bswap optimization.

includes changes for InstCombine only, the DAGCombiner implementation is moved to a separate rivision D149783

Can you update the summary to explain the transform.
Other than that looks good.

RKSimon added inline comments.May 4 2023, 6:04 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1347

match(X, m_OneUse(m_Intrinsic<IntrID>(m_Value(OldReorder)))) ?

1350

match(Y, m_OneUse(m_Intrinsic<IntrID>(m_Value(OldReorder)))) ?

goldstein.w.n added inline comments.May 4 2023, 8:56 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1355

In that case that both X and Y match the intrinsic, we should still do the optimization even if X/Y have multi-use.
I.e

declare i16 @llvm.bswap.i16(i16)
declare void @use.i16(i16)
define i16 @bs_and_lhs_bs16(i16 %x, i16 %y) {
  %bx = tail call i16 @llvm.bswap.i16(i16 %x)
  %by = tail call i16 @llvm.bswap.i16(i16 %y)
  %r = and i16 %bx, %by
  %br = tail call i16 @llvm.bswap.i16(i16 %r)

  call void @use.i16(i16 %bx)
  call void @use.i16(i16 %by)    
  ret i16 %br
}

Should become:

declare i16 @llvm.bswap.i16(i16)
declare void @use.i16(i16)
define i16 @bs_and_lhs_bs16(i16 %x, i16 %y) {
  %bx = tail call i16 @llvm.bswap.i16(i16 %x)
  %by = tail call i16 @llvm.bswap.i16(i16 %y)
  %r = and i16 %x, %y
  call void @use.i16(i16 %bx)
  call void @use.i16(i16 %by)    
  ret i16 %r
}

I think an initiali if statement that check for that case would work.

austin880625 edited the summary of this revision. (Show Details)May 4 2023, 12:59 PM

handles bswap(logic_op(bswap(x), bswap(y))) -> logic_op(x, y) case with multi-use

goldstein.w.n accepted this revision.May 4 2023, 8:29 PM

LGTM. Note, I'm not a maintainer so please wait a day or so before pushing so others can check it out.

Maybe follow up patch to enable for bitreverse?

This revision is now accepted and ready to land.May 4 2023, 8:29 PM
RKSimon accepted this revision.May 6 2023, 9:08 AM

LGTM with one minor

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
1334

Can we just use a static_assert here since its a template arg?

Use the static_assert for the intrinsic ID check in compile time

RKSimon accepted this revision.May 6 2023, 2:53 PM

LGTM - cheers

This revision was landed with ongoing or failed builds.May 7 2023, 6:28 AM
This revision was automatically updated to reflect the committed changes.

This patch causes an assertion failure when building the Linux kernel for PowerPC. A reduced LLVM IR reproducer (and the C reproducer it was derived from):

target datalayout = "e-m:e-Fn32-i64:64-n32:64-S128-v256:256:256-v512:512:512"
target triple = "powerpc64le-unknown-linux-gnu"

@_end = external global [0 x i8]

define void @__attribute__kexec_setup(ptr %kernel_end) {
entry:
  %0 = call i64 @llvm.bswap.i64(i64 and (i64 ptrtoint (ptr @_end to i64), i64 4095))
  store i64 %0, ptr %kernel_end, align 8
  ret void
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i64 @llvm.bswap.i64(i64) #0

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
extern char _end[];
long kernel_end;
void __attribute__kexec_setup() {
  kernel_end = __builtin_bswap64(({ (long)_end & 4095; }));
}
$ opt -O3 -disable-output reduced.ll
opt: /mnt/nvme/tmp/cvise.COqG3u2BwX/src/llvm/include/llvm/Support/Casting.h:578: decltype(auto) llvm::cast(From *) [To = llvm::BinaryOperator, From = llvm::Value]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: opt -O3 -disable-output reduced.ll
 #0 0x00005628ee047568 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/mnt/nvme/tmp/cvise.COqG3u2BwX/install/llvm-bad/bin/opt+0x4b8b568)
 #1 0x00005628ee0452ee llvm::sys::RunSignalHandlers() (/mnt/nvme/tmp/cvise.COqG3u2BwX/install/llvm-bad/bin/opt+0x4b892ee)
 #2 0x00005628ee047d28 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fd4ef6efab0 (/usr/lib/libc.so.6+0x39ab0)
 #4 0x00007fd4ef73f26c (/usr/lib/libc.so.6+0x8926c)
 #5 0x00007fd4ef6efa08 raise (/usr/lib/libc.so.6+0x39a08)
 #6 0x00007fd4ef6d8538 abort (/usr/lib/libc.so.6+0x22538)
 #7 0x00007fd4ef6d845c (/usr/lib/libc.so.6+0x2245c)
 #8 0x00007fd4ef6e83d6 (/usr/lib/libc.so.6+0x323d6)
 #9 0x00005628edc4a2c2 llvm::Instruction* foldBitOrderCrossLogicOp<9u>(llvm::Value*, llvm::IRBuilder<llvm::TargetFolder, llvm::IRBuilderCallbackInserter>&) InstCombineCalls.cpp:0:0
#10 0x00005628edc46269 llvm::InstCombinerImpl::visitCallInst(llvm::CallInst&) InstCombineCalls.cpp:0:0
#11 0x00005628edbec59f llvm::InstCombinerImpl::run() InstructionCombining.cpp:0:0
#12 0x00005628edbeee45 combineInstructionsOverFunction(llvm::Function&, llvm::InstructionWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, unsigned int, llvm::LoopInfo*) InstructionCombining.cpp:0:0
#13 0x00005628edbee605 llvm::InstCombinePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/mnt/nvme/tmp/cvise.COqG3u2BwX/install/llvm-bad/bin/opt+0x4732605)
#14 0x00005628ee24841d llvm::detail::PassModel<llvm::Function, llvm::InstCombinePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#15 0x00005628edb00004 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/mnt/nvme/tmp/cvise.COqG3u2BwX/install/llvm-bad/bin/opt+0x4644004)
#16 0x00005628ec6ba78d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) BPFTargetMachine.cpp:0:0
#17 0x00005628edb04273 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/mnt/nvme/tmp/cvise.COqG3u2BwX/install/llvm-bad/bin/opt+0x4648273)
#18 0x00005628ec6ba52d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) BPFTargetMachine.cpp:0:0
#19 0x00005628edaff1f4 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/mnt/nvme/tmp/cvise.COqG3u2BwX/install/llvm-bad/bin/opt+0x46431f4)
#20 0x00005628ec1a08eb llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool) (/mnt/nvme/tmp/cvise.COqG3u2BwX/install/llvm-bad/bin/opt+0x2ce48eb)
#21 0x00005628ec1b0111 main (/mnt/nvme/tmp/cvise.COqG3u2BwX/install/llvm-bad/bin/opt+0x2cf4111)
#22 0x00007fd4ef6d9850 (/usr/lib/libc.so.6+0x23850)
#23 0x00007fd4ef6d990a __libc_start_main (/usr/lib/libc.so.6+0x2390a)
#24 0x00005628ec19a265 _start (/mnt/nvme/tmp/cvise.COqG3u2BwX/install/llvm-bad/bin/opt+0x2cde265)

I'm not able to reproduce the failure with the given IR. Does the failure persist if the and in the bswap are moved out as a separate instruction?
Anyway maybe we should use dyn_cast with if check

I'm not able to reproduce the failure with the given IR. Does the failure persist if the and in the bswap are moved out as a separate instruction?
Anyway maybe we should use dyn_cast with if check

I'm going to revert this for now so you can post the fix.
You can re-open this commit when you are ready.

The following:

-  if (match(V, m_OneUse(m_BitwiseLogic(m_Value(X), m_Value(Y))))) {
+  // Find bitwise logic op. Check that it is a BinaryOperator explicitly so we
+  // don't match ConstantExpr that aren't meaningful for this transform.
+  if (match(V, m_OneUse(m_BitwiseLogic(m_Value(X), m_Value(Y)))) &&
+      isa<BinaryOperator>(V)) {

works for me and leaves both:

@gp = external global [0 x i8]
+
+define void @foo(ptr %out, i64 %a) {
+  %gpi = ptrtoint ptr @gp to i64
+  %exp = and i64 %gpi, 4095
+  %res = call i64 @llvm.bswap.i64(i64 %exp)
+  store i64 %res, ptr %out, align 8
+  ret void
+}
+
+
+define void @bar(ptr %out, i64 %a) {
+  %gpi = ptrtoint ptr @gp to i64
+  %bs_gpi = call i64 @llvm.bswap.i64(i64 %gpi)
+  %exp = and i64 %bs_gpi, 4095
+  %res = call i64 @llvm.bswap.i64(i64 %exp)
+  store i64 %res, ptr %out, align 8
+  ret void
+}
+

working ideally.

Oh I reproduced it in Debug build but cannot in Release build. I might be able to troubleshoot in detail in the following day

Add explicit check for BinaryOperator and add ConstExpr test case

try to update the patch base as previous one failed

goldstein.w.n reopened this revision.May 8 2023, 10:26 PM
This revision is now accepted and ready to land.May 8 2023, 10:26 PM

try to update the patch base as previous one failed

Should work now. You need to reopen the revision.

Sorry if inappropriate, but does this patch need further actions/improvement on my side?

Sorry if inappropriate, but does this patch need further actions/improvement on my side?

Sorry! I've been dragging my feet on this. It looks good. I will test this and the DAGCombiner patch today/tonight.
If you haven't heard back in 24hr ping back.

Its not inappropriate at all to ping. roughly once a week is appropriate at all times unless there are
unaddressed feedbacks to already handle. Sometime more often is appropriate as well if its say for
an urgent bug.

goldstein.w.n accepted this revision.May 15 2023, 6:58 PM

LGTM. Do you need me to commit?

Austin, can you give me a commit message for the re-push.

Something like

'Recommit "<original tilte>" (2nd Try)

<Explain why it now works>'

Then I'll push this and the bswap patches for you.

Thanks for helping out! Here's the proposed commit message

Recommit: [InstCombine] Improve bswap optimization(2nd try)

Check the operator is BinaryOperator before cast so that we won't match ConstExpr

Thanks for helping out! Here's the proposed commit message

Recommit: [InstCombine] Improve bswap optimization(2nd try)

Check the operator is BinaryOperator before cast so that we won't match ConstExpr

Does:

Recommit "[InstCombine] Improve bswap optimization" (2nd try)

Issue was assertion failure due to an unchecked `cast`. Fix is to check the operator is
`BinaryOperator` before cast so that we won't match `ConstExpr`

Sound okay? Just to reference what the original issue was.

Thanks for helping out! Here's the proposed commit message

Recommit: [InstCombine] Improve bswap optimization(2nd try)

Check the operator is BinaryOperator before cast so that we won't match ConstExpr

Does:

Recommit "[InstCombine] Improve bswap optimization" (2nd try)

Issue was assertion failure due to an unchecked `cast`. Fix is to check the operator is
`BinaryOperator` before cast so that we won't match `ConstExpr`

Sound okay? Just to reference what the original issue was.

BTW: Austin Chang <austin880625@gmail.com>

Thanks for helping out! Here's the proposed commit message

Recommit: [InstCombine] Improve bswap optimization(2nd try)

Check the operator is BinaryOperator before cast so that we won't match ConstExpr

Does:

Recommit "[InstCombine] Improve bswap optimization" (2nd try)

Issue was assertion failure due to an unchecked `cast`. Fix is to check the operator is
`BinaryOperator` before cast so that we won't match `ConstExpr`

Sound okay? Just to reference what the original issue was.

OK, let's proceed with this one

This revision was landed with ongoing or failed builds.May 16 2023, 4:58 PM
This revision was automatically updated to reflect the committed changes.