Page MenuHomePhabricator

[X86] Use bit test instructions to optimize some logic atomic operations
AbandonedPublic

Authored by pengfei on Dec 29 2021, 7:32 AM.

Details

Summary

This is to match GCC's optimizations: https://gcc.godbolt.org/z/3odh9e7WE

Diff Detail

Unit TestsFailed

TimeTest
60,130 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv64 -target-feature +f -target-feature +d -target-feature +zfh -target-feature +v -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vloxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vloxseg.c
60,140 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vluxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv64 -target-feature +f -target-feature +d -target-feature +zfh -target-feature +v -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vluxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vluxseg.c
60,170 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vluxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv64 -target-feature +f -target-feature +d -target-feature +v -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-overloaded/vluxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics-overloaded/vluxseg.c

Event Timeline

pengfei created this revision.Dec 29 2021, 7:32 AM
pengfei requested review of this revision.Dec 29 2021, 7:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 29 2021, 7:32 AM
craig.topper added inline comments.Dec 29 2021, 2:36 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
30332

Probably want to make sure I is in the same BB given the SelectionDAG restrictions?

30341

Is this the same as ~C1->getValue() == C2->getValue()?

31115

Why does it matter if the AND has more than one use? Or were you trying to check that the ATOMIC_LOAD has only one use of the Result 0?

31151

Use the getSETCC function from this file.

31153

Don't you need getZExtOrTrunc here? You aren't guaranteed to shift out the extended bits so you need to make sure they are 0 like the and would have done.

llvm/test/CodeGen/X86/atomic-bit-test.ll
292

Was -1 supposed to be -2?

332

Was -2 supposed to be -3?

pengfei updated this revision to Diff 396629.Dec 30 2021, 1:06 AM

Great review! Thanks Craig.

llvm/lib/Target/X86/X86ISelLowering.cpp
30332

Yes, good catch!

30341

Good idea!

31115

Yes, good catch!

31151

Good idea!

31153

Yes. I misunderstood of AnyExt. Good catch!

llvm/test/CodeGen/X86/atomic-bit-test.ll
292

Good catch!

332

Good catch!

craig.topper added inline comments.Jan 7 2022, 12:54 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
31078

Verify that operand 1 is a constant too.

31079

My big concern is that we rely on the checks done in IR holding until SelectionDAG. If that fails and a user hits this fatal error, how would they know how to fix their code?

Did you give any thought about converting the patterns into a target specific intrinsic in IR? That would probably need to be a new hook in the AtomicExpandPass or a separate X86 pass.

@RKSimon or @spatel what are your thoughts?

pengfei added inline comments.Jan 8 2022, 4:08 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
31078

This is not used to check constantency but to check the code is entered once only (C2 hasn't been assigned to a constant value).
We check the constantency through cast<ConstantSDNode>

31079

Did you give any thought about converting the patterns into a target specific intrinsic in IR? That would probably need to be a new hook in the AtomicExpandPass or a separate X86 pass.

I'm not sure if we can simply use target specific intrinsic here. Atomic instruction has attributes like ordering that need special handling.

My big concern is that we rely on the checks done in IR holding until SelectionDAG. If that fails and a user hits this fatal error, how would they know how to fix their code?

We mainly check 3 aspects here:

  1. One use, constantency and consistency of the constants.
    • I don't think they will be changed given there are no room to optimize them.
  2. AND operation.
    • It won't be changed either. The reason is the same. But we might have chance to roll back the AND if it changed.
  3. AND operation been splitted to a different BB.
    • There may be chance to happen in reality. We can always hoise it by a separate pass before SelectionDAG. I'd like to leave it till it happens.

What do you think?

This test triggers the fatal error

#include <atomic>

std::atomic<int> x;

void bar();
void baz();

void foo(int a, int b) {
  int y = std::atomic_fetch_or(&x, 8);
  if (y & 8 && a)
    bar();
  if ((y & 8) && b)
    baz();
}
pengfei updated this revision to Diff 398392.Jan 8 2022, 8:35 PM

Break AND + CMP 0 fold for atomic RMW logic instructions. This fixes the problem in Craig's example. Thanks Craig!

This test fails

%"struct.std::atomic" = type { %"struct.std::__atomic_base" }                    
%"struct.std::__atomic_base" = type { i32 }                                      
                                                                                 
@x = global %"struct.std::atomic" zeroinitializer, align 4                       
                                                                                 
define i32 @_Z3fooii(i32 %0, i32 %1) {                                           
  %3 = atomicrmw or i32* getelementptr inbounds (%"struct.std::atomic", %"struct.std::atomic"* @x, i64 0, i32 0, i32 0), i32 8 seq_cst, align 4
  %4 = tail call i32 @llvm.ctlz.i32(i32 %0, i1 false)                            
  %5 = and i32 %3, 8                                                             
  %6 = icmp ne i32 %5, 0                                                         
  %7 = icmp ne i32 %1, 0                                                         
  %8 = select i1 %6, i1 %7, i1 false                                             
  br i1 %8, label %9, label %10                                                  
                                                                                 
9:                                                ; preds = %2                   
  tail call void @_Z3bazi(i32 %4)                                                
  br label %10                                                                   
                                                                                 
10:                                               ; preds = %9, %2               
  ret i32 %4                                                                     
}                                                                                
                                                                                 
declare i32 @llvm.ctlz.i32(i32, i1 immarg)                                       
                                                                                 
declare void @_Z3bazi(i32)

CodeGenPrepare splits the basic block to despeculate the ctlz.

pengfei updated this revision to Diff 398418.Jan 9 2022, 5:07 AM

Add a new pass to hosit AND instructions that been split by other passes.
This fixes the failed example. Thanks a lot for Craig's helps!

craig.topper added inline comments.Jan 10 2022, 11:21 AM
llvm/lib/Target/X86/X86AtomicANDHoist.cpp
62

It might not be safe to move the And if the other operand isn't a constant. There's no guarantee that the atomic instruction occurs after the source of the other operand.

pengfei updated this revision to Diff 398911.Jan 11 2022, 4:43 AM
pengfei marked an inline comment as done.

Add constant check for AND operand.

llvm/lib/Target/X86/X86AtomicANDHoist.cpp
62

Makes sense. Thank you!

pre-commit the tests and rebase so this patch shows the codegen diff?

llvm/lib/Target/X86/X86ISelLowering.h
1647

Log?

reverse ping?

craig.topper added inline comments.Feb 6 2022, 11:55 AM
llvm/lib/Target/X86/X86ISelLowering.h
1647

I assumed it meant Logic

pengfei updated this revision to Diff 406313.Feb 6 2022, 6:39 PM

Rename Log to Logic;
Pre-commit the tests and rebased on it;

This breaks -O0. Fast isel consumes the And, then we fall back to SelectionDAG, but the And is gone so it triggers the fatal error.

I'm uncomfortable with the cross pass requirements this approach depends on. It seems very fragile.

This breaks -O0. Fast isel consumes the And, then we fall back to SelectionDAG, but the And is gone so it triggers the fatal error.

I'm uncomfortable with the cross pass requirements this approach depends on. It seems very fragile.

Thanks Craig! An alternative implementation D120199.

pengfei abandoned this revision.Mar 6 2022, 11:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2022, 11:51 PM