Page MenuHomePhabricator

[RFC][SelectionDAG] Add Target-Independent Compiler Barrier
Needs ReviewPublic

Authored by lenary on Dec 8 2020, 6:12 AM.

Details

Summary

While working on the RISC-V backend, we got bug reports from language frontends
that we were emitting instructions for compiler-only barriers: while emittiong
instructions wasn't incorrect in our case, it was inefficient, and contradicted
high-level language documentation which claimed that certain constructs (C++'s
std::atomic_signal_fence, C's atomic_signal_fence, Rust's
compiler_barrier) would not result in assembly instructions.

Many LLVM Target backends implement the same "MEMBARRIER" target-specific ISD
node to represent a compiler-only barrier, which we could have replicated in the
RISC-V backend, but I felt that it is now a reasonable time to have a
target-independent method for achieving the same, which backends can opt-in to
if they don't need to emit instructions.

It's not clear to me that all targets have strong enough memory models that this
should be applied across the board, but I've tried to implement it in a way that
backends can opt-in to using a target-independent barrier.

At the moment, I've modified the following backends to use the
target-independent barrier, to show how other backends could be updated:

  • X86
  • RISC-V
  • Arm
  • AArch64

I plan to write a short RFC for llvm-dev about this patch, as it has the
potential to affect many backends, including out-of-tree backends.

Diff Detail

Unit TestsFailed

TimeTest
40 msx64 debian > LLVM.CodeGen/RISCV::atomic-fence.ll
Script: -- : 'RUN: at line 2'; /mnt/disks/ssd0/agent/llvm-project/build/bin/llc -mtriple=riscv32 -verify-machineinstrs < /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/RISCV/atomic-fence.ll | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefix=RV32I /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/RISCV/atomic-fence.ll
60 msx64 windows > LLVM.CodeGen/RISCV::atomic-fence.ll
Script: -- : 'RUN: at line 2'; c:\ws\w32-1\llvm-project\premerge-checks\build\bin\llc.exe -mtriple=riscv32 -verify-machineinstrs < C:\ws\w32-1\llvm-project\premerge-checks\llvm\test\CodeGen\RISCV\atomic-fence.ll | c:\ws\w32-1\llvm-project\premerge-checks\build\bin\filecheck.exe -check-prefix=RV32I C:\ws\w32-1\llvm-project\premerge-checks\llvm\test\CodeGen\RISCV\atomic-fence.ll
90 msx64 windows > LLVM.CodeGen/XCore::threads.ll
Script: -- : 'RUN: at line 1'; c:\ws\w32-1\llvm-project\premerge-checks\build\bin\llc.exe -march=xcore < C:\ws\w32-1\llvm-project\premerge-checks\llvm\test\CodeGen\XCore\threads.ll | c:\ws\w32-1\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w32-1\llvm-project\premerge-checks\llvm\test\CodeGen\XCore\threads.ll

Event Timeline

lenary created this revision.Dec 8 2020, 6:12 AM
lenary requested review of this revision.Dec 8 2020, 6:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2020, 6:12 AM
Herald added a subscriber: MaskRay. · View Herald Transcript
craig.topper added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
28652

Would it make sense to have a SelectionDAG::getCompilerBarrier method that would hide most of the boilerplate for creating these nodes?

lenary added inline comments.Dec 10 2020, 8:52 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
28652

Almost certainly.

One thing I did run into was I wanted to keep the ordering in the barrier around for later passes - but I ran into issues on x86 where the immediate I was using to represent it was getting allocated into a register and that caused instruction verification to fail. I could look at bringing that back if people thought it was useful (for instance, later Machine Passes may want to know if they can hoist/sink loads and stores past a barrier with more granularity than just "No").

arsenm added a subscriber: arsenm.Dec 10 2020, 8:53 AM
arsenm added inline comments.
llvm/include/llvm/Support/TargetOpcodes.def
212–215

I think the scope should also be parameterized. This also does not match the description of the defined instruction which is missing the ordering operand

arsenm added a subscriber: t-tye.Dec 10 2020, 8:53 AM
craig.topper added inline comments.Dec 10 2020, 8:57 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
28652

Can you use getTargetConstant? That should never end up in a register.

lenary added inline comments.Dec 10 2020, 9:15 AM
llvm/include/llvm/Support/TargetOpcodes.def
212–215

The documentation about ordering is out of date, I did include it before, but then ran into x86 issues, so removed it. I'm looking at bringing it back with @craig.topper's suggestions.

Because this barrier explicitly does not translate to machine code (that would be a place where you used a target-dependent fence or similar), I don't want to suggest this has any syncscope that is not singlethread, but maybe that's wrong?

llvm/lib/Target/X86/X86ISelLowering.cpp
28652

I'll try again, I cannot remember exactly what I used for that version of the code.

lenary updated this revision to Diff 310959.Dec 10 2020, 10:38 AM
  • Include Ordering Again (using getTargetConstant)
  • Add Helper for creating DAG nodes
jrtc27 added inline comments.Dec 10 2020, 10:49 AM
llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
278–279

I don't think this should be reached any more?

llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp
606–607

The change to emitFunctionBody should render this dead?

llvm/test/CodeGen/X86/atomic-idempotent.ll
384

What's up with this change?

craig.topper added inline comments.Dec 10 2020, 10:57 AM
llvm/test/CodeGen/X86/atomic-idempotent.ll
384

I'm guessing that TargetOpcode::COMPILER_BARRIER is giving a different value from TargetSchedMode::computeInstrLatency() that X86::MEMBARRIER did which changed how X86PadShortFunctions.cpp computed how many NOPs to add before the return.

jrtc27 added inline comments.Dec 10 2020, 11:07 AM
llvm/test/CodeGen/X86/atomic-idempotent.ll
384

I guess the question then is "which is better?".

craig.topper added inline comments.Dec 10 2020, 11:14 AM
llvm/test/CodeGen/X86/atomic-idempotent.ll
384

I think we were counting a "compiler barrier" that doesn't generate code as having a non-zero or multiple cycle latency before. So this change is probably better.

bcain added a subscriber: bcain.Dec 10 2020, 11:59 AM
lenary updated this revision to Diff 311414.Dec 12 2020, 3:56 PM
  • Address Minor Review Comments, Method Naming issues.
jrtc27 added inline comments.Dec 12 2020, 4:01 PM
llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp
607–608

"both" no longer applies

lenary updated this revision to Diff 311415.Dec 12 2020, 4:20 PM
  • Comments and Whitespace
lenary marked 5 inline comments as done.Dec 12 2020, 4:29 PM
lkail added a subscriber: lkail.Dec 17 2020, 10:58 PM

Ping

@arsenm Please may you clarify why you think the scope is still useful to represent this late?

skan added a subscriber: skan.Jan 24 2021, 11:13 PM
skan added a comment.Jan 25 2021, 1:29 AM

FYI, this patch also fix the bug that "X86::Int_MemBarrier should be a MetaInstruction".

skan added a comment.Jan 31 2021, 5:54 PM

@lenary Do you need to update something? There are three tests fail according to the log.

@lenary Do you need to update something? There are three tests fail according to the log.

I'm waiting for some confirmation that people are happy with the approach. I've had a little feedback but not much, and hardly any actual code review.

Can you link to the bug you are referencing? With a bug to point to, it'll be easier to push for this to land.

skan added a comment.Wed, Feb 17, 6:18 PM

Can you link to the bug you are referencing? With a bug to point to, it'll be easier to push for this to land.

The failed LIT tests in the remote tests are "x64 debian > LLVM.CodeGen/RISCV::atomic-fence.ll", "x64 windows > LLVM.CodeGen/RISCV::atomic-fence.ll", "x64 windows > LLVM.CodeGen/XCore::threads.ll", I think you can see them on this page.

The bug "X86::Int_MemBarrier should be a MetaInstruction" is an internal bug, I found it when I was doing some development work about debug info personally. Although I could not give a failed test, here is a way to trigger that:

define dso_local i32 @main() !dbg !7 {
entry:
  fence acquire, !dbg !9
  fence release, !dbg !9
  %retval = alloca i32, align 4
  store i32 0, i32* %retval, align 4
  ret i32 0, !dbg !9
}

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!3, !4, !5}
!llvm.ident = !{!6}

!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 11.0.0", isOptimized: false, runtimeVersion: 0, emissionKind: DebugDirectivesOnly, enums: !2, splitDebugInlining: false, nameTableKind: None)
!1 = !DIFile(filename: "none.c", directory: "/temp")
!2 = !{}
!3 = !{i32 2, !"PersonalFlag", i32 1}
!4 = !{i32 2, !"Debug Info Version", i32 3}
!5 = !{i32 1, !"wchar_size", i32 4}
!6 = !{!"clang version 11.0.0"}
!7 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 1, type: !8, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
!8 = !DISubroutineType(types: !2)
!9 = !DILocation(line: 1, column: 14, scope: !7)
llc -O0 -mtriple x86_64-linux-gnu  test.ll

IR fence acquire and fence release is lowered to X86::Int_MemBarrier, and the interface bool isMetaInstruction() const returns false for it. So if you add an assert
for this, the program will crash.