This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Add Target-Independent Compiler Barrier
AbandonedPublic

Authored by lenary on Dec 8 2020, 6:12 AM.

Details

Summary

Several different target backends end up implementing compiler-only
barrier pseudo-instructions which eventually expand into an assembly
comment.

These come from for example C's atomic_signal_fence, C++'s
std::atomic_signal_fence, or Rust's compiler_barrier which are all
documented to not result in assembly instructions. While emitting a
fence instruction (or equivalent) is not incorrect within usual memory
models, it is inefficient. Depending on the target configuration, these
usually turn into atomic_fence before SelectionDAG (with syncscope of
"singlethread"), where they are then turned first into a target-specific
ISD node, and then into a pseudo-instruction which prints as a comment.

Most LLVM Target backends seem to have copied a MEMBARRIER
target-specific ISD node, so every new LLVM target needs to replicate
all these parts to get the same behaviour. Some targets don't do this,
and end up emitting fence instructions without paying attention to the
syncscope fence argument.

LLVM, however, has target-independent machine opcodes, which have a
variety of uses in LLVM's backend code. This patch adds a
COMPILER_BARRIER target-independent opcode to represent a compiler-only
barrier, which does not generate instructions. This opcode still
contains information about the syncscope and ordering of the original
atomic_fence, so that backend-specific optimisations still have this
information.

This patch also refactors existing instances of target-specific
MEMBARRIERs to use the new target-independent COMPILER_BARRIER.

Diff Detail

Event Timeline

lenary created this revision.Dec 8 2020, 6:12 AM
lenary requested review of this revision.Dec 8 2020, 6:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2020, 6:12 AM
Herald added a subscriber: MaskRay. · View Herald Transcript
craig.topper added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
29204

Would it make sense to have a SelectionDAG::getCompilerBarrier method that would hide most of the boilerplate for creating these nodes?

lenary added inline comments.Dec 10 2020, 8:52 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
29204

Almost certainly.

One thing I did run into was I wanted to keep the ordering in the barrier around for later passes - but I ran into issues on x86 where the immediate I was using to represent it was getting allocated into a register and that caused instruction verification to fail. I could look at bringing that back if people thought it was useful (for instance, later Machine Passes may want to know if they can hoist/sink loads and stores past a barrier with more granularity than just "No").

arsenm added a subscriber: arsenm.Dec 10 2020, 8:53 AM
arsenm added inline comments.
llvm/include/llvm/Support/TargetOpcodes.def
220–223

I think the scope should also be parameterized. This also does not match the description of the defined instruction which is missing the ordering operand

arsenm added a subscriber: t-tye.Dec 10 2020, 8:53 AM
craig.topper added inline comments.Dec 10 2020, 8:57 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
29204

Can you use getTargetConstant? That should never end up in a register.

lenary added inline comments.Dec 10 2020, 9:15 AM
llvm/include/llvm/Support/TargetOpcodes.def
220–223

The documentation about ordering is out of date, I did include it before, but then ran into x86 issues, so removed it. I'm looking at bringing it back with @craig.topper's suggestions.

Because this barrier explicitly does not translate to machine code (that would be a place where you used a target-dependent fence or similar), I don't want to suggest this has any syncscope that is not singlethread, but maybe that's wrong?

llvm/lib/Target/X86/X86ISelLowering.cpp
29204

I'll try again, I cannot remember exactly what I used for that version of the code.

lenary updated this revision to Diff 310959.Dec 10 2020, 10:38 AM
  • Include Ordering Again (using getTargetConstant)
  • Add Helper for creating DAG nodes
jrtc27 added inline comments.Dec 10 2020, 10:49 AM
llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
286–287

I don't think this should be reached any more?

llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp
610–611

The change to emitFunctionBody should render this dead?

llvm/test/CodeGen/X86/atomic-idempotent.ll
384

What's up with this change?

craig.topper added inline comments.Dec 10 2020, 10:57 AM
llvm/test/CodeGen/X86/atomic-idempotent.ll
384

I'm guessing that TargetOpcode::COMPILER_BARRIER is giving a different value from TargetSchedMode::computeInstrLatency() that X86::MEMBARRIER did which changed how X86PadShortFunctions.cpp computed how many NOPs to add before the return.

jrtc27 added inline comments.Dec 10 2020, 11:07 AM
llvm/test/CodeGen/X86/atomic-idempotent.ll
384

I guess the question then is "which is better?".

craig.topper added inline comments.Dec 10 2020, 11:14 AM
llvm/test/CodeGen/X86/atomic-idempotent.ll
384

I think we were counting a "compiler barrier" that doesn't generate code as having a non-zero or multiple cycle latency before. So this change is probably better.

bcain added a subscriber: bcain.Dec 10 2020, 11:59 AM
lenary updated this revision to Diff 311414.Dec 12 2020, 3:56 PM
  • Address Minor Review Comments, Method Naming issues.
jrtc27 added inline comments.Dec 12 2020, 4:01 PM
llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp
612

"both" no longer applies

lenary updated this revision to Diff 311415.Dec 12 2020, 4:20 PM
  • Comments and Whitespace
lenary marked 5 inline comments as done.Dec 12 2020, 4:29 PM
lkail added a subscriber: lkail.Dec 17 2020, 10:58 PM

Ping

@arsenm Please may you clarify why you think the scope is still useful to represent this late?

skan added a subscriber: skan.Jan 24 2021, 11:13 PM
skan added a comment.Jan 25 2021, 1:29 AM

FYI, this patch also fix the bug that "X86::Int_MemBarrier should be a MetaInstruction".

skan added a comment.Jan 31 2021, 5:54 PM

@lenary Do you need to update something? There are three tests fail according to the log.

@lenary Do you need to update something? There are three tests fail according to the log.

I'm waiting for some confirmation that people are happy with the approach. I've had a little feedback but not much, and hardly any actual code review.

Can you link to the bug you are referencing? With a bug to point to, it'll be easier to push for this to land.

skan added a comment.Feb 17 2021, 6:18 PM

Can you link to the bug you are referencing? With a bug to point to, it'll be easier to push for this to land.

The failed LIT tests in the remote tests are "x64 debian > LLVM.CodeGen/RISCV::atomic-fence.ll", "x64 windows > LLVM.CodeGen/RISCV::atomic-fence.ll", "x64 windows > LLVM.CodeGen/XCore::threads.ll", I think you can see them on this page.

The bug "X86::Int_MemBarrier should be a MetaInstruction" is an internal bug, I found it when I was doing some development work about debug info personally. Although I could not give a failed test, here is a way to trigger that:

define dso_local i32 @main() !dbg !7 {
entry:
  fence acquire, !dbg !9
  fence release, !dbg !9
  %retval = alloca i32, align 4
  store i32 0, i32* %retval, align 4
  ret i32 0, !dbg !9
}

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!3, !4, !5}
!llvm.ident = !{!6}

!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 11.0.0", isOptimized: false, runtimeVersion: 0, emissionKind: DebugDirectivesOnly, enums: !2, splitDebugInlining: false, nameTableKind: None)
!1 = !DIFile(filename: "none.c", directory: "/temp")
!2 = !{}
!3 = !{i32 2, !"PersonalFlag", i32 1}
!4 = !{i32 2, !"Debug Info Version", i32 3}
!5 = !{i32 1, !"wchar_size", i32 4}
!6 = !{!"clang version 11.0.0"}
!7 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 1, type: !8, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
!8 = !DISubroutineType(types: !2)
!9 = !DILocation(line: 1, column: 14, scope: !7)
llc -O0 -mtriple x86_64-linux-gnu  test.ll

IR fence acquire and fence release is lowered to X86::Int_MemBarrier, and the interface bool isMetaInstruction() const returns false for it. So if you add an assert
for this, the program will crash.

lenary updated this revision to Diff 351798.Jun 14 2021, 2:19 AM

Updated based on review feedback:

  • Migrated more targets to COMPILER_BARRIER
  • Added syncscope argument to COMPILER_BARRIER
lenary retitled this revision from [RFC][SelectionDAG] Add Target-Independent Compiler Barrier to [SelectionDAG] Add Target-Independent Compiler Barrier.Jun 14 2021, 2:19 AM
lenary edited the summary of this revision. (Show Details)
lenary added inline comments.Jun 14 2021, 2:29 AM
llvm/include/llvm/Support/TargetOpcodes.def
220

Sorry, I missed this comment

lenary added a subscriber: simoll.

I'm looking for reviewers from the following targets for this change:

Yes, I will look at this for the XCore target.

XCore change LGTM. (Commenting on XCore only.)

llvm/include/llvm/Target/TargetSelectionDAG.td defines SDTMemBarrier. This seems to have been introduced in 9b254eed32028 for ISD::MEMBARRIER, which was removed some time ago. I can't find a use of it and I can build without it. Could SDTMemBarrier be removed and would it belong in this patch?

llvm/include/llvm/Target/TargetSelectionDAG.td defines SDTMemBarrier. This seems to have been introduced in 9b254eed32028 for ISD::MEMBARRIER, which was removed some time ago. I can't find a use of it and I can build without it. Could SDTMemBarrier be removed and would it belong in this patch?

Yes, that makes sense to me, I think I removed some target-specific TypeProfiles which were now unused, so it makes sense to remove this too.

Reverse ping. Is there any blocker issue for this patch landing?

skan added a comment.Mar 16 2022, 10:39 PM

Can you link to the bug you are referencing? With a bug to point to, it'll be easier to push for this to land.

The failed LIT tests in the remote tests are "x64 debian > LLVM.CodeGen/RISCV::atomic-fence.ll", "x64 windows > LLVM.CodeGen/RISCV::atomic-fence.ll", "x64 windows > LLVM.CodeGen/XCore::threads.ll", I think you can see them on this page.

The bug "X86::Int_MemBarrier should be a MetaInstruction" is an internal bug, I found it when I was doing some development work about debug info personally. Although I could not give a failed test, here is a way to trigger that:

define dso_local i32 @main() !dbg !7 {
entry:
  fence acquire, !dbg !9
  fence release, !dbg !9
  %retval = alloca i32, align 4
  store i32 0, i32* %retval, align 4
  ret i32 0, !dbg !9
}

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!3, !4, !5}
!llvm.ident = !{!6}

!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 11.0.0", isOptimized: false, runtimeVersion: 0, emissionKind: DebugDirectivesOnly, enums: !2, splitDebugInlining: false, nameTableKind: None)
!1 = !DIFile(filename: "none.c", directory: "/temp")
!2 = !{}
!3 = !{i32 2, !"PersonalFlag", i32 1}
!4 = !{i32 2, !"Debug Info Version", i32 3}
!5 = !{i32 1, !"wchar_size", i32 4}
!6 = !{!"clang version 11.0.0"}
!7 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 1, type: !8, scopeLine: 1, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
!8 = !DISubroutineType(types: !2)
!9 = !DILocation(line: 1, column: 14, scope: !7)
llc -O0 -mtriple x86_64-linux-gnu  test.ll

IR fence acquire and fence release is lowered to X86::Int_MemBarrier, and the interface bool isMetaInstruction() const returns false for it. So if you add an assert
for this, the program will crash.

A LIT test is provided at D121879.

Reverse ping. Is there any blocker issue for this patch landing?

iirc, I have some planned changes and haven't got back to them recently. In the meantime, hopefully @skan's patch tides you over.

Reverse ping. Is there any blocker issue for this patch landing?

iirc, I have some planned changes and haven't got back to them recently. In the meantime, hopefully @skan's patch tides you over.

Thanks! I personally prefer this patch. But yes, we can land @skan's patch first.

lenary abandoned this revision.Jan 10 2023, 3:26 PM
lenary added a subscriber: reames.

Superseded by work from @reames

Replying here mostly so I can find the information later.

Over the last few days, I have submitted a couple of changes which factor out a target independent MEMBARRIER node. At this point, all targets except ARM, AArch64, AMDGPU, and WebAssembly have been migrated. The former two are on review, the third I don't plan to touch as it uses a late expansion of atomic_fence instead, and has some target specific logic based on that which is hard to follow without architecture context I don't have. The fourth I haven't looked at closely as I just found the alternate spelling of the barrier name it uses.

The choice of MEMBARRIER vs COMPILER_BARRIER naming was arbitrary. I went with what more targets seemed to have used to minimize test churn.

I didn't find an in tree example which required the ordering on the generic instruction. Given that, I left it out. I am not opposed to adding it, but would like to see a justification before we add complexity.

As far as I can tell, the scope argument in this patch is purely spurious. A MEMBARRIER can only have a single scope - SingleThread. This is definitely true for all the targets I've looked at closely. AMDGPU *might* be a counter example, but I don't understand the semantics of it's scopes closely well enough to know for sure.

Finally, this patch contains two functional changes to the x86 backend. I did not include those. Separating those out - with tests! - would likely be valuable. I'd be happy to review if you wanted to do that.