Page MenuHomePhabricator

[X86] Improve lowering of idemptotent RMW operations
ClosedPublic

Authored by reames on Feb 25 2019, 9:22 AM.

Details

Summary

The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local.

Diff Detail

Repository
rL LLVM

Event Timeline

reames created this revision.Feb 25 2019, 9:22 AM
jfb added a comment.Feb 25 2019, 10:56 AM

Overall this looks good, but I have a few comments.

lib/Target/X86/X86ISelLowering.cpp
25482 ↗(On Diff #188200)

This comment needs an update.

25492 ↗(On Diff #188200)

This seems to need an update too.

test/CodeGen/X86/atomic-idempotent.ll
193 ↗(On Diff #188200)

Can you test monotonic as well?
Also, i8 and i16.

jfb added a subscriber: anemet.Feb 25 2019, 10:56 AM
reames planned changes to this revision.Feb 25 2019, 8:31 PM
reames marked 2 inline comments as done.
reames added inline comments.
lib/Target/X86/X86ISelLowering.cpp
25482 ↗(On Diff #188200)

I'm not sure what update you're asking for. If we fall down this path, the comments appear correct as they ever were. Something I'm missing?

test/CodeGen/X86/atomic-idempotent.ll
193 ↗(On Diff #188200)

Will do.

reames updated this revision to Diff 188385.Feb 26 2019, 8:31 AM

Add tests requested by JF

jfb accepted this revision.Feb 26 2019, 11:44 AM

Overall this LGTM, but that part of CodeGen isn't my area of expertise so it would be nice to get another person to chime in.

This revision is now accepted and ready to land.Feb 26 2019, 11:44 AM
anemet edited reviewers, added: craig.topper; removed: jfb.Feb 26 2019, 1:01 PM
This revision now requires review to proceed.Feb 26 2019, 1:01 PM
anemet added a reviewer: jfb.Feb 26 2019, 1:01 PM
craig.topper added inline comments.Apr 23 2019, 10:28 PM
lib/Target/X86/X86ISelLowering.cpp
25455 ↗(On Diff #188385)

Should that be "auto *C"? I think we try to keep the * on pointers.

25995 ↗(On Diff #188385)

Line comments up?

25999 ↗(On Diff #188385)

Shouldn't this be mi8 for a shorter encoding?

Does 64-bit really need a 64-bit access or can we just use a 32-bit access?

26081 ↗(On Diff #188385)

I think you can use isNullConstant

craig.topper added inline comments.Apr 23 2019, 10:34 PM
lib/Target/X86/X86ISelLowering.cpp
25984 ↗(On Diff #188385)

Windows 64 definitely doesn't have a red zone. I'm not sure if any 32-bit target does.

reames updated this revision to Diff 198698.May 8 2019, 10:56 AM
reames marked an inline comment as done.

Address review comments

reames marked 4 inline comments as done.May 8 2019, 10:58 AM
reames added inline comments.
lib/Target/X86/X86ISelLowering.cpp
25984 ↗(On Diff #188385)

Does my whitelist approach seem reasonable here? Or do you have a better suggestion?

25999 ↗(On Diff #188385)

For the 64 bit, we need a valid stack location. There's no guarantee that ESP will contain a valid address. (Since it's the 32 bit truncation of RSP and the stack may be outside the low 4 GB.)

jfb added inline comments.May 8 2019, 11:16 AM
lib/Target/X86/X86ISelLowering.cpp
26255 ↗(On Diff #198698)

I'd rather see something more precise here.

We have a noredzone function attribute which should be honored:
https://releases.llvm.org/3.7.0/docs/LangRef.html#id630

aarch64 has some handling for redzone too:
https://llvm.org/doxygen/AArch64FrameLowering_8cpp_source.html#l00186

So does X86FrameLowering.cpp, X86InstrInfo.cpp, and X86MachineFunctionInfo.h.

Some other architectures to this too. I think you should do the same: if the attribute isn't present, redzone away. If some targets don't have a redzone yet don't set the attribute, they've already opened themselves to pain. Maybe we need a PSA on the mailing list to make sure these targets set noredzone on all their functions.

craig.topper added inline comments.May 8 2019, 11:18 AM
lib/Target/X86/X86ISelLowering.cpp
25999 ↗(On Diff #188385)

Sorry. I meant can we use OR32mi8Locked in 64-bit mode, but still use RSP for the address ? The amount of stack space accessed isn't important right? We could read/write 4 bytes from the stack? I mainly ask because OR32mi8Locked is 1 byte shorter to encode than LOCK_OR64mi8 if none of the registers used force the use of a REX prefix.

craig.topper added inline comments.May 8 2019, 11:30 AM
lib/Target/X86/X86ISelLowering.cpp
26255 ↗(On Diff #198698)

But clang doesn't currently do that does it? Wouldn't it also create a bitcode backwards compatibility issue?

jfb added inline comments.May 8 2019, 12:46 PM
lib/Target/X86/X86ISelLowering.cpp
26255 ↗(On Diff #198698)

The above heuristic must, at a minimum, be updated to obey noredzone.

I'm saying that on top of this I think we want something more general. Yes there's potential bugs lurking and incompatibilities, hence why I think a PSA is warranted, I think the discussion will be helpful.

reames marked 2 inline comments as done.May 9 2019, 12:21 PM
reames added inline comments.
lib/Target/X86/X86ISelLowering.cpp
25999 ↗(On Diff #188385)

Ah, gotcha. When I do the next rev, I'll make this change.

26255 ↗(On Diff #198698)

Ok, I think this is snowballing a bit. From a quick dig through the code, the existing redzone code is a mix of "have we relied on the redzone", and "are we allowed to use the redzone". I'm fine trying to do a bit of cleanup there, but I would *very* much like to separate that into a distinct set of patches.

One point, I don't agree with your intepretation of the noredzone attribute. The LangRef explicit says it only matters if the ABI would otherwise allow the use of the redzone.

Can we land this w/o using the redzone? Banging on top of stack isn't great, but it's a lot better than the mfence. If everyone is okay with that, I'll rev the patch to remove the offset for the moment.

jfb added inline comments.May 9 2019, 12:24 PM
lib/Target/X86/X86ISelLowering.cpp
26255 ↗(On Diff #198698)

Works for me (as long as it doesn't special-case any ABI), maybe leave a FIXME to use redzone and check the attribute, etc.

reames updated this revision to Diff 198917.May 9 2019, 3:13 PM

Remove redzone handling (to be done in a future change)

reames updated this revision to Diff 198919.May 9 2019, 3:16 PM
reames marked 2 inline comments as done.

Reverse a test change which was only needed by the redzone handling (which has been removed)

jfb accepted this revision.May 9 2019, 3:34 PM
This revision is now accepted and ready to land.May 9 2019, 3:34 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2019, 4:21 PM

We've seen crashes bisected to this patch with fatal error: error in backend: Cannot emit physreg copy instruction.

I've only seen it with SLH so far. Will try to reduce a testcase, but I think we'll likely need to revert this revision.

@sammccall, any chance you’ve seen it with an asserts build? I think the debug output might say which registers it was trying to copy.

@ctopper thanks for the hint! There is an assertion, hope it helps...

(Unfortunately after running with -emit-llvm I can't get opt to crash, so I'm stuck waiting for creduce)

clang-6.0: /usr/local/google/home/sammccall/src/llvm/include/llvm/CodeGen/TargetRegisterInfo.h:302: static unsigned int llvm::TargetRegisterInfo::virtReg2Index(unsigned int): Assertion `isVirtualRegister(Reg) && "Not a virtual register"' failed.
Stack dump:
0.      Program arguments: /usr/local/google/home/sammccall/llvmbuild/bin/clang-6.0 -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -disable-free -main-file-name crash-reduced.cpp -mrelocation-model static -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -target-feature +retpoline-indirect-calls -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -coverage-notes-file /dev/null.gcno -resource-dir /usr/local/google/home/sammccall/llvmbuild/lib/clang/9.0.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8.0.1/../../../../include/c++/8.0.1 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8.0.1/../../../../include/x86_64-linux-gnu/c++/8.0.1 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8.0.1/../../../../include/x86_64-linux-gnu/c++/8.0.1 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8.0.1/../../../../include/c++/8.0.1/backward -internal-isystem /usr/local/include -internal-isystem /usr/local/google/home/sammccall/llvmbuild/lib/clang/9.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -std=gnu++17 -fdeprecated-macro -fdebug-compilation-dir /usr/local/google/home/sammccall/llvmbuild -ferror-limit 19 -fmessage-length 174 -mspeculative-load-hardening -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -target-feature +sse4.2 -o /dev/null -x c++ /tmp/crash-reduced.cpp -faddrsig
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module '/tmp/crash-reduced.cpp'.
4.      Running pass 'X86 speculative load hardening' on function '@<snip>'
 #0 0x0000000003c7d2b9 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /usr/local/google/home/sammccall/src/llvm/lib/Support/Unix/Signals.inc:494:11
 #1 0x0000000003c7d469 PrintStackTraceSignalHandler(void*) /usr/local/google/home/sammccall/src/llvm/lib/Support/Unix/Signals.inc:558:1
 #2 0x0000000003c7bd76 llvm::sys::RunSignalHandlers() /usr/local/google/home/sammccall/src/llvm/lib/Support/Signals.cpp:67:5
 #3 0x0000000003c7daeb SignalHandler(int) /usr/local/google/home/sammccall/src/llvm/lib/Support/Unix/Signals.inc:357:1
 #4 0x00007f596a05e0c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x110c0)
 #5 0x00007f5968be8fcf raise (/lib/x86_64-linux-gnu/libc.so.6+0x32fcf)
 #6 0x00007f5968bea3fa abort (/lib/x86_64-linux-gnu/libc.so.6+0x343fa)
 #7 0x00007f5968be1e37 (/lib/x86_64-linux-gnu/libc.so.6+0x2be37)
 #8 0x00007f5968be1ee2 (/lib/x86_64-linux-gnu/libc.so.6+0x2bee2)
 #9 0x00000000013fd6d4 llvm::TargetRegisterInfo::virtReg2Index(unsigned int) /usr/local/google/home/sammccall/src/llvm/include/llvm/CodeGen/TargetRegisterInfo.h:303:12
#10 0x00000000013fd607 llvm::VirtReg2IndexFunctor::operator()(unsigned int) const /usr/local/google/home/sammccall/src/llvm/include/llvm/CodeGen/TargetRegisterInfo.h:1154:5
#11 0x00000000013fd549 llvm::IndexedMap<std::pair<llvm::PointerUnion<llvm::TargetRegisterClass const*, llvm::RegisterBank const*>, llvm::MachineOperand*>, llvm::VirtReg2IndexFunctor>::operator[](unsigned int) const /usr/local/google/home/sammccall/src/llvm/include/llvm/ADT/IndexedMap.h:51:7
#12 0x00000000013fdc09 llvm::MachineRegisterInfo::getRegClass(unsigned int) const /usr/local/google/home/sammccall/src/llvm/include/llvm/CodeGen/MachineRegisterInfo.h:627:5
#13 0x000000000285042d (anonymous namespace)::X86SpeculativeLoadHardeningPass::hardenLoadAddr(llvm::MachineInstr&, llvm::MachineOperand&, llvm::MachineOperand&, llvm::SmallDenseMap<unsigned int, unsigned int, 32u, llvm::DenseMapInfo<unsigned int>, llvm::detail::DenseMapPair<unsigned int, unsigned int> >&) /usr/local/google/home/sammccall/src/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp:2030:11
#14 0x000000000284ccf1 (anonymous namespace)::X86SpeculativeLoadHardeningPass::tracePredStateThroughBlocksAndHarden(llvm::MachineFunction&) /usr/local/google/home/sammccall/src/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp:1793:11
#15 0x000000000284902e (anonymous namespace)::X86SpeculativeLoadHardeningPass::runOnMachineFunction(llvm::MachineFunction&) /usr/local/google/home/sammccall/src/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp:549:30
#16 0x0000000002ef3527 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /usr/local/google/home/sammccall/src/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:8
#17 0x00000000033c18b9 llvm::FPPassManager::runOnFunction(llvm::Function&) /usr/local/google/home/sammccall/src/llvm/lib/IR/LegacyPassManager.cpp:1648:23
#18 0x00000000033c1cff llvm::FPPassManager::runOnModule(llvm::Module&) /usr/local/google/home/sammccall/src/llvm/lib/IR/LegacyPassManager.cpp:1685:16
#19 0x00000000033c2465 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /usr/local/google/home/sammccall/src/llvm/lib/IR/LegacyPassManager.cpp:1752:23
#20 0x00000000033c1fa5 llvm::legacy::PassManagerImpl::run(llvm::Module&) /usr/local/google/home/sammccall/src/llvm/lib/IR/LegacyPassManager.cpp:1865:16
#21 0x00000000033c29e1 llvm::legacy::PassManager::run(llvm::Module&) /usr/local/google/home/sammccall/src/llvm/lib/IR/LegacyPassManager.cpp:1896:3
#22 0x0000000003fafe68 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/CodeGen/BackendUtil.cpp:894:3
#23 0x0000000003fac4dc clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/CodeGen/BackendUtil.cpp:1466:5
#24 0x000000000509bc7d clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/CodeGen/CodeGenAction.cpp:300:7
#25 0x00000000067cd7fe clang::ParseAST(clang::Sema&, bool, bool) /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/Parse/ParseAST.cpp:178:12
#26 0x0000000004761552 clang::ASTFrontendAction::ExecuteAction() /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/Frontend/FrontendAction.cpp:1037:1
#27 0x000000000509935f clang::CodeGenAction::ExecuteAction() /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/CodeGen/CodeGenAction.cpp:1057:1
#28 0x0000000004760f70 clang::FrontendAction::Execute() /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/Frontend/FrontendAction.cpp:938:7
#29 0x00000000046f8ae0 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/Frontend/CompilerInstance.cpp:945:7
#30 0x00000000048e4c81 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:273:8
#31 0x000000000137d3c3 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /usr/local/google/home/sammccall/src/llvm/tools/clang/tools/driver/cc1_main.cpp:225:13
#32 0x00000000013711e1 ExecuteCC1Tool(llvm::ArrayRef<char const*>, llvm::StringRef) /usr/local/google/home/sammccall/src/llvm/tools/clang/tools/driver/driver.cpp:309:5
#33 0x0000000001370571 main /usr/local/google/home/sammccall/src/llvm/tools/clang/tools/driver/driver.cpp:381:5
#34 0x00007f5968bd62b1 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b1)
#35 0x000000000136fd6a _start (/usr/local/google/home/sammccall/llvmbuild/bin/clang-6.0+0x136fd6a)
clang-6: error: unable to execute command: Aborted
clang-6: error: clang frontend command failed due to signal (use -v to see invocation)

You’ll need to use llc to crash. But I can guess the issue now. SLH is expecting the memory address to have a virtual register as a base register, but we forced RSP on this access. We need to teach SLH about this case

@craig.topper doh, I'm a frontend person and somehow forgot about llc vs opt. I'll get a repro and attach it.

@reames Yes please, reverting until we can address this symptom would be really useful. We try to integrate daily (which is how we found this), and it gets much harder the longer a breakage lasts.

It reduces to this point with the same error:

Bugpoint can reduce it further, but to a segfault that I'm not sure is the same error (--append-exit-code doesn't seem to help me):

@sammccall, can you try this patch

diff --git a/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp b/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
index c8b740ca39e..ab8cb5250c9 100644
--- a/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
+++ b/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
@@ -1958,7 +1958,7 @@ void X86SpeculativeLoadHardeningPass::hardenLoadAddr(
 
   SmallVector<MachineOperand *, 2> HardenOpRegs;
 
-  if (BaseMO.isFI()) {
+  if (BaseMO.isFI() || BaseMO.getReg() == X86::RSP) {
     // A frame index is never a dynamically controllable load, so only
     // harden it if we're covering fixed address loads as well.
     LLVM_DEBUG(

@craig.topper Indeed that successfully compiles the problematic (non-reduced) file.

Whether the resulting binary works I should be able to tell you in an hour or so :-)

Yes, that fix works for us. Thanks!

It'd be great to get it landed or a revert of this patch soon if possible.

I've commited a modified version of my patch in r360475. I'll try to reduce a test case and commit that later tonight or tomorrow.