The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local.
Details
- Reviewers
craig.topper jfb - Commits
- rZORGa7e6d75cd491: [X86] Improve lowering of idemptotent RMW operations
rZORG115381d182c5: [X86] Improve lowering of idemptotent RMW operations
rGa7e6d75cd491: [X86] Improve lowering of idemptotent RMW operations
rG115381d182c5: [X86] Improve lowering of idemptotent RMW operations
rGbd588dfd5947: [X86] Improve lowering of idemptotent RMW operations
rL360393: [X86] Improve lowering of idemptotent RMW operations
Diff Detail
Event Timeline
Overall this LGTM, but that part of CodeGen isn't my area of expertise so it would be nice to get another person to chime in.
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
25984 | Windows 64 definitely doesn't have a red zone. I'm not sure if any 32-bit target does. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
25984 | Does my whitelist approach seem reasonable here? Or do you have a better suggestion? | |
25999 | For the 64 bit, we need a valid stack location. There's no guarantee that ESP will contain a valid address. (Since it's the 32 bit truncation of RSP and the stack may be outside the low 4 GB.) |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
25985 | I'd rather see something more precise here. We have a noredzone function attribute which should be honored: aarch64 has some handling for redzone too: So does X86FrameLowering.cpp, X86InstrInfo.cpp, and X86MachineFunctionInfo.h. Some other architectures to this too. I think you should do the same: if the attribute isn't present, redzone away. If some targets don't have a redzone yet don't set the attribute, they've already opened themselves to pain. Maybe we need a PSA on the mailing list to make sure these targets set noredzone on all their functions. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
25999 | Sorry. I meant can we use OR32mi8Locked in 64-bit mode, but still use RSP for the address ? The amount of stack space accessed isn't important right? We could read/write 4 bytes from the stack? I mainly ask because OR32mi8Locked is 1 byte shorter to encode than LOCK_OR64mi8 if none of the registers used force the use of a REX prefix. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
25985 | But clang doesn't currently do that does it? Wouldn't it also create a bitcode backwards compatibility issue? |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
25985 | The above heuristic must, at a minimum, be updated to obey noredzone. I'm saying that on top of this I think we want something more general. Yes there's potential bugs lurking and incompatibilities, hence why I think a PSA is warranted, I think the discussion will be helpful. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
25985 | Ok, I think this is snowballing a bit. From a quick dig through the code, the existing redzone code is a mix of "have we relied on the redzone", and "are we allowed to use the redzone". I'm fine trying to do a bit of cleanup there, but I would *very* much like to separate that into a distinct set of patches. One point, I don't agree with your intepretation of the noredzone attribute. The LangRef explicit says it only matters if the ABI would otherwise allow the use of the redzone. Can we land this w/o using the redzone? Banging on top of stack isn't great, but it's a lot better than the mfence. If everyone is okay with that, I'll rev the patch to remove the offset for the moment. | |
25999 | Ah, gotcha. When I do the next rev, I'll make this change. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
25985 | Works for me (as long as it doesn't special-case any ABI), maybe leave a FIXME to use redzone and check the attribute, etc. |
Reverse a test change which was only needed by the redzone handling (which has been removed)
We've seen crashes bisected to this patch with fatal error: error in backend: Cannot emit physreg copy instruction.
I've only seen it with SLH so far. Will try to reduce a testcase, but I think we'll likely need to revert this revision.
@sammccall, any chance you’ve seen it with an asserts build? I think the debug output might say which registers it was trying to copy.
@ctopper thanks for the hint! There is an assertion, hope it helps...
(Unfortunately after running with -emit-llvm I can't get opt to crash, so I'm stuck waiting for creduce)
clang-6.0: /usr/local/google/home/sammccall/src/llvm/include/llvm/CodeGen/TargetRegisterInfo.h:302: static unsigned int llvm::TargetRegisterInfo::virtReg2Index(unsigned int): Assertion `isVirtualRegister(Reg) && "Not a virtual register"' failed. Stack dump: 0. Program arguments: /usr/local/google/home/sammccall/llvmbuild/bin/clang-6.0 -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -disable-free -main-file-name crash-reduced.cpp -mrelocation-model static -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -target-feature +retpoline-indirect-calls -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -coverage-notes-file /dev/null.gcno -resource-dir /usr/local/google/home/sammccall/llvmbuild/lib/clang/9.0.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8.0.1/../../../../include/c++/8.0.1 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8.0.1/../../../../include/x86_64-linux-gnu/c++/8.0.1 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8.0.1/../../../../include/x86_64-linux-gnu/c++/8.0.1 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8.0.1/../../../../include/c++/8.0.1/backward -internal-isystem /usr/local/include -internal-isystem /usr/local/google/home/sammccall/llvmbuild/lib/clang/9.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -std=gnu++17 -fdeprecated-macro -fdebug-compilation-dir /usr/local/google/home/sammccall/llvmbuild -ferror-limit 19 -fmessage-length 174 -mspeculative-load-hardening -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -target-feature +sse4.2 -o /dev/null -x c++ /tmp/crash-reduced.cpp -faddrsig 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module '/tmp/crash-reduced.cpp'. 4. Running pass 'X86 speculative load hardening' on function '@<snip>' #0 0x0000000003c7d2b9 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /usr/local/google/home/sammccall/src/llvm/lib/Support/Unix/Signals.inc:494:11 #1 0x0000000003c7d469 PrintStackTraceSignalHandler(void*) /usr/local/google/home/sammccall/src/llvm/lib/Support/Unix/Signals.inc:558:1 #2 0x0000000003c7bd76 llvm::sys::RunSignalHandlers() /usr/local/google/home/sammccall/src/llvm/lib/Support/Signals.cpp:67:5 #3 0x0000000003c7daeb SignalHandler(int) /usr/local/google/home/sammccall/src/llvm/lib/Support/Unix/Signals.inc:357:1 #4 0x00007f596a05e0c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x110c0) #5 0x00007f5968be8fcf raise (/lib/x86_64-linux-gnu/libc.so.6+0x32fcf) #6 0x00007f5968bea3fa abort (/lib/x86_64-linux-gnu/libc.so.6+0x343fa) #7 0x00007f5968be1e37 (/lib/x86_64-linux-gnu/libc.so.6+0x2be37) #8 0x00007f5968be1ee2 (/lib/x86_64-linux-gnu/libc.so.6+0x2bee2) #9 0x00000000013fd6d4 llvm::TargetRegisterInfo::virtReg2Index(unsigned int) /usr/local/google/home/sammccall/src/llvm/include/llvm/CodeGen/TargetRegisterInfo.h:303:12 #10 0x00000000013fd607 llvm::VirtReg2IndexFunctor::operator()(unsigned int) const /usr/local/google/home/sammccall/src/llvm/include/llvm/CodeGen/TargetRegisterInfo.h:1154:5 #11 0x00000000013fd549 llvm::IndexedMap<std::pair<llvm::PointerUnion<llvm::TargetRegisterClass const*, llvm::RegisterBank const*>, llvm::MachineOperand*>, llvm::VirtReg2IndexFunctor>::operator[](unsigned int) const /usr/local/google/home/sammccall/src/llvm/include/llvm/ADT/IndexedMap.h:51:7 #12 0x00000000013fdc09 llvm::MachineRegisterInfo::getRegClass(unsigned int) const /usr/local/google/home/sammccall/src/llvm/include/llvm/CodeGen/MachineRegisterInfo.h:627:5 #13 0x000000000285042d (anonymous namespace)::X86SpeculativeLoadHardeningPass::hardenLoadAddr(llvm::MachineInstr&, llvm::MachineOperand&, llvm::MachineOperand&, llvm::SmallDenseMap<unsigned int, unsigned int, 32u, llvm::DenseMapInfo<unsigned int>, llvm::detail::DenseMapPair<unsigned int, unsigned int> >&) /usr/local/google/home/sammccall/src/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp:2030:11 #14 0x000000000284ccf1 (anonymous namespace)::X86SpeculativeLoadHardeningPass::tracePredStateThroughBlocksAndHarden(llvm::MachineFunction&) /usr/local/google/home/sammccall/src/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp:1793:11 #15 0x000000000284902e (anonymous namespace)::X86SpeculativeLoadHardeningPass::runOnMachineFunction(llvm::MachineFunction&) /usr/local/google/home/sammccall/src/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp:549:30 #16 0x0000000002ef3527 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /usr/local/google/home/sammccall/src/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:8 #17 0x00000000033c18b9 llvm::FPPassManager::runOnFunction(llvm::Function&) /usr/local/google/home/sammccall/src/llvm/lib/IR/LegacyPassManager.cpp:1648:23 #18 0x00000000033c1cff llvm::FPPassManager::runOnModule(llvm::Module&) /usr/local/google/home/sammccall/src/llvm/lib/IR/LegacyPassManager.cpp:1685:16 #19 0x00000000033c2465 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /usr/local/google/home/sammccall/src/llvm/lib/IR/LegacyPassManager.cpp:1752:23 #20 0x00000000033c1fa5 llvm::legacy::PassManagerImpl::run(llvm::Module&) /usr/local/google/home/sammccall/src/llvm/lib/IR/LegacyPassManager.cpp:1865:16 #21 0x00000000033c29e1 llvm::legacy::PassManager::run(llvm::Module&) /usr/local/google/home/sammccall/src/llvm/lib/IR/LegacyPassManager.cpp:1896:3 #22 0x0000000003fafe68 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/CodeGen/BackendUtil.cpp:894:3 #23 0x0000000003fac4dc clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/CodeGen/BackendUtil.cpp:1466:5 #24 0x000000000509bc7d clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/CodeGen/CodeGenAction.cpp:300:7 #25 0x00000000067cd7fe clang::ParseAST(clang::Sema&, bool, bool) /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/Parse/ParseAST.cpp:178:12 #26 0x0000000004761552 clang::ASTFrontendAction::ExecuteAction() /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/Frontend/FrontendAction.cpp:1037:1 #27 0x000000000509935f clang::CodeGenAction::ExecuteAction() /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/CodeGen/CodeGenAction.cpp:1057:1 #28 0x0000000004760f70 clang::FrontendAction::Execute() /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/Frontend/FrontendAction.cpp:938:7 #29 0x00000000046f8ae0 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/Frontend/CompilerInstance.cpp:945:7 #30 0x00000000048e4c81 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /usr/local/google/home/sammccall/src/llvm/tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:273:8 #31 0x000000000137d3c3 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /usr/local/google/home/sammccall/src/llvm/tools/clang/tools/driver/cc1_main.cpp:225:13 #32 0x00000000013711e1 ExecuteCC1Tool(llvm::ArrayRef<char const*>, llvm::StringRef) /usr/local/google/home/sammccall/src/llvm/tools/clang/tools/driver/driver.cpp:309:5 #33 0x0000000001370571 main /usr/local/google/home/sammccall/src/llvm/tools/clang/tools/driver/driver.cpp:381:5 #34 0x00007f5968bd62b1 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b1) #35 0x000000000136fd6a _start (/usr/local/google/home/sammccall/llvmbuild/bin/clang-6.0+0x136fd6a) clang-6: error: unable to execute command: Aborted clang-6: error: clang frontend command failed due to signal (use -v to see invocation)
You’ll need to use llc to crash. But I can guess the issue now. SLH is expecting the memory address to have a virtual register as a base register, but we forced RSP on this access. We need to teach SLH about this case
@craig.topper doh, I'm a frontend person and somehow forgot about llc vs opt. I'll get a repro and attach it.
@reames Yes please, reverting until we can address this symptom would be really useful. We try to integrate daily (which is how we found this), and it gets much harder the longer a breakage lasts.
It reduces to this point with the same error:
Bugpoint can reduce it further, but to a segfault that I'm not sure is the same error (--append-exit-code doesn't seem to help me):
@sammccall, can you try this patch
diff --git a/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp b/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp index c8b740ca39e..ab8cb5250c9 100644 --- a/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp +++ b/llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp @@ -1958,7 +1958,7 @@ void X86SpeculativeLoadHardeningPass::hardenLoadAddr( SmallVector<MachineOperand *, 2> HardenOpRegs; - if (BaseMO.isFI()) { + if (BaseMO.isFI() || BaseMO.getReg() == X86::RSP) { // A frame index is never a dynamically controllable load, so only // harden it if we're covering fixed address loads as well. LLVM_DEBUG(
@craig.topper Indeed that successfully compiles the problematic (non-reduced) file.
Whether the resulting binary works I should be able to tell you in an hour or so :-)
Yes, that fix works for us. Thanks!
It'd be great to get it landed or a revert of this patch soon if possible.
I've commited a modified version of my patch in r360475. I'll try to reduce a test case and commit that later tonight or tomorrow.
Should that be "auto *C"? I think we try to keep the * on pointers.