Page MenuHomePhabricator

[RISCV] Split the pseudo instruction splitting pass
ClosedPublic

Authored by luismarques on May 8 2020, 9:42 AM.

Details

Summary

This patch extracts the atomic pseudo-instructions' splitting from riscv-expand-pseudo / RISCVExpandPseudo into its own pass, riscv-expand-atomic-pseudo / RISCVExpandAtomicPseudo. This allows for the expansion of atomic operations to continue to happen late (the new pass is added in addPreEmitPass2, so those expansions continue to happen in the same place), while the remaining pseudo-instructions can now be expanded earlier and benefit from more optimization passes. The nonatomics pass is now added in addPreSched2.

Diff Detail

Event Timeline

luismarques created this revision.May 8 2020, 9:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 8 2020, 9:42 AM
asb accepted this revision.May 12 2020, 12:28 AM

Thanks, this looks good to me. PreSched2 is logically a better place for the standard pseudo expansions, though I'm not seeing any codegen changes at all with e.g. the GCC torture suite. But I'm happy to land this as-is.

This revision is now accepted and ready to land.May 12 2020, 12:28 AM

Macro-op fusion likely means that we'll always schedule the AUIPC with its consumer anyway, but in theory there could be pipelines that don't macro-op fuse and stall on RAW hazards, which could benefit from splitting the pair up.

jrtc27 added inline comments.May 12 2020, 12:35 AM
llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
171

Is this what clang-format enforces? I don't like this style, it makes the diffs needlessly messy when someone adds another pass here.

luismarques marked an inline comment as done.May 12 2020, 12:46 AM
luismarques added inline comments.
llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
171

It was what clang-format asked for, yes. But I suspect it only did so because it saw surrounding functions doing the same thing, and it went for consistency. I'm also not a big fan, but if we want to change that it probably should be a separate patch that changes them all.

Macro-op fusion likely means that we'll always schedule the AUIPC with its consumer anyway, but in theory there could be pipelines that don't macro-op fuse and stall on RAW hazards, which could benefit from splitting the pair up.

I have no idea how common macro-op fusion will be, and it strikes me that the RAW hazards will likely be worse on embedded cores where there is unlikely to be macro-op fusion anyway? This is why I feel it's better to get this pass as early as possible, even if we later schedule these instructions together.

My intention is eventually to get this pass as early in the machine code pipeline as possible. At the moment this is as early as it can go without modifying the auipc-introducing code, and some other machine code passes (Branch folding, at the very least).

This revision was automatically updated to reflect the committed changes.

This patch breaks compiling the Linux kernel:

$ make -skj"$(nproc)" ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- LLVM=1 LLVM_IAS=1 O=out/riscv distclean defconfig arch/riscv/kernel/cpu.o
clang: /home/nathan/cbl/github/tc-build/llvm-project/llvm/lib/CodeGen/MachineBasicBlock.cpp:65: llvm::MCSymbol *llvm::MachineBasicBlock::getSymbol() const: Assertion `getNumber() >= 0 && "cannot get label for unreachable MBB"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: clang -Wp,-MMD,arch/riscv/kernel/.cpu.o.d -nostdinc -isystem /home/nathan/cbl/github/tc-build/build/llvm/stage1/lib/clang/11.0.0/include -I/home/nathan/src/linux/arch/riscv/include -I./arch/riscv/include/generated -I/home/nathan/src/linux/include -I./include -I/home/nathan/src/linux/arch/riscv/include/uapi -I./arch/riscv/include/generated/uapi -I/home/nathan/src/linux/include/uapi -I./include/generated/uapi -include /home/nathan/src/linux/include/linux/kconfig.h -include /home/nathan/src/linux/include/linux/compiler_types.h -D__KERNEL__ -Qunused-arguments -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Wno-format-security -std=gnu89 --target=riscv64-linux-gnu --prefix= --gcc-toolchain=/ -Werror=unknown-warning-option -mabi=lp64 -march=rv64imac -mno-save-restore -DCONFIG_PAGE_OFFSET=0xffffffe000000000 -mcmodel=medany -fno-delete-null-pointer-checks -Wno-frame-address -Wno-address-of-packed-member -O2 -Wframe-larger-than=2048 -fno-stack-protector -Wno-format-invalid-specifier -Wno-gnu -mno-global-merge -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-array-bounds -fno-strict-overflow -fno-merge-all-constants -fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types -fmacro-prefix-map=/home/nathan/src/linux/= -Wno-initializer-overrides -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-pointer-to-enum-cast -Wno-tautological-constant-out-of-range-compare -I /home/nathan/src/linux/arch/riscv/kernel -I ./arch/riscv/kernel -DKBUILD_MODFILE="arch/riscv/kernel/cpu" -DKBUILD_BASENAME="cpu" -DKBUILD_MODNAME="cpu" -c -o arch/riscv/kernel/cpu.o /home/nathan/src/linux/arch/riscv/kernel/cpu.c 
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module '/home/nathan/src/linux/arch/riscv/kernel/cpu.c'.
4.	Running pass 'RISCV Assembly Printer' on function '@riscv_of_processor_hartid'
 #0 0x000000000275b2b4 PrintStackTraceSignalHandler(void*) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x275b2b4)
 #1 0x0000000002758f6e llvm::sys::RunSignalHandlers() (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x2758f6e)
 #2 0x000000000275a53d llvm::sys::CleanupOnSignal(unsigned long) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x275a53d)
 #3 0x00000000026eef63 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x26eef63)
 #4 0x00000000026ef09c CrashRecoverySignalHandler(int) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x26ef09c)
 #5 0x00007ff64bb933c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x153c0)
 #6 0x00007ff64b65818b raise (/lib/x86_64-linux-gnu/libc.so.6+0x4618b)
 #7 0x00007ff64b637859 abort (/lib/x86_64-linux-gnu/libc.so.6+0x25859)
 #8 0x00007ff64b637729 (/lib/x86_64-linux-gnu/libc.so.6+0x25729)
 #9 0x00007ff64b648f36 (/lib/x86_64-linux-gnu/libc.so.6+0x36f36)
#10 0x0000000001ccfd29 llvm::MachineBasicBlock::getSymbol() const (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x1ccfd29)
#11 0x00000000016ee4c7 llvm::LowerRISCVMachineOperandToMCOperand(llvm::MachineOperand const&, llvm::MCOperand&, llvm::AsmPrinter const&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x16ee4c7)
#12 0x00000000016ee886 llvm::LowerRISCVMachineInstrToMCInst(llvm::MachineInstr const*, llvm::MCInst&, llvm::AsmPrinter const&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x16ee886)
#13 0x00000000016ea68a (anonymous namespace)::RISCVAsmPrinter::emitInstruction(llvm::MachineInstr const*) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x16ea68a)
#14 0x00000000032d280d llvm::AsmPrinter::emitFunctionBody() (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x32d280d)
#15 0x00000000016ea3c3 (anonymous namespace)::RISCVAsmPrinter::runOnMachineFunction(llvm::MachineFunction&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x16ea3c3)
#16 0x0000000001d1061d llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x1d1061d)
#17 0x0000000002112a04 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x2112a04)
#18 0x0000000002112ce8 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x2112ce8)
#19 0x0000000002113414 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x2113414)
#20 0x000000000295f615 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x295f615)
#21 0x0000000003146086 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x3146086)
#22 0x00000000037f3223 clang::ParseAST(clang::Sema&, bool, bool) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x37f3223)
#23 0x00000000030aac33 clang::FrontendAction::Execute() (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x30aac33)
#24 0x000000000300aed3 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x300aed3)
#25 0x00000000031407b3 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x31407b3)
#26 0x00000000016dfa3b cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x16dfa3b)
#27 0x00000000016ddbac ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x16ddbac)
#28 0x0000000002ed6a52 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const::$_1>(long) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x2ed6a52)
#29 0x00000000026eee77 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x26eee77)
#30 0x0000000002ed5f0d clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x2ed5f0d)
#31 0x0000000002ea396b clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x2ea396b)
#32 0x0000000002ea3d57 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) const (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x2ea3d57)
#33 0x0000000002ebb0a8 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x2ebb0a8)
#34 0x00000000016dd553 main (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x16dd553)
#35 0x00007ff64b6390b3 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b3)
#36 0x00000000016da95e _start (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-11+0x16da95e)
clang-11: error: clang frontend command failed due to signal (use -v to see invocation)
ClangBuiltLinux clang version 11.0.0 (https://github.com/llvm/llvm-project 73377c45974855a00b13974cd515e875c5605556)
Target: riscv64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/nathan/cbl/github/tc-build/build/llvm/stage1/bin
clang-11: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-11: note: diagnostic msg: /tmp/cpu-2dc70a.c
clang-11: note: diagnostic msg: /tmp/cpu-2dc70a.sh
clang-11: note: diagnostic msg: 

********************
...

cvise spits out:

$ cat cpu.i
*a;
b;
__attribute__((__cold__)) c();
e() {
  int d;
  if (f(&d)) {
    c("", d);
    return -19;
  }
  if (g()) {
    c("CPU with hartid=%d has no \"riscv,isa\" property\n", d);
    return -19;
  }
  if (b != 'r' || a[1])
    return -19;
  return d;
}

$ clang -O2 -mcmodel=medany --target=riscv64-linux-gnu -O2 -c -o /dev/null cpu.i |& grep "cannot get label for unreachable MBB"
clang: /home/nathan/cbl/github/tc-build/llvm-project/llvm/lib/CodeGen/MachineBasicBlock.cpp:65: llvm::MCSymbol *llvm::MachineBasicBlock::getSymbol() const: Assertion `getNumber() >= 0 && "cannot get label for unreachable MBB"' failed.

Full interestingness test and original preprocessed file available here: https://github.com/nathanchance/creduce-files/tree/d0b985827d38bfbacb7a5a26bb636c1aa14cc1dd/D79635

This patch breaks compiling the Linux kernel:

Thanks for the detailed bug report. I have addressed this by temporarily reverting the part of this patch that wasn't NFC. This should solve the problem for now, if not please let me know.

rogfer01 added a comment.EditedJul 1 2020, 8:20 AM

This patch breaks compiling the Linux kernel:

Thanks for the detailed bug report. I have addressed this by temporarily reverting the part of this patch that wasn't NFC. This should solve the problem for now, if not please let me know.

I took a quick look and it seems that the pass "Branch Probability Basic Block Placement" (lib/CodeGen/MachineBlockPlacement.cpp) reorders the basic blocks.

Unfortunately some of them are referenced in pc-relative addressing. In RISC-V such addressing uses two instructions and the second one has a label to the first one (so the linker can chain to the first instruction, I assume).

After the pass we end with a wrong machine operand (the %bb.-1 is wrong)

$x10 = ADDI $x10, target-flags(riscv-pcrel-lo) %bb.-1

I'm not sure why this pass does not take into account such operands (or one of the analyses it uses to do the transformation), but expanding those things later seems to hide/prevent the issue. Perhaps the new basic block created in RISCVExpandPseudo::expandAuipcInstPair needs to be marked as setHasAddressTaken or something (but I have a vague recollection that this had some other consequences, I might be wrong here).

I'm not sure why this pass does not take into account such operands (or one of the analyses it uses to do the transformation), but expanding those things later seems to hide/prevent the issue. Perhaps the new basic block created in RISCVExpandPseudo::expandAuipcInstPair needs to be marked as setHasAddressTaken or something (but I have a vague recollection that this had some other consequences, I might be wrong here).

Thanks for the nice analysis Roger. I had also traced the compilation issue to the %bb.-1 and in discussion with Sam he had also referred to that API you mention. Sam was saying that (as far as he could remember from the top of his head) that we have labels that point to places that might not be the start of an IR BB, so they didn't always have LLVM block names, and this had been previously been addressed by one of Lewis' patches, adding checks like !MBB->hasAddressTaken() && !MBB->hasLabelMustBeEmitted(), but that these checks might not have been added in all places where they were needed, or something along those lines. I'll will delve into this when possible (but if someone wants to have a shot at this let me know).

Sam was saying that (as far as he could remember from the top of his head) that we have labels that point to places that might not be the start of an IR BB, so they didn't always have LLVM block names, and this had been previously been addressed by one of Lewis' patches, adding checks like !MBB->hasAddressTaken() && !MBB->hasLabelMustBeEmitted(), but that these checks might not have been added in all places where they were needed, or something along those lines. I'll will delve into this when possible (but if someone wants to have a shot at this let me know).

Should these blocks just be manually split, that way you don't have to hack around tracking that an arbitrary instruction in the stream is a branch target?

lenary added a comment.Jul 1 2020, 9:50 AM

@luismarques yep, you've got my analysis correct.

I think, last time I looked at it, MachineBasicBlocks had some weirdness associated with hasAddressTaken, as @rogfer01 points out. They used to require the respective IR BasicBlock also had its address taken, as is not the case with these labels. Looking at llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:3030 this is no longer the case, but git-blame tells me that hasn't changed for years, so I must be misremembering something on re-investigation.

The broader point is also correct though: lots of passes which check if a MBB has its address taken, don't also check if the label must be emitted, which this code "assumes". Updating these passes isn't always going to be easy.

I think for the moment the NFC is to keep these passes together (as @luismarques has done), and later we can choose a different approach (and/that lets us move the non-atomic pseudo expansion earlier). I think MachineInstr can now have associated labels that don't require the basic block to be split, as is going on here. There are notes that API is not yet entirely complete, though.

lenary added a comment.Jul 1 2020, 9:53 AM

Should these blocks just be manually split, that way you don't have to hack around tracking that an arbitrary instruction in the stream is a branch target?

These blocks *are* manually split. The problem is later passes don't check hasLabelMustBeEmitted which can cause them to get re-merged (say if the prior block is empty), at which point the label becomes invalid.

The broader point is also correct though: lots of passes which check if a MBB has its address taken, don't also check if the label must be emitted, which this code "assumes". Updating these passes isn't always going to be easy.

I got the impression from Lewis' comments in D54143 that using LabelMustBeEmitted instead of AddressTaken was mostly to avoid the segfault issues in the AsmPrinter, so I wonder if we can just use both (or some other solution that moves more toward using AddressTaken only), instead of trying to fix all of the passes that aren't aware of LabelMustBeEmitted. Regarding the performance costs of AddressTaken, we can compare the cost of that vs the benefit of moving the expansions earlier.