Page MenuHomePhabricator

Exception support for basic block sections
Needs ReviewPublic

Authored by rahmanl on Jan 30 2020, 12:57 PM.

Details

Reviewers
skan
modimo
wenlei
Summary

This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf

This patch provides exception support for basic block sections by splitting the call-site table into call-site ranges corresponding to different basic block sections. Still all landing pads must reside in the same basic block section (which is guaranteed by the the core basic block section patch D73674 (ExceptionSection) ). Each call-site table will refer to the landing pad fragment by explicitly specifying @LPstart (which is omitted in the normal non-basic-block section case). All these call-site tables will share their action and type tables.

The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. In the case of basic block section where one section contains all the landing pads, the landing pad offset relative to LPStart could actually be zero. Thus, we avoid zero-offset landing pads by inserting a nop operation as the first non-CFI instruction in the exception section.

Background on Exception Handling in C++ ABI
https://github.com/itanium-cxx-abi/cxx-abi/blob/master/exceptions.pdf

Compiler emits an exception table for every function. When an exception is thrown, the stack unwinding library queries the unwind table (which includes the start and end of each function) to locate the exception table for that function.

The exception table includes a call site table for the function, which is used to guide the exception handling runtime to take the appropriate action upon an exception. Each call site record in this table is structured as follows:

CallSite--> Position of the call site (relative to the function entry)
CallSite length--> Length of the call site.
Landing Pad--> Position of the landing pad (relative to the landing pad fragment’s begin)
Action record offset--> Position of the first action record

The call site records partition a function into different pieces and describe what action must be taken for each callsite. The callsite fields are relative with respect to the start of the function (as captured in the unwind table).

The landing pad is a reference into the function and corresponds roughly to the catch block of a try/catch statement. When execution resumes at a landing pad, it receives an exception structure and a selector value corresponding to the type of the exception thrown, and executes similar to a switch-case statement. The landing pad field is relative to the beginning of the procedure fragment which includes all the landing pads (@LPStart). The C++ ABI requires all landing pads to be in the same fragment. Nonetheless, without basic block sections, @LPStart is the same as the function @Start (found in the unwind table) and can be omitted.

The action record offset is an index into the action table which includes information about which exception types are caught.

C++ Exceptions with Basic Block Sections
Basic block sections break the contiguity of a function fragment. Therefore, call sites must be specified relative to the beginning of the basic block section. On the other hand, the unwinding library should be able to find the corresponding callsites for each section. To do so, the .cfi_lsda directive for a section must point to the call site table for that section.

Conceptually, we emit the call site tables for sections sequentially in the exception table as if each section has its own exception table. In the example below, two sections result in the two call site tables (denoted by LSDA1 and LSDA2) placed next to each other. However, their callsite records will refer to records in the shared Action Table. We also emit the header fields (@LPStart and CallSite Table Length) for each call site table in order to place the call site tables in individual LSDAs.
Since every call site table has a single @LPStart pointer, the landing pads for all callsites in the corresponding section must reside in one section (not necessarily the same section). To make this easier, we decide to place all landing pads of the function in one section.

@LPStart---> Landing pad fragment ( LSDA1 points here)
CallSite Table Length---> Used to find the action table.
CallSites
@LPStart---> Landing pad fragment ( LSDA2 points here)
CallSite Table Length
CallSites


Action Table
Types Table

Diff Detail

Event Timeline

rahmanl created this revision.Jan 30 2020, 12:57 PM
rahmanl edited the summary of this revision. (Show Details)Jan 30 2020, 1:01 PM

Pseudo-randomly picking a few other possible reviewers - folks, if you know more authoritative/appropriate reviewers, do please delegate

llvm/include/llvm/CodeGen/AsmPrinter.h
247–250
rahmanl updated this revision to Diff 241586.Jan 30 2020, 2:47 PM

Remove the redundant else after return + sync in the changes in the parent patch.

rahmanl marked an inline comment as done.Jan 30 2020, 2:48 PM
rahmanl updated this revision to Diff 278274.Jul 15 2020, 12:05 PM
rahmanl edited the summary of this revision. (Show Details)

This syncs and updates the patch with upstream and propeller changes.

rahmanl updated this revision to Diff 279442.Jul 21 2020, 1:54 AM
rahmanl edited the summary of this revision. (Show Details)
rahmanl edited the summary of this revision. (Show Details)Jul 21 2020, 1:59 AM
rahmanl updated this revision to Diff 279630.Jul 21 2020, 2:16 PM

This update does some refactoring on the CallSiteRange struct and how it is initialized.

rahmanl updated this revision to Diff 283023.Aug 4 2020, 1:58 PM
rahmanl added a subscriber: amharc.

Fix exception handling on PIC targets (Credit to @amharc for implementing the fix).

The idea is to use pc-relative encoding for LPStart in PIC mode, as absptr is
not possible.
The existing tests are updated to exercise the PIC mode.

rahmanl updated this revision to Diff 283129.Aug 4 2020, 10:55 PM

Fix case style.

rahmanl updated this revision to Diff 286411.Aug 18 2020, 3:20 PM

The previous diff was created incorrectly. This update fixes it.

modimo added a subscriber: modimo.Fri, Aug 28, 4:25 PM
modimo added a comment.Tue, Sep 1, 4:31 PM

I'm rebased against 478eb98cd25cb0ebc01fc2c3889ae94d3f1797d3 and I'm seeing both added tests failing. They may need to be updated.

hoy added a subscriber: hoy.Wed, Sep 2, 2:52 PM
modimo added a comment.Wed, Sep 2, 3:06 PM

I think with this alongside the recently committed MachineFunctionSplitter (MFS) the MFS phase is a good place to split out EH landing pads in both profile and non-profile code. A small change in MFS will enable EH LP splits in profiling given they should always be cold. I think in non-profile mode a sub-flag can be added like -mfs-split-LP-only where EH LPs are always split out. Enabling a general static-analysis splitting could also be interesting but outside of EH LPs I'm not certain there's guaranteed gains in other scenarios.

I think with this alongside the recently committed MachineFunctionSplitter (MFS) the MFS phase is a good place to split out EH landing pads in both profile and non-profile code. A small change in MFS will enable EH LP splits in profiling given they should always be cold. I think in non-profile mode a sub-flag can be added like -mfs-split-LP-only where EH LPs are always split out. Enabling a general static-analysis splitting could also be interesting but outside of EH LPs I'm not certain there's guaranteed gains in other scenarios.

Yes @modimo .You're right that with a minimal change MachineFunctionSplitter can simply place all the EH pads into the cold section. I also see your point about splitting EH pads in non-profiled code . I believe MBFI assigns static frequencies in case profiles are missing and that can be used by MachineFunctionSplitter (@snehasish to confirm?).

wenlei added a subscriber: wenlei.Thu, Sep 3, 4:18 PM

Once this patch is in we can look into splitting ehpads out though I'm more inclined to enhance the static profile count mechanism to account for ehpads appropriately rather than adding a new flag to MFS. In general, it would be great to have MFS operate with only the knowledge of profiles counts rather than special-casing for particular types of blocks. I tried to test static count driven ehpad splitting on mysql but applying the patch to ToT fails (AsmPrinter.cpp needs to be rebased).

modimo added a comment.Wed, Sep 9, 1:07 PM

Once this patch is in we can look into splitting ehpads out though I'm more inclined to enhance the static profile count mechanism to account for ehpads appropriately rather than adding a new flag to MFS.

On this front, is there a specific reviewer that we're waiting for here to get this change through?

In general, it would be great to have MFS operate with only the knowledge of profiles counts rather than special-casing for particular types of blocks.

Splitting out EH pads in my experience is always a net gain or at least neutral in all aspects of code size/performance. I expect static and dynamic counts in general will require rounds of tuning and will still hit pathological cases on certain scenarios. Taken together enabling a scenario where only EH pads are split out is valuable since this can safely be enabled as a default optimization behavior. As a reference point MSVC does by default split out the its EH pad equivalents into cold code as part of its general -O2 optimizations.

wenlei added a comment.Wed, Sep 9, 1:46 PM

I expect static and dynamic counts in general will require rounds of tuning and will still hit pathological cases on certain scenarios.

Agreed. Specifically, even if MFS doesn't work well with sample PGO yet (could lead to perf regression due to profile quality issues?), we'd still like to have a way to enable just eh pad splitting for sample PGO. I can see that having everything driven by static or dynamic profile is cleaner, but practically the extra flexibility can be very useful.

rahmanl edited reviewers, added: modimo, wenlei; removed: majnemer, davidxl, MaskRay, jyknight.Wed, Sep 9, 2:18 PM
rahmanl added subscribers: davidxl, MaskRay.

Once this patch is in we can look into splitting ehpads out though I'm more inclined to enhance the static profile count mechanism to account for ehpads appropriately rather than adding a new flag to MFS.

On this front, is there a specific reviewer that we're waiting for here to get this change through?

Unfortunately, I haven't found a willing reviewer for this patch. Would you be so kind to review it?

Please see https://llvm.org/docs/GettingStarted.html#sending-patches . You'll need to upload the diff with full context.

llvm/test/CodeGen/X86/gcc_except_table_bb_sections.ll
13

personality/lsda encodings are difficult to remember. Worth a comment like DW_EH_PE_pcrel | DW_EH_PE_indirect | DW_EH_PE_sdata4

16

This is not sufficient. You need to check the next instruction to make sure things don't go off

20

If you use main.1 main.2 etc, you will waste a lot of space in .strtab. Better using a unary encoding or a temporary label (.L in ELF)

MaskRay added inline comments.Wed, Sep 9, 2:59 PM
llvm/lib/CodeGen/AsmPrinter/EHStreamer.h
73

The canonical name has been changed to basic block sections (not "BB section") if I remember correctly.
You can just write -fbasic-block-sections= to make readers know this is related to -fbasic-block-sections=

modimo added inline comments.Wed, Sep 9, 3:26 PM
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1133–1134

Is there a specific case that this check is necessary for or will the previous 2 suffice?

llvm/lib/CodeGen/AsmPrinter/EHStreamer.cpp
636–653

This and the its exact copy above (in if (IsSJLJ || IsWasm) {) IMO should be combined into a function/lambda to ensure they stay in sync.

llvm/lib/CodeGen/AsmPrinter/EHStreamer.h
73

This structure is also used for all cases now not just basic block sections so is the annotation necessary?

Taking a closer look at the test (haven't delved into the code yet): the PIC vs non-PIC difference of .gcc_exception_table looks strange.
.gcc_except_table (please also test its section flags "a") has an absolute relocation referencing main.2. This is not good. In all targets (except RISC-V -mno-relax), .gcc_except_table is a relocation-free section.

I think you should just use PIC handling for -fno-PIC.

modimo added a comment.Wed, Sep 9, 3:37 PM

Taking a closer look at the test (haven't delved into the code yet): the PIC vs non-PIC difference of .gcc_exception_table looks strange.
.gcc_except_table (please also test its section flags "a") has an absolute relocation referencing main.2. This is not good. In all targets (except RISC-V -mno-relax), .gcc_except_table is a relocation-free section.

By relocation-free section do you mean all relocations or absolute relocations? This needs some type of relocation in .gcc_except_table to indicate that the landing pads are not relative to the parent function since they have been moved out.

modimo added a comment.Wed, Sep 9, 3:55 PM

These changes will also apply to other Itanium C++ ABI targets (arm32/arm64/RISCV etc.) so adding testing for at least another target is good. Trying this out for arm64 hits an ICE here:

case TargetOpcode::EH_LABEL:
  if (MBB.isBeginSection() && MBB.isEHPad() &&
      ((*std::prev(MI.getIterator())).getOpcode() ==
       TargetOpcode::CFI_INSTRUCTION)) {

which is worth following up on.

rahmanl updated this revision to Diff 291092.Thu, Sep 10, 2:54 PM
  • Rebase.
  • Override X86InstrInfo::InsertNoops and use it in BasicBlockSections.cpp to emit a NOP before the first EH pad.
  • Create a lambda function for duplicated code in EHStreamer.cpp.
  • Change some references to const.
modimo added inline comments.Thu, Sep 10, 5:33 PM
llvm/lib/CodeGen/AsmPrinter/EHStreamer.h
113

What's the reason for using Optional here? Does checking for size == 0 suffice?

llvm/lib/CodeGen/AsmPrinter/WasmException.h
35

SmallVectorImpl for consistency.

llvm/lib/CodeGen/BasicBlockSections.cpp
298

Using getNoop this should look something like:

MCInst Noop;
MF.getSubtarget().getInstrInfo()->getNoop(Noop);
BuildMI(MBB, MI, DL, MF.getSubtarget().getInstrInfo()->get(Noop.getOpcode()));

I would also create a function to encapsulate this. Logically I think this fits here since you want to do this after MF.assignBeginEndSections(); but before updateBranches(MF, PreLayoutFallThroughs); however the naming of sortBasicBlocksAndUpdateBranches is now not accurate. Personally I think a rename to finalizeBasicBlocks would work but I'll let @snehasish chime in here.

llvm/lib/Target/X86/X86InstrInfo.cpp
90–91 ↗(On Diff #291092)

Use the generic TargetInstrInfo.h: virtual void getNoop(MCInst &NopInst) const;
Here so that this is extensible to other architectures.

rahmanl updated this revision to Diff 291141.Thu, Sep 10, 11:39 PM
  • Do not use Optional for the CallSiteRanges parameter.
  • Add the new function llvm::avoidZeroOffsetLandingPad(MachineFunction&) and expose it in BasicBlockSectionUtils.h.
  • Use getNoop(MCInst&) instead of insertNoop() to insert NOP.
rahmanl updated this revision to Diff 291142.Thu, Sep 10, 11:43 PM
  • clang-format.
rahmanl marked 10 inline comments as done and 2 inline comments as done.Fri, Sep 11, 12:13 AM

Taking a closer look at the test (haven't delved into the code yet): the PIC vs non-PIC difference of .gcc_exception_table looks strange.
.gcc_except_table (please also test its section flags "a") has an absolute relocation referencing main.2. This is not good. In all targets (except RISC-V -mno-relax), .gcc_except_table is a relocation-free section.

I think you should just use PIC handling for -fno-PIC.

Thanks for the closer look. I included the section flags in the test.
Actually EHStreamer::emitTypeInfos already uses absolute and relative address (via calling getTTypeGlobalReference). This can be observed in the test:

; CHECK-NON-PIC-NEXT: .long _ZTIi # TypeInfo 1
; CHECK-PIC-NEXT: [[DOT:\.Ltmp[0-9]+]]:
; CHECK-PIC-NEXT: .quad .L_ZTIi.DW.stub-[[DOT]]

llvm/lib/CodeGen/AsmPrinter/EHStreamer.h
113

Yes. It doesn't look great with Optional. I reverted the change.

llvm/lib/CodeGen/BasicBlockSections.cpp
298

Great suggestion.
I added a new function avoidZeroOffsetLandingPad which can also be used in MachineFunctionSpliter.

llvm/test/CodeGen/X86/gcc_except_table_bb_sections.ll
13

You're right, but the LSDA encoding is emitted by MCStreamer::emitCFILsda and I don't see any comment functionality there.

16

Added instruction between the two labels. After Ltmp1 comes # %bb.1 which is not really helpful.

20

Your observation is correct. However, this patch does not touch that functionality (implemented in MachineBasicBlock::getSymbol()).

@rahmanl Have you had a chance to run your added test on ARM64 as another itanium C++ ABI target to make sure it works properly? Having testing there is fine as a follow-up but wanted to make sure we're in agreement here. Otherwise changes LGTM, please let @MaskRay have a chance to provide any additional feedback.

rahmanl marked 5 inline comments as done.Wed, Sep 16, 11:25 PM

@rahmanl Have you had a chance to run your added test on ARM64 as another itanium C++ ABI target to make sure it works properly? Having testing there is fine as a follow-up but wanted to make sure we're in agreement here. Otherwise changes LGTM, please let @MaskRay have a chance to provide any additional feedback.

Thanks @modimo.
This patch relies on basic-block-sections support which is currently only tested on X86 targets (even though it might work on other targets too).
Although I can definitely work with you to diagnose any potential issues on ARM64, I suggest we don't write tests for this patch before the base feature is well-tested.
(On related note, a recent commit (rG157cd93b48a9) limited -fbasic-block-sections to X86.).

Although I can definitely work with you to diagnose any potential issues on ARM64, I suggest we don't write tests for this patch before the base feature is well-tested.
(On related note, a recent commit (rG157cd93b48a9) limited -fbasic-block-sections to X86.).

Great, that exact change was what I was going to suggest if tests were not added. Thanks for the link!