Page MenuHomePhabricator

Exception support for basic block sections
ClosedPublic

Authored by rahmanl on Jan 30 2020, 12:57 PM.

Details

Summary

This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf

This patch provides exception support for basic block sections by splitting the call-site table into call-site ranges corresponding to different basic block sections. Still all landing pads must reside in the same basic block section (which is guaranteed by the the core basic block section patch D73674 (ExceptionSection) ). Each call-site table will refer to the landing pad fragment by explicitly specifying @LPstart (which is omitted in the normal non-basic-block section case). All these call-site tables will share their action and type tables.

The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. In the case of basic block section where one section contains all the landing pads, the landing pad offset relative to LPStart could actually be zero. Thus, we avoid zero-offset landing pads by inserting a nop operation as the first non-CFI instruction in the exception section.

Background on Exception Handling in C++ ABI
https://github.com/itanium-cxx-abi/cxx-abi/blob/master/exceptions.pdf

Compiler emits an exception table for every function. When an exception is thrown, the stack unwinding library queries the unwind table (which includes the start and end of each function) to locate the exception table for that function.

The exception table includes a call site table for the function, which is used to guide the exception handling runtime to take the appropriate action upon an exception. Each call site record in this table is structured as follows:

CallSite--> Position of the call site (relative to the function entry)
CallSite length--> Length of the call site.
Landing Pad--> Position of the landing pad (relative to the landing pad fragment’s begin label)
Action record offset--> Position of the first action record

The call site records partition a function into different pieces and describe what action must be taken for each callsite. The callsite fields are relative to the start of the function (as captured in the unwind table).

The landing pad entry is a reference into the function and corresponds roughly to the catch block of a try/catch statement. When execution resumes at a landing pad, it receives an exception structure and a selector value corresponding to the type of the exception thrown, and executes similar to a switch-case statement. The landing pad field is relative to the beginning of the procedure fragment which includes all the landing pads (@LPStart). The C++ ABI requires all landing pads to be in the same fragment. Nonetheless, without basic block sections, @LPStart is the same as the function @Start (found in the unwind table) and can be omitted.

The action record offset is an index into the action table which includes information about which exception types are caught.

C++ Exceptions with Basic Block Sections
Basic block sections break the contiguity of a function fragment. Therefore, call sites must be specified relative to the beginning of the basic block section. Furthermore, the unwinding library should be able to find the corresponding callsites for each section. To do so, the .cfi_lsda directive for a section must point to the range of call-sites for that section.
This patch introduces a new CallSiteRange structure which specifies the range of call-sites which correspond to every section:

`struct CallSiteRange {
  // Symbol marking the beginning of the precedure fragment.
  MCSymbol *FragmentBeginLabel = nullptr;
  // Symbol marking the end of the procedure fragment.
  MCSymbol *FragmentEndLabel = nullptr;
  // LSDA symbol for this call-site range.
  MCSymbol *ExceptionLabel = nullptr;
  // Index of the first call-site entry in the call-site table which
  // belongs to this range.
  size_t CallSiteBeginIdx = 0;
  // Index just after the last call-site entry in the call-site table which
  // belongs to this range.
  size_t CallSiteEndIdx = 0;
  // Whether this is the call-site range containing all the landing pads.
  bool IsLPRange = false;
};`

With N basic-block-sections, the call-site table is partitioned into N call-site ranges.

Conceptually, we emit the call-site ranges for sections sequentially in the exception table as if each section has its own exception table. In the example below, two sections result in the two call site ranges (denoted by LSDA1 and LSDA2) placed next to each other. However, their call-sites will refer to records in the shared Action Table. We also emit the header fields (@LPStart and CallSite Table Length) for each call site range in order to place the call site ranges in separate LSDAs. We note that with -basic-block-sections, The CallSiteTableLength will not actually represent the length of the call site table, but rather the reference to the action table. Since the only purpose of this field is to locate the action table, correctness is guaranteed.

Finally, every call site range has one @LPStart pointer so the landing pads of each section must all reside in one section (not necessarily the same section). To make this easier, we decide to place all landing pads of the function in one section (hence the IsLPRange field in CallSiteRange).

@LPStart---> Landing pad fragment ( LSDA1 points here)
CallSite Table Length---> Used to find the action table.
CallSites
@LPStart---> Landing pad fragment ( LSDA2 points here)
CallSite Table Length
CallSites


Action Table
Types Table

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
MaskRay added inline comments.Sep 9 2020, 2:57 PM
llvm/test/CodeGen/X86/gcc_except_table_bb_sections.ll
13

personality/lsda encodings are difficult to remember. Worth a comment like DW_EH_PE_pcrel | DW_EH_PE_indirect | DW_EH_PE_sdata4

16

This is not sufficient. You need to check the next instruction to make sure things don't go off

MaskRay added inline comments.Sep 9 2020, 2:59 PM
llvm/lib/CodeGen/AsmPrinter/EHStreamer.h
73

The canonical name has been changed to basic block sections (not "BB section") if I remember correctly.
You can just write -fbasic-block-sections= to make readers know this is related to -fbasic-block-sections=

modimo added inline comments.Sep 9 2020, 3:26 PM
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1133–1134

Is there a specific case that this check is necessary for or will the previous 2 suffice?

llvm/lib/CodeGen/AsmPrinter/EHStreamer.cpp
636–653

This and the its exact copy above (in if (IsSJLJ || IsWasm) {) IMO should be combined into a function/lambda to ensure they stay in sync.

llvm/lib/CodeGen/AsmPrinter/EHStreamer.h
73

This structure is also used for all cases now not just basic block sections so is the annotation necessary?

Taking a closer look at the test (haven't delved into the code yet): the PIC vs non-PIC difference of .gcc_exception_table looks strange.
.gcc_except_table (please also test its section flags "a") has an absolute relocation referencing main.2. This is not good. In all targets (except RISC-V -mno-relax), .gcc_except_table is a relocation-free section.

I think you should just use PIC handling for -fno-PIC.

modimo added a comment.Sep 9 2020, 3:37 PM

Taking a closer look at the test (haven't delved into the code yet): the PIC vs non-PIC difference of .gcc_exception_table looks strange.
.gcc_except_table (please also test its section flags "a") has an absolute relocation referencing main.2. This is not good. In all targets (except RISC-V -mno-relax), .gcc_except_table is a relocation-free section.

By relocation-free section do you mean all relocations or absolute relocations? This needs some type of relocation in .gcc_except_table to indicate that the landing pads are not relative to the parent function since they have been moved out.

modimo added a comment.Sep 9 2020, 3:55 PM

These changes will also apply to other Itanium C++ ABI targets (arm32/arm64/RISCV etc.) so adding testing for at least another target is good. Trying this out for arm64 hits an ICE here:

case TargetOpcode::EH_LABEL:
  if (MBB.isBeginSection() && MBB.isEHPad() &&
      ((*std::prev(MI.getIterator())).getOpcode() ==
       TargetOpcode::CFI_INSTRUCTION)) {

which is worth following up on.

rahmanl updated this revision to Diff 291092.Sep 10 2020, 2:54 PM
  • Rebase.
  • Override X86InstrInfo::InsertNoops and use it in BasicBlockSections.cpp to emit a NOP before the first EH pad.
  • Create a lambda function for duplicated code in EHStreamer.cpp.
  • Change some references to const.
modimo added inline comments.Sep 10 2020, 5:33 PM
llvm/lib/CodeGen/AsmPrinter/EHStreamer.h
113

What's the reason for using Optional here? Does checking for size == 0 suffice?

llvm/lib/CodeGen/AsmPrinter/WasmException.h
35

SmallVectorImpl for consistency.

llvm/lib/CodeGen/BasicBlockSections.cpp
299

Using getNoop this should look something like:

MCInst Noop;
MF.getSubtarget().getInstrInfo()->getNoop(Noop);
BuildMI(MBB, MI, DL, MF.getSubtarget().getInstrInfo()->get(Noop.getOpcode()));

I would also create a function to encapsulate this. Logically I think this fits here since you want to do this after MF.assignBeginEndSections(); but before updateBranches(MF, PreLayoutFallThroughs); however the naming of sortBasicBlocksAndUpdateBranches is now not accurate. Personally I think a rename to finalizeBasicBlocks would work but I'll let @snehasish chime in here.

llvm/lib/Target/X86/X86InstrInfo.cpp
90–91 ↗(On Diff #291092)

Use the generic TargetInstrInfo.h: virtual void getNoop(MCInst &NopInst) const;
Here so that this is extensible to other architectures.

rahmanl updated this revision to Diff 291141.Sep 10 2020, 11:39 PM
  • Do not use Optional for the CallSiteRanges parameter.
  • Add the new function llvm::avoidZeroOffsetLandingPad(MachineFunction&) and expose it in BasicBlockSectionUtils.h.
  • Use getNoop(MCInst&) instead of insertNoop() to insert NOP.
rahmanl updated this revision to Diff 291142.Sep 10 2020, 11:43 PM
  • clang-format.
rahmanl marked 10 inline comments as done and 2 inline comments as done.Sep 11 2020, 12:13 AM

Taking a closer look at the test (haven't delved into the code yet): the PIC vs non-PIC difference of .gcc_exception_table looks strange.
.gcc_except_table (please also test its section flags "a") has an absolute relocation referencing main.2. This is not good. In all targets (except RISC-V -mno-relax), .gcc_except_table is a relocation-free section.

I think you should just use PIC handling for -fno-PIC.

Thanks for the closer look. I included the section flags in the test.
Actually EHStreamer::emitTypeInfos already uses absolute and relative address (via calling getTTypeGlobalReference). This can be observed in the test:

; CHECK-NON-PIC-NEXT: .long _ZTIi # TypeInfo 1
; CHECK-PIC-NEXT: [[DOT:\.Ltmp[0-9]+]]:
; CHECK-PIC-NEXT: .quad .L_ZTIi.DW.stub-[[DOT]]

llvm/lib/CodeGen/AsmPrinter/EHStreamer.h
113

Yes. It doesn't look great with Optional. I reverted the change.

llvm/lib/CodeGen/BasicBlockSections.cpp
299

Great suggestion.
I added a new function avoidZeroOffsetLandingPad which can also be used in MachineFunctionSpliter.

llvm/test/CodeGen/X86/gcc_except_table_bb_sections.ll
13

You're right, but the LSDA encoding is emitted by MCStreamer::emitCFILsda and I don't see any comment functionality there.

16

Added instruction between the two labels. After Ltmp1 comes # %bb.1 which is not really helpful.

20

Your observation is correct. However, this patch does not touch that functionality (implemented in MachineBasicBlock::getSymbol()).

@rahmanl Have you had a chance to run your added test on ARM64 as another itanium C++ ABI target to make sure it works properly? Having testing there is fine as a follow-up but wanted to make sure we're in agreement here. Otherwise changes LGTM, please let @MaskRay have a chance to provide any additional feedback.

rahmanl marked 5 inline comments as done.Sep 16 2020, 11:25 PM

@rahmanl Have you had a chance to run your added test on ARM64 as another itanium C++ ABI target to make sure it works properly? Having testing there is fine as a follow-up but wanted to make sure we're in agreement here. Otherwise changes LGTM, please let @MaskRay have a chance to provide any additional feedback.

Thanks @modimo.
This patch relies on basic-block-sections support which is currently only tested on X86 targets (even though it might work on other targets too).
Although I can definitely work with you to diagnose any potential issues on ARM64, I suggest we don't write tests for this patch before the base feature is well-tested.
(On related note, a recent commit (rG157cd93b48a9) limited -fbasic-block-sections to X86.).

Although I can definitely work with you to diagnose any potential issues on ARM64, I suggest we don't write tests for this patch before the base feature is well-tested.
(On related note, a recent commit (rG157cd93b48a9) limited -fbasic-block-sections to X86.).

Great, that exact change was what I was going to suggest if tests were not added. Thanks for the link!

rahmanl edited reviewers, added: MaskRay; removed: wenlei.Sep 18 2020, 12:55 PM

I have some understanding with this patch now. Continuously pondering... Here are some comments.

llvm/include/llvm/CodeGen/AsmPrinter.h
145

Move below CurrentSectionBeginSym and add a comment to make it clear this is for basic block sections.

259

Move this above, next to the other overload.

llvm/lib/CodeGen/AsmPrinter/EHStreamer.cpp
221

The -fno-function-sections case is similar to -ffunction-sections, so there is probably no need to mention function sections.

251

Seems that this comment can be merged with the next comment.

409–410

Probably worth mentioning that normally there is exactly one CallSiteRange. With -fbasic-block-sections, there is one for each cluster of basic block sections.

477–493
502

This comment does not start with a complete sentence.

510

typo: preciesly

Probably juse: The Action table follows the call-site table

561

Probably worth mentioning that there is only one basic block section having landing pads.

llvm/lib/CodeGen/BasicBlockSections.cpp
299

Where will avoidZeroOffsetLandingPad used elsewhere?

llvm/test/CodeGen/X86/gcc_except_table_bb_sections.ll
11

You can place NON-PIC together so that you can use -NEXT:

13

I mean a test file comment, not in CHECK.

81

The comment markers appear to be misaligned.

rahmanl updated this revision to Diff 294172.Sep 24 2020, 2:40 PM
rahmanl marked 10 inline comments as done.
  • Rebase.
  • Unify access to exception symbols for basic-block-sections and no basic-block-sections.
  • other nits.

Thanks for the comments @MaskRay

llvm/include/llvm/CodeGen/AsmPrinter.h
145

In light of your comment, I refactored this map to be used for both situations.
This refactoring also removed some code .

259

NA.

llvm/lib/CodeGen/AsmPrinter/EHStreamer.cpp
251

This comment was left out from old code. Thanks for catching. Removed.

477–493

This is existing code copied here. Does it matter that we make the boolean const since it's temporary and we can't take the address/reference of the temporary.

502

Thanks again.

llvm/test/CodeGen/X86/gcc_except_table_bb_sections.ll
11

Apparently, -NEXT works even without them being together, but I organized the checks.

81

Thanks. Fixed here and in the other test.

rahmanl updated this revision to Diff 294180.Sep 24 2020, 2:58 PM
  • clang-format.
modimo added inline comments.Sep 25 2020, 5:32 PM
llvm/lib/CodeGen/AsmPrinter/EHStreamer.cpp
477–493

Perf and but more importantly read-ability that this is only calculated here. IMO a good thing to get in the habit of.

rahmanl updated this revision to Diff 294771.Sep 28 2020, 11:43 AM
rahmanl marked 4 inline comments as done.
  • Rebase.
  • Add test comment about LSDA and personality encodings.
  • Make avoidZeroOffsetLandingPad static.
llvm/lib/CodeGen/BasicBlockSections.cpp
299

You're right. It's not used anywhere else right now. So I am going to make it static for now.

There is a compilation error:

llvm/lib/Target/ARM/ARMAsmPrinter.cpp:906:13: error: use of undeclared identifier 'getCurExceptionSym'
    MCSym = getCurExceptionSym();
MaskRay added inline comments.Sep 28 2020, 2:54 PM
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1768

The idiom is to use try_emplace. If a new element is inserted, replace it.

auto Res = MBBSectionExceptionSyms.try_emplace(MBB->getSectionIDNum());
if (Res.second)
  Res.first->second = createTempSymbol("exception");
return Res.first->second;
rahmanl updated this revision to Diff 294824.Sep 28 2020, 4:24 PM
rahmanl marked an inline comment as done.
  • Fix ARM AsmPrinter code.
  • Use DenseMap for MBBSectionExceptionSyms.

There is a compilation error:

llvm/lib/Target/ARM/ARMAsmPrinter.cpp:906:13: error: use of undeclared identifier 'getCurExceptionSym'
    MCSym = getCurExceptionSym();

Thanks @MaskRay. Fixed.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1768

Yes. MapVector doesn't support emplace methods.
Thinking more about this, I decided to simply use a DenseMap for MBBSectionExceptionSyms since we don't need the deterministic order for this map (It's not iterated through). And now we can use the code snippet above.

rahmanl removed a reviewer: skan.Sep 28 2020, 4:27 PM
MaskRay added a comment.EditedSep 28 2020, 10:44 PM

Let's take gcc_except_table_bb_sections.ll as the example:

Call site sizes are incorrect

.uleb128 .Lcst_end0-.Lcst_begin0  # Call Site Size

.uleb128 .Lcst_end0-.Lcst_begin1  # Call Site Size

.uleb128 .Lcst_end0-.Lcst_begin2  # Call Site Size

I think the Call Site Sizes are incorrect. A pair of .Lcst_end0 and .Lcst_begin0 should exactly delimiter the call sites table, but the current implements uses .Lcst_end0 for all 3 basic block sections and places .Lcst_end0 after the last Call Site Table.

Action table

All these call-site tables will share their action and type tables.

I think type tables can be shared but action tables cannot.

In each call-site table, the action record offset of each call site record is relative to the action table the start of the action table. The action table follows the call-site table.

"The action table follows the call-site table in the LSDA." If there are N call-site tables, there should be N corresponding action tables. See libcxxabi/src/cxa_exceptions.cpp:scan_eh_tab for how the table is parsed (actionTableStart + (actionEntry - 1))

(Apologies that I only noticed this problem just now. I am learning exceptions while reviewing this patch...)

Let's take gcc_except_table_bb_sections.ll as the example:

Call site sizes are incorrect

.uleb128 .Lcst_end0-.Lcst_begin0  # Call Site Size

.uleb128 .Lcst_end0-.Lcst_begin1  # Call Site Size

.uleb128 .Lcst_end0-.Lcst_begin2  # Call Site Size

I think the Call Site Sizes are incorrect. A pair of .Lcst_end0 and .Lcst_begin0 should exactly delimiter the call sites table, but the current implements uses .Lcst_end0 for all 3 basic block sections and places .Lcst_end0 after the last Call Site Table.

Yes. It's a bit confusing I agree. The label difference is used to locate the action table by the exception handling runtime. It also happens to be equal to the call site table size for no-basic-block-sections.
With basic block sections, these offsets do not correspond to the call site table. We have one cst_end label and multiple cst_begin labels. However, correctness is guaranteed because each call site range explicitly includes the LSDA headers.
One alternative is to change the name of these symbols similar to type table references (cst_begin to atbaseref and cst_end to atbase where at represents Action Table) to make sure they're not incorrectly interpreted, but this will be applied to default too.

Action table

All these call-site tables will share their action and type tables.

I think type tables can be shared but action tables cannot.

In each call-site table, the action record offset of each call site record is relative to the action table the start of the action table. The action table follows the call-site table.

"The action table follows the call-site table in the LSDA." If there are N call-site tables, there should be N corresponding action tables. See libcxxabi/src/cxa_exceptions.cpp:scan_eh_tab for how the table is parsed (actionTableStart + (actionEntry - 1))

Even with basic-block sections, there is still only one call-site table, but multiple call-site ranges. This works because each call-site range explicitly specifies references to action table and type table. So actionTableStart across all call-site ranges.

(Apologies that I only noticed this problem just now. I am learning exceptions while reviewing this patch...)

No worries. Thanks for taking the time to understand and review.

rahmanl edited the summary of this revision. (Show Details)Sep 29 2020, 12:03 PM

Let's take gcc_except_table_bb_sections.ll as the example:

Call site sizes are incorrect

.uleb128 .Lcst_end0-.Lcst_begin0  # Call Site Size

.uleb128 .Lcst_end0-.Lcst_begin1  # Call Site Size

.uleb128 .Lcst_end0-.Lcst_begin2  # Call Site Size

I think the Call Site Sizes are incorrect. A pair of .Lcst_end0 and .Lcst_begin0 should exactly delimiter the call sites table, but the current implements uses .Lcst_end0 for all 3 basic block sections and places .Lcst_end0 after the last Call Site Table.

Yes. It's a bit confusing I agree. The label difference is used to locate the action table by the exception handling runtime. It also happens to be equal to the call site table size for no-basic-block-sections.
With basic block sections, these offsets do not correspond to the call site table. We have one cst_end label and multiple cst_begin labels. However, correctness is guaranteed because each call site range explicitly includes the LSDA headers.
One alternative is to change the name of these symbols similar to type table references (cst_begin to atbaseref and cst_end to atbase where at represents Action Table) to make sure they're not incorrectly interpreted, but this will be applied to default too.

Thanks for the clarification. I think I understand it now. Conceptually the 3 call site tables overlap.[[[ cst0 ] cst1 ] cst2 ] cst_end0: action_table. This point may deserve a sentence in the description. I've left an inline comment about renaming cst_end0.

The trailing bytes of cst0 and cst1 are actually un-parseable parts. This does not matter in practice because these call site tables are all comprehensive. __gxx_personality_v0 should be able to find a call site within cst0, if an exception is thrown in its range. __cxa_throw -> _Unwind_RaiseException -> find a CIE and the associated LSDA -> call __gxx_personality_v0 -> find a call site in the call site table.

Action table

All these call-site tables will share their action and type tables.

I think type tables can be shared but action tables cannot.

In each call-site table, the action record offset of each call site record is relative to the action table the start of the action table. The action table follows the call-site table.

"The action table follows the call-site table in the LSDA." If there are N call-site tables, there should be N corresponding action tables. See libcxxabi/src/cxa_exceptions.cpp:scan_eh_tab for how the table is parsed (actionTableStart + (actionEntry - 1))

Even with basic-block sections, there is still only one call-site table, but multiple call-site ranges. This works because each call-site range explicitly specifies references to action table and type table. So actionTableStart across all call-site ranges.

(Apologies that I only noticed this problem just now. I am learning exceptions while reviewing this patch...)

No worries. Thanks for taking the time to understand and review.

llvm/lib/CodeGen/AsmPrinter/EHStreamer.cpp
470–471

if CallSiteRanges.size() > 1, consider renaming cst_end to something else, e.g. action_table_start.

rahmanl updated this revision to Diff 295188.Sep 30 2020, 12:34 AM
  • Use 'action_table_base' instead of 'cst_end' with more than one call-site range.
  • Rebase.
MaskRay accepted this revision.Sep 30 2020, 12:53 AM

LGTM.

This revision is now accepted and ready to land.Sep 30 2020, 12:53 AM
This revision was landed with ongoing or failed builds.Sep 30 2020, 11:06 AM
This revision was automatically updated to reflect the committed changes.
modimo added a comment.Oct 1 2020, 4:16 PM

Glad to see this landing! I'm testing locally with my change to enable this purely for landing blocks.

As is, MachineFunctionSplitter can utilize this without basic-block-sections which means the "nop" handling doesn't get invoked and the runtime fails to find the landing pads. This would certainly be a separate change and I'd be happy to drive that.

Glad to see this landing! I'm testing locally with my change to enable this purely for landing blocks.

Me too. Thanks.

As is, MachineFunctionSplitter can utilize this without basic-block-sections which means the "nop" handling doesn't get invoked and the runtime fails to find the landing pads. This would certainly be a separate change and I'd be happy to drive that.

That is expected. Thanks for signing up.