This is an archive of the discontinued LLVM Phabricator instance.

Call Frame Information (CFI) Handling for Basic Block Sections
ClosedPublic

Authored by tmsriram on May 14 2020, 6:26 PM.

Details

Summary

Presenting this for review on behalf of @amharc who is the original author of this patch. This patch handles CFI with basic block sections, which unlike DebugInfo does not support ranges. The DWARF standard explicitly requires emitting separate CFI Frame Descriptor Entries for each contiguous fragment of a function. Thus, the CFI information for all callee-saved registers (possibly including the frame pointer, if necessary) have to be emitted along with redefining the Call Frame Address (CFA), viz. where the current frame starts.

CFI directives are emitted in FDE’s in the object file with a low_pc, high_pc specification. So, a single FDE must point to a contiguous code region unlike debug info which has the support for ranges. This is what complicates CFI for basic block sections.

Now, what happens when we start placing individual basic blocks in unique sections:

  • Basic block sections allow the linker to randomly reorder basic blocks in the address space such that a given basic block can become non-contiguous with the original function.
  • The different basic block sections can no longer share the cfi_startproc and cfi_endproc directives. So, each basic block section should emit this independently.
  • Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that caters to that basic block section.
  • Now, this basic block section needs to duplicate the information from the entry block to compute the CFA as it is an independent entity. It cannot refer to the FDE of the original function and hence must duplicate all the stuff that is needed to compute the CFA on its own.
  • We are working on a de-duplication patch that can share common information in FDEs in a CIE (Common Information Entry) and we will present this as a follow up patch. This can significantly reduce the duplication overhead and is particularly useful when several basic block sections are created.
  • The CFI directives are emitted similarly for registers that are pushed onto the stack, like callee saved registers in the prologue. There are cfi directives that emit how to retrieve the value of the register at that point when the push happened. This has to be duplicated too in a basic block that is floated as a separate section.

Diff Detail

Event Timeline

tmsriram created this revision.May 14 2020, 6:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2020, 6:26 PM
davidxl edited reviewers, added: wmi; removed: davidxl.May 19 2020, 9:54 AM
davidxl added a subscriber: davidxl.
wmi added inline comments.May 20 2020, 5:51 PM
llvm/lib/CodeGen/AsmPrinter/DwarfException.h
70–71

They are new functions marked as override, but there are no virtual functions added in the base class.

llvm/lib/CodeGen/CFIInstrInserter.cpp
310–316

Can we refactor it a little so the case to generate createDefCfa can be separated out like following? It is easier to understand when ForceFullCFA is true, only createDefCfa will be used to set up CFA info.

if (PrevMBBInfo->OutgoingCFAOffset != MBBInfo.IncomingCFAOffset &&
    PrevMBBInfo->OutgoingCFARegister != MBBInfo.IncomingCFARegister || ForceFullCFA) {
  MF.addFrameInst(MCCFIInstruction::createDefCfa...
  ...
} else if (...) {
...
379–381

Better move the snippet upwards and early return for ForceFullCFA in case later people add continue before the snippet since it is expected to call emitCalleeSavedFrameMoves for every BB except the first BB.

tmsriram updated this revision to Diff 265654.May 21 2020, 6:01 PM
tmsriram marked 4 inline comments as done.

Code refactoring in CFIInstrInserter.cpp

wmi added inline comments.May 28 2020, 10:23 AM
llvm/lib/MC/MCDwarf.cpp
1501 ↗(On Diff #265654)

Could you explain a little about how you decide which instructions can be deduped? I see the code is comparing the label of each CFI instruction with the label of the first CFI instruction in the frame. What is the meaning of the label? When those CFI intructions are created in CFIInstrInserter, the labels are initialized to nullptr.

tmsriram updated this revision to Diff 273599.Jun 25 2020, 11:38 PM

Split deduplication out of base CFI patch.

This patch now only contains the basic support needed to make CFI correct with basic block sections. CFI FDE deduplication will be added as a separate child patch to make it easy to review.

tmsriram edited the summary of this revision. (Show Details)Jun 25 2020, 11:42 PM
tmsriram added subscribers: sanjoy, vstefanovic.
tmsriram added inline comments.
llvm/lib/CodeGen/AsmPrinter/DwarfException.h
70–71

It is in the parent patch, D78851 in AsmPrinterHandler.h just like beginFragment.

(don't have any particular feedback on most of the patch - CFI stuff isn't something I'm especially familiar with - just a couple of optional thoughts on simplifying/clarifying the tests)

llvm/test/DebugInfo/X86/basicblock-sections-cfi.ll
4–10 ↗(On Diff #273599)

I /might/ find this simpler:

void f1();
void f2();
void f3(bool b) {
  if (b)
    f1();
  else
    f2();
}

But no big deal either way - maybe some more exposition on why 1 CIE and 4FDEs are expected? (I guess 1 CIE shares some common data across the 4 FDEs which are for the 4 basic blocks? Could it be fewer, what about "void f1(); void f2(bool b) { if (b) f1(); }" is that still adequate to test this functionality?)

llvm/test/DebugInfo/X86/basicblock-sections-cfiinstr.ll
23–30 ↗(On Diff #273599)

Seems like a surprisingly large amount of computation - is it there for a reason? needed to push some optimization or layout decisions? Could it all use the same operation (just all multiplication, for instance) or is the different operations significant? (Well, I guess they have to differ between the two branches - but could all be the same within each one?) does it need 12 parameters? Could it be fewer & use a function call?

(etc, etc - simple test case, maybe some comments describing what's significant about the features of it that are needed to demonstrate the desired behavior, etc)

tmsriram marked an inline comment as done.Jun 26 2020, 1:21 PM
tmsriram added inline comments.
llvm/test/DebugInfo/X86/basicblock-sections-cfi.ll
4–10 ↗(On Diff #273599)

@amharc

Thanks for taking a look, I will change the code.

The FDEs cannot be fewer, one is needed per section. This is because FDEs do not support ranges like debug info and it is only low-high PC. In theory, each section could be in arbitrary non-contiguous locations and hence conservatively we need four.

We have a CFI dedup that will reduce the size of each FDE and move the common parts out to the CIE. I split that out of this patch to keep this simple.

Could it be fewer, what about "void f1(); void f2(bool b) { if (b) f1(); }" is that still adequate to test this functionality?)

This will not capture the deduping well because we need atleast 4 basic blocks to have duplicates. The entry and the exit block have other CFI instructions.

dblaikie added inline comments.Jun 26 2020, 2:02 PM
llvm/test/DebugInfo/X86/basicblock-sections-cfi.ll
4–10 ↗(On Diff #273599)

We have a CFI dedup that will reduce the size of each FDE and move the common parts out to the CIE. I split that out of this patch to keep this simple.

Then I'd generally suggest keeping this test simple too - though I can understand the benefit of having the diff on the test just demonstrate how things got better in the later change (rather than introducing test changes or new tests that demonstrate the new behavior). Your call.

tmsriram marked 2 inline comments as done.Jun 26 2020, 3:37 PM
tmsriram added inline comments.Jun 26 2020, 3:37 PM
llvm/test/DebugInfo/X86/basicblock-sections-cfi.ll
4–10 ↗(On Diff #273599)

Sure, Iwill make it simple and evolve it when submitting the dedup patch.

llvm/test/DebugInfo/X86/basicblock-sections-cfiinstr.ll
23–30 ↗(On Diff #273599)

It was done so that more callee-saved registers are used and when more callee saved registers are used cfi_offset directives are needed for it. The .s looks like this for a basic block that does the computation:

_Z7computebiiiiiiiiiiii.1: # %if.then
.cfi_startproc
.cfi_def_cfa %rbp, 16
.cfi_offset %rbx, -48
.cfi_offset %r12, -40
.cfi_offset %r14, -32
.cfi_offset %r15, -24
.cfi_offset %rbp, -16

Each basic block that goes in a different section must emit cfi directives for callee-saved registers. The parameters is to make sure the caller saved registers are taken and the callee saved registers are forced so that we can check that the cfi emission indeed works for callee saved registers.

tmsriram updated this revision to Diff 274249.Jun 29 2020, 2:50 PM
tmsriram marked 2 inline comments as done.

Simplify CFI test.

tmsriram updated this revision to Diff 274324.Jun 29 2020, 8:52 PM

Simplify cfiinstr test .ll file

dblaikie added inline comments.Jul 6 2020, 8:58 PM
llvm/test/DebugInfo/X86/basicblock-sections-cfiinstr.ll
23–30 ↗(On Diff #273599)

Ah, OK - a comment might be handy to describe that?

And rather than the somewhat arbitrary computation, perhaps an opaque function call would suffice? Or would that introduce other complications for spills/saves/etc?

Maybe using a pass by value struct as the parameter type so the long parameter list doesn't have to be repeated?

wmi added inline comments.Jul 7 2020, 8:54 AM
llvm/lib/CodeGen/CFIInstrInserter.cpp
344

Should we add a continue here after setting PrevMBBInfo = &MBBInfo? We shouldn't insert any extra CSR related CFI directives based on prev MBB.

llvm/lib/Target/X86/X86FrameLowering.cpp
488–490

.cfi_offset directive for framepointer is inserted after other .cfi_offset directives for callee save registers. This is different from the .cfi_offset order inserted for prologue. I am not familiar with how the dedup cfi is implemented. A question is will the order difference reduce the chance of deduplicating cfi in prologue?

tmsriram updated this revision to Diff 276206.Jul 7 2020, 2:19 PM
tmsriram marked 3 inline comments as done.

Simplify CFI instructions test.

tmsriram added inline comments.Jul 7 2020, 2:24 PM
llvm/test/DebugInfo/X86/basicblock-sections-cfiinstr.ll
23–30 ↗(On Diff #273599)

Simplified the test and added comments. Having more than 4 integers in the struct seems to go to the stack though the ABI says upto 32 bytes.

dblaikie added inline comments.Jul 7 2020, 2:57 PM
llvm/test/DebugInfo/X86/basicblock-sections-cfiinstr.ll
23–30 ↗(On Diff #273599)

Ah - the comment's good, thanks! Not sure about the code changes - I was hoping for more uniformity (& brevity as a second-order benefit), but having a struct, then 3 struct parameters and some ints lacks the uniformity I was hoping for.

Also the arithmetic looks sort of arbitrarily complicated (& raises the question, as a reader (for me at least), why is it complicated? Is the particular sequence of arithmetic important in some way?). Is a more uniform operation (like all addition) not viable due to vectorizing or something? (is a function call inadequate because of other spill issues (eg: void f1(bool k, int a, int b.... ) { int result; if (k) { result f2(a, b, ... ); } else { result = f3(a, b, ...); } return result; })?)

tmsriram marked an inline comment as done and an inline comment as not done.Jul 7 2020, 8:07 PM
tmsriram added inline comments.
llvm/test/DebugInfo/X86/basicblock-sections-cfiinstr.ll
23–30 ↗(On Diff #273599)

@wmi who is a register allocation expert. I am not an expert but we need to use callee saved registers in some manner. Function calls would still only use caller-save registers and unless we have a *callee* where there is serious computation combined with a dearth of scratch registers (caller-save), callee saved registers will be avoided. Does this make sense :) ?

I can try to simplify the computation or spend some time to see if I can produce simpler code that can still use callee saved registers, but I dont have concrete ideas now.

tmsriram marked an inline comment as done and 2 inline comments as not done.Jul 7 2020, 8:21 PM
tmsriram added inline comments.
llvm/test/DebugInfo/X86/basicblock-sections-cfiinstr.ll
23–30 ↗(On Diff #273599)

Ah! I think you are right :), sorry about that! It does use callee saved registers for function calls. Let me simplify this along these lines

tmsriram updated this revision to Diff 276310.Jul 7 2020, 9:40 PM
tmsriram marked 6 inline comments as done.

Fix CFIInstr test and address other reviewer comments.

llvm/lib/Target/X86/X86FrameLowering.cpp
488–490

I fixed the order and checked that dedup works just fine.

wmi accepted this revision.Jul 7 2020, 9:54 PM

Thanks. LGTM. Please wait and see if David has more comments.

This revision is now accepted and ready to land.Jul 7 2020, 9:54 PM
MaskRay added a subscriber: MaskRay.Jul 7 2020, 9:58 PM
MaskRay added inline comments.
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
3080

Drop braces for simple statements

I have studied CFIInstrInserter in May. If you don't mind, please give me some time to review as well.

For basic-block-sections-cfiinstr_1.ll, have you considered places like CodeGen/X86/cfi-inserter-*? You may even create a subdirectory there.
_1 is not very common. -1 is more common.

curl -L 'https://reviews.llvm.org/D79978?download=1' does not have a/ or b/ prefix. I think that may be why arc patch D79978 cannot apply the patch.
Can you upload a diff with either arc diff, git format-patch -1 or git diff 'HEAD^'? Thanks.

llvm/test/DebugInfo/X86/basic-block-sections-cfiinstr_1.ll
6 ↗(On Diff #276310)

I think these labels may need : suffix and a # prefix to make them unique.

dblaikie added inline comments.Jul 8 2020, 11:07 AM
llvm/test/DebugInfo/X86/basic-block-sections-cfiinstr_1.ll
20–30 ↗(On Diff #276310)

this looks nicer - though I'd still like a bit more commentary on exactly how/why these constructs are here? Why two function calls with interleaved parameters rather than one, etc?

Mostly I'm hoping the test would explain why these constructs are used and which parts are relevant. (does the function need a non-void return type? or could the function calls be void-returning but conditional? (it's not like they can be optimized away, since they might have side effects anyway))

tmsriram marked 4 inline comments as done.Jul 8 2020, 5:00 PM

I have studied CFIInstrInserter in May. If you don't mind, please give me some time to review as well.

For basic-block-sections-cfiinstr_1.ll, have you considered places like CodeGen/X86/cfi-inserter-*? You may even create a subdirectory there.
_1 is not very common. -1 is more common.

Done.

curl -L 'https://reviews.llvm.org/D79978?download=1' does not have a/ or b/ prefix. I think that may be why arc patch D79978 cannot apply the patch.
Can you upload a diff with either arc diff, git format-patch -1 or git diff 'HEAD^'? Thanks.

Uploaded after git diff HEAD^. Please see if this works.

llvm/test/DebugInfo/X86/basic-block-sections-cfiinstr_1.ll
6 ↗(On Diff #276310)

I added the full name now, not sure what you mean by '#' prefix.

20–30 ↗(On Diff #276310)

I cleaned this up a bit more adding more comments and changing it to a void func. A single func call is not utilizing callee saved registers but using two calls like this is forcing it. PTAL.

tmsriram updated this revision to Diff 276603.Jul 8 2020, 5:01 PM
tmsriram marked an inline comment as done.

Address reviewer comments:

  • Rename tests.
  • SImplify tests.
  • Remove braces.

I have studied CFIInstrInserter in May. If you don't mind, please give me some time to review as well.

For basic-block-sections-cfiinstr_1.ll, have you considered places like CodeGen/X86/cfi-inserter-*? You may even create a subdirectory there.
_1 is not very common. -1 is more common.

Done.

curl -L 'https://reviews.llvm.org/D79978?download=1' does not have a/ or b/ prefix. I think that may be why arc patch D79978 cannot apply the patch.
Can you upload a diff with either arc diff, git format-patch -1 or git diff 'HEAD^'? Thanks.

Uploaded after git diff HEAD^. Please see if this works.

Your diff includes:

--- llvm/include/llvm/CodeGen/TargetFrameLowering.h
+++ llvm/include/llvm/CodeGen/TargetFrameLowering.h

git diff 'HEAD^' includes a prefix:

--- c/llvm/include/llvm/CodeGen/TargetFrameLowering.h
+++ w/llvm/include/llvm/CodeGen/TargetFrameLowering.h
MaskRay requested changes to this revision.EditedJul 8 2020, 11:27 PM

I haven't looked into the details, but the test suggests that the patch is wrong:

# basic-block-sections-cfi-1.ll
        .section        .text,"ax",@progbits,unique,2
_Z2f3b.2:                               # %if.end
        .cfi_startproc
        .cfi_def_cfa %rbp, 16    # this should be inserted after addq $16, %rsp
        .cfi_offset %rbp, -16    # this should be after .cfi_def_cfa %rbp, 16
        addq    $16, %rsp
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq
This revision now requires changes to proceed.Jul 8 2020, 11:27 PM
wmi added a comment.Jul 9 2020, 7:58 AM

I haven't looked into the details, but the test suggests that the patch is wrong:

# basic-block-sections-cfi-1.ll
        .section        .text,"ax",@progbits,unique,2
_Z2f3b.2:                               # %if.end
        .cfi_startproc
        .cfi_def_cfa %rbp, 16    # this should be inserted after addq $16, %rsp
        .cfi_offset %rbp, -16    # this should be after .cfi_def_cfa %rbp, 16
        addq    $16, %rsp
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq

I think the position where the cfi directives are currently inserted is correct. Those directives at the beginning of BB are not to maintain call frame information for instructions inside of BB like "addq $16, %rsp" and "popq %rbp", but to setup the call frame information correctly at the beginning of BB because the BB could be moved around.

In D79978#2141768, @wmi wrote:

I haven't looked into the details, but the test suggests that the patch is wrong:

# basic-block-sections-cfi-1.ll
        .section        .text,"ax",@progbits,unique,2
_Z2f3b.2:                               # %if.end
        .cfi_startproc
        .cfi_def_cfa %rbp, 16    # this should be inserted after addq $16, %rsp
        .cfi_offset %rbp, -16    # this should be after .cfi_def_cfa %rbp, 16
        addq    $16, %rsp
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq

I think the position where the cfi directives are currently inserted is correct. Those directives at the beginning of BB are not to maintain call frame information for instructions inside of BB like "addq $16, %rsp" and "popq %rbp", but to setup the call frame information correctly at the beginning of BB because the BB could be moved around.

Right. If you also compile the code without fbasicblock-sections, then you will see that there is no cfi directive for addq $16, %rsp at that point where you are referring to. I use that myself to understand when bugs happen. I will double-check.

In D79978#2141768, @wmi wrote:

I haven't looked into the details, but the test suggests that the patch is wrong:

# basic-block-sections-cfi-1.ll
        .section        .text,"ax",@progbits,unique,2
_Z2f3b.2:                               # %if.end
        .cfi_startproc
        .cfi_def_cfa %rbp, 16    # this should be inserted after addq $16, %rsp
        .cfi_offset %rbp, -16    # this should be after .cfi_def_cfa %rbp, 16
        addq    $16, %rsp
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq

I think the position where the cfi directives are currently inserted is correct. Those directives at the beginning of BB are not to maintain call frame information for instructions inside of BB like "addq $16, %rsp" and "popq %rbp", but to setup the call frame information correctly at the beginning of BB because the BB could be moved around.

Ack. Then what instructions should be placed at the top of these basic blocks? Should .cfi_def_cfa_register %rbp be placed as well? If you move these basic blocks around, .cfi_def_cfa_register %rbp is currently not tracked.

For basic-block-sections-cfiinstr_1.ll, have you considered places like CodeGen/X86/cfi-inserter-*? You may even create a subdirectory there.

This question still stands. I haven't seen other CFI tests in`DebugInfo/X86/`

wmi added a comment.Jul 9 2020, 9:28 AM
In D79978#2141768, @wmi wrote:

I haven't looked into the details, but the test suggests that the patch is wrong:

# basic-block-sections-cfi-1.ll
        .section        .text,"ax",@progbits,unique,2
_Z2f3b.2:                               # %if.end
        .cfi_startproc
        .cfi_def_cfa %rbp, 16    # this should be inserted after addq $16, %rsp
        .cfi_offset %rbp, -16    # this should be after .cfi_def_cfa %rbp, 16
        addq    $16, %rsp
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq

I think the position where the cfi directives are currently inserted is correct. Those directives at the beginning of BB are not to maintain call frame information for instructions inside of BB like "addq $16, %rsp" and "popq %rbp", but to setup the call frame information correctly at the beginning of BB because the BB could be moved around.

Ack. Then what instructions should be placed at the top of these basic blocks? Should .cfi_def_cfa_register %rbp be placed as well? If you move these basic blocks around, .cfi_def_cfa_register %rbp is currently not tracked.

That is because .cfi_def_cfa %rbp, 16 is identical to the following:
.cfi_def_cfa_register %rbp
.cfi_def_cfa_offset 16

In D79978#2141768, @wmi wrote:

I haven't looked into the details, but the test suggests that the patch is wrong:

# basic-block-sections-cfi-1.ll
        .section        .text,"ax",@progbits,unique,2
_Z2f3b.2:                               # %if.end
        .cfi_startproc
        .cfi_def_cfa %rbp, 16    # this should be inserted after addq $16, %rsp
        .cfi_offset %rbp, -16    # this should be after .cfi_def_cfa %rbp, 16
        addq    $16, %rsp
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq

I think the position where the cfi directives are currently inserted is correct. Those directives at the beginning of BB are not to maintain call frame information for instructions inside of BB like "addq $16, %rsp" and "popq %rbp", but to setup the call frame information correctly at the beginning of BB because the BB could be moved around.

Ack. Then what instructions should be placed at the top of these basic blocks? Should .cfi_def_cfa_register %rbp be placed as well? If you move these basic blocks around, .cfi_def_cfa_register %rbp is currently not tracked.

For basic-block-sections-cfiinstr_1.ll, have you considered places like CodeGen/X86/cfi-inserter-*? You may even create a subdirectory there.

This question still stands. I haven't seen other CFI tests in`DebugInfo/X86/`

Sure, I will move it, np.

tmsriram updated this revision to Diff 276825.Jul 9 2020, 1:27 PM

Move tests to CodeGen/X86/cfi*.

Ping, Is this alright now?

MaskRay added a comment.EditedJul 10 2020, 1:08 PM
In D79978#2141992, @wmi wrote:

Ack. Then what instructions should be placed at the top of these basic blocks? Should .cfi_def_cfa_register %rbp be placed as well? If you move these basic blocks around, .cfi_def_cfa_register %rbp is currently not tracked.

That is because .cfi_def_cfa %rbp, 16 is identical to the following:
.cfi_def_cfa_register %rbp
.cfi_def_cfa_offset 16

Honestly I am not a CFI expert but I have read enough bits of LLVM libunwind and am not completely CFI illiterate (I have fixed a very subtle negative cfiDefCfa bug). The description of the patch is still puzzling me.
I think it lacks a summary about what the patch intends to do.

Is the intention: if the entry block stays in the front of the function, while other basic blocks can be randomly shuffled => the CFI states of all basic blocks are the same no matter how non-entry basic blocks are shuffled?

Or

The entry basic block can be shuffled as well?

For either behavior, I am not sure the test covers the situations: I can only find .cfi_def_cfa_register in the entry block, not in others - so I am not confident that .cfi_def_cfa_register information is correctly retained. We need a stronger test.

For convenience of other reviewers, the diff when -basicblock-sections=all is turned on:

-       je      .LBB0_2                                                                                                                               
-# %bb.1:                                # %if.then                                                                                                   
+       je      _Z2f3b.2                                                                                                                              
+       jmp     _Z2f3b.1                                                                                                                              
+       .cfi_endproc                                                                                                                                  
+       .section        .text,"ax",@progbits,unique,1                                                                                                 
+_Z2f3b.1:                               # %if.then                                                                                                   
+       .cfi_startproc                                                                                                                                
+       .cfi_def_cfa %rbp, 16                                                                                                                         
+       .cfi_offset %rbp, -16
        callq   _Z2f1v
-.LBB0_2:                                # %if.end
+       jmp     _Z2f3b.2
+.Ltmp0:
+       .size   _Z2f3b.1, .Ltmp0-_Z2f3b.1
+       .cfi_endproc
+       .section        .text,"ax",@progbits,unique,2
+_Z2f3b.2:                               # %if.end
+       .cfi_startproc
+       .cfi_def_cfa %rbp, 16
+       .cfi_offset %rbp, -16
        addq    $16, %rsp
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq

Note also that only rbp is described. I think we need another register to demonstrate the effect.

In D79978#2141992, @wmi wrote:

Ack. Then what instructions should be placed at the top of these basic blocks? Should .cfi_def_cfa_register %rbp be placed as well? If you move these basic blocks around, .cfi_def_cfa_register %rbp is currently not tracked.

That is because .cfi_def_cfa %rbp, 16 is identical to the following:
.cfi_def_cfa_register %rbp
.cfi_def_cfa_offset 16

Honestly I am not a CFI expert but I have read enough bits of LLVM libunwind and am not completely CFI illiterate (I have fixed a very subtle negative cfiDefCfa bug). The description of the patch is still puzzling me.

I am not a CFI expert either and that is not a problem.

I think it lacks a summary about what the patch intends to do.

Ok, I can write a more detailed summary here.

Is the intention: if the entry block stays in the front of the function, while other basic blocks can be randomly shuffled => the CFI states of all basic blocks are the same no matter how non-entry basic blocks are shuffled?

@amharc If a basic block is placed in a unique section then it can potentially be moved away from the original function. CIEs do not allow ranges unlike debug info. So, when you are the PC of this basic block how does the unwinder know where the CFA is? In order to do that, we have to replicate the cfi directives to say how to find this. This test you pointed out shows that we do this with cfi directives.

Please note that the control-flow of the program never changes even though the blocks are shuffled randomly.

Or

The entry basic block can be shuffled as well?

The entry basic block is fine as it has the symbol of the original function and the default CFI generated is correct. Its section is the same as the function's section.

For either behavior, I am not sure the test covers the situations: I can only find .cfi_def_cfa_register in the entry block, not in others - so I am not confident that .cfi_def_cfa_register information is correctly retained. We need a stronger test.

Sure, could you please tell us what we should be testing. We have a test for callee saved register cfi directives being generated too.

For convenience of other reviewers, the diff when -basicblock-sections=all is turned on:

-       je      .LBB0_2                                                                                                                               
-# %bb.1:                                # %if.then                                                                                                   
+       je      _Z2f3b.2                                                                                                                              
+       jmp     _Z2f3b.1                                                                                                                              
+       .cfi_endproc                                                                                                                                  
+       .section        .text,"ax",@progbits,unique,1                                                                                                 
+_Z2f3b.1:                               # %if.then                                                                                                   
+       .cfi_startproc                                                                                                                                
+       .cfi_def_cfa %rbp, 16                                                                                                                         
+       .cfi_offset %rbp, -16
        callq   _Z2f1v
-.LBB0_2:                                # %if.end
+       jmp     _Z2f3b.2
+.Ltmp0:
+       .size   _Z2f3b.1, .Ltmp0-_Z2f3b.1
+       .cfi_endproc
+       .section        .text,"ax",@progbits,unique,2
+_Z2f3b.2:                               # %if.end
+       .cfi_startproc
+       .cfi_def_cfa %rbp, 16
+       .cfi_offset %rbp, -16
        addq    $16, %rsp
        popq    %rbp
        .cfi_def_cfa %rsp, 8
        retq

Note also that only rbp is described. I think we need another register to demonstrate the effect.

The other test does check for callee saved registers. This test is not only to check %rbp but also to make sure cfi_startproc and cfi_endproc are generated as expected.

Ple

In D79978#2141992, @wmi wrote:

Ack. Then what instructions should be placed at the top of these basic blocks? Should .cfi_def_cfa_register %rbp be placed as well? If you move these basic blocks around, .cfi_def_cfa_register %rbp is currently not tracked.

That is because .cfi_def_cfa %rbp, 16 is identical to the following:
.cfi_def_cfa_register %rbp
.cfi_def_cfa_offset 16

Honestly I am not a CFI expert but I have read enough bits of LLVM libunwind and am not completely CFI illiterate (I have fixed a very subtle negative cfiDefCfa bug). The description of the patch is still puzzling me.
I think it lacks a summary about what the patch intends to do.

Is the intention: if the entry block stays in the front of the function, while other basic blocks can be randomly shuffled => the CFI states of all basic blocks are the same no matter how non-entry basic blocks are shuffled?

Or

The entry basic block can be shuffled as well?

We explicitly want to support the case where all BBs can be shuffled around arbitrarily.

For either behavior, I am not sure the test covers the situations: I can only find .cfi_def_cfa_register in the entry block, not in others - so I am not confident that .cfi_def_cfa_register information is correctly retained. We need a stronger test.

It is retained, because we always emit a full .cfi_def_cfa in all non-entry basic blocks - which, as @wmi has mentioned before, is equivalent to a .cfi_def_cfa_register and a .cfi_def_cfa_offset. For non-entry basic blocks, there is no previous state (neither the cfa register nor the cfa offset), so we cannot emit just .cfi_def_cfa_register or just .cfi_def_cfa_offset, because both of them preserve one part of the cfa definition (the offset for .cfi_def_cfa_register or the register for .cfi_def_cfa_offset).

Note also that only rbp is described. I think we need another register to demonstrate the effect.

rbp is the usual frame pointer register for the x86 architecture and I'm not really sure we can easily force the compiler to choose a different register to hold the frame pointer. If you know how to force a different register to be the frame pointer, please let us know - we will add a corresponding test.

Honestly I am not a CFI expert but I have read enough bits of LLVM libunwind and am not completely CFI illiterate (I have fixed a very subtle negative cfiDefCfa bug). The description of the patch is still puzzling me.
I think it lacks a summary about what the patch intends to do.

Here is an attempt and trying to help understand what needs to be addressed to get CFI right with basic block sections:

One of the main goals of CFI directives is to allow the unwinder to retrieve the Canonical Frame Address (CFA) from any PC.

CFA - Canonical Frame Address, is the address of the stack pointer just before a call instruction is executed.

FDE - Frame Descriptor Entries are unique to each section and holds the CFI information in eh_frame section

CIE - Common Information Entries that hold information that is common to multiple FDEs in eh_frame section.

Let’s first cover some basics of CFI which will help us understand what needs to change to support basic block sections. The CFI directives for a simple program without any basic block sections:

void f1(); 
void f3(bool b) {
  if (b) f1();
}

With the frame pointer using -fno-omit-frame-pointer:

	.type	_Z2f3b,@function
_Z2f3b:                                 # @_Z2f3b
	.cfi_startproc
# %bb.0:                                # %entry
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
            …
	popq	%rbp
	.cfi_def_cfa %rsp, 8
	retq
.Lfunc_end0:
	.size	_Z2f3b, .Lfunc_end0-_Z2f3b
  • All CFI directives need to be placed between a cfi_startproc and cfi_endproc directive
  • A section/function is the smallest granularity for CFI directives. All CFI directives within a section can share the same cfi_startproc and cfi_endproc. However, each function must have a unique startproc and endproc directive too even without function sections.
  • Looking through the directives in sequence, “cfi_def_cfa_offset 16” tells us that initially the CFA (Canonical Frame Address) is 16 bytes from where the stack pointer is currently because we pushed the return address and %rbp onto the stack. “cfi_offset %rbp, -16” says the rbp register’s old value is now available 16 bytes from the CFA, because of the push.
  • “.cfi_def_cfa_register %rbp” basically says that the CFA can now be computed off %rbp as we just moved the stack pointer to it, the original offset of 16 from the value of rbp is used. This is efficient as we can change the %rsp now as much as we want and we don’t have to worry about generating directives to update CFA.
  • So, just before returning from the function we pop %rbp and correctly update that CFI needs to be computed using the %rsp.
  • CFI directives are emitted in FDE’s in the object file with a low_pc, high_pc specification. So, a single FDE must point to a contiguous code region unlike debug info which has the support for ranges. This is what complicates CFI for basic block sections.

Now, what happens when we start placing individual basic blocks in unique sections:

  • Basic block sections allow the linker to randomly reorder basic blocks in the address space such that a given basic block can become non-contiguous with the original function.
  • The different basic block sections can no longer share the cfi_startproc and cfi_endproc directives. So, each basic block section should emit this independently.
  • Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that caters to that basic block section.
  • Now, this basic block section needs to duplicate the information from the entry block to compute the CFA as it is an independent entity. It cannot refer to the FDE of the original function and hence must duplicate all the stuff that is needed to compute the CFA on its own.
  • We are working on a de-duplication patch that can share common information in FDEs in a CIE (Common Information Entry) and we will present this as a follow up patch. This can significantly reduce the duplication overhead and is particularly useful when several basic block sections are created.
  • The CFI directives are emitted similarly for registers that are pushed onto the stack, like callee saved registers in the prologue. There are cfi directives that emit how to retrieve the value of the register at that point when the push happened. This has to be duplicated too in a basic block that is floated as a separate section.
tmsriram updated this revision to Diff 277216.Jul 10 2020, 9:07 PM

Enhance test cfi-basic-block-sections-1.ll to also check for frame-pointer=none.

There is exactly 2 ways in which CFA can be computed:

  1. With frame pointer via %ebp
  2. Without, where %esp offset is used when frame pointer is omitted

The first test now explicitly checks both to clear any ambiguity here.

dblaikie added inline comments.Jul 10 2020, 11:22 PM
llvm/test/CodeGen/X86/cfi-inserter-basic-block-sections-callee-save-registers.ll
22–37

Thanks for this - totally makes sense to me now!
(Totally optional: I'd probably just write this with one parameter? And/or perhaps with constants for the first f1 call? - I've needed to do things like this when testing the DWARF OP_entry_values:

void f1(int);
extern bool b;
void f3(int i) {
  if (b) { // adds a basic block
    f1(1); // clobbers 'i', causing it to be spilled
    f1(i); // keeps 'i' alive
  }
}

Not to nitpick - just trying to help ensure tests are very specifically targeted - at least makes them easier for me to read, not sure about other folks? (if there's something interesting about testing more than one of these - some comments describing that'd be handy too)

What you've got is fine though, if you prefer it/find it valuable

Thanks for the general write-up of CFI and this patch - helped me understand a bit of what this patch in general, and this test in particular are all about!)

MaskRay added a comment.EditedJul 11 2020, 12:39 PM

...
Now, what happens when we start placing individual basic blocks in unique sections:

  • Basic block sections allow the linker to randomly reorder basic blocks in the address space such that a given basic block can become non-contiguous with the original function.
  • The different basic block sections can no longer share the cfi_startproc and cfi_endproc directives. So, each basic block section should emit this independently.
  • Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that caters to that basic block section.
  • Now, this basic block section needs to duplicate the information from the entry block to compute the CFA as it is an independent entity. It cannot refer to the FDE of the original function and hence must duplicate all the stuff that is needed to compute the CFA on its own.

...

Thanks for the detailed write-up! I I've learned a bit from it. I think at least the quoted items should be placed in the description.

Note also that only rbp is described. I think we need another register to demonstrate the effect.

rbp is the usual frame pointer register for the x86 architecture and I'm not really sure we can easily force the compiler to choose a different register to hold the frame pointer. If you know how to force a different register to be the frame pointer, please let us know - we will add a corresponding test.

I cannot find a test with .cfa_offset or describing a non-rbp register, so the implementation is under-tested. How about leveraging inline asm to clobber callee-saved registers?

int f3(int i, int j, int k) {
  if (i == 0) { // adds a basic block
    asm("nop" : : : "rdi","rsi","rdx","rbp","r12","r13","r14","r15","memory"); // there is a .cfi_offset for each of rbp,r12,r13,r14,r15
    return j;
  }
  if (j == 0) { // adds a basic block
    asm("xchg %%ax,%%ax" : : : "rdi","rsi","rdx","rbp","r14","r15","memory");  // r12 and r13 are not clobbered but the current implementation adds .cfi_offset for both r12 and r13
    return k;
  }
  return i;
}

I get a lot of .cfi_offset with clang -S -emit-llvm -O1 a.c; llc -O0 -basicblock-sections=all < a.ll:

...
        .section        .text,"ax",@progbits,unique,1
f3.1:                                   # %if.then
        .cfi_startproc
        .cfi_def_cfa %rsp, 48
        .cfi_offset %r12, -48
        .cfi_offset %r13, -40
        .cfi_offset %r14, -32
        .cfi_offset %r15, -24
        .cfi_offset %rbp, -16
        #APP
        nop
        #NO_APP
        movl    -8(%rsp), %eax                  # 4-byte Reload
        movl    %eax, -16(%rsp)                 # 4-byte Spill
...
llvm/lib/Target/X86/X86FrameLowering.cpp
484

If the function is only used by basic block sections, please mention the fact.

I cannot find a test with .cfa_offset or describing a non-rbp register, so the implementation is under-tested. How about leveraging inline asm to clobber callee-saved registers?

I'm not exactly sure which CFI instruction this comment refers to (there is no .cfa_offset directive, see e.g. https://sourceware.org/binutils/docs/as/CFI-directives.html).

If to .cfi_def_cfa_offset - the question was answered before: we emit a full CFA definition (.cfi_def_cfa - setting both the offset and the register) which is equivalent to a .cfi_def_cfa_offset (only setting the offset) plus a .cfi_def_cfa_register (only setting the register). Moreover, for code following the SysV ABI the register used in the .cfi_def_cfa_register/.cfi_def_cfa instructions is conventionally always set to either rbp (if the frame pointer is kept explicitly) or in rsp (when it's omitted). Both cases are covered by the existing tests. We are not aware of any easy way to force a different register to be the frame pointer.

If the review comment above refers to .cfi_offset instead, please note that the cfi-basic-block-sections-1.ll test checks that .cfi_offset is emitted properly in the generated assembly and cfi-inserter-basic-block-sections-callee-save-registers.ll checks that the CFI Inserter Pass actually inserts the corresponding CFI_INSTRUCTION offset instructions for clobbered callee-saved registers. I'm not exactly sure what would be gained by using inline assembly instead.

I cannot find a test with .cfa_offset or describing a non-rbp register, so the implementation is under-tested. How about leveraging inline asm to clobber callee-saved registers?

I'm not exactly sure which CFI instruction this comment refers to (there is no .cfa_offset directive, see e.g. https://sourceware.org/binutils/docs/as/CFI-directives.html).

If to .cfi_def_cfa_offset - the question was answered before: we emit a full CFA definition (.cfi_def_cfa - setting both the offset and the register) which is equivalent to a .cfi_def_cfa_offset (only setting the offset) plus a .cfi_def_cfa_register (only setting the register). Moreover, for code following the SysV ABI the register used in the .cfi_def_cfa_register/.cfi_def_cfa instructions is conventionally always set to either rbp (if the frame pointer is kept explicitly) or in rsp (when it's omitted). Both cases are covered by the existing tests. We are not aware of any easy way to force a different register to be the frame pointer.

If the review comment above refers to .cfi_offset instead, please note that the cfi-basic-block-sections-1.ll test checks that .cfi_offset is emitted properly in the generated assembly and cfi-inserter-basic-block-sections-callee-save-registers.ll checks that the CFI Inserter Pass actually inserts the corresponding CFI_INSTRUCTION offset instructions for clobbered callee-saved registers. I'm not exactly sure what would be gained by using inline assembly instead.

I ran llc for both tests. Neither has a section with CFI describing a non-rbp register at the top, thus I say I am not sure the implementation is properly tested. My proposed inline assembly makes sure more than one registers (not just the current CFA register) are recorded. Please inspect the output.

...
Now, what happens when we start placing individual basic blocks in unique sections:

  • Basic block sections allow the linker to randomly reorder basic blocks in the address space such that a given basic block can become non-contiguous with the original function.
  • The different basic block sections can no longer share the cfi_startproc and cfi_endproc directives. So, each basic block section should emit this independently.
  • Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that caters to that basic block section.
  • Now, this basic block section needs to duplicate the information from the entry block to compute the CFA as it is an independent entity. It cannot refer to the FDE of the original function and hence must duplicate all the stuff that is needed to compute the CFA on its own.

...

Thanks for the detailed write-up! I I've learned a bit from it. I think at least the quoted items should be placed in the description.

Note also that only rbp is described. I think we need another register to demonstrate the effect.

rbp is the usual frame pointer register for the x86 architecture and I'm not really sure we can easily force the compiler to choose a different register to hold the frame pointer. If you know how to force a different register to be the frame pointer, please let us know - we will add a corresponding test.

I cannot find a test with .cfa_offset or describing a non-rbp register, so the implementation is under-tested. How about leveraging inline asm to clobber callee-saved registers?

Please feel free to double-check with any body familiar with X86 , X86 has exactly one register for the frame pointer and that is %rbp, there is *no other register*. The only other option is to omit the frame pointer and *we have checked both now explicitly*.

We are already checking for callee-saved registers. That is the whole point of test 2 , there is no need to force inline assembly. @dblaikie could you please help explain here? If you want us to check the final assembly too we can, but I don't see why you are asking for inline assembly when a program could do this.

int f3(int i, int j, int k) {
  if (i == 0) { // adds a basic block
    asm("nop" : : : "rdi","rsi","rdx","rbp","r12","r13","r14","r15","memory"); // there is a .cfi_offset for each of rbp,r12,r13,r14,r15
    return j;
  }
  if (j == 0) { // adds a basic block
    asm("xchg %%ax,%%ax" : : : "rdi","rsi","rdx","rbp","r14","r15","memory");  // r12 and r13 are not clobbered but the current implementation adds .cfi_offset for both r12 and r13
    return k;
  }
  return i;
}

I get a lot of .cfi_offset with clang -S -emit-llvm -O1 a.c; llc -O0 -basicblock-sections=all < a.ll:

...
        .section        .text,"ax",@progbits,unique,1
f3.1:                                   # %if.then
        .cfi_startproc
        .cfi_def_cfa %rsp, 48
        .cfi_offset %r12, -48
        .cfi_offset %r13, -40
        .cfi_offset %r14, -32
        .cfi_offset %r15, -24
        .cfi_offset %rbp, -16

It is the same thing, it is exactly what test2 does. Do you want use to test the assembly too? We can do it without inline assembly, I personally hate unnecessary uses of inline assembly.

#APP
nop
#NO_APP
movl    -8(%rsp), %eax                  # 4-byte Reload
movl    %eax, -16(%rsp)                 # 4-byte Spill

...

I cannot find a test with .cfa_offset or describing a non-rbp register, so the implementation is under-tested. How about leveraging inline asm to clobber callee-saved registers?

I'm not exactly sure which CFI instruction this comment refers to (there is no .cfa_offset directive, see e.g. https://sourceware.org/binutils/docs/as/CFI-directives.html).

If to .cfi_def_cfa_offset - the question was answered before: we emit a full CFA definition (.cfi_def_cfa - setting both the offset and the register) which is equivalent to a .cfi_def_cfa_offset (only setting the offset) plus a .cfi_def_cfa_register (only setting the register). Moreover, for code following the SysV ABI the register used in the .cfi_def_cfa_register/.cfi_def_cfa instructions is conventionally always set to either rbp (if the frame pointer is kept explicitly) or in rsp (when it's omitted). Both cases are covered by the existing tests. We are not aware of any easy way to force a different register to be the frame pointer.

If the review comment above refers to .cfi_offset instead, please note that the cfi-basic-block-sections-1.ll test checks that .cfi_offset is emitted properly in the generated assembly and cfi-inserter-basic-block-sections-callee-save-registers.ll checks that the CFI Inserter Pass actually inserts the corresponding CFI_INSTRUCTION offset instructions for clobbered callee-saved registers. I'm not exactly sure what would be gained by using inline assembly instead.

I ran llc for both tests. Neither has a section with CFI describing a non-rbp register at the top, thus I say I am not sure the implementation is properly tested.

What is the problem here, I dont follow. Is this with or without frame pointer? If you ran with O2 frame pointer is omitted so you must explicitly say --frame-pointer=all.

Let me try a slightly different approach here. It is not clear to us what more is needed to land the patch. In the interests of resolving conflict :

  1. I will also explicitly test assembly too for callee saved registers with bb sections when they are being pushed and popped.
  2. Tracking the CFA with and without the frame pointer is explicitly tested, fine?
  3. I will fix all the minor nits mentioned.

Is there anything else? Thanks.

My apologies that I did not try cfi-inserter-basic-block-sections-callee-save-registers.ll
The generated assembly does have .cfi_offset
(I cannot apply the patch with arc patch (https://reviews.llvm.org/D79978?id=277216#2138208 )
so I used curl | patch and somehow ignored the locally modified .ll)

However, I do think inline asm with clobbered register list () has some advantage: the .cfi_offset registers
can be precisely controlled.

call void asm sideeffect "nop", "~{rbp},~{r12},~{r13},~{r14},~{r15}"()

The CHECK lines can thus be strengthened:

; CFI_INSTR:      CFI_INSTRUCTION def_cfa_offset 48
; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r12, -48
; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r13, -40
; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r14, -32
; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r15, -24
; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $rbp, -16

While the current approach cannot control the used registers.


There is another unaddressed comment.

Adding part of your write-up to the description will be helpful
https://reviews.llvm.org/D79978?id=277216#2145979

llvm/test/CodeGen/X86/cfi-basic-block-sections-1.ll
3

While --eh-frame is an alias for --debug-frame, I think using --eh-frame here is more appropriate. This tests .eh_frame, not .debug_frame.

My apologies that I did not try cfi-inserter-basic-block-sections-callee-save-registers.ll
The generated assembly does have .cfi_offset
(I cannot apply the patch with arc patch (https://reviews.llvm.org/D79978?id=277216#2138208 )
so I used curl | patch and somehow ignored the locally modified .ll)

However, I do think inline asm with clobbered register list () has some advantage: the .cfi_offset registers
can be precisely controlled.

call void asm sideeffect "nop", "~{rbp},~{r12},~{r13},~{r14},~{r15}"()

The CHECK lines can thus be strengthened:

; CFI_INSTR:      CFI_INSTRUCTION def_cfa_offset 48
; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r12, -48
; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r13, -40
; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r14, -32
; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $r15, -24
; CFI_INSTR-NEXT: CFI_INSTRUCTION offset $rbp, -16

While the current approach cannot control the used registers.

Yeah, I have mixed feelings about this - not sure it's necessary to test all of these, or test them as strongly as you've suggested (overly constraining tests can make them brittle - prone to breakage for other reasons (eg: it might still be worth leaving off the specific stack offset, so that if stack layout changes these tests don't fail - other, pre-existing, tests specifically for stack layout should catch that)). And usually I'd advocate for not using inline assembly (as in the other test). But in this case, given how indirect the code is when trying to clobber the callee saved register - yeah, I think inline asm might've been more clear to me in that test. Though I think my non-asm suggestion (https://reviews.llvm.org/D79978#2145753) did manage to reduce the test down to exactly one register in a fairly understandable way... hmm, could be a bit simpler now that I understand better what's going on:

void clobber();
void sink(int);
void test(bool b, int i) {
  if (b) { // adds a basic block
    clobber(); // encourage 'i' to be in a callee-save register by clobbering caller-save registers
    sink(i); // keeps 'i' alive
  }
}

So that tests one callee save register, assuming LLVM doesn't decide to (gratuitously inefficiently - well, I guess it might not be that much less efficient) put 'i' on the stack or somewhere else instead.

Whereas @MaskRay's suggestion of using explicit register use/clobber does seem a bit more explicit/intentional, rather than having to coax a certain representation out of the compiler backend, I guess something like:

void f3(bool b) { 
  if (b) // adds a basic block
     // clobber some example callee-save registers to force them to be callee-saved and to be described by cfi_offset directives
    asm("nop" : : : "r12","r13","r14","r15");
}

(at least I'm not sure much else is required - @MaskRay - was the second inline asm needed? or the return values/parameters? Also perhaps just testing one register would be sufficient? (less ordering issues, chance for unrelated backend changes to disturb this test))

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
3087–3091

probably drop the braces here too (as above)

llvm/test/CodeGen/X86/cfi-basic-block-sections-1.ll
3

Agreed - the check on line 51 should be updated too. llvm-dwarfdump's output is actually rendering an empty debug_frame and then after that it's rendering the eh_frame - the test is currently a bit misleading down there too. (& maybe llvm-dwarfdump's output is prone to such misleading testing, unfortunately - it prints the name of every section explicitly requested (& considers eh_frame and debug_frame explicitly requested when using --eh-frame or --debug-frame) even if they're empty, so you can easily get this sort of flow-on effect)

Let me try a slightly different approach here. It is not clear to us what more is needed to land the patch. In the interests of resolving conflict :

  1. I will also explicitly test assembly too for callee saved registers with bb sections when they are being pushed and popped.

Just on this point - it's not clear to me the motivation for the difference in testing between the two tests (one testing assembly, the other testing LLVM's MIR) - was there some particular difference/detail that these different strategies enabled testing? Otherwise, yeah, I'd probably just go with only testing assembly in both cases, for consistency/simplicity? (any variation in tests always makes me wonder "what's the reason for this difference? Must be something important or they'd be the same and so I'm not understanding some important difference")

Whereas @MaskRay's suggestion of using explicit register use/clobber does seem a bit more explicit/intentional, rather than having to coax a certain representation out of the compiler backend, I guess something like:

void f3(bool b) { 
  if (b) // adds a basic block
     // clobber some example callee-save registers to force them to be callee-saved and to be described by cfi_offset directives
    asm("nop" : : : "r12","r13","r14","r15");
}

(at least I'm not sure much else is required - @MaskRay - was the second inline asm needed? or the return values/parameters? Also perhaps just testing one register would be sufficient? (less ordering issues, chance for unrelated backend changes to disturb this test))

The second inline asm in the example (in a previous comment of mine; pasted below for your convenience) is not needed. When I wrote it, I was thinking whether we can exercise more code in CFIInstrInserter, i.e. not just that (all non-entry basic blocks can inherit CFI from the entry), but also that: if a non-entry basic block has updated its CFI information, subsequent basic blocks can pick up that delta part (relative to the entry basic block)

int f3(int i, int j, int k) {
  if (i == 0) { // adds a basic block
    asm("nop" : : : "rdi","rsi","rdx","rbp","r12","r13","r14","r15","memory"); // there is a .cfi_offset for each of rbp,r12,r13,r14,r15
    return j;
  }
  if (j == 0) { // adds a basic block
    // Ideally there is some code construct here which can reliably alter CFI, so that we can test that CFIInstrInserter can handle the delta part.
    // Unfortunately this cannot be inline asm `subq 100, %rsp` as that does not generate CFI_INSTRUCTION which can be tracked by CFIInstrInserter
    asm("xchg %%ax,%%ax" : : : "rdi","rsi","rdx","rbp","r14","r15","memory");  // r12 and r13 are not clobbered but the current implementation adds .cfi_offset for both r12 and r13
    return k;
  }
  return i;
}

If we can find such a code construct, it'd give me more confidence that we are updating CFIInstrInserter correctly & it'd be more difficult to break the basic block sections code.

And yes that I used registers because I hope the order of these CFI_INSTRUCTION registers is relatively stable so CHECK-NEXT: CFI_INSTRUCTION ..... can be a stronger test. As to offsets, I don't know codegen enough to confidently say that it is stable, but I hope simple code like this (with very specific spill slots) will not cause offsets to be updated abruptly.

tmsriram updated this revision to Diff 277306.Jul 12 2020, 4:00 PM
tmsriram marked 8 inline comments as done.
tmsriram edited the summary of this revision. (Show Details)
  • Change the callee saved register test to use a simple clobber on all callee saved registers and test cfi_offset for all the callee-save registers including rbp.
  • The tests now cover the 2 issues that need special handling with sections: CFA tracking with and without frame pointer and cfi directives for callee saved registers.
  • Other reviewer mentioned changes.

Let me try a slightly different approach here. It is not clear to us what more is needed to land the patch. In the interests of resolving conflict :

  1. I will also explicitly test assembly too for callee saved registers with bb sections when they are being pushed and popped.

Just on this point - it's not clear to me the motivation for the difference in testing between the two tests (one testing assembly, the other testing LLVM's MIR) - was there some particular difference/detail that these different strategies enabled testing? Otherwise, yeah, I'd probably just go with only testing assembly in both cases, for consistency/simplicity? (any variation in tests always makes me wonder "what's the reason for this difference? Must be something important or they'd be the same and so I'm not understanding some important difference")

You are right. For reasons that are long forgotten since this was written a while go, one of us thought it would be good to directly check CFI IR instructions too. Like you note, we could have just tested assembly, changed now.

llvm/test/CodeGen/X86/cfi-basic-block-sections-1.ll
3

Fixed this, stems from my mis-understanding of the various tools output.

llvm/test/CodeGen/X86/cfi-inserter-basic-block-sections-callee-save-registers.ll
22–37

Marking it done since you wanted to go with your other suggestion with inline asm.

Ping, is this alright?

Last few nits

llvm/test/CodeGen/X86/cfi-basic-block-sections-1.ll
2

x86_64-unknown-linux-gnu -> x86_64

This tests generic ELF behavior, not specific to linux-gnu.

50

Comments tend to be full sentences (with period).

You may consider ;; for tests. IIRC update_llc_test_checks.py can sometimes drop comments. We can teach update_llc_test_checks.py to retain comments with ;;

50

Typo: You may consider ;; for comments (not RUN or CHECK lines)

llvm/test/CodeGen/X86/cfi-inserter-basic-block-sections-callee-save-registers.ll
8

You can capture the register name with [[RA:%r.+]]

and use [[RA]] below to reference the captured name.

tmsriram updated this revision to Diff 277642.Jul 13 2020, 7:43 PM
tmsriram marked 3 inline comments as done.

Address reviewer comments.

tmsriram marked an inline comment as done.Jul 13 2020, 8:15 PM
MaskRay accepted this revision.Jul 13 2020, 9:56 PM

Looks great! Thanks for bearing with my nits... In the end I just want our tests to be sufficiently strong and reliable.

llvm/test/CodeGen/X86/cfi-inserter-basic-block-sections-callee-save-registers.ll
4

excess spaces on this line

This revision is now accepted and ready to land.Jul 13 2020, 9:56 PM
dblaikie accepted this revision.Jul 14 2020, 10:15 AM

Looks good - thanks for your patience!

This revision was automatically updated to reflect the committed changes.
tmsriram marked an inline comment as done.