This is an archive of the discontinued LLVM Phabricator instance.

[X86] Use the CFA when appropriate for better variable locations around calls.
ClosedPublic

Authored by khuey on Feb 6 2023, 10:18 PM.

Details

Summary

Without frame pointers, the locations of variables on the stack are emitted
relative to the stack pointer (via the stack pointer being the value of
DW_AT_frame_base on the subprogram). If a call modifies the stack pointer
this results in the locations being wrong and the debugger displaying the
wrong values for variables.

By using DW_OP_call_frame_cfa in these situations the emitted location for
the variable will automatically handle changes in the stack pointer
(provided LLVM is emitting the correct CFI directives elsewhere, of course).
The CFA needs to be adjusted for the size of the stack frame (including the
return address) to allow the variable locations themselves to remain
unchanged by this patch.

Certain LLDB features cannot cope with DW_OP_call_frame_cfa, so this change
is heuristically limited to the cases where it's necessary for correctness
to minimize the fallout there.

Diff Detail

Event Timeline

khuey created this revision.Feb 6 2023, 10:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2023, 10:18 PM
khuey requested review of this revision.Feb 6 2023, 10:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2023, 10:18 PM
khuey added a comment.Feb 21 2023, 7:03 AM

Poke?

Pretty sure those test failures aren't me.

khuey added a comment.Apr 10 2023, 8:55 PM

Poke again? This has been sitting for two months now.

Thank you for the patch! Sorry for the long delay in review

As a bit of a side-note, I think X86 always maintains a "precise CFA" to refer to, but that may not always be true (see https://reviews.llvm.org/D14948).

I'm curious about a couple points:

  • If we do always have the CFA available, then is there any reason not to use it unconditionally?
  • If we cannot use it unconditionally, can the condition be more direct, i.e. can we remove the heuristic?
  • Is the "frame base" value ABI-stable? If not, is there a strong reason not to rework the variable expressions to instead be CFA-relative, and then just define the "frame base" to always be equal to the CFA? It would avoid the extra offset needed in the current version of the patch.
llvm/lib/CodeGen/CFIInstrInserter.cpp
153

Fairly harmless (and NFC in this case) but I don't believe Register is intended to be constructed with a dwarf register ordinal

llvm/lib/Target/X86/X86FrameLowering.cpp
3808

wrt. the type confusion here, it seems to be a bug introduced during a bulk update which replaced many instances of unsigned with Register (2481f26ac3f22)

3817–3818

Can the new X86MachineFunctionInfo field be eliminated, and this instead check the same condition the rest of the X86 code checks before creating a OpAdjustCfaOffset? It seems to consistently just be that !X86FrameLowering::hasFP(MF).

3817–3824
llvm/lib/Target/X86/X86MCInstLower.cpp
2122

I think the fact that we need this extra call is evidence that the heuristic-based approach and a field in X86MachineFunctionInfo is not ideal

khuey added a comment.Apr 11 2023, 9:12 PM

As a bit of a side-note, I think X86 always maintains a "precise CFA" to refer to, but that may not always be true (see https://reviews.llvm.org/D14948).

This is relying on the CFA being correct when it's present. CFI is not always present (see X86FrameLowering::needsDwarfCFI which at a minimum excludes some Win64 stuff) but it might be present in all cases that matter. I'm not sure.

I'm curious about a couple points:

  • If we do always have the CFA available, then is there any reason not to use it unconditionally?

In decreasing order of persuasiveness to me

  1. If the frame pointer is present (or the stack pointer remains unchanged) using it is simpler for the debug info consumer (as there's no need to consult the CFI and run through its state machine). In theory there might also be old tools out there that don't support CFI as well.
  2. If the frame pointer is present (or the stack pointer remains unchanged) using it works today. Don't fix what isn't broke.
  3. If there are any bugs out there that result in an incorrect CFA the machine registers are more likely to be correct as they're actually used by the program and thus issues are more likely to be noticed.

Although it seems the gcc maintainers were not persuaded by any of this. gcc emits DW_OP_call_frame_cfa in every circumstance I can think of (at least on x86-64).

  • If we cannot use it unconditionally, can the condition be more direct, i.e. can we remove the heuristic?

I'm not aware of any reason we cannot use the CFA unconditionally (at least on x86) if it is present. I believe it is possible to remove the setHasCFIAdjustCfa heuristic and use the CFA in place of the stack pointer unconditionally if the above arguments are not persuasive. I believe it's also possible to use the CFA in place of the frame pointer even when it's present, if desired.

  • Is the "frame base" value ABI-stable?

In the debug info? I don't see why it would be. As mentioned above gcc emits DW_AT_frame_base = DW_OP_call_frame_cfa.

If not, is there a strong reason not to rework the variable expressions to instead be CFA-relative, and then just define the "frame base" to always be equal to the CFA? It would avoid the extra offset needed in the current version of the patch.

The reason I didn't do this originally was that I expected it to require updating a massive number of tests, but it turns out git grep DW_OP_fbreg | wc -l has fewer than 100 hits in LLVM so it probably wouldn't be that bad to update the tests. Looking into it a bit more, it would require changes in DwarfExpression::addMachineRegExpression which is 1) already pretty complex and 2) shared across architectures. It's not entirely clear to me what's going on with CFI on other arches, CFIInstrInserter is definitely x86 specific but other arches seem to have at least some CFI stuff in the target specific frame lowering code. Untangling all this seems like a significantly bigger task than merely switching to using the CFA + offset.

khuey added inline comments.Apr 11 2023, 9:17 PM
llvm/lib/Target/X86/X86FrameLowering.cpp
3817–3818

The idea here was that if the frame register is the stack pointer *and* a CFA adjustment was emitted we'd switch to the CFA. This proposal would get rid of the latter, which I discussed in more detail in the top-level comment.

As a bit of a side-note, I think X86 always maintains a "precise CFA" to refer to, but that may not always be true (see https://reviews.llvm.org/D14948).

This is relying on the CFA being correct when it's present. CFI is not always present (see X86FrameLowering::needsDwarfCFI which at a minimum excludes some Win64 stuff) but it might be present in all cases that matter. I'm not sure.

I'm curious about a couple points:

  • If we do always have the CFA available, then is there any reason not to use it unconditionally?

In decreasing order of persuasiveness to me

  1. If the frame pointer is present (or the stack pointer remains unchanged) using it is simpler for the debug info consumer (as there's no need to consult the CFI and run through its state machine). In theory there might also be old tools out there that don't support CFI as well.

Very reasonable, but if GCC is already using the CFA unconditionally it seems likely that implies a pretty good support base?

  1. If the frame pointer is present (or the stack pointer remains unchanged) using it works today. Don't fix what isn't broke.

I definitely think this holds a lot of weight, but just having one concept (CFA) instead of several (stack-pointer, frame-pointer, frame-base, CFA) is also very attractive. The fact that the patch is even necessary also demonstrates something is broken, although it might be the exception that proves the rule.

  1. If there are any bugs out there that result in an incorrect CFA the machine registers are more likely to be correct as they're actually used by the program and thus issues are more likely to be noticed.

If the CFA is incorrect, unwinding seems like it is doomed to fail, which should constitute a program execution bug if e.g. exceptions are enabled, right? I don't necessarily think we will end up with more elusive debug-info bugs as a result of relying on the CFA.

Although it seems the gcc maintainers were not persuaded by any of this. gcc emits DW_OP_call_frame_cfa in every circumstance I can think of (at least on x86-64).

  • If we cannot use it unconditionally, can the condition be more direct, i.e. can we remove the heuristic?

I'm not aware of any reason we cannot use the CFA unconditionally (at least on x86) if it is present. I believe it is possible to remove the setHasCFIAdjustCfa heuristic and use the CFA in place of the stack pointer unconditionally if the above arguments are not persuasive. I believe it's also possible to use the CFA in place of the frame pointer even when it's present, if desired.

I meant to propose another middle-ground, where we use the CFA whenever !hasFP, which (IIUC) is a precondition for x86 ever adjusting the CFA. It would mean we use the CFA more than with the heuristic, but still not unconditionally.

I think I am mostly concerned with the statefulness of the heuristic, coupled with the fact that it seems to not be completely contained. The majority of it is handled by X86FrameLowering, but there is at least the one instance where it has to be explicitly managed in X86MCInstLower. If others are not as concerned with this I'm happy to concede on it, but it is the main concern I have.

  • Is the "frame base" value ABI-stable?

In the debug info? I don't see why it would be. As mentioned above gcc emits DW_AT_frame_base = DW_OP_call_frame_cfa.

If not, is there a strong reason not to rework the variable expressions to instead be CFA-relative, and then just define the "frame base" to always be equal to the CFA? It would avoid the extra offset needed in the current version of the patch.

The reason I didn't do this originally was that I expected it to require updating a massive number of tests, but it turns out git grep DW_OP_fbreg | wc -l has fewer than 100 hits in LLVM so it probably wouldn't be that bad to update the tests. Looking into it a bit more, it would require changes in DwarfExpression::addMachineRegExpression which is 1) already pretty complex and 2) shared across architectures. It's not entirely clear to me what's going on with CFI on other arches, CFIInstrInserter is definitely x86 specific but other arches seem to have at least some CFI stuff in the target specific frame lowering code. Untangling all this seems like a significantly bigger task than merely switching to using the CFA + offset.

I definitely understand the complexity in DwarfExpression::addMachineRegExpression makes any change very difficult. I am happy with continuing to apply the offset to the CFA in the frame base, and if some other brave soul wants to change that in a future patch they won't be any worse off.

My current leaning is to change to using the CFA unconditionally, but offset to maintain the same frame base value. It will result in a slightly larger expression in the default case, but I would be much more confident that the result is always correct.

I would appreciate any other opinions from @aprantl @dblaikie @probinson et al.

khuey added a comment.Apr 12 2023, 4:26 PM

Very reasonable, but if GCC is already using the CFA unconditionally it seems likely that implies a pretty good support base?

Indeed.

I meant to propose another middle-ground, where we use the CFA whenever !hasFP, which (IIUC) is a precondition for x86 ever adjusting the CFA. It would mean we use the CFA more than with the heuristic, but still not unconditionally.

!hasFP is a precondition for x86 emitting CFA offset adjustment directives solely because if there is a frame pointer LLVM simply defines the CFA to be the frame pointer. And then the frame pointer remains the same regardless of adjustments to %rsp, so the CFA doesn't need to be adjusted simultaneously. See the createDefCfaRegister call inside X86FrameLowering::emitPrologue (which is gated on, at the top level, HasFP).

I think I am mostly concerned with the statefulness of the heuristic, coupled with the fact that it seems to not be completely contained. The majority of it is handled by X86FrameLowering, but there is at least the one instance where it has to be explicitly managed in X86MCInstLower. If others are not as concerned with this I'm happy to concede on it, but it is the main concern I have.

The not-being-contained is just because X86::MOVPC32r is really weird.

I definitely understand the complexity in DwarfExpression::addMachineRegExpression makes any change very difficult. I am happy with continuing to apply the offset to the CFA in the frame base, and if some other brave soul wants to change that in a future patch they won't be any worse off.

Yeah I don't really want to do that here but I agree that folding the offset between the CFA and the stack pointer into the variable expressions rather than leaving it in the DW_AT_frame_base expression is the ideal end point.

My current leaning is to change to using the CFA unconditionally, but offset to maintain the same frame base value. It will result in a slightly larger expression in the default case, but I would be much more confident that the result is always correct.

So, to be clear, you're learning towards using a CFA-based DW_AT_frame_base in all three of these scenarios?

  1. There is a CFA, and LLVM is currently using %rsp as the DW_AT_frame_base, and %rsp changes throughout the function.
  2. There is a CFA, and LLVM is currently using %rsp as the DW_AT_frame_base, and %rsp remains constant (after the prologue).
  3. There is a CFA, and LLVM is currently using %rbp as the DW_AT_frame_base.

Where using the CFA for 1 is what this patch as originally proposed does, and 1 + 2 is the "middle ground" you suggested previously, as I understand it.

I think using the CFA for 1 + 2 + 3 makes more sense than doing just 1 + 2, so I think we more or less agree here.

Very reasonable, but if GCC is already using the CFA unconditionally it seems likely that implies a pretty good support base?

Indeed.

I meant to propose another middle-ground, where we use the CFA whenever !hasFP, which (IIUC) is a precondition for x86 ever adjusting the CFA. It would mean we use the CFA more than with the heuristic, but still not unconditionally.

!hasFP is a precondition for x86 emitting CFA offset adjustment directives solely because if there is a frame pointer LLVM simply defines the CFA to be the frame pointer. And then the frame pointer remains the same regardless of adjustments to %rsp, so the CFA doesn't need to be adjusted simultaneously. See the createDefCfaRegister call inside X86FrameLowering::emitPrologue (which is gated on, at the top level, HasFP).

I think I am mostly concerned with the statefulness of the heuristic, coupled with the fact that it seems to not be completely contained. The majority of it is handled by X86FrameLowering, but there is at least the one instance where it has to be explicitly managed in X86MCInstLower. If others are not as concerned with this I'm happy to concede on it, but it is the main concern I have.

The not-being-contained is just because X86::MOVPC32r is really weird.

OK, I may have just been overly concerned about this; my familiarity with the X86 target in LLVM is very limited.

I definitely understand the complexity in DwarfExpression::addMachineRegExpression makes any change very difficult. I am happy with continuing to apply the offset to the CFA in the frame base, and if some other brave soul wants to change that in a future patch they won't be any worse off.

Yeah I don't really want to do that here but I agree that folding the offset between the CFA and the stack pointer into the variable expressions rather than leaving it in the DW_AT_frame_base expression is the ideal end point.

My current leaning is to change to using the CFA unconditionally, but offset to maintain the same frame base value. It will result in a slightly larger expression in the default case, but I would be much more confident that the result is always correct.

So, to be clear, you're learning towards using a CFA-based DW_AT_frame_base in all three of these scenarios?

  1. There is a CFA, and LLVM is currently using %rsp as the DW_AT_frame_base, and %rsp changes throughout the function.
  2. There is a CFA, and LLVM is currently using %rsp as the DW_AT_frame_base, and %rsp remains constant (after the prologue).
  3. There is a CFA, and LLVM is currently using %rbp as the DW_AT_frame_base.

Where using the CFA for 1 is what this patch as originally proposed does, and 1 + 2 is the "middle ground" you suggested previously, as I understand it.

I think using the CFA for 1 + 2 + 3 makes more sense than doing just 1 + 2, so I think we more or less agree here.

I do think my leaning is towards using the CFA for 1 + 2 + 3, but I also appreciate that your patch as-is addresses a real bug that affects debug-info correctness. I am OK with accepting your patch with a FIXME or TODO describing the desire to eventually define the frame base as equal to the CFA.

I would just ask that you give others at least another day to respond. I recognize it has already been months, but one more day would make me a little less concerned that I missed something obvious that the more experienced LLVM debug-info devs would spot :)

This revision is now accepted and ready to land.Apr 13 2023, 11:27 AM

I'm happy to update this to do 1 + 2 + 3 too, it's not hard.

I'm happy to update this to do 1 + 2 + 3 too, it's not hard.

If you don't mind, I do think it is still my preferred approach; I'll leave it up to you

khuey planned changes to this revision.Apr 13 2023, 1:02 PM
khuey updated this revision to Diff 513760.Apr 14 2023, 3:06 PM
khuey retitled this revision from [X86] Use the CFA when appropriate for better variable locations around calls. to [X86] Use the CFA as the DWARF frame base for better variable locations around calls..
khuey edited the summary of this revision. (Show Details)

Use the CFA as the DWARF frame base whenever it's present.

As discussed, this drops the previous heuristic for determining
when we need to use the CFA for accurate results and simply uses
the CFA in all cases.

This revision is now accepted and ready to land.Apr 14 2023, 3:06 PM
khuey requested review of this revision.Apr 14 2023, 3:08 PM
khuey marked 2 inline comments as done.

Other than dbg-baseptr.ll, the test changes are all adjusting for the slightly larger DW_AT_frame_base value.

scott.linder accepted this revision.Apr 17 2023, 1:02 PM

LGTM, thank you!

This revision is now accepted and ready to land.Apr 17 2023, 1:02 PM
khuey added a comment.Apr 17 2023, 2:32 PM

Thanks. I don't have commit access so after whoever else you want to look at it is satisfied someone will need to land this :)

khuey added a comment.May 15 2023, 6:52 AM

Poke. If nobody else is going to review this can someone land it?

jryans accepted this revision.May 15 2023, 7:03 AM
jryans added a subscriber: jryans.

This looks good to me as well. Thanks for working on this! 🙂

I'll land these changes momentarily.

This appears to have broken a bunch of LLDB test — can you please investigate and potentially revert the patch until we found a solution?

https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/55133/changes#d421f5226048e4a5d88aab157d0f4d434c43f208

Thanks for the test failure report, I'll revert this for now then.

I didn't have time to look at the test failure in detail. Since it's in the frame diagnose command, maybe it just needs to be updated for this. Let me know if you need any help with figuring out a path forward!

khuey added a comment.May 15 2023, 6:16 PM

How do I run these tests? make check either doesn't include them, or they pass on my machine.

khuey added a comment.May 15 2023, 6:21 PM

Or maybe I need to explicitly enable lldb first ... I don't see an lldb binary in my build.

Yeah, you’ll need to enable it by giving cmake something like -D LLVM_ENABLE_PROJECTS="lldb". See https://llvm.org/docs/CMake.html#frequently-used-llvm-related-variables for more details.

khuey added a comment.May 16 2023, 5:49 AM

Even after enabling and building lldb make check-lldb-api says all the tests are unsupported.

If you're testing the API you might need -DLLDB_ENABLE_PYTHON=On too which enables the LLDB python bindings. That option is present in the Cmake step of the logs. (Not an LLDB dev, just passing by).

khuey added a comment.May 16 2023, 8:38 AM

Yeah, that gets some of the lldb-api tests to run.

The ones that are failing are still not running though. After looking into a bit they're all Mac only tests, and I don't have a Mac machine.

I'll read the code and see if there's anything obvious that frame diagnose is doing.

khuey added a comment.May 17 2023, 8:31 AM

So this is actually a case of what I said earlier about there possibly being old tools out there that don't know how to cope with CFI. Turns out one of them is this "frame diagnose" feature in lldb.

Based on code inspection what I think is happening is that we're getting to this point https://github.com/llvm/llvm-project/blob/d9610b4a56c532614545eef5995362e99b776535/lldb/source/Expression/DWARFExpression.cpp#L2676. DWARFExpression::MatchesOperand takes the operand from a crashing instruction (e.g. [rbp + 42]) and tries to match that up with the locations of the various variables provided in the DWARF. It does this symbolically, looking for exactly two forms of DWARF expressions in variable locations, DW_OP_regN/x and DW_OP_fbreg <offset> where DW_AT_frame_base is itself DW_OP_regN/x. Because I changed DW_AT_frame_base to be DW_OP_call_frame_cfa DW_OP_consts <offset> DW_OP_plus, nothing on the stack is recognized anymore. There's no code here that knows how to deal with DW_OP_call_frame_cfa.

I don't see any easy way to fix this. The CFA is only available at this point in value form (via StackFrame's StackID m_cfa). Despite the comments, StackFrame::GetFrameBaseExpression really does return the DW_AT_frame_base expression, not the CFA (the author of the comment describing the function seems not to have understood that the frame base and the canonical frame address are not the same thing). With the symbolic form of the CFA available we could recognize the sequence DW_AT_location = DW_OP_fbreg <offset1>, DW_AT_frame_base = DW_OP_call_frame_cfa DW_OP_consts <offset2> DW_OP_plus, CFA = rXX + offset3 and match it to e.g. mov [%rXX + offset4], %rax where offset4 = offset1 + offset2 + offset3, but that would require a bunch of work in lldb to plumb the symbolic form of the CFA up out of the unwinding layer to somewhere where it's available to CommandObjectFrameDiagnose.

Without that, the main alternative I see is to revert back from using the CFA unconditionally to only using it in the cases where it's necessary for correctness. Then most functions out there will continue to have frame pointer expressions that fit the format "frame diagnose" is expecting, and the only functions it won't be able to handle are the ones that currently have wrong locations anyways.

As an aside, is this frame diagnose feature actually used? I've never heard of it, and some cursory googling (e.g. '"frame diagnose" lldb') suggests nobody else has either. The code looks like it's been basically untouched since it was written in 2016. Is there someone who can say it's ok to break it? It's already not working with any code gcc produced in the last decade or so.

It might be worth posting your question on discourse in the LLDB subcategory for greater visibility.

Side note (no action required): I think we might be able to improve variable availability as well as correctness with this patch if we could teach LLVM to use fbreg rather than the SP/BP for location list entries too (see this issue).

Thanks for working on this!

khuey reopened this revision.May 21 2023, 10:59 AM
This revision is now accepted and ready to land.May 21 2023, 10:59 AM
khuey updated this revision to Diff 524120.May 21 2023, 11:26 AM
khuey retitled this revision from [X86] Use the CFA as the DWARF frame base for better variable locations around calls. to [X86] Use the CFA when appropriate for better variable locations around calls..
khuey edited the summary of this revision. (Show Details)

Revert back to the heuristic based approach for using the CFA only when it affects correctness.

I've realized that it's not *just* the frame diagnose tests that fail,
TestStdFunctionStepIntoCallable.py is also failing. Since not all of lldb's features can cope
with CFA-based DW_AT_frame_bases, let's go back to using the CFA only when the locations are
currently wrong. Then we won't be breaking anything that currently works.

khuey requested review of this revision.May 21 2023, 11:26 AM
scott.linder accepted this revision.May 22 2023, 4:00 PM

LGTM, thank you again for attempting the switch to doing this unconditionally! I was clearly wrong about there being no dependence if even an in-tree project is breaking

This revision is now accepted and ready to land.May 22 2023, 4:00 PM
jmorse accepted this revision.May 23 2023, 7:20 AM
jmorse added a subscriber: jmorse.

Sorry for having missed this; it sounds like a good direction to take, and will fix various variable locations.

Just to confirm my understanding, this should have no interaction with the shrink-wrapping optimisation pass because there shouldn't be any stack-stored variables before frame setup occurs, yes?

llvm/test/DebugInfo/X86/stack_adjustments_trigger_cfa_frame_base.ll
143–146

NB -- if these attributes aren't necessary for the test to operate, it's better to remove them to avoid future maintenance burdens

scott.linder added inline comments.May 23 2023, 1:26 PM
llvm/test/DebugInfo/X86/stack_adjustments_trigger_cfa_frame_base.ll
143–146

I removed these before landing!

@khuey let me know if you have concerns with dropping the attributes

khuey added inline comments.May 23 2023, 2:43 PM
llvm/test/DebugInfo/X86/stack_adjustments_trigger_cfa_frame_base.ll
143–146

Should be fine, I believe.

khuey added a comment.May 23 2023, 6:50 PM

Just to confirm my understanding, this should have no interaction with the shrink-wrapping optimisation pass because there shouldn't be any stack-stored variables before frame setup occurs, yes?

I'm not familiar with how that optimization works I don't think it will matter at all since it just changes how the frame base is computed by the debugger. Assuming there are tests for that optimization, they passed on the earlier version of this patch that made the change unconditionally.

skan added a subscriber: skan.May 23 2023, 7:24 PM