This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo][Dexter] Speculated BB presents illegal variable value to debugger
ClosedPublic

Authored by CarlosAlbertoEnciso on Sep 12 2018, 4:45 AM.

Details

Summary

For the below test:

int main() {
  volatile int foo = 4;
  int read = foo;
  int read1 = foo;

  int result = 0;
  if (read == 4) {
    result = read1 + 2;
  } else {
    result = read1 - 2;
  }

  return result;
}

Debug location information for 'result' is wrong with -O2.

When in the debugger, on the line "if (read == 4)", the value of "result"
is reported as '2', where it should be zero, as described by:

PR38763

Diff Detail

Event Timeline

couple of nitpicks.

lib/Transforms/Utils/Local.cpp
2528

I'd use for (auto &DII : DbgUsers) here. The type is obvious from the declaration of DbgUsers. This should make this line a little easier on the eyes.

2528

is DII the correct name for the iterator here? should this be DVI inline with the name of the type used in the declaration of DbgUsers?

That is : DVI for DbgVariableIntrinsic.

aprantl accepted this revision.Sep 12 2018, 8:27 AM
aprantl added inline comments.
test/CodeGen/X86/pr38763.ll
2

Sine you are testing SimplifyCFG, it would be much better to just run that one pass via opt and check the result immediately after it.

8

This is not a helpful description of the test. It would be easier to understand what the test is testing for if it said something like: "When SimplifyCFG folds two PHI instructions into a conditional instruction, the debug info becomes ambiguous..." (not sure if that is what actually happens but you get the idea.)

This revision is now accepted and ready to land.Sep 12 2018, 8:27 AM
aprantl requested changes to this revision.Sep 12 2018, 8:27 AM
This revision now requires changes to proceed.Sep 12 2018, 8:27 AM

This test confuses me. The function you are modifying is called FoldTwoEntryPHINode, but there is not a single PHI node in the input?

It still isn't clear to me why you need to drop the debug info in the first place (instead of e.g., making it point to the select, or leave it on its operands) but I hope that that will become obvious once I see the actual testcase.

probinson added inline comments.Sep 12 2018, 9:08 AM
lib/Transforms/Utils/Local.cpp
2525

The function name and the comment in the .h file made me think this is removing the !dbg metadata from the specified instruction. But that's not at all what it does.

People generally don't use the term 'metadata' to refer to the debug intrinsic instructions. Maybe dropDebugUsers would be a better name.

lib/Transforms/Utils/Local.cpp
2525

Good observation. Changing the name to dropDebugUsers.

2528

Changing the loop to for (auto &DII : DbgUsers).

2528

I can see your point.

The SmallVector<DbgVariableIntrinsic *, 1> is used in quite few other places and it seems the DII is the standard name used for the iterator.

I would prefer to follow that name convention.

dblaikie added inline comments.Sep 13 2018, 9:24 AM
lib/Transforms/Utils/Local.cpp
2528

One tweak on this - if you're deducing a pointer with 'auto', LLVM style prefers to be explicit about that, so write "auto *X" rather than "auto &X" (assuming a reference isn't actually needed here, which it seems not)

It still isn't clear to me why you need to drop the debug info in the first place (instead of e.g., making it point to the select, or leave it on its operands) but I hope that that will become obvious once I see the actual testcase.

Thanks for your feedback. I will review the test case and upload a new patch.

Address issues raised by the reviewers.

  • Change the test case to use opt and the specific 'SimplifyCFG' pass.
  • Rename 'dropDebugMetadata' as 'dropDebugUsers'.
  • Use range-based for loop.
CarlosAlbertoEnciso marked 7 inline comments as done.Sep 14 2018, 7:23 AM
CarlosAlbertoEnciso added inline comments.
lib/Transforms/Utils/Local.cpp
2528

@dblaikie,

I am sorry I missed your tweak. I am uploading a patch that includes it.

test/CodeGen/X86/pr38763.ll
2

The new test case, run only the SimplifyCFG pass via opt.

8

Updated test description.

CarlosAlbertoEnciso marked 2 inline comments as done.Sep 14 2018, 7:27 AM

Address issues raised by the reviewers.

  • Change the test case to use opt and the specific 'SimplifyCFG' pass.
  • Rename 'dropDebugMetadata' as 'dropDebugUsers'.
  • Use range-based for loop.
  • Use pointer for the iterator.
CarlosAlbertoEnciso marked 2 inline comments as done.Sep 14 2018, 7:37 AM
CarlosAlbertoEnciso marked 2 inline comments as done.Sep 14 2018, 7:42 AM
vsk added a comment.Sep 14 2018, 12:10 PM

Thanks, this is looking pretty close. A few comments inline.

include/llvm/Transforms/Utils/Local.h
446

Please omit 'machine' here, as MachineInstr and Instruction are distinct.

test/CodeGen/X86/pr38763.ll
37

For the test to be stringent, it needs to check that there these dbg.value's don't appear:

dbg.value(%add, !16, ...)
dbg.value(%sub, !16, ...)

One way to do that is to require exact next-line matches. E.g:

CHECK: %add = ...
CHECK-NOT: dbg.value(metadata i32 %add
CHECK-NEXT: %sub = ...
CHECK-NOT: dbg.value(metadata i32 %sub
CHECK-NEXT: %result.0 = select ...
CHECK-NEXT: d.v(%result.0, ...)
87

Are these lifetime markers needed to reproduce the issue?

94

Is the test still effective if you strip out all these attributes (i.e, remove '#0' etc from the file)? If so, doing that is a good way to make the test more maintainable.

Having read the real testcase now, I understand why you are dropping the debug info now. I would like to point out that we could do better. If we extended llvm.dbg.value to take more than one LLVM SSA Value, then we could produce a DIExpression that selects either %add or %sub depending on the value of %cmp. Let me know if you are interested in implementing this, it should be relatively straight forward to do and would come in useful in all sorts of other situations, too.

aprantl added inline comments.Sep 14 2018, 1:55 PM
test/CodeGen/X86/pr38763.ll
114

You can strip out all variables that aren't necessary for the test, it makes it easier to read and maintain in the future.

123

You can further simplify the testcase by just using single (or just fewer) DILocation for everything.

CarlosAlbertoEnciso marked 6 inline comments as done.Sep 17 2018, 7:16 AM

Having read the real testcase now, I understand why you are dropping the debug info now. I would like to point out that we could do better. If we extended llvm.dbg.value to take more than one LLVM SSA Value, then we could produce a DIExpression that selects either %add or %sub depending on the value of %cmp. Let me know if you are interested in implementing this, it should be relatively straight forward to do and would come in useful in all sorts of other situations, too.

That is a very good idea and I am interested in implementing it. A good opportunity to improve my knowledge in that area. I can see the associated benefits.

I would suggest if that is OK with you, to finish with the current issue. And then as a general improvement, to revisit the 'select' case as an independent issue, to implement what you have described. If you feel otherwise, I would proceed with the changes proposed to the 'llvm.dbg.value'.

In the meantime, I am updating the patch.

include/llvm/Transforms/Utils/Local.h
446

Removed 'machine'.

test/CodeGen/X86/pr38763.ll
37

Thanks for showing me how to improve the test. Adding the 'CHECK-NOT' entries.

87

Those markers are not needed. Removing them.

94

Able to reproduce the issue without those attributes.

114

Stripped out variables that are not necessary.

123

Remove unnecessary DILocation entries.

CarlosAlbertoEnciso marked 11 inline comments as done.Sep 17 2018, 7:17 AM

Address issues raised by the reviewers.

  • Improve the test case by removing unnecessary variables and DILocation entries.
  • Depending on @aprantl decision, a second step would be extending 'llvm.dbg.value' to deal with the 'select' case.

Having read the real testcase now, I understand why you are dropping the debug info now. I would like to point out that we could do better. If we extended llvm.dbg.value to take more than one LLVM SSA Value, then we could produce a DIExpression that selects either %add or %sub depending on the value of %cmp. Let me know if you are interested in implementing this, it should be relatively straight forward to do and would come in useful in all sorts of other situations, too.

That is a very good idea and I am interested in implementing it. A good opportunity to improve my knowledge in that area. I can see the associated benefits.

I would suggest if that is OK with you, to finish with the current issue. And then as a general improvement, to revisit the 'select' case as an independent issue, to implement what you have described. If you feel otherwise, I would proceed with the changes proposed to the 'llvm.dbg.value'.

Sure. Dropping the debug info is always the safe/correct option and will improve the status quo. It would be good to leave a comment in the that explains why we are dropping it and that properly handling it needs a missing feature in LLVM IR.

In the meantime, I am updating the patch.

Having read the real testcase now, I understand why you are dropping the debug info now. I would like to point out that we could do better. If we extended llvm.dbg.value to take more than one LLVM SSA Value, then we could produce a DIExpression that selects either %add or %sub depending on the value of %cmp. Let me know if you are interested in implementing this, it should be relatively straight forward to do and would come in useful in all sorts of other situations, too.

That is a very good idea and I am interested in implementing it. A good opportunity to improve my knowledge in that area. I can see the associated benefits.

Hi Carlos, I had this exact conversation with Tom Weaver last week. Extending llvm.dbg.value to handle more complex expressions will allow huge steps forward in making variable values visible (if not always modifiable). He was thinking more in terms of loop induction variables however. Maybe you two can collaborate on the IR part of this.

I also sketched an implementation over in https://bugs.llvm.org/show_bug.cgi?id=38763.

CarlosAlbertoEnciso added a comment.EditedSep 17 2018, 12:11 PM

@aprantl and @probinson,

Based on your views and comments, I am happy to "change" the current fix towards implementing the extension of 'llvm.dbg.value', as it seems it would be more beneficial for a better debug information.

The added comments (https://bugs.llvm.org/show_bug.cgi?id=38763) by @aprantl, are a good starting point.

@vsk, how do you feel about moving the current proposed patch, to be in terms of extending the 'llvm.dbg.value'?

vsk added a comment.Sep 17 2018, 2:06 PM

@aprantl and @probinson,

Based on your views and comments, I am happy to "change" the current fix towards implementing the extension of 'llvm.dbg.value', as it seems it would be more beneficial for a better debug information.

The added comments (https://bugs.llvm.org/show_bug.cgi?id=38763) by @aprantl, are a good starting point.

@vsk, how do you feel about moving the current proposed patch, to be in terms of extending the 'llvm.dbg.value'?

I expect that extending dbg.value in this way will be fairly involved. There are a couple questions worth thinking about, imo:

  • Would the extra SSA values need to be constants (or live in registers)? If not, how will their locations be represented in a TAG_variable entry?
  • What's the expected memory size overhead of adding an extra operand to dbg.value? Does it need to be counterbalanced by introducing dbg.value.simple() [for the case where there is an empty DIExpression/extended SSA stack]?
  • How will LiveDebugVariables determine where to insert DBG_VALUEs into the MI stream when there are extra SSA values? How do you ensure that the live intervals of those values aren't disjoint?

I think we should separate out the task of investigating these questions (& maybe collecting some numbers with a prototype) from driving this patch forward.

My preference would be to land this patch as-is.

test/CodeGen/X86/pr38763.ll
2

Do you need to pipe in the input ('< %s') here? I assumed that opt would write its output to %s.ll otherwise.

In D51976#1237392, @vsk wrote:

I expect that extending dbg.value in this way will be fairly involved. There are a couple questions worth thinking about, imo:

  • Would the extra SSA values need to be constants (or live in registers)? If not, how will their locations be represented in a TAG_variable entry?
  • What's the expected memory size overhead of adding an extra operand to dbg.value? Does it need to be counterbalanced by introducing dbg.value.simple() [for the case where there is an empty DIExpression/extended SSA stack]?

Metadata already knows how to do lists, I'd suggest that rather than a variable-length argument list to the intrinsic (if there aren't other considerations that make this problematic).

  • How will LiveDebugVariables determine where to insert DBG_VALUEs into the MI stream when there are extra SSA values? How do you ensure that the live intervals of those values aren't disjoint?

The values that feed into a DIExpression must dominate the instruction referencing the DIExpression (he said casually waving his hands).
Whether that's easy or hard to enforce is a different question.

I think we should separate out the task of investigating these questions (& maybe collecting some numbers with a prototype) from driving this patch forward.

And the design discussion ought to be on llvm-dev rather than buried in a code review....

My preference would be to land this patch as-is.

+1

CarlosAlbertoEnciso marked an inline comment as done.Sep 18 2018, 1:22 AM
CarlosAlbertoEnciso added inline comments.
test/CodeGen/X86/pr38763.ll
2

The pipe is not required. opt write its output to the standard output.
I had look at the test cases that use 'opt' and there are 2 ways to invoke it:

(1) RUN: opt <options> < %s

and

(2) RUN: opt <options> %s

The most common format is (1). Changing to use that format.

CarlosAlbertoEnciso updated this revision to Diff 165902.EditedSep 18 2018, 1:28 AM
CarlosAlbertoEnciso marked an inline comment as done.

The general preference seems to land this patch as-is.

This patch addresses the outstanding issues raised by the reviewers.

  • Use pipe in the opt command line.
CarlosAlbertoEnciso marked an inline comment as done.Sep 18 2018, 1:29 AM
probinson added inline comments.Sep 18 2018, 6:25 AM
test/CodeGen/X86/pr38763.ll
2

FTR we prefer < %s because that way the input path doesn't appear in the output. The input path can end up causing incorrect CHECK matches, depending on how the checks are written.

test/CodeGen/X86/pr38763.ll
2

Hi Paul,

Thanks for the explanation about the preference on < %s.

vsk accepted this revision.Sep 18 2018, 11:03 AM

Thanks, LGTM. @aprantl?

aprantl accepted this revision.Sep 18 2018, 11:18 AM
This revision is now accepted and ready to land.Sep 18 2018, 11:18 AM

Thanks very much to all the reviewers for your valuable comments and suggestions.

This revision was automatically updated to reflect the committed changes.