This is an archive of the discontinued LLVM Phabricator instance.

Move DBG_VALUE's that depend on loads to after a load if the load is moved due to the pre register allocation ld/st optimization pass
ClosedPublic

Authored by rastogishubham on Mar 2 2023, 9:04 AM.

Details

Summary

The issue here is that there can be a scenario where debug information is lost because of the pre register allocation load store optimization pass, where a load who's result describes the debug infomation for a local variable gets moved below the load and that causes the debug information for that load to get lost.

Example:

Before the Pre Register Allocation Load Store Pass
inst_a
%2 = ld ...
inst_b
DBG_VALUE %2, "x", ...
%3 = ld ...

After the Pass:
inst_a
inst_b
DBG_VALUE %2, "x", ...
%2 = ld ...
%3 = ld ...

The load has now been moved to after the DBG_VAL that uses its result and the debug info for "x" has been lost. What we want is:

inst_a
inst_b
%2 = ld ...
DBG_VALUE %2, "x", ...
%3 = ld ...

Which is what this patch addresses

Diff Detail

Event Timeline

rastogishubham created this revision.Mar 2 2023, 9:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 9:04 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
rastogishubham requested review of this revision.Mar 2 2023, 9:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 9:04 AM

Nice! I appreciate the detailed explanation of the algorithm. I have some suggestions for how to streamline it a bit and potentially make it easier to read inside.

llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2633

typo: DBG_VALUEs (makes grepping easier)

2637

Maybe the entire sentence could be compressed to:

When a load is sunk beyond a DBG_VALUE that is referring to it, the reference to the load in the DBG_VALUE becomes undef, resulting in a loss of debug info.
Example:

2650

// DBG_VALUE undef, "x"

2653

The code below addresses this by moving the DBG_VALUE to the position immediately after the load. Example:

2667

I would move the block detailing the maps until after introducing the general idea of the algorithm, otherwise the reader won't know what problem they're solving. It might be good to very briefly outline the idea first before diving into the implementation. I.e., what are the condition the algorithm is checking for, what is it trying to avoid and why.

2674

maybe put these comments next to the variable declarations?

2681

There's a bunch of filler words that can be dropped without loosing any contents:

The algorithm works in two phases: First RescheduleOps() populates RegisterMap with the registers that were moved.

(Q: The wording is curious because it's loads that define registers that are being moved, maybe we can integrate this)

2686

Can you clarify "we can move forward"?

2686

If the first operand of the DBG_VAL ...
Could we also handle the variadic DBG_VALUE_LIST instructions?

2803

Might consider a SmallDenseMap<8> for better performance in the average case.
If you want to be really scientific about it you could compile a large file with optimizations and print the LocalVarMap.size() at the end of the function and then see if there is a sweet spot of small element size (<16) based on that data. But using 8 is fine, too!

2803

Would a name like DbgValueSinkCandidates be more descriptive? (Not sure if that captures the semantics perfectly).

2804

Same here.

2807

Are there isDebugInstr() that are not isDebugValueLike()?

2818

micro optimization: you're looking up Op.getReg twice here.

auto RegMapEntry = RegisterMap.find(Op.getReg())
if (Op.isReg() && RegMapEntry != RegisterMap.end()) {

RegMapEntry = &MI;
2821

Maybe focus on the high level goal in the documentation here:

If the current DBG_VALUE describes the same variable as one of the in-flight DBG_VALUEs, remove the candidate from the list and set it to undef. Moving one DBG_VALUE past another would result in the variable's value going back in time when stepping through the block in the debugger.

aprantl added inline comments.Mar 3 2023, 10:15 AM
llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2828

double lookup

2830

as you already commented: double lookup

2844

!RegisterMap[Reg]
but also, double lookup

2850

Is this safe? Aren't we going to visit the instruction we just inserted next? I'm not sure how the semantics of the iterator are defined if inserting on the fly.

llvm/test/DebugInfo/ARM/move-dbg-values.mir
5

Nice. Looks like this covers all of the interesting cases?

70

You might be able to simplify the IR here, by just making up new variables with basic types and dropping most of the unnecessary debug info.

This is a good improvement, many thanks for working in this area. The overall approach is good, I've got a couple of high level comments:

  • I believe you'll need to erase elements from LocalVarMap once a DBG_VALUE is sunk -- if a later, unrelated DBG_VALUE for the same variable is encountered in the same block, then there's no need terminate the earlier sunk DBG_VALUE,
  • You should use the DebugVariable class to identify variables, as that combines inlining and fragment information into a single object. Using just DILocalVariable risks duplicate instances of the same variable from different inlining contexts being merged.

Finally, I think that rather than just sinking the DBG_VALUE that used to refer to a load, it might be more correct to duplicate the DBG_VALUE to below the load, and mark the old one as undef/$noreg. Consider:

%0 = load, !dbg !0
DBG_VALUE %0
%1 = foo, !dbg !1
%2 = bar, !dbg !2
%3 = baz, !dbg !3

If we sink the load to below baz, and sink the DBG_VALUE too, we get:

%1 = foo, !dbg !1
%2 = bar, !dbg !2
%3 = baz, !dbg !3
%0 = load, !dbg !0
DBG_VALUE %0

Which means that whatever value the variable used to have at the start of the block extends over the "foo", "bar" and "baz" instructions, which risks giving a misleading variable value on those line numbers. It might be better to produce:

DBG_VALUE $noreg
%1 = foo, !dbg !1
%2 = bar, !dbg !2
%3 = baz, !dbg !3
%0 = load, !dbg !0
DBG_VALUE %0

instead.

llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2656–2657

IMHO, describing this as a "use-before-def" would indicate to the reader exactly why the debug-info is being lost after the pass.

2807

DBG_LABEL, I believe -- isDebugValueLike should be sufficient.

2810–2813

I think there are some tests out there that deliberately give an empty variable to test things like the the MIR parser. IMHO, asserting here is acceptable as any "real" DBG_VALUE must have a variable.

rastogishubham marked 24 inline comments as done.

Addressed all the comments in the review including documentation, adding support for DBG_VALUE_LIST and small bug fixes

llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2650

This is what would happen if we didn't do what we are doing in the patch. The DBG_VALUE would remain untouched

2803

I think that name makes sense

2807

We do not need to move DBG_INSTR_REFS, so I am going to use

isDebugValue()

instead.

2810–2813

Unfortunately that is not the case, when I tried to build compiler-rt with this patch, the assert

assert(DbgVar)

tripped. There is something that is letting DBG_VALUEs have no variable in them slip. I will look into it in a future patch

2850

This is safe, the MBBI is incremented again in the for loop, and this is the end of the loop.

aprantl added inline comments.Mar 28 2023, 3:22 PM
llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2499

nit: static inline is effectively equivalent to static and thus LLVM code usually uses static alone.

Given the ratio of the parameter list and the function body, maybe this makes more sense as a lambda anyway?

2543

What happens if more than one DBG_VALUE use the same register?

int x, y;
x = y = *p;

-->

r0 = load ...
DBG_VALUE r0, 0, "x"
DBG_VALUE r0, 0, "y"
2797

It looks like you use the same pattern of for_each_reg_operand_in_dbg_intrinsic 4 times. Maybe create

forEachDbgRegOperand(MachineInstruction *MI, std::function<void(unsigned)> Fn)

helper function so you can write

forEachDbgRegOperand(MI, [&](unsigned Reg) {
   populateRegisterAndInstrMapForDebugInstr(Reg, RegisterMap, InstrMap, &MI);
 });

?

2803

I think this can just be else with no condition.

2861

if (!isLoadSingle(Opc))

continue

?

rastogishubham marked 5 inline comments as done.Mar 31 2023, 4:36 PM
rastogishubham added inline comments.
llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2499

I am still not used to writing lambdas, so they don't come to mind as a first thing, but I think it makes a lot of sense here, I also replaced

undefDbgValueList

with a lambda

2543

Good testcase, I have addressed that issue now.

2797

This is a great suggestion, thank you. There are actually 6 places :)

2861

Ah yes, llvm likes its early exists, good catch.

rastogishubham marked 4 inline comments as done.

Addressed review feedback:

  1. Added lambdas where necessary.
  1. Added support for multiple DBG_VALUEs mapping to the same virtual register.
  1. minor code formatting and style changes
rastogishubham added inline comments.Mar 31 2023, 4:38 PM
llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2499

Actually, I just removed

undefDbgValueList

I can use

forEachDbgRegOperand

to do the same thing

aprantl added inline comments.Mar 31 2023, 5:04 PM
llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2480

This comment doesn't really tell the reader much that isn't already there in the code.
Shorter version:

// Populate RegisterMap with all Registers defined by loads.
2519

Why aren't you using forEachDbgRegOperand() here?

2522

Commented out code.

2792

You're looking up Reg twice here.

auto &Entry = RegisterMap.find(Reg);
if (Entry != RegisterMap.end())
  Entry.push_back(&MI);
2826

auto &?

2831

does this loop get shorter if you implement it in terms of std::remove_if?

https://en.cppreference.com/w/cpp/algorithm/remove

2844

Maybe comment here what the else branch means, since it's far below from the condition.

2848

Would Instr->getOperand(i).setReg(0) work here and be shorter?

2884

This is the second time you use this pattern. Maybe add a constructor to DebugVariable that takes a MachineInstr as parameter, or a static DebugVariable::createFromDebugValue() method?

2890

Is it really necessary to look it up before erasing it, or can you call straight into erase, too?

rastogishubham marked 10 inline comments as done.Mar 31 2023, 7:15 PM
rastogishubham added inline comments.
llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2519

I missed this one

2826

Registers are just integers with helper functions, it doesn't matter either way, but sure

2831

It reduces it by 2 lines and for some reason it doesn't seem to work?

auto IsDbgVar = [&] (MachineInstr* I) -> bool {
              DebugVariable Var(I);
              return Var == DbgVar;
            };

            std::remove_if(InstrVec.begin(), InstrVec.end(), IsDbgVar);

Does that look right to you? It seems like remove_if doesn't work correctly with only one element?

2848

This will not work, the operand is an immediate and cannot be set to a $noreg by doing

setReg()

the opcode of the operands are different and the only way to change it is to create a new instruction with a $noreg opcode for the first operand and erase the old Instr.

setReg()

calls

getReg()

if the operand is an immediate that will assert

2890

Apparently not, when I do

DbgValueSinkCandidates.erase(DbgValueSinkCandidates.find(DbgVar));

or

auto DbgIt = DbgValueSinkCandidates.find(DbgVar);
DbgValueSinkCandidates.erase(DbgIt);

I get an assert

rastogishubham marked 5 inline comments as done.

Addressed review feedback

Used std::remove_if

aprantl accepted this revision.Apr 3 2023, 4:53 PM

I think I'm happy with how the code looks now, maybe wait a few days to see if @jmorse has any more comments.

llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2783

This might make it easier to see what this look is doing:

for (auto MBBI = MBB->begin(),  E = MBB->end(); MBBI != E; ++MBBI)
2848

Too bad. I would probably factor out a replaceOperand helper here to keep the main algorithm as readable as possible.

2873

You could combine these two lines as:

MachineBasicblock::iterator InsertPos = std::next(MBBI);
This revision is now accepted and ready to land.Apr 3 2023, 4:53 PM
rastogishubham marked 3 inline comments as done.Apr 11 2023, 2:40 PM

Addressed changes suggest by Adrian when he accepted the review

Added new lines to end of test files

This revision was landed with ongoing or failed builds.Apr 12 2023, 12:12 PM
This revision was automatically updated to reflect the committed changes.
MaskRay reopened this revision.Apr 12 2023, 12:58 PM
MaskRay added a subscriber: MaskRay.
MaskRay added inline comments.
llvm/include/llvm/IR/DebugInfoMetadata.h
23 ↗(On Diff #512935)

There is a library layering issue https://llvm.org/docs/CodingStandards.html#library-layering

LLVMCore cannot use LLVMCodeGen headers :)

This revision is now accepted and ready to land.Apr 12 2023, 12:58 PM

I spot this issue with

# curl -s https://raw.githubusercontent.com/chromium/chromium/main/tools/clang/scripts/update.py | python3 - --output-dir=~/Stable

cd utils/bazel
PATH=~/Stable/bin:$PATH bazel-5.0.0 build --config=generic_clang @llvm-project//llvm:opt
In file included from external/llvm-project/llvm/lib/IR/VectorBuilder.cpp:18:
In file included from external/llvm-project/llvm/include/llvm/IR/IntrinsicInst.h:27:
external/llvm-project/llvm/include/llvm/IR/DebugInfoMetadata.h:23:10: fatal error: 'llvm/CodeGen/MachineInstr.h' file not found
#include "llvm/CodeGen/MachineInstr.h"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
rastogishubham marked an inline comment as done.Apr 20 2023, 11:28 AM
rastogishubham added inline comments.
llvm/include/llvm/IR/DebugInfoMetadata.h
23 ↗(On Diff #512935)

Thank you for bringing that to my attention!

rastogishubham marked an inline comment as done.

Removed the include "llvm/CodeGen/MachineInstr.h" in DebugInfoMetadata.h

Added a new test to check to make sure that DBG_VALUEs that use an immediate are not zero-ed out if another DBG_VALUE is seen later

I created a static function

createDebugVariableFromMachineInstr

to create a DebugVariable, because I was not sure if I could add an include to DebugInfoMetadata.h in the MachineInstr.h file.

aprantl added inline comments.Apr 20 2023, 12:42 PM
llvm/test/DebugInfo/ARM/move-dbg-values.mir
69

How did we end up with these anonymous variables, and do we actually need any of them?

Reduced the testcases by removing uncessary debug info

Removed a bunch of DILocations that were essentially the same and could be folded into one

aprantl accepted this revision.Apr 21 2023, 12:26 PM
This revision was automatically updated to reflect the committed changes.