This is an archive of the discontinued LLVM Phabricator instance.

Move DBG_VALUE's that depend on loads to after a load if the load is moved due to the pre register allocation ld/st optimization pass
ClosedPublic

Authored by rastogishubham on Mar 2 2023, 9:04 AM.

Details

Summary

The issue here is that there can be a scenario where debug information is lost because of the pre register allocation load store optimization pass, where a load who's result describes the debug infomation for a local variable gets moved below the load and that causes the debug information for that load to get lost.

Example:

Before the Pre Register Allocation Load Store Pass
inst_a
%2 = ld ...
inst_b
DBG_VALUE %2, "x", ...
%3 = ld ...

After the Pass:
inst_a
inst_b
DBG_VALUE %2, "x", ...
%2 = ld ...
%3 = ld ...

The load has now been moved to after the DBG_VAL that uses its result and the debug info for "x" has been lost. What we want is:

inst_a
inst_b
%2 = ld ...
DBG_VALUE %2, "x", ...
%3 = ld ...

Which is what this patch addresses

Diff Detail

Event Timeline

rastogishubham created this revision.Mar 2 2023, 9:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 9:04 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
rastogishubham requested review of this revision.Mar 2 2023, 9:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 9:04 AM

Nice! I appreciate the detailed explanation of the algorithm. I have some suggestions for how to streamline it a bit and potentially make it easier to read inside.

llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2590

typo: DBG_VALUEs (makes grepping easier)

2594

Maybe the entire sentence could be compressed to:

When a load is sunk beyond a DBG_VALUE that is referring to it, the reference to the load in the DBG_VALUE becomes undef, resulting in a loss of debug info.
Example:

2607

// DBG_VALUE undef, "x"

2610

The code below addresses this by moving the DBG_VALUE to the position immediately after the load. Example:

2624

I would move the block detailing the maps until after introducing the general idea of the algorithm, otherwise the reader won't know what problem they're solving. It might be good to very briefly outline the idea first before diving into the implementation. I.e., what are the condition the algorithm is checking for, what is it trying to avoid and why.

2631

maybe put these comments next to the variable declarations?

2638

There's a bunch of filler words that can be dropped without loosing any contents:

The algorithm works in two phases: First RescheduleOps() populates RegisterMap with the registers that were moved.

(Q: The wording is curious because it's loads that define registers that are being moved, maybe we can integrate this)

2643

Can you clarify "we can move forward"?

2643

If the first operand of the DBG_VAL ...
Could we also handle the variadic DBG_VALUE_LIST instructions?

2760

Might consider a SmallDenseMap<8> for better performance in the average case.
If you want to be really scientific about it you could compile a large file with optimizations and print the LocalVarMap.size() at the end of the function and then see if there is a sweet spot of small element size (<16) based on that data. But using 8 is fine, too!

2760

Would a name like DbgValueSinkCandidates be more descriptive? (Not sure if that captures the semantics perfectly).

2761

Same here.

2764

Are there isDebugInstr() that are not isDebugValueLike()?

2775

micro optimization: you're looking up Op.getReg twice here.

auto RegMapEntry = RegisterMap.find(Op.getReg())
if (Op.isReg() && RegMapEntry != RegisterMap.end()) {

RegMapEntry = &MI;
2778

Maybe focus on the high level goal in the documentation here:

If the current DBG_VALUE describes the same variable as one of the in-flight DBG_VALUEs, remove the candidate from the list and set it to undef. Moving one DBG_VALUE past another would result in the variable's value going back in time when stepping through the block in the debugger.

aprantl added inline comments.Mar 3 2023, 10:15 AM
llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2785

double lookup

2787

as you already commented: double lookup

2801

!RegisterMap[Reg]
but also, double lookup

2807

Is this safe? Aren't we going to visit the instruction we just inserted next? I'm not sure how the semantics of the iterator are defined if inserting on the fly.

llvm/test/DebugInfo/ARM/move-dbg-values.mir
5

Nice. Looks like this covers all of the interesting cases?

70

You might be able to simplify the IR here, by just making up new variables with basic types and dropping most of the unnecessary debug info.

This is a good improvement, many thanks for working in this area. The overall approach is good, I've got a couple of high level comments:

  • I believe you'll need to erase elements from LocalVarMap once a DBG_VALUE is sunk -- if a later, unrelated DBG_VALUE for the same variable is encountered in the same block, then there's no need terminate the earlier sunk DBG_VALUE,
  • You should use the DebugVariable class to identify variables, as that combines inlining and fragment information into a single object. Using just DILocalVariable risks duplicate instances of the same variable from different inlining contexts being merged.

Finally, I think that rather than just sinking the DBG_VALUE that used to refer to a load, it might be more correct to duplicate the DBG_VALUE to below the load, and mark the old one as undef/$noreg. Consider:

%0 = load, !dbg !0
DBG_VALUE %0
%1 = foo, !dbg !1
%2 = bar, !dbg !2
%3 = baz, !dbg !3

If we sink the load to below baz, and sink the DBG_VALUE too, we get:

%1 = foo, !dbg !1
%2 = bar, !dbg !2
%3 = baz, !dbg !3
%0 = load, !dbg !0
DBG_VALUE %0

Which means that whatever value the variable used to have at the start of the block extends over the "foo", "bar" and "baz" instructions, which risks giving a misleading variable value on those line numbers. It might be better to produce:

DBG_VALUE $noreg
%1 = foo, !dbg !1
%2 = bar, !dbg !2
%3 = baz, !dbg !3
%0 = load, !dbg !0
DBG_VALUE %0

instead.

llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2613–2614

IMHO, describing this as a "use-before-def" would indicate to the reader exactly why the debug-info is being lost after the pass.

2764

DBG_LABEL, I believe -- isDebugValueLike should be sufficient.

2767–2770

I think there are some tests out there that deliberately give an empty variable to test things like the the MIR parser. IMHO, asserting here is acceptable as any "real" DBG_VALUE must have a variable.

rastogishubham marked 24 inline comments as done.

Addressed all the comments in the review including documentation, adding support for DBG_VALUE_LIST and small bug fixes

llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2607

This is what would happen if we didn't do what we are doing in the patch. The DBG_VALUE would remain untouched

2760

I think that name makes sense

2764

We do not need to move DBG_INSTR_REFS, so I am going to use

isDebugValue()

instead.

2767–2770

Unfortunately that is not the case, when I tried to build compiler-rt with this patch, the assert

assert(DbgVar)

tripped. There is something that is letting DBG_VALUEs have no variable in them slip. I will look into it in a future patch

2807

This is safe, the MBBI is incremented again in the for loop, and this is the end of the loop.

aprantl added inline comments.Mar 28 2023, 3:22 PM
llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2499

nit: static inline is effectively equivalent to static and thus LLVM code usually uses static alone.

Given the ratio of the parameter list and the function body, maybe this makes more sense as a lambda anyway?

2543

What happens if more than one DBG_VALUE use the same register?

int x, y;
x = y = *p;

-->

r0 = load ...
DBG_VALUE r0, 0, "x"
DBG_VALUE r0, 0, "y"
2797

It looks like you use the same pattern of for_each_reg_operand_in_dbg_intrinsic 4 times. Maybe create

forEachDbgRegOperand(MachineInstruction *MI, std::function<void(unsigned)> Fn)

helper function so you can write

forEachDbgRegOperand(MI, [&](unsigned Reg) {
   populateRegisterAndInstrMapForDebugInstr(Reg, RegisterMap, InstrMap, &MI);
 });

?

2803

I think this can just be else with no condition.

2861

if (!isLoadSingle(Opc))

continue

?

rastogishubham marked 5 inline comments as done.Mar 31 2023, 4:36 PM
rastogishubham added inline comments.
llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2499

I am still not used to writing lambdas, so they don't come to mind as a first thing, but I think it makes a lot of sense here, I also replaced

undefDbgValueList

with a lambda

2543

Good testcase, I have addressed that issue now.

2797

This is a great suggestion, thank you. There are actually 6 places :)

2861

Ah yes, llvm likes its early exists, good catch.

rastogishubham marked 4 inline comments as done.

Addressed review feedback:

  1. Added lambdas where necessary.
  1. Added support for multiple DBG_VALUEs mapping to the same virtual register.
  1. minor code formatting and style changes
rastogishubham added inline comments.Mar 31 2023, 4:38 PM
llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2499

Actually, I just removed

undefDbgValueList

I can use

forEachDbgRegOperand

to do the same thing

aprantl added inline comments.Mar 31 2023, 5:04 PM
llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2480

This comment doesn't really tell the reader much that isn't already there in the code.
Shorter version:

// Populate RegisterMap with all Registers defined by loads.
2518

Why aren't you using forEachDbgRegOperand() here?

2521

Commented out code.

2788

You're looking up Reg twice here.

auto &Entry = RegisterMap.find(Reg);
if (Entry != RegisterMap.end())
  Entry.push_back(&MI);
2822

auto &?

2827

does this loop get shorter if you implement it in terms of std::remove_if?

https://en.cppreference.com/w/cpp/algorithm/remove

2840

Maybe comment here what the else branch means, since it's far below from the condition.

2844

Would Instr->getOperand(i).setReg(0) work here and be shorter?

2880

This is the second time you use this pattern. Maybe add a constructor to DebugVariable that takes a MachineInstr as parameter, or a static DebugVariable::createFromDebugValue() method?

2886

Is it really necessary to look it up before erasing it, or can you call straight into erase, too?

rastogishubham marked 10 inline comments as done.Mar 31 2023, 7:15 PM
rastogishubham added inline comments.
llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2518

I missed this one

2822

Registers are just integers with helper functions, it doesn't matter either way, but sure

2827

It reduces it by 2 lines and for some reason it doesn't seem to work?

auto IsDbgVar = [&] (MachineInstr* I) -> bool {
              DebugVariable Var(I);
              return Var == DbgVar;
            };

            std::remove_if(InstrVec.begin(), InstrVec.end(), IsDbgVar);

Does that look right to you? It seems like remove_if doesn't work correctly with only one element?

2844

This will not work, the operand is an immediate and cannot be set to a $noreg by doing

setReg()

the opcode of the operands are different and the only way to change it is to create a new instruction with a $noreg opcode for the first operand and erase the old Instr.

setReg()

calls

getReg()

if the operand is an immediate that will assert

2886

Apparently not, when I do

DbgValueSinkCandidates.erase(DbgValueSinkCandidates.find(DbgVar));

or

auto DbgIt = DbgValueSinkCandidates.find(DbgVar);
DbgValueSinkCandidates.erase(DbgIt);

I get an assert

rastogishubham marked 5 inline comments as done.

Addressed review feedback

Used std::remove_if

aprantl accepted this revision.Apr 3 2023, 4:53 PM

I think I'm happy with how the code looks now, maybe wait a few days to see if @jmorse has any more comments.

llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
2783

This might make it easier to see what this look is doing:

for (auto MBBI = MBB->begin(),  E = MBB->end(); MBBI != E; ++MBBI)
2844

Too bad. I would probably factor out a replaceOperand helper here to keep the main algorithm as readable as possible.

2873

You could combine these two lines as:

MachineBasicblock::iterator InsertPos = std::next(MBBI);
This revision is now accepted and ready to land.Apr 3 2023, 4:53 PM
rastogishubham marked 3 inline comments as done.Apr 11 2023, 2:40 PM

Addressed changes suggest by Adrian when he accepted the review

Added new lines to end of test files

This revision was landed with ongoing or failed builds.Apr 12 2023, 12:12 PM
This revision was automatically updated to reflect the committed changes.
MaskRay reopened this revision.Apr 12 2023, 12:58 PM
MaskRay added a subscriber: MaskRay.
MaskRay added inline comments.
llvm/include/llvm/IR/DebugInfoMetadata.h
23

There is a library layering issue https://llvm.org/docs/CodingStandards.html#library-layering

LLVMCore cannot use LLVMCodeGen headers :)

This revision is now accepted and ready to land.Apr 12 2023, 12:58 PM

I spot this issue with

# curl -s https://raw.githubusercontent.com/chromium/chromium/main/tools/clang/scripts/update.py | python3 - --output-dir=~/Stable

cd utils/bazel
PATH=~/Stable/bin:$PATH bazel-5.0.0 build --config=generic_clang @llvm-project//llvm:opt
In file included from external/llvm-project/llvm/lib/IR/VectorBuilder.cpp:18:
In file included from external/llvm-project/llvm/include/llvm/IR/IntrinsicInst.h:27:
external/llvm-project/llvm/include/llvm/IR/DebugInfoMetadata.h:23:10: fatal error: 'llvm/CodeGen/MachineInstr.h' file not found
#include "llvm/CodeGen/MachineInstr.h"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
rastogishubham marked an inline comment as done.Apr 20 2023, 11:28 AM
rastogishubham added inline comments.
llvm/include/llvm/IR/DebugInfoMetadata.h
23

Thank you for bringing that to my attention!

rastogishubham marked an inline comment as done.

Removed the include "llvm/CodeGen/MachineInstr.h" in DebugInfoMetadata.h

Added a new test to check to make sure that DBG_VALUEs that use an immediate are not zero-ed out if another DBG_VALUE is seen later

I created a static function

createDebugVariableFromMachineInstr

to create a DebugVariable, because I was not sure if I could add an include to DebugInfoMetadata.h in the MachineInstr.h file.

aprantl added inline comments.Apr 20 2023, 12:42 PM
llvm/test/DebugInfo/ARM/move-dbg-values.mir
69

How did we end up with these anonymous variables, and do we actually need any of them?

Reduced the testcases by removing uncessary debug info

Removed a bunch of DILocations that were essentially the same and could be folded into one

aprantl accepted this revision.Apr 21 2023, 12:26 PM
This revision was automatically updated to reflect the committed changes.