This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Use LiveDebugValues analysis
ClosedPublic

Authored by aheejin on Nov 29 2022, 12:40 PM.

Details

Summary

This enables LiveDebugValues analysis for Wasm. DBG_VALUEs expire at
the end of a BB, and this is the analysis extends their lifetime when
possible, greatly increasing the coverage of variable debug info.

Specifically, this removes the current constraint that this analysis is
only used with physical registers, which was first introduced in D18421,
because Wasm uses only virtual registers. I don't think there's anything
inherent in this analysis that only applies to physical registers; it
was just because all targets using this analysis ran this at the end of
their compiliation pipeline, at which point all their vregs had been
allocated, and Wasm's debug info infrastructure was not really set up
yet, so it was not using it.

This adds supports to Wasm-specific target-index operands, defined in
https://github.com/llvm/llvm-project/blob/2166d9529a60d1cdedb733d2e4134c971f0969ec/llvm/lib/Target/WebAssembly/WebAssembly.h#L87-L100.
Among these, TI_LOCAL, TI_LOCAL_INDIRECT, and TI_OPERAND_STACK are
used by Wasm DBG_VALUE instructions.

This does not yet handle mutable target indices, i.e., this does not
terminate a DBG_VALUE for a local index when we encounter a new
local.set or local.tee. It will be implemented as a follow-up.

(I had posted some stats but removed them because they are inaccurate at
this point, because this is not fully implemented yet)

Diff Detail

Event Timeline

aheejin created this revision.Nov 29 2022, 12:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 29 2022, 12:40 PM
Herald added subscribers: pmatos, asb, wingo and 5 others. · View Herald Transcript
aheejin requested review of this revision.Nov 29 2022, 12:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 29 2022, 12:40 PM
aheejin edited the summary of this revision. (Show Details)Nov 29 2022, 12:41 PM
aheejin edited the summary of this revision. (Show Details)
dschuff added inline comments.Nov 29 2022, 2:43 PM
llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
695

does SpillLocation make sense when the loc is a register kind?

743

So I assume this is being removed because we are using VRegs instead of physregs, and VRegs have arbitrary large unsigned values (greater than 1 << 30).
Looking at what that might mean: It seems that LocIndex is relying on this in some way.
Aside from there, the other place register values are checked against kFirstInvalidRegLocation is in getUsedRegs, where it seems to only collect values in the valid physreg range. so it seems that getUsedRegs will never return VRegs. It seems that `getUsedRegs is only used in the case of RegMask machine operand (which IIRC wasm doesn't use and which only make sense for physregs anyway).

So... I guess this seems like overall it should work? It does seem odd that we are using LocIndex but violating one of its primary assumptions. Is there anything else that could potentially go wrong?

2111

would it also be correct to condition this on MO.getReg() being a physical rather than a virtual reg? If so, I think something like that would be clearer than the more general condition that there can't be any vregs at all.

aheejin updated this revision to Diff 478737.Nov 29 2022, 3:08 PM
aheejin marked 2 inline comments as done.

Address a few comments first

aheejin added inline comments.Nov 29 2022, 3:43 PM
llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
695

OMG, I copy-pasted this from SpillLocKind by mistake. Nevermind.

743

I deleted this thinking simply like 1. this assertion failed on Wasm and 2. Wasm used virtual registers, so the assertion failure made sense. But yeah, this doesn't feel very satisfactory after all.

I don't think this will cause any functional problems in other archs. They are all using all physical registers after all. (This analysis runs at the very end of the backend pipeline.) But the funny thing is, Wasm is NOT supposed use RegisterKind at all. What does it even mean when debug info resides within a virtual register (which doesn't really exist in Wasm physically) at the end of compilation? All usable debug info should be either in a local or a stack location, or an immediate. So encountering a (virtual) register here in Wasm basically means it is a dangling debug info, which will be ignored when written to binary.

I'm not sure we can consider this kind of dangling DBG_VALUE hanging around a critical bug to fix. While it is a bug, it can happen quite often, when a pass does not handle its corresponding DBG_VALUE when transforming/deleting/splitting a normal instruction. Preserving as much as DBG_VALUE is more like an optimization problem; not preserving some of them doesn't mean the program should crash.

So in short, I had to disable this to make it run on many Wasm programs, which have some dangling DBG_VALUEs at the end. I think maybe more tidy solution here would be to make the pass ignore register-based DBG_VALUE in case of Wasm, and re-enable this assertion..? But to do that we have to check the architecture at some point, which feels a little ugly to me, but maybe that's better than this.

WDYT?

2111

Makes sense. Changed to check the register itself.

dschuff added inline comments.Nov 29 2022, 4:09 PM
llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
743

Ah, I had forgotten that we aren't using VRegs here either (I guess I didn't realize that this pass runs after explicit locals). I think in that case it does make more sense to just say that this pass should not encounter register values with wasm at all, so the register code doesn't need to handle VRegs. Probably we should have some comment to that effect somewhere.

I guess it's a bit unfortunate that we'll have dangling register locations anyway. Is there a straightforward way to either ignore them explicitly or maybe drop them before or at the beginning of this pass?

aheejin added inline comments.Nov 29 2022, 4:23 PM
llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
743

How about dropping all of them at the end of WebAssemblyDebugFixup pass? It runs at the very end, and it also does tasks related to debug info, so adding something there doesn't seem like a stretch. Or we can add a new pass, but that pass will be tiny.

If you think doing that in WebAssemblyDebugFixup is a good idea, I'll make a CL that does that separately and will land it first before landing this, and re-enable the assertion here.

dschuff added inline comments.Nov 29 2022, 4:50 PM
llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
743

If I'm understanding you right, any DBG_VALUEs that have Register operands after explicit locals are dead and can't be encoded into the binary anyway, right? So I think it makes sense to drop them, and WebAssemblyDebugFixup seems like a reasonable place to me.

jmorse added a subscriber: jmorse.

I see from the inline comments that virtual registers in WASM at this stage may or may not be meaningful -- if we can avoid LiveDebugValues having to encounter virtual registers, that's highly preferred, as it avoids considering a whole load of scenarios that LiveDebugValues isn't designed for. The number space for LocIndex is designed only for physregs and spill slots (physregs from 1 to 2^30, spill slots from 2^30+1 to 2^31-1). Adding the number range for virtual registers would require more testing.

Stepping back a bit: as ever I'm not very familiar with WASM, but are the "target index" locations mutable? The test added covers a few scenarios that more or less amount to SSA construction, i.e. propagation of the variable locations to successor blocks and merging of locations at PHI points. The main purpose of LiveDebugValues however is to track variable locations/values as they move between phyregs and stack slots, and go out of liveness. If the target index locations are immutable, then you can just get away with using existing SSA construction tools in LLVM to work out how to propagate locations. If they're mutable though, you'll need to put logic in transferRegisterDef and similar to identify when the target-index locations are modified, so that variable locations are terminated correctly.

llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.cpp
75–79

Doing this makes sense for WASM, but check that there are no virtual registers remains an important sanity check for other targets -- mixing vregs and physregs would definitely generate broken results. Would you be able to add an assertion somewhere else for everything but WASM? (I don't think this method can know what the target arch is).

aheejin updated this revision to Diff 483442.Dec 15 2022, 11:34 PM
aheejin marked 2 inline comments as done.

Address comments + disable LiveDebugValues analysis by default

llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.cpp
75–79

Added an assertion in the beginning of LiveDebugValues::runOnMachineFunction.

llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
743

Nullified the dangling register-based DBG_VALUEs in D139675, and revived this assertion.

aheejin added a comment.EditedDec 15 2022, 11:57 PM

@jmorse Thanks for your review, and sorry for the late reply!

I see from the inline comments that virtual registers in WASM at this stage may or may not be meaningful -- if we can avoid LiveDebugValues having to encounter virtual registers, that's highly preferred, as it avoids considering a whole load of scenarios that LiveDebugValues isn't designed for. The number space for LocIndex is designed only for physregs and spill slots (physregs from 1 to 2^30, spill slots from 2^30+1 to 2^31-1). Adding the number range for virtual registers would require more testing.

I removed all register-based (= dangling) DBG_VALUEs for Wasm in D139675; WebAssemblyDebugFixup runs right before this analysis. So now we don't have any virtual registers participating here, and I re-enabled the physical register assertion, so we can maintain LocIndex only deals with physical registers.

Stepping back a bit: as ever I'm not very familiar with WASM, but are the "target index" locations mutable? The test added covers a few scenarios that more or less amount to SSA construction, i.e. propagation of the variable locations to successor blocks and merging of locations at PHI points. The main purpose of LiveDebugValues however is to track variable locations/values as they move between phyregs and stack slots, and go out of liveness. If the target index locations are immutable, then you can just get away with using existing SSA construction tools in LLVM to work out how to propagate locations. If they're mutable though, you'll need to put logic in transferRegisterDef and similar to identify when the target-index locations are modified, so that variable locations are terminated correctly.

Thanks for pointing this out. They are mutable, so I should have handled this. There are two kinds of target indices: local indices and stack operands.

  1. Locals are something similar to registers in Wasm-land. For local indices, I think we can check for local-defining instructions (local.set or local.tee).
  2. Wasm is a stack machine, so we have a value in certain Wasm value stack location, which changes whenever Wasm instructions produce or consume values. So basically any value-producing instrucion, i.e., instruction with defs, can change values in Wasm stack. But I think we don't need to worry about this here, because WebAssemblyDebugFixup, which runs right before this analysis, makes sure to insert terminating DBG_VALUE $noreg instructions whenever a stack value gets popped.

I'm working on (1) now. Would you mind if I address it as a follow-up child CL? I disabled LiveDebugValues analysis here so this not-yet-fully-implemented analysis doesn't run on Wasm pipeline, and we only check for my basic test cases here.


By the way, may I ask a separate question? How should I handle defs in memory locations? Should I use SpillLocs? (They currently can have only registers as their base though) But looking at this code, it looks not every store instruction qualifies as a spill. Also you can access the same memory address even with a different register + offset, so it kind of looks all stores can terminate any indirect DBG_VALUEs. How do other targets handle defs in indirect DBG_VALUE locations?

For Wasm, it looks we have a very small number of indirect DBG_VALUEs anyway, so maybe I can start with nullifying them and not dealing with them, as I did for DBG_VALUE_LISTs a while ago in D102999. But I'm generally curious about how memory locations should be handled. Sorry for many questions 😅

aheejin edited the summary of this revision. (Show Details)Dec 16 2022, 12:02 AM
aheejin edited the summary of this revision. (Show Details)
aheejin marked an inline comment as done.
dschuff added inline comments.Dec 16 2022, 9:10 AM
llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.cpp
75–79

I wonder if the ExplicitLocals pass should actually set the NoVRegs property? It should hold true after that pass runs, and I don't think we add any ourselves after that?

llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
2111

This change can be reverted now that we've removed the VRegs, right?

aheejin added inline comments.Dec 16 2022, 1:17 PM
llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.cpp
75–79

I don't think setting NoVRegswill affect the functionality or make the compiler crash or anything, but in the MIR sense, we still have virtual registers after ExplicitLocals, no? We just insert local.get/sets so that they can be translated to stack machine Wasm instructions later.

Before ExplicitLocals:

%0 = i32.const 3
...
use %0

After ExplicitLocals:

%0 = i32.const 3
local.set 5
...
%1 = local.get 5
use %1

So the MIR after ExplicitLocals still contains %0 and %1, which are virtual registers, even though they will not be materialized in the final Wasm code. So I'm not sure if we should set NoVRegs after that.

llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
2111

Not really, at least not without touching other parts. This function is called on all instructions, not only DBG_VALUEs, so our virtual registers enter here. They will not play any role in the rest of the pass though. Apparently DefinedRegs which are collected in this function is only used in recordEntryValue function, and we don't use entry values. But anyway without this if we will crash.

On the other hand, the other assertion I had removed and revived, is in VarLoc::insert, and only runs on registers in DBG_VALUEs, which we removed, so we can have that assertion.

dschuff accepted this revision.Dec 16 2022, 1:40 PM

This looks good to me overall with the plan for followup. Maybe we can wait to land for a bit to see if @jmorse has any followup comments.

llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.cpp
75–79

Ah, right I had forgotten that the register operands aren't removed until MCInstLower. I guess any code that runs after explicit locals just has to be careful not to mess up the relationship between the local gets/sets and their associated operands.

This revision is now accepted and ready to land.Dec 16 2022, 1:40 PM
aheejin retitled this revision from [WebAssembly] Enable LiveDebugValues analysis to [WebAssembly] Use LiveDebugValues analysis.Dec 19 2022, 3:33 PM
aheejin updated this revision to Diff 484114.Dec 19 2022, 3:33 PM

Test function name changes

I removed all register-based (= dangling) DBG_VALUEs for Wasm in D139675; WebAssemblyDebugFixup runs right before this analysis. So now we don't have any virtual registers participating here, and I re-enabled the physical register assertion, so we can maintain LocIndex only deals with physical registers.

Excellent, that keeps the pass nice and simple,

I'm working on (1) now. Would you mind if I address it as a follow-up child CL? I disabled LiveDebugValues analysis here so this not-yet-fully-implemented analysis doesn't run on Wasm pipeline, and we only check for my basic test cases here.

SGTM -- to state the obvious, it means that the analysis won't realise that over-written locals means the variable location should terminate, and they might be propagated into successor blocks. (And it should be fixed by the follow up).

In terms of this patch, I think it wants metadata node-number-capturing (see inline comment on test) but otherwise LGTM.

By the way, may I ask a separate question? How should I handle defs in memory locations? Should I use SpillLocs? (They currently can have only registers as their base though) But looking at this code, it looks not every store instruction qualifies as a spill. Also you can access the same memory address even with a different register + offset, so it kind of looks all stores can terminate any indirect DBG_VALUEs. How do other targets handle defs in indirect DBG_VALUE locations?

For Wasm, it looks we have a very small number of indirect DBG_VALUEs anyway, so maybe I can start with nullifying them and not dealing with them, as I did for DBG_VALUE_LISTs a while ago in D102999. But I'm generally curious about how memory locations should be handled. Sorry for many questions 😅

I'm very glad to answer debug-info questions -- by "defs in memory locations" do you mean that an instruction produces a value, not in a register, but at a memory location? There are some situations where this can be tracked on other targets, but it's messy, and we don't try to do it for heap memory because any pointer can alias it. There are two situations where memory transfers are tracked:

  • For local stack spill slots, the LiveDebugVariables pass receives information during register allocation indicating where variable values are going, and indirect DBG_VALUEs are inserted whenever spills/restores occur.
  • LiveDebugValues can recognise spill-and-restores instructions and transfer variables locations into spill slots, and back out, in case there are spills not recognised by LiveDebugVariables

As far as I understand it, the situation that LLVM handles very well is where (on x86 for example) a definition happens in a register that is then spilled, and then the spill is fused into the instruction, for example:

$rax = addrr64 $rax, $rdx           <---- Some arithmetic instruction,
MOV64mr $rsp, ???, 16, killed $rax  <---- A stack spill
DBG_VALUE %stack.0, 0               <---- indirect DBG_VALUE inserted by LiveDebugVariables

Is fused into:

addmr64 $rsp, ???, 16, killed $rdx  <---- Arithmetic adding into a stack slot,
DBG_VALUE %stack.0, 0               <---- indirect DBG_VALUE inserted by LiveDebugVariables

I'm not aware of any other situation that LLVM can handle where a definition happens in memory -- and critically, the above code sequence is one where the memory location used to explicitly come from a spill instruction, but is then optimised further, so it's not a piece of information that is explicitly tracked. If that fusing happens during instruction selection and the value is defined into a regular local variable on the stack, then today it isn't be tracked.

Seeing how you've got a lot of control of the wasm pipeline, I imagine it's possible to add instrumentation and (perhaps) identify scenarios where definitions to stack-like memory happen and produce indirect DBG_VALUEs. That might be prohibitively complicated though.

This might be the point where I need to plug my re-designed variable tracking scheme, a summary is in this talk: https://www.youtube.com/watch?v=yxuwfNnp064 , you can find references to "instruction referencing" on the mailing list / discourse. The core idea is to number each instruction, and each operand of that instruction, connect variable information with those numbers and leave variable information in SSA form. Then at the end of compilation to re-analyse the code and find the machine locations for values. This means that you can describe a value as "The memory operand of instruction number $n" such as in this test [0]. The analysis at the end of codegen relies on being able to "see" all memory writes, so it too only works for stack slots. If there's something analogous in wasm then perhaps it could be fitted to that.

[0] https://github.com/llvm/llvm-project/blob/416fd03708d49b66900ef629c733fbbad2a602e2/llvm/test/DebugInfo/MIR/InstrRef/memory-operand-folding.mir

llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
2111

This change won't hinder other targets, however you mention that virtual register don't participate in LiveDebugValues, so they don't need to be in the defined register collection AFAIUI. It might be cleaner to add the isPhysicalRegister condition to the if so that virtual registers are completely ignored.

llvm/test/DebugInfo/WebAssembly/live-debug-values.mir
2–3

Could you make the CHECK lines a bit more robust by capturing the metadata number and comparing against that -- if you look at other debug-info tests for dbg.values you'll see they usually capture like:

![[VAR:[0-9+]]] = DILocalVariable(name: "foo"
CHECK: dbg.value({{.*}}, ![[VAR]]...

or something. This saves us bother in the future if the IR module gets auto-upgraded, the metadata renumbers, and then the metadata node numbers don't match what's in the CHECK lines.

aheejin updated this revision to Diff 484418.Dec 20 2022, 4:30 PM
aheejin marked 2 inline comments as done.

Address comments

llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
2111

Hoisted the isPhysicalRegister condition so that it's now this

for (const MachineOperand &MO : MI.operands())
  if (MO.isReg() && MO.isDef() && MO.getReg() &&
      Register::isPhysicalRegister(MO.getReg())) {

@jmorse Thanks for the answer for the spill location tracking! Will check your dev meeting talk too. I saw x86 is using an alternative implementation of LDV that uses DBG_INSTR_REF, but hasn't had a chance to catch up on that in detail.

aheejin updated this revision to Diff 484420.Dec 20 2022, 4:37 PM

Add braces

@jmorse Do you have any remaining concerns for landing this patch? (which will be followed up by D140373)

aheejin added a comment.EditedJan 4 2023, 4:41 PM

@jmorse Oh, I found you suggested to add comments here in D140373.

Thanks for assurance about the stack-popping being addressed elsewhere -- it might be worth putting a comment in D138943 next to the definition of struct WasmLoc about that being a deliberate non-objective, to enlighten future readers.

I tried that, but I'm not sure if this is the best location to put this kind of target-specific comment. This is what I've tried:

--- a/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
+++ b/llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
@@ -306,7 +306,12 @@ private:
 
     // Target indices used for wasm-specific locations.
     struct WasmLoc {
-      int Index; // One of TargetIndex values defined in WebAssembly.h
+      // One of TargetIndex values defined in WebAssembly.h. We deal with
+      // local-related TargetIndex in this analysis (TI_LOCAL and
+      // TI_LOCAL_INDIRECT). Stack operands (TI_OPERAND_STACK) will be handled
+      // separately WebAssemblyDebugFixup pass, and we don't associate debug
+      // info with values in global operands (TI_GLOBAL_RELOC) at the moment.
+      int Index;
       int64_t Offset;
       bool operator==(const WasmLoc &Other) const {
         return Index == Other.Index && Offset == Other.Offset;

This is basically the copy of what I said in WebAssemblyInstrInfo.cpp in D140373. That file is within lib/Target/WebAssembly, so I feel like I can elaborate on as many target-specific things as I want. But I'm not so sure about VarLocBasedImpl.cpp, which is target-independent, and especially when we don't need to distinguish anything about local and stack operands within that file. (We treat all target indices in the same way in the file; the part that handles the difference is encapsulated within

if (!TII->isExplicitTargetIndexDef(MI, Index, Offset))                         
  return;

which has its own detailed comments in WebAssemblyInstrInfo.cpp. This part is added in D140373; it's not here yet.

I haven't committed this diff yet, because of this concern. Please let me know if you think this is a good idea there.

@jmorse I would like to land this soon because we have waiting users. If you have remaining comments, I'll address them as follow-ups. Thank you!

jmorse accepted this revision.Jan 10 2023, 1:47 AM

This all LGTM; if you're happy with the comments for struct WasmLoc then that's fine too.

This revision was landed with ongoing or failed builds.Jan 10 2023, 9:55 AM
This revision was automatically updated to reflect the committed changes.