This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly][LiveDebugValues] Handle target index defs
ClosedPublic

Authored by aheejin on Dec 20 2022, 12:59 AM.

Details

Summary

This adds the missing handling for defs for target index operands, as is
already done for registers.

There are two kinds of target indices: local indices and stack operands.

  • Locals are something similar to registers in Wasm-land. For local indices, we can check for local-defining instructions (local.set or local.tee).
  • Wasm is a stack machine, so we have values in certain Wasm value stack location, which change when Wasm instructions produce or consume values. So basically any value-producing instrucion, i.e., instruction with defs, can change values in the Wasm stack. But I think we don't need to worry about this here, because WebAssemblyDebugFixup, which runs right before this analysis, makes sure to insert terminating DBG_VALUE $noreg instructions whenever a stack value gets popped. After WebAssemblyDebugFixup, there shouldn't be any DBG_VALUEs for stack operands that don't have a terminating DBG_VALUE $noreg within the same BB.

So this CL only works on DBG_VALUEs for locals. When we encounter a
local.set or local.tee instructions, we delete DBG_VALUEs for
those target index locations from the open range set, so they will not
be availble in OutLocs. For example,

bb.0:
  successors: %bb.1
  DBG_VALUE target-index(wasm-local) + 2, $noreg, "var", ...
  ...
  local.set 2 ...

bb.1:
; predecessors: %bb.0
  ; We shouldn't add `DBG_VALUE target (wasm-local) + 2 here because
  ; it was killed by 'local.set' in bb.0

After disabling register coalescing at -O1, the average PC ranges covered
for Emscripten core benchmarks is currently 20.6% in the LLVM
tot. After applying D138943 and this CL, the coverage goes up to 57%.
This also enables LiveDebugValues analysis in the Wasm pipeline by
default.

Diff Detail

Event Timeline

aheejin created this revision.Dec 20 2022, 12:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 20 2022, 12:59 AM
Herald added subscribers: pmatos, asb, wingo and 6 others. · View Herald Transcript
aheejin requested review of this revision.Dec 20 2022, 12:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 20 2022, 12:59 AM
aheejin updated this revision to Diff 484178.Dec 20 2022, 1:00 AM

Remove debug printing

@jmorse I have a separate question again, if you don't mind 😅

When there's a register def, this pass doesn't insert DBG_VALUE $noreg right after it to terminate its impact; this just deletes its info from OpenRanges so that it doesn't get propagated to its OutLocs. I did the same for our Wasm target indices, because it looks that's the way this pass works. But do you know why this pass was designed that way? I think adding DBG_VALUE $noreg right after the new def would make debug info more precise within the BB.

And another thing, I didn't add anything to copy handling because Wasm doesn't have copy instructions. (This COPY instruction is a pseudo instruction and should have been lowered into other instructions at this phase.)

aheejin edited the summary of this revision. (Show Details)Dec 20 2022, 10:25 AM

And another thing, I didn't add anything to copy handling because Wasm doesn't have copy instructions. (This COPY instruction is a pseudo instruction and should have been lowered into other instructions at this phase.)

Regarding the latter bit, I think that's fine for wasm. My understanding is that COPY instructions are mostly the product of register allocation (and a few other random special cases in backends) but because wasm doesn't do register allocation the way other targets do (and in any case the set of locals is basically an infinite virtual register file we leave the part of regalloc that typically requires copies to the engine).
I think a similar explanation applies to the discussion on D138943 about memory because Wasm likewise has no spills (no callee-saved or caller-saved registers, etc)

dschuff added inline comments.Dec 20 2022, 4:11 PM
llvm/lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
1657
llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp
208

Technically this function is "wrong" because stack value defs and global.sets are also target index defs. Your commit message explains why we only care about local defs and I agree that's the behavior we want; but it is a bit confusing.
One option would be to implement the "correct" behavior here (i.e. return true for all of those), and then filter out the non-local ones in the caller. But that would be more code and slower for no good reason. The other option would be to change the name of the function to better reflect what we want it to do, but because it's technically target-independent I don't know of a good name. Another option would be to keep the name and behavior the same but put a big comment somewhere with an explanation. None of those sound particularly great, or particularly catastrophic. Thoughts?

aheejin updated this revision to Diff 484424.Dec 20 2022, 4:45 PM
aheejin marked an inline comment as done.

Fix a typo

aheejin added inline comments.Dec 20 2022, 4:54 PM
llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp
208

Doing something in the caller doesn't work because VarLocBasedImpl.cpp is under lib/CodeGen and it cannot access WebAssembly:: namespace or any of its instruction names such as WebAssembly::LOCAL_SET there. We need lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h to access them, and this file is only accessible within lib/Target/WebAassembly. This the reason I added this utility function here in TargetInstrInfo in the first place.

Also, implementing the "correct" behavior means returning true for all instructions with a def, because they produce stack values, which I guess is more than half of Wasm instructions anyway so there's no point of doing that. Also what we want here is a target index and an operand, which only local.*** instructions have.

That leaves us only with the option of changing the function name or putting a big comment blob or both. Do you have any suggestions for a better name?

aheejin updated this revision to Diff 484692.Dec 21 2022, 3:09 PM

Rebase onto main

dschuff added inline comments.Dec 21 2022, 3:36 PM
llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp
208

Yeah it's kind of weird because VarLocBasedImpl is technically target-neutral but has wasm-specific logic; I didn't think about not even being able to access the namespace and instruction names.
But you did just call out the fundamental thing that separates the local-defs we care about from stack defs is that they have explicit operands. Maybe it's enough to call it isExplicitTargetIndexDef and for the blob comment to say that the intention is for this to cover explicit TargetIndex defs that can be tracked by their offset. (It's probably useful for the comment to say a bit more than just that, i.e. maybe also briefly describe the actual use of the function).

Thanks for assurance about the stack-popping being addressed elsewhere -- it might be worth putting a comment in D138943 next to the definition of struct WasmLoc about that being a deliberate non-objective, to enlighten future readers.

The LiveDebugValues bits look fine -- it's quite a small diff,

And another thing, I didn't add anything to copy handling because Wasm doesn't have copy instructions. (This COPY instruction is a pseudo instruction and should have been lowered into other instructions at this phase.)

If there's no movement of values with COPYs / spills / restores in webassembly then that's great, no extra logic is needed. Having to follow multiple machine locations around is an artefact of the limited space available on physical machines.

A note on the performance characteristics -- the set of "live" variable locations that are being tracked are kept in a sparse bitvector, one bit per {Variable,MachineLocation} pair. Some large projects can have hundreds (or thousands) of variables for each machine location, which creates a lot of overhead when updating the sparse bitvector. So, a couple of years ago the sparse bitvector became a coalescing bitvector that groups bits together, and the numerical space of the locations were grouped by machine location to make them coalesce better, improving compile-time performance. There's more explanation at D74986 and that stack.

As you're placing all target index locations in the kWasmLocation location, for large projects there's a risk that the tracking of variable locations won't compress very well, because the bit numbers given for them are randomly spread between variables and target indexes. If you gave each target index it's own u32_location_t, they would group together better and compress more.

This isn't actually anything to care about unless you're dealing with absolutely massive functions with lots of inlining, but I figured I'd explain in case you see poor compile-time performance in this area in the future.

When there's a register def, this pass doesn't insert DBG_VALUE $noreg right after it to terminate its impact; this just deletes its info from OpenRanges so that it doesn't get propagated to its OutLocs. I did the same for our Wasm target indices, because it looks that's the way this pass works. But do you know why this pass was designed that way? I think adding DBG_VALUE $noreg right after the new def would make debug info more precise within the BB.

That's correct -- as it stands, there's a certain amount of redundancy because the DbgEntityHistoryCalculatorclass has to re-compute that information, even though LiveDebugValues already had to do it. I don't really know how things ended up being this way, my best guess is that other parts of LLVM used to propagate variable locations between blocks and did so incorrectly, and DbgEntityHistoryCalculator was needed to find out where variable ranges ended. I think LiveDebugValues was added in 2015, to more accurately propagate variable locations, but DbgEntityHistoryCalculator wasn't removed or re-designed as it was still doing useful things (i.e., building location lists), and there was no reason to change it. So, I think the redundancy is just a case of technical debt.

A more ideal design would be one where LiveDebugValues was an analysis rather than a pass, and it directly produced variable location ranges instead of DBG_VALUE instructions. In functions with lots of inlining and lots of blocks, LiveDebugValues effectively produces the cross product of all locations for all variables in all blocks in DBG_VALUEs, which can be a massive number of instructions, and is really inefficient! I tried a while back to convert it to an analysis, but doing that changed the line tables for some reason that I couldn't pin down X_X...

llvm/include/llvm/CodeGen/TargetInstrInfo.h
2064–2065

Nit: could I suggest writing "target index" as "TargetIndex" to make it clear it's a noun, and an operand kind. That makes it very clear to readers that it's a specific data structure rather than some higher level concept applicable to all targets.

llvm/test/DebugInfo/WebAssembly/live-debug-values.mir
199
aheejin updated this revision to Diff 486427.Jan 4 2023, 4:43 PM
aheejin marked 3 inline comments as done.

Address comments + move code a bit

llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp
208
  • Changed the function name to isExplicitTargetIndexDef.
  • Expanded the comment a little more in TargetInstrInfo.h, and also added more detailed comments in this file (WebAssemblyInstrInfo.cpp), where the WebAssemblyInstrInfo::isTargetIndexDef is defined.
  • In theory we need to add global.set here too, but we don't have global indices at this point because they are relocatable and we address them by names until linking, so we don't have 'offsets' (which are used to store local/global indices) to deal with in LiveDebugValues. And AFAIK we don't really store values that have debug info in wasm globals anyway.
aheejin updated this revision to Diff 486437.Jan 4 2023, 5:12 PM

Add a TODO on kWasmLocation

aheejin added a comment.EditedJan 4 2023, 5:12 PM

@jmorse

Thanks for assurance about the stack-popping being addressed elsewhere -- it might be worth putting a comment in D138943 next to the definition of struct WasmLoc about that being a deliberate non-objective, to enlighten future readers.

I tried that but I was not sure whether it was the best location. I wrote about it in https://reviews.llvm.org/D138943#4027424.

A note on the performance characteristics -- the set of "live" variable locations that are being tracked are kept in a sparse bitvector, one bit per {Variable,MachineLocation} pair. Some large projects can have hundreds (or thousands) of variables for each machine location, which creates a lot of overhead when updating the sparse bitvector. So, a couple of years ago the sparse bitvector became a coalescing bitvector that groups bits together, and the numerical space of the locations were grouped by machine location to make them coalesce better, improving compile-time performance. There's more explanation at D74986 and that stack.

As you're placing all target index locations in the kWasmLocation location, for large projects there's a risk that the tracking of variable locations won't compress very well, because the bit numbers given for them are randomly spread between variables and target indexes. If you gave each target index it's own u32_location_t, they would group together better and compress more.

This isn't actually anything to care about unless you're dealing with absolutely massive functions with lots of inlining, but I figured I'd explain in case you see poor compile-time performance in this area in the future.

Thanks a lot for the info! I tried the current patch with various Emscripten benchmarks, which includes sizable programs, and I didn't notice pathological slowdown for them. But this is certainly possible with larger programs, and I'll probably do this as a followup if the need arises. I added a TODO describing this possibility above the definition of kWasmLocation.

When there's a register def, this pass doesn't insert DBG_VALUE $noreg right after it to terminate its impact; this just deletes its info from OpenRanges so that it doesn't get propagated to its OutLocs. I did the same for our Wasm target indices, because it looks that's the way this pass works. But do you know why this pass was designed that way? I think adding DBG_VALUE $noreg right after the new def would make debug info more precise within the BB.

That's correct -- as it stands, there's a certain amount of redundancy because the DbgEntityHistoryCalculatorclass has to re-compute that information, even though LiveDebugValues already had to do it. I don't really know how things ended up being this way, my best guess is that other parts of LLVM used to propagate variable locations between blocks and did so incorrectly, and DbgEntityHistoryCalculator was needed to find out where variable ranges ended. I think LiveDebugValues was added in 2015, to more accurately propagate variable locations, but DbgEntityHistoryCalculator wasn't removed or re-designed as it was still doing useful things (i.e., building location lists), and there was no reason to change it. So, I think the redundancy is just a case of technical debt.

A more ideal design would be one where LiveDebugValues was an analysis rather than a pass, and it directly produced variable location ranges instead of DBG_VALUE instructions. In functions with lots of inlining and lots of blocks, LiveDebugValues effectively produces the cross product of all locations for all variables in all blocks in DBG_VALUEs, which can be a massive number of instructions, and is really inefficient! I tried a while back to convert it to an analysis, but doing that changed the line tables for some reason that I couldn't pin down X_X...

Thank you for the info and the history! Yeah, I also think this would be better as an analysis. That way, subscribers to this analysis can choose the granularity with which they want to place DBG_VALUE or DBG_VALUE $noreg instructions themselves, depending on the optimization level or and function size.

dschuff accepted this revision.Jan 9 2023, 4:33 PM

It looks like all the feedback has been address for now, so LGTM.
We can always update the comments if we have more discussion.

This revision is now accepted and ready to land.Jan 9 2023, 4:33 PM

@jmorse I would like to land this soon because we have waiting users. If you have remaining comments, I'll address them as follow-ups. Thank you!

jmorse accepted this revision.Jan 10 2023, 1:55 AM

LGTM

This revision was landed with ongoing or failed builds.Jan 10 2023, 9:57 AM
This revision was automatically updated to reflect the committed changes.