This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Use TargetIndex operands in DbgValue to track WebAssembly operands locations
ClosedPublic

Authored by yurydelendik on Sep 27 2018, 2:48 PM.

Details

Summary

Extends DWARF expression language to express locals/globals locations. (via
target-index operands atm) (possible variants are: non-virtual registers
or address spaces)

The WebAssemblyExplicitLocals can replace virtual registers to targertindex operand type at the time when WebAssembly backend introduces {get,set,tee}_local instead of
corresponding virtual registers.

Diff Detail

Event Timeline

yurydelendik created this revision.Sep 27 2018, 2:48 PM

There was a question about how easy it will be to reuse or modify LLDB to use with such extension (DW_OP_WASM_location). I looked at DWARFExpression::Evaluate and, I think, it will not be a problem to add support for that. It might be even easier than try to emulate registers mapping to some WASM local or stack operand.

I added DWARF data visualization is the wasm file to the https://wasdk.github.io/wasmcodeexplorer/. Here is a comparison of before and after the patch:

Can you explain how this new operation is meant to work?

include/llvm/BinaryFormat/Dwarf.def
626 ↗(On Diff #172431)

0xf6
Did you check against https://sourceware.org/elfutils/DwarfExtensions to see if this is is available?

include/llvm/BinaryFormat/Dwarf.h
62 ↗(On Diff #172431)

Is this actually WASM-specific or is LLVM the only implementation and you could use LLVM instead?

Can you explain how this new operation is meant to work?

This particular patch is trying to "save" debug values during the WebAssemblyExplicitLocals pass. Currently they are lost because virtual registers are not replaced. The patch replaces virtual registers with target-index(), that are correspond to the introduced WebAssembly locals, and then with WebAssembly DWARF extensions operators during serialization. (This patch is a first step and it does not fix all WebAssembly passes yet.)

include/llvm/BinaryFormat/Dwarf.h
62 ↗(On Diff #172431)

Stack operands, locals and globals are specific to WebAssembly target. (These can be mapped to known primitives such as registers or specific address space, but will require more effort on producer and consumer sides IMHO)

The plan is to make it non-LLVM specific.

Change DW_OP_WASM_location op to 0xED.

0xF6 is taken per https://sourceware.org/elfutils/DwarfExtensions

yurydelendik marked 2 inline comments as done.Nov 9 2018, 1:23 PM
yurydelendik added inline comments.
include/llvm/BinaryFormat/Dwarf.def
626 ↗(On Diff #172431)

0xf6 is taken, using 0xed instead

yurydelendik marked an inline comment as done.Dec 18 2018, 6:37 AM

Can you explain how this new operation is meant to work?

@aprantl, here is some description of what we are trying to achieve: https://github.com/WebAssembly/debugging/issues/1#issuecomment-448237641

In the nutshell, the WebAssembly extensions for the DWARF expression will allow us to express WebAssembly specific locations such as for locals, globals and operands stack. The consumers of this information will be limited to LLVM tools, web browser debuggers and AOT/JIT WebAssembly compilers.

yurydelendik retitled this revision from [WIP] [WebAssembly] Add DBG_VALUE with local operands location in WebAssemblyExplicitLocals pass to [WebAssembly] Add DBG_VALUE with local operands location in WebAssemblyExplicitLocals pass.Dec 18 2018, 6:38 AM

Can you explain how this new operation is meant to work?

@aprantl, here is some description of what we are trying to achieve: https://github.com/WebAssembly/debugging/issues/1#issuecomment-448237641

In the nutshell, the WebAssembly extensions for the DWARF expression will allow us to express WebAssembly specific locations such as for locals, globals and operands stack. The consumers of this information will be limited to LLVM tools, web browser debuggers and AOT/JIT WebAssembly compilers.

Does WebAssembly have a concept of registers? You could just define three additional register numbers for WebAssembly and address a stack slot as DW_OP_breg [wasm_stack_register] + 3 etc.. This would need less support in all the involved tools. The link says that it was considered, but rejected, but there is no mention as to why.

Can you explain how this new operation is meant to work?

@aprantl, here is some description of what we are trying to achieve: https://github.com/WebAssembly/debugging/issues/1#issuecomment-448237641

In the nutshell, the WebAssembly extensions for the DWARF expression will allow us to express WebAssembly specific locations such as for locals, globals and operands stack. The consumers of this information will be limited to LLVM tools, web browser debuggers and AOT/JIT WebAssembly compilers.

Does WebAssembly have a concept of registers? You could just define three additional register numbers for WebAssembly and address a stack slot as DW_OP_breg [wasm_stack_register] + 3 etc.. This would need less support in all the involved tools. The link says that it was considered, but rejected, but there is no mention as to why.

There is no registers in wasm. There are (somewhat) unlimited numbers of locals and stack depth per function, globals per module. Maybe there will be more types of elements, e.g.table items? Yes, I entertained idea of complex encoding scheme as <register number> = <type>*3 + <index>. IHMO it is to brittle and does not carry any value, except to be decodable by tools that do not know how to handle it, and at the same time adds a burden to the tools that will know how to deal with and decode that. That's just my opinion. I also found that the tools (e.g. LLVM or LLDB) are eager to know about number of registers (which we cannot determine) to allocate internal structures.

aprantl added a comment.EditedDec 18 2018, 9:11 AM

I also found that the tools (e.g. LLVM or LLDB) are eager to know about number of registers (which we cannot determine) to allocate internal structures.

What I was thinking of was to define exactly three WASM DWARF register numbers: wasm_reg_local, wasm_reg_global, wasm_reg_stack and use DW_OP_breg regnum offset like you would use the new special operation introduced by this patch.

I also found that the tools (e.g. LLVM or LLDB) are eager to know about number of registers (which we cannot determine) to allocate internal structures.

What I was thinking of was to define exactly three WASM DWARF register numbers: wasm_reg_local, wasm_reg_global, wasm_reg_stack and use DW_OP_breg regnum offset like you would use the new special operation introduced by this patch.

Okay, let me try that. It will be sufficient to try it out, since signatures are identical, just spec defined meaning is different (it is not a memory location and offset is not bytes) from what we are trying to achieve, and that's my main concern. I'll make the changes.

Replace DW_OP_wasm_location with DW_OP_breg

@aprantl Is the advantage of your suggested approach just that we don't have to define a new expression type? Obviously the interpretation is not the same as DW_OP_breg on other targets so as you say, either way there would have to be special logic in all the tools that consume it. Is this kind of repurposing of builtin primitives common?

lib/CodeGen/AsmPrinter/DebugLocEntry.h
24 ↗(On Diff #178780)

This is very generic and doesn't really say what it's for, or what any of the fields are for. I get that this is trying to be target-independent but maybe it would be easier to understand if it wasn't. It looks like the index corresponds now to the dwarf register (e.g. to indicate that this is for a local or global), and the offset to indicate which one, but that seems backwards because we use the term "index" to refer to e.g. the index of the global or local.
Really what we have might in LLVM terms is a "kind" (i.e. a kind of location, local or stack or global) and an index (which local). "Index" is still a pretty generic term but it does match the usage of the term "index space" in the spec (and I would expect that our numbering should be the same, e.g. sharing the local index space for arguments and local variables).

@aprantl Is the advantage of your suggested approach just that we don't have to define a new expression type? Obviously the interpretation is not the same as DW_OP_breg on other targets so as you say, either way there would have to be special logic in all the tools that consume it. Is this kind of repurposing of builtin primitives common?

DWARF leaves a lot of details to the ABI of a platform so I would say it is not uncommon. But since it was mentioned that the indices aren't really offsets I see how using custom operators would be cleaner. I'm not against this approach, I was just trying to point out that an alternative less invasive encoding could be used that might be less confusing to existing consumers of debug information.

@aprantl Is the advantage of your suggested approach just that we don't have to define a new expression type? Obviously the interpretation is not the same as DW_OP_breg on other targets so as you say, either way there would have to be special logic in all the tools that consume it. Is this kind of repurposing of builtin primitives common?

I'm not against this approach, I was just trying to point out that an alternative less invasive encoding could be used that might be less confusing to existing consumers of debug information.

There will no be existing consumers of WebAssembly debug information, except probably llvm-dwarfdump. Investing time into extension of LLDB/GDB is a risk and may not pay off. The new consumers will be web browser debugger and AOT compilers, which will be familiar with the extensions.

I think there will eventually be more consumers, including traditional-style debuggers (even if just for non-web use cases).
I had initially hoped that we could get away with just using DWARF registers more-or-less directly for locals, but as Yury pointed out, there are a lot of places that assume a fixed number of architectural registers. And of course wasm globals and the wasm value stack don't really have analogs on most architectures. I still hope we'll be able to use DWARF's CFA and other stack stuff for the stack in linear memory. (Although in our case the stack pointer will be a global rather than a register. Hopefully we can make that work.)

We could experiment with the register-based approach for now since it wouldn't require coordinating any global numbers or anything.

yurydelendik marked an inline comment as done.Dec 19 2018, 10:29 AM
yurydelendik added inline comments.
lib/CodeGen/AsmPrinter/DebugLocEntry.h
24 ↗(On Diff #178780)

Here I was trying to us existing MO_TargetIndex "Target-dependent index+offset operand" as it is defined in LLVM (similar to lib/Target/AMDGPU/AMDGPU.h).

I also pondering the idea of using the Offset as "memory" offset by introducing the imaginary array/layout for e.g. locals with indices 0, 1, 2,.. will have offsets +0, +16, +32, ...

The alternative is to define MachineOperandType (e.g. MO_WasmLocation), but it will be more invasive

Come to think of it, might we actually want the possibility of constant offset in addition to the "kind" and index field, e.g. for cases where the value is actually a pointer?

Come to think of it, might we actually want the possibility of constant offset in addition to the "kind" and index field, e.g. for cases where the value is actually a pointer?

The idea was target-index() will define the (virtual?) location for the local/global/operand, so it is a reference, and it can further participate in DWARF expression calculations using the plus and deref operations if needed. Having target-index() as location/ref may pay-off when structure is stored in a local (e.g. flags) (not sure if it is the case in clang/llvm).

  • revert breg use due to incompatibility with DWARF definition
  • rebase to git llvm-project

Add simple test

Minor changes to comment; move #include

nikic added a subscriber: nikic.Jul 12 2019, 7:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2019, 7:39 AM

Let's work toward getting this landed!
@yurydelendik Your note mentioned that you reverted the breg version based on some kind of incompatibility, can you say more about that?

yurydelendik added a comment.EditedJul 13 2019, 6:59 AM

Let's work toward getting this landed!
@yurydelendik Your note mentioned that you reverted the breg version based on some kind of incompatibility, can you say more about that?

breg<i> n literally means to read register i content and add n offset (and that will be used as location address of value). What the above "breg" encoding does is it assigns reg 0 as an "virtual array" location of locals, reg 1 as a "virtual array" location of globals, etc., which makes it looks like a workaround since locals, globals or operand stack are not always stored in continues memory segments.

Incompatibilities or conflicts with the DWARF spec definitions:

  1. unclear what offset n means - is it an index or byte offset (e.g. 8 can mean 8th local or low byte of 1th local)?
  2. whole encoding scheme expresses a location address rather than a location (we can append extra DW_OP_deref (?) operation thus increasing size of encoding and risking that tools might break/optimize it)

So far there no benefits to us to choose breg encoding: it is not extensible and we cannot further extend it (e.g. beyond the single n offset parameter). Also generic DWARF (optimization?) tools has to track that they are in a WebAssembly context to choose different logic for breg.

Rebase D52634: [WebAssembly] Add DBG_VALUE with local operands location in WebAssemblyExplicitLocals pass

OK, I think we have enough out-of-tree experience now with this and the lldb side to expect that it will work, so we should really get it upstream (in this form or whatever form is acceptable).
So there are really 3 different elements to this patch.
1 is the DWARF expression type. I think we've found that this is preferable to repurposing DW_OP_breg as mentioned above. There was a question previously about whether this should be LLVM-specific or not. @yurydelendik I'm guessing that Mozilla's Rust DWARF library will be using this too, and LLVM won't be the only consumer, correct?
@aprantl Do you want to push back further or offer any other suggestions about this?

2 (the bulk of the patch) is all the target-independent infrastructure for using TargetIndex operands in DbgValue-typed MachineInstrs and using that instead of a MachineLocation to express the location. Mechanically this seems straightforward to me; @aprantl do you have any opinions on that, or on my comment on lvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp?

3 is just the straightforward application of the previous 2 to the wasm backend.

This patch should perhaps be split into a target-independent part and a wasm-specific part; or at the very least it should be renamed, since the bit about WebAssemblyExplicitLocals is really the least consequential and most-straightforward part.

llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
579

It's a bit odd that this is a target-independent extension but we directly encode the wasm-specific expression type here. Not sure if there's a better way or not.

llvm/lib/CodeGen/AsmPrinter/DwarfExpression.h
342

Since this is a generic extension and TargetIndex are usable by any target, how about saying something like "currently this is only used for WebAssembly locals..." etc

@yurydelendik can you split and/or rename this patch?
@aprantl do you have any more feedback on the above, or any suggestions about who else might be interested?

Use TargetIndex operands in DbgValue to track WebAssembly operands locations

yurydelendik retitled this revision from [WebAssembly] Add DBG_VALUE with local operands location in WebAssemblyExplicitLocals pass to [WebAssembly] Use TargetIndex operands in DbgValue to track WebAssembly operands locations.Nov 18 2019, 10:07 AM

Ping. Any other opinions? Perhaps @dblaikie ?

Ping. The target-independent portion of this patch is very small now.

aprantl added inline comments.Dec 6 2019, 8:52 AM
llvm/include/llvm/BinaryFormat/Dwarf.def
652

LLVM's list of vendor extensions is unfortunately not complete. (Pathces very welcome!) Have you checked that this enumerator is actually available? You might want to check libdwarf and the list of GNU extensions. Generally I would recommend using an enumerator that isn't close to the other GNU values, unless you also plan to add this to binutils etc.

yurydelendik marked an inline comment as done.Dec 6 2019, 3:34 PM
yurydelendik added inline comments.
llvm/include/llvm/BinaryFormat/Dwarf.def
652

Have you checked that this enumerator is actually available?

There is no codes specified for range between 0xe1 - 0xef per https://sourceware.org/elfutils/DwarfExtensions. The 0xed was chosen to be far from edges of that range. I also checked libdwarf/dwarf.h for availability.

Can you add a test that runs the output through llvm-dwarfdump, too, so that part of the patch is tested as well?

add llvm-dwarfdump test

yurydelendik edited the summary of this revision. (Show Details)Dec 16 2019, 9:26 AM
aprantl accepted this revision.Dec 16 2019, 2:56 PM

LGTM with minor nits.

llvm/lib/CodeGen/AsmPrinter/DebugLocEntry.h
23

Can you add a doxygen comment here?

60

///

llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
580

It's also odd to generalize something for which only a single implementation exists...

This revision is now accepted and ready to land.Dec 16 2019, 2:56 PM
dschuff added inline comments.Dec 18 2019, 11:07 AM
llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
580

I think this is called what it is because TargetIndex is already a general operand type. But maybe at the DWARF layer it really just just be something like DwarfExpression::addWasmLocation, and then in asmPrinter/DwarfDebug.cpp we just call addWasmLocation instead. If it's possible to assert there that the target is wasm, so much the better.

dschuff added inline comments.Dec 18 2019, 11:15 AM
llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp
580

The downside to that would be that if and when there are other places we want to use a TargetIndex instead of a Register (e.g. for the frame base) then we'd have to do that check in those places too.

Fix comments

dschuff accepted this revision.Dec 19 2019, 2:25 PM

After thinking a bit more, I'd like to rename DwarfExpression::addTargetIndexLocation to DwarfExpression::addWasmLocation because basically all of the methods on DarfExpression correspond to DW_OP_foo operands, so a method for adding DW_OP_wasm_location should probably just be called addWasmLocation. If we want to later have a utility function to extract some other functionality like converting a TargetIndex, we can always do that in the future when we have another caller.
With that suggestion, this change LGTM

rename addWasmLocation

This revision was automatically updated to reflect the committed changes.