This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Add DW_OP_WASM_location_int
ClosedPublic

Authored by aardappel on Apr 2 2020, 5:32 PM.

Details

Summary

This to allow us to add reloctable global indices as a symbol.
See discussion in https://github.com/WebAssembly/debugging/issues/12

To accomplish this, we add a new DW_OP_WASM_location_int tag (to keep DW_OP_WASM_location unchanged for now for tools that rely on it), and a new R_WASM_GLOBAL_INDEX_I32 relocation type.

As discussed in the issue, the current implementation is a bit of a "hard-coded" version of what DwarfExpression does, with added support for symbols. Weaving this into DwarfExpression itself was deemed to invasive at this stage, but could be a good change in the future if use of this functionality expands. For now it serves our need of a relocatable stack pointer reference, and possibly other globals.

Diff Detail

Event Timeline

aardappel created this revision.Apr 2 2020, 5:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2020, 5:32 PM

I'm still curious if something like this will work:

diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h
index c4dc53337c0..4bbff49606a 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h
@@ -44,2 +44,3 @@ public:
       BaseTypeRef = 8,
+      WasmLocationArg = 30,
       SignBit = 0x80,
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp b/llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp
index e9463fd53d9..c51039c66ef 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFExpression.cpp
@@ -96,3 +96,3 @@ static DescVector getDescriptions() {
   Descriptions[DW_OP_WASM_location] =
-      Desc(Op::Dwarf4, Op::SizeLEB, Op::SignedSizeLEB);
+      Desc(Op::Dwarf4, Op::SizeLEB, Op::WasmLocationArg);
   Descriptions[DW_OP_GNU_push_tls_address] = Desc(Op::Dwarf3);
@@ -172,2 +172,15 @@ bool DWARFExpression::Operation::extract(DataExtractor Data,
       break;
+    case Operation::WasmLocationArg:
+      assert(Operand == 1);
+      switch (Operands.Op[0]) {
+        case 0: case 1: case 2:
+          Operands[Operand] = Data.getULEB128(&Offset);
+          break;
+        case 4: // global as uint32
+          Operands[Operand] = Data.getU32(&Offset);
+          break;
+        default:
+          return false; // Unknown Wasm location
+      }
+      break;
     case Operation::SizeBlock:
@@ -273,2 +286,10 @@ bool DWARFExpression::Operation::print(raw_ostream &OS,
         OS << format(" 0x%02x", Expr->Data.getU8(&Offset));
+    } else if (Size == Operation::WasmLocationArg) {
+      assert(Operand == 1);
+      switch (Operands[0]) {
+        case 0: case 1: case 2:
+        case 4: // global as uint32
+          OS << format(" 0x%" PRIx64, Operands[Operand]);
+        default: assert(false);
+      }
     } else {

@yurydelendik that would just take care of the encoding, the bigger question would be how to plumb symbol support into DwarfExpression (including the buffering path).

aardappel updated this revision to Diff 255431.Apr 6 2020, 12:08 PM

Fixed dwarfdump (see change to WasmObjectFile).
Fixed int32 value being unsigned (now signed, like existing tag).
Fixed accidental binary file added to patch.

aardappel updated this revision to Diff 255448.Apr 6 2020, 12:49 PM

Added new relocation type: R_WASM_GLOBAL_INDEX_I32

sbc100 added a comment.Apr 6 2020, 2:02 PM

New relocation type lgtm. I'm surprised there wasn't a need to update any lld tests. Probably means we need to add one that tests DW_AT_frame_base

aardappel updated this revision to Diff 255548.Apr 6 2020, 5:47 PM

Fixed global only used from debug section not getting flags set.
Fixed tests.

aardappel updated this revision to Diff 255682.Apr 7 2020, 8:22 AM

Fixes for LLD

@yurydelendik that would just take care of the encoding, the bigger question would be how to plumb symbol support into DwarfExpression (including the buffering path).

Is this just because DwarfExpression doesn't have access to the AsmPrinter?

llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
424–454

This should probably at least say which header this value comes from.

Is this just because DwarfExpression doesn't have access to the AsmPrinter?

It is slightly more complicated, DwarfExpression has two paths, one that directly emits to a DIE structure as it goes, the other that caches into a list (of integers) whose size can be measured before its emitted. The latter is not really set up for metadata, like "this integer is really a symbol". Either way, I am sure it can be done, but it is in shared code, so I was a bit more hesitant to rework that, especially when @yurydelendik says it's outside the scope of this patch :) My conversations with @dblaikie also seemed to indicate it would be a much larger change, only worth it if we were going to have many more complex expressions making use of this functionality. Then again, this patch already grew larger than expected.. I'd say its something we can consider if my "hack" in this patch is needed in at least one more location?

aardappel removed a subscriber: dblaikie.Apr 9 2020, 8:42 AM
aardappel updated this revision to Diff 256319.Apr 9 2020, 8:46 AM

Updated comment to refer to original header

aardappel edited the summary of this revision. (Show Details)Apr 9 2020, 11:35 AM

If there are no further comments on improvements we can carry out right now, does anyone feel like approving this for merging? :)

If there are no further comments on improvements we can carry out right now, does anyone feel like approving this for merging? :)

The adding of new OP code DW_OP_WASM_location_int seems driven by LLVM implementation detail. I feel that is not necessary. Is it possible to use the alternative encoding suggested above, since we solved plumbing symbol support?

@yurydelendik is that actually true? Concretely you are proposing that the form of DW_OP_Wasm_location be variadic instead of fixed, right? i.e. it has different fields based on the value of the first field. I don't think there is any precedent for that in DWARF anywhere, is there? e.g. there are different ops for OP_plus and OP_plus_uconst; or OP_call2 and OP_call4 could be just one op with a similar discriminator, but they are not. You could argue that even having OP_Wasm_location as it is (if it didn't need different forms) is inefficient because instead of the discriminator we could just use different ops, and make it smaller.

aprantl added subscribers: dblaikie, probinson.

@yurydelendik how would you change Descriptions[DW_OP_WASM_location] = Desc(Op::Dwarf4, Op::SizeLEB, Op::SignedSizeLEB); to make it variadic? And there may be further locations that need to support this.
We didn't "solve" plumbing, we circumvented it for the moment :)
When a location is a local, it is still being output with your existing LEB code. So we could make your suggestion happen simply by using DW_OP_WASM_location instead of DW_OP_WASM_location_int.

@dschuff Yes, I could bake the cases of the wasm::TI_ enum into multiple DW_OP_WASM_ tags, be happy to do that as part of this patch. We could then deprecate the existing DW_OP_WASM_location tag. Consumers already need to deal with new tags showing up for globals, might as well do it for locals at the same time then?

Concretely you are proposing that the form of DW_OP_Wasm_location be variadic instead of fixed, right? i.e. it has different fields based on the value of the first field. I don't think there is any precedent for that in DWARF anywhere, is there?

Closest to variadic args I see is DW_OP_implicit_value. The DWARF sees DW_OP_Wasm_location is an operator by itself (which may have its custom arguments or none). FWIW I was planing to suggest to have something like DW_OP_Wasm_location wasm-local-0, DW_OP_Wasm_location wasm-local-1, .. similar to DWARF's reg0..reg31 to save extra byte.

I argue for this design because the space for the vendor extension is pretty limited (only 32 entries and some are already taken), so we can quickly run out of IDs for DW_OP_Wasm_xxx. I don't see any requirement on amount and kinds of arguments for vendor operators. The limitation on at most 2 arguments with strict types comes from the DWARFExpression.cpp implementation.

@yurydelendik how would you change Descriptions[DW_OP_WASM_location] = Desc(Op::Dwarf4, Op::SizeLEB, Op::SignedSizeLEB); to make it variadic?

I was hoping to test the code I suggested in the comment above at https://reviews.llvm.org/D77353#1959871 , which changes the above to Descriptions[DW_OP_WASM_location] = Desc(Op::Dwarf4, Op::SizeLEB, Op::WasmLocationArg);. I'll try to test my idea in couple of days.

So far we only need 2 tags, for locals and globals, and maybe soon a third for stack-relative. I don't see us running out of 32 entries that fast :)

So far we only need 2 tags, for locals and globals, and maybe soon a third for stack-relative. I don't see us running out of 32 entries that fast :)

Per https://github.com/gcc-mirror/gcc/blob/master/include/dwarf2.def, 22 is already well-known.

Anyway suggested encoding patch can be found at https://gist.github.com/yurydelendik/de99a1f512cca6e685780f07aade416a

@yurydelendik thanks! that appears to mostly be a merge of this patch and the changes you outlined above?

@dschuff @sbc100 do you prefer this over whats in this patch? (keeping the single WASM dwarf tag in exchange for implementing a variadic argument reader/writer?)

@paolosev you may have an opinion on this, given that you're already working with this data as well?

If there really are only 32 codes available for all the user targets, then maybe we should just try to get away with only using one.

Also cc @philip.pfaffe

@paolosev you may have an opinion on this, given that you're already working with this data as well?

I also think that if possible we should try to use only one code, since there are only 10 vendor extension codes left (9 if we count our DW_OP_WASM_location).

I am not an expert here, but looking at DW_OP_implicit_value as Yury says, it should be possible to have variable-length arguments (DW_OP_implicit_value expects a ULEB128 size followed by block of that size).

@dschuff @paolosev thanks for the feedback, seems we have some consensus on going forward with the single variadic tag. I'll merge in Yury's changes.

aardappel added a comment.EditedApr 16 2020, 12:19 PM

@yurydelendik one thing about your patch: the variadic argument was previously declared as SignedSizeLEB and emitted with emitSigned, yet your new code reads it as getULEB128 and stores it in an array of uint64_t (which we can't change since its shared code). It doesn't matter much for the current use case, but we might want to be consistent about this storing signed or unsigned values? I see value in it being signed for future use cases, but given the existing DWARFExpression code it may well have to be unsigned.

@yurydelendik one thing about your patch: the variadic argument was previously declared as SignedSizeLEB and emitted with emitSigned, yet your new code reads it as getULEB128 and stores it in an array of uint64_t (which we can't change since its shared code). It doesn't matter much for the current use case, but we might want to be consistent about this storing signed or unsigned values? I see value in it being signed for future use cases, but given the existing DWARFExpression code it may well have to be unsigned.

Previously the signed LEB was chosen as universal argument type. For wasm-local, -global, -stack we just need unsigned, so I decided to switch to be unsigned for these location. I have no preference of them be signed/unsigned.

aardappel updated this revision to Diff 258153.Apr 16 2020, 1:13 PM

Changed encoding to use Yury's variadic WasmLocationArg type.

Ok, will try to make it more consistently unsigned, given that down the line they will be stored as unsigned anyway. But its still messy, given that at least thru one path they arrive as this TargetIndexLocation which is signed.
Made it explicit in the comments for TargetIndex that these are meant to be unsigned.

aardappel updated this revision to Diff 258160.Apr 16 2020, 1:31 PM

Made values more consistently unsigned thru-out.

New relocation type and handling LGTM.

If one were being meticulous one might land this as three different changes:

  1. Rename TargetIndex enum values. NFC.
  2. Introduce the new relocation type (albeit with no uses)
  3. The actual meat of the change.

Not necessary, I just noticed the logical split.

@sbc100 I'm a fan of small and focussed commits, but not a fan of new functionality without any uses/tests. Given that this commit is not particularly big, I vote to keep it together.

dschuff accepted this revision.Apr 16 2020, 3:09 PM

LGTM with one nit

llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp
238–241

I'm not exactly sure what these strings are for, but they should probably be changed to reflect the new naming of the enums?

This revision is now accepted and ready to land.Apr 16 2020, 3:09 PM
aardappel updated this revision to Diff 258199.Apr 16 2020, 4:30 PM

updated target index string descriptions

The comparison on lines 81 and 82 of InputChunks.cpp is always true.

if (rel.Type != R_WASM_GLOBAL_INDEX_LEB ||
    rel.Type != R_WASM_GLOBAL_INDEX_I32) {

This causes build failures:

InputChunks.cpp:81:45: error: overlapping comparisons always evaluate to true [-Werror,-Wtautological-overlap-compare]

@saugustine thanks, pushed a fix.

The perils of WFH developing with Visual Studio :(

Appreciate the quick fix. Thanks.

aardappel closed this revision.May 7 2020, 3:21 PM

This has already landed.