This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Generate R_WASM_FUNCTION_OFFSET relocs in debuginfo sections
ClosedPublic

Authored by dschuff on Jun 2 2021, 2:39 PM.

Details

Summary

Debug info sections need R_WASM_FUNCTION_OFFSET_I32 relocs (with FK_Data_4 fixup kinds) to refer to functions (instead of R_WASM_TABLE_INDEX as is used in data sections). Usually this is done in a convoluted way, with unnamed temp data symbols which target the start of the function, in which case WasmObjectWriter::recordRelocation converts it to use the section symbol instead. However in some cases the function can actually be undefined; in this case the dwarf generator uses the function symbol (a named undefined function symbol) instead. In that case the section-symbol transform doesn't work and we need to generate the correct reloc type a different way. In this change WebAssemblyWasmObjectWriter::getRelocType takes the fixup section type into account to choose the correct reloc type.

Fixes PR50408

Diff Detail

Event Timeline

dschuff created this revision.Jun 2 2021, 2:39 PM
dschuff updated this revision to Diff 349385.Jun 2 2021, 2:41 PM

fix diff

dschuff updated this revision to Diff 349420.Jun 2 2021, 5:11 PM
  • exclude functions from section-symbolizing
dschuff added inline comments.Jun 2 2021, 5:27 PM
llvm/lib/MC/WasmObjectWriter.cpp
507 ↗(On Diff #349420)

@sbc100 from here, testing with Andy's example I found that we are getting some fixups where SymA has type data (or no type?) but the relocation is clearly intended to be against a function symbol (it has the name of a function and is a R_WASM_FUNCTION_OFFSET reloc). Also the SECTION_OFFSET relocs with target symbols in the .debug_str section are all nameless symbols. I don't really understand yet why either of these are the way they are.

dschuff updated this revision to Diff 350738.Jun 8 2021, 4:57 PM

Add test

dschuff updated this revision to Diff 350739.Jun 8 2021, 4:58 PM

fix diff

@sbc I think this patch functionality (i.e. pass the section to be fixed-up into getRelocType, and make the reloc type be WASM_FUNCTION_OFFSET_I32 if the section is a debuginfo section) gives the reasonable-looking behavior for the object file on the problematic tests.
However the linker seems to be turning into 0xffffff

Here's a complete example using dwarf 4 rather than dwarf 5 since wasm-emscripten-finalize is choking on .debug_addr still:

ds@ancho:/s/emr/emscripten-releases/localtests (main)$ cat PR50408.cc
// Compile with -g
#if MAIN
char a();
template <typename, char b()>
void f() { b(); }
void g() { f<char, a>(); }
#else
char a() { return 'a'; }
#endif
ds@ancho:/s/emr/emscripten-releases/localtests (main)$ /s/emr/install/emscripten/em++ PR50408.cc -DMAIN -c -g -gdwarf-4 -o PR50408.o
ds@ancho:/s/emr/emscripten-releases/localtests (main)$ /s/emr/install/emscripten/em++ PR50408.cc  -c -g -gdwarf-4 -o PR50408_a.o
ds@ancho:/s/emr/emscripten-releases/localtests (main)$ /s/emr/install/emscripten/em++ -g PR50408.o PR50408_a.o -o PR50408.js
ds@ancho:/s/emr/emscripten-releases/localtests (main)$ /s/emr/install/bin/llvm-dwarfdump -debug-info PR50408.o
PR50408.o:      file format WASM

.debug_info contents:
0x00000000: Compile Unit: length = 0x00000080, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x04 (next unit at 0x00000084)

<snip>

0x00000061:     DW_TAG_template_value_parameter
                  DW_AT_type    (0x00000079 "char()*")
                  DW_AT_name    ("b")
                  DW_AT_location        (DW_OP_addr 0x0, DW_OP_stack_value)

ds@ancho:/s/emr/emscripten-releases/localtests (main)$ /s/emr/install/bin/llvm-dwarfdump -debug-info PR50408.wasm
PR50408.wasm:   file format WASM

.debug_info contents:
0x00000000: Compile Unit: length = 0x00000080, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x04 (next unit at 0x00000084)

<snip>

0x00000061:     DW_TAG_template_value_parameter
                  DW_AT_type    (0x00000079 "char()*")
                  DW_AT_name    ("b")
                  DW_AT_location        (DW_OP_addr 0xffffffff, DW_OP_stack_value)

So maybe the linker isn't properly handling that use of WASM_FUNCTION_OFFSET_I32 when used with a function instead of a section symbo.

dschuff updated this revision to Diff 359179.Jul 15 2021, 5:53 PM
  • exclude functions from section-symbolizing
  • Add test
  • fix up map-file test
  • commit random prints
  • clean up debug printfs
  • remove debuginfo-relocs
dschuff retitled this revision from Fix for PR50408 to [WebAssembly] Generate R_FUNCTION_OFFSET relocs in debuginfo sections.Jul 15 2021, 6:06 PM
dschuff edited the summary of this revision. (Show Details)
dschuff updated this revision to Diff 359182.Jul 15 2021, 6:07 PM
  • add asm test
dschuff published this revision for review.Jul 15 2021, 6:07 PM
dschuff added reviewers: sbc100, aardappel.
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2021, 6:08 PM

This is now almost ready, just a couple of issues. Primarily, as the comment on line 510 of WasmObjectWriter.cpp suggests, I don't fully understand how the existing codepath (which uses the section symbol) works. I'm not sure if I should try to understand that more or not. And secondly I figured out a way to test the new behavior (but not the existing one) with a .s file, so I'm not sure whether we should keep the .ll test or not (it would catch any changes in how the dwarf code generates this MC, but I'm not sure we care about that, if the MC code is correct).

sbc100 added inline comments.Jul 16 2021, 10:01 AM
lld/test/wasm/debuginfo-relocs.s
20 ↗(On Diff #359182)

Why delete this file? Can we just update the comment instead?

lld/test/wasm/map-file.s
33 ↗(On Diff #359182)

Why change this test?

llvm/lib/MC/WasmObjectWriter.cpp
514 ↗(On Diff #359182)

Its seem odd that does doesn't break stuff since R_WASM_SECTION_OFFSET_I32 relocations (where SymA is not a function) no longer get SymA converted to a section symbol?

630 ↗(On Diff #359182)

Can we use RelEntry.Symbol->isDefined like below? (or at least make them the same?)

llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyWasmObjectWriter.cpp
47 ↗(On Diff #359182)

I'm still not sure about the rename of this function. This function seems to return the section in which the symbol being relocated against is defined. Is that the "target" section? Or is the target section the section in which the relocation itself lives?

136 ↗(On Diff #359182)

I guess this is fine for now, but would break if we ever have metadata sections that want to refer to the table index of a function?

llvm/test/MC/WebAssembly/debuginfo-relocs.s
1 ↗(On Diff #359182)

Maybe comment here saying what we are testing? i.e. that we always generate R_WASM_FUNCTION_OFFSET_I32 in debug sections.

I wonder if it would make sense to use a relocation type in the assembly here to make this more explicit: i.e. .int32 bar@OFF? Might make this change more invasive but would be less magical perhaps? I think it would involve a new MCSymbolRefExpr::VariantKind such as VS_BYTE_OFFSET?

39 ↗(On Diff #359182)

Indentation looks off here

40 ↗(On Diff #359182)

Remove blank line

dschuff retitled this revision from [WebAssembly] Generate R_FUNCTION_OFFSET relocs in debuginfo sections to [WebAssembly] Generate R_WASM_FUNCTION_OFFSET relocs in debuginfo sections.Jul 16 2021, 1:25 PM
dschuff edited the summary of this revision. (Show Details)
dschuff added inline comments.Jul 16 2021, 1:51 PM
lld/test/wasm/debuginfo-relocs.s
20 ↗(On Diff #359182)

This test no longer tests the code it was intended to cover (see https://reviews.llvm.org/D66435) becuase that change is about handling R_WASM_TABLE_INDEX and now this test case generates R_WASM_FUNCTION_OFFSET instead. In fact, I'm not even sure it's possible anymore to get the situation that D66435 is supposed to handle (I at least couldn't think of another way to write a test for it).

lld/test/wasm/map-file.s
33 ↗(On Diff #359182)

The .debug_info section on line 40 has the same pattern that's being changed. Previously it generated R_WASM_TABLE_INDEX reloc, and therefore a table section in the linked binary (now there would be no table section). Since the point of this test is to show the link map with all the sections, I decided to go ahead and change the test so that the result would still include a table section.

llvm/lib/MC/WasmObjectWriter.cpp
514 ↗(On Diff #359182)

I think the R_WASM_SECTION_OFFSET_I32 relocations are never associated with function symbols, so they always fall into this case. (Whereas the R_WASM_FUNCTION_OFFSET relocs can be either way). This condition could also be written as

if (((Type == wasm::R_WASM_FUNCTION_OFFSET_I32 ||
     Type == wasm::R_WASM_FUNCTION_OFFSET_I64) &&
    !SymA->isFunction()) ||
    Type == wasm::R_WASM_SECTION_OFFSET_I32) {

and it also seems to work.

630 ↗(On Diff #359182)

It seems to pass the tests, although I'm not sure I understand what the difference is. isInSection is implemented as return isDefined() && !isAbsolute(); but I'm not sure whether isAbsolute is relevant here. isInSection seems more intuitive given that the condition it guards goes to actually get the section?

sbc100 added inline comments.Jul 16 2021, 2:55 PM
lld/test/wasm/debuginfo-relocs.s
20 ↗(On Diff #359182)

But why not just update the comment? This .s file is still valid right? And it should have the same content (zero) for the debug_info section?

llvm/lib/MC/WasmObjectWriter.cpp
514 ↗(On Diff #359182)

I still don't understand. Surely we want to use the section symbol for all relocations of this type not just the !SymA->isFunction() ones?

i.e. Why do we leave SymA alone one when isFunction() returns true? Why not always adjust SymA to be a section symbol?

630 ↗(On Diff #359182)

Given that we have the following code below I still think we should be consistent:

// For undefined symbols, use zero
if (!RelEntry.Symbol->isDefined())
  return 0;

All defined functions must have a section .. I think that is just as intuitive (maybe more intuitive).

dschuff updated this revision to Diff 359491.Jul 16 2021, 5:09 PM
dschuff marked 2 inline comments as done.
  • review comments
lld/test/wasm/debuginfo-relocs.s
20 ↗(On Diff #359182)

actually no, the content of the debug_info section is now 2 (because it's the offset of bar from the start of the code section, rather than a table index). And the whole point of the test originally was to test lld's behavior on table_index relocations, which it doesn't anymore. So this test doesn't test anything not covered by other tests.

llvm/lib/MC/WasmObjectWriter.cpp
514 ↗(On Diff #359182)

We can't use the section symbol for the cases in PR50408 because the function is undefined, and therefore not in any section.
It seems that for every other case the symbol is not actually a function symbol but a data symbol.
In other words, I don't know of any cases where IsFunction() and IsDefined() are both true.
If there were, I think we would have the correct behavior either way, we could just get it via 2 different codepaths. I still don't fulle understand how the section-symbol case ends up using the defined function symbol by the time the object file is written.
Another way this could be written could maybe be

if ((Type == wasm::R_WASM_FUNCTION_OFFSET_I32 ||
     Type == wasm::R_WASM_FUNCTION_OFFSET_I64 ||
     Type == wasm::R_WASM_SECTION_OFFSET_I32) &&
    SymA->isDefined()) {

That would mean that if SymA were a defined function it would go through the section-symbol codepath instead. That also passes all the tests we have (but as I said, I'm pretty sure our tests don't cover that case).

630 ↗(On Diff #359182)

OK yeah, that makes sense given the invariant.

llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyWasmObjectWriter.cpp
47 ↗(On Diff #359182)

yes, I guess it's a matter of perspective. To me the "fixup section", is the section being fixed up, which this is not. I'm open to other ideas for names though.

136 ↗(On Diff #359182)

Yes, it would. I couldn't think of any cases where we'd need that (nor a convenient way to tell the difference in that case, short of having another fixup type?)

llvm/test/MC/WebAssembly/debuginfo-relocs.s
1 ↗(On Diff #359182)

I added a comment.
That's an interesting idea about the SymbolRefExpr. I guess that would allow assembly code to explicitly select either byte offset or table index relocs. Would it be possible to use it in the compiler though?
One of the weird things I discovered working on this CL is that for the existing case (where it starts with a temp data symbol, and we convert it to use the section symbols) I couldn't think of any way to test that with assembly.

sbc100 accepted this revision.Jul 16 2021, 5:56 PM
sbc100 added inline comments.
llvm/lib/MC/WasmObjectWriter.cpp
514 ↗(On Diff #359182)

I think I would prefer this since it obvious why undefined symbols can't be replaced with the corresponding section symbol.

This revision is now accepted and ready to land.Jul 16 2021, 5:56 PM
aardappel accepted this revision.Jul 19 2021, 10:17 AM
This revision was landed with ongoing or failed builds.Jul 19 2021, 2:06 PM
This revision was automatically updated to reflect the committed changes.
dschuff marked an inline comment as done.