This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Emit clangast in custom section aligned by 4 bytes
ClosedPublic

Authored by kateinoigakukun on Feb 13 2020, 1:37 AM.

Details

Summary

Emit __clangast in custom section instead of named data segment
to find it while iterating sections.
This could be avoided if all data segements (the wasm sense) were
represented as their own sections (in the llvm sense).
This can be resolved by https://github.com/WebAssembly/tool-conventions/issues/138

And the on-disk hashtable in clangast needs to be aligned by 4 bytes,
so add paddings in name length field in custom section header.

The length of clangast section name can be represented in 1 byte
by leb128, and possible maximum pads are 3 bytes, so the section
name length won't be invalid in theory.

Fixes https://bugs.llvm.org/show_bug.cgi?id=35928

Diff Detail

Event Timeline

sbc100 added inline comments.Feb 13 2020, 9:02 AM
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
1701 ↗(On Diff #244358)

Hmm.. it looks like this function isn't actually needed: https://reviews.llvm.org/D74565.

Does this change work if you simply return K here? I guess/hope it does.

llvm/lib/MC/WasmObjectWriter.cpp
383

I general this seems like trick that could work. However, I think this value is encoded as an LEB32 which means its max length is technically 5. I don't think a 6 byte LEB is valid .

1113

Why this part needed ? What happens if you don't specify this? I looks like removes section symbol from the symtab?

Can we get a test to see the effect of this patch?

sbc100 added inline comments.Feb 14 2020, 3:34 PM
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
1701 ↗(On Diff #244358)

Let me know if this change is uneeded with https://reviews.llvm.org/D74565.? I'll hold of landing that until I hear back. Hopefully it will just work and simplify this patch once it lands.

kateinoigakukun marked 3 inline comments as done.Feb 16 2020, 8:42 PM
kateinoigakukun added inline comments.
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
1701 ↗(On Diff #244358)

Thanks for suggestion. But this special treatment is necessary to put the clangast in custom section.
The section kind K come from TargetLoweringObjectFile::getKindForGlobal which is called from AsmPrinter::EmitGlobalVariable, but the getKindForGlobal infers that __clangast global variable should be put in readonly section.

Then readonly section is put in data segment for WebAssembly. But what we really want in WebAssembly context is to put clangast metadata in custom section.

Should I change getKindForGlobal to return metadata section kind for clangast? But I'm not sure how the change effects to other object formats.

llvm/lib/MC/WasmObjectWriter.cpp
383

Thanks. I see, I confirmed that you are right https://webassembly.github.io/spec/core/binary/values.html#integers

Then how should I pad section start? Should I pad with empty data segment or something else?

1113

This part prevents the clangast section symbol from putting it in symbol table.
This part is necessary to put clangast in custom section because the sym is treated as data type but the actual content is not in data section, so WasmObjectWriter raised abortion with "data symbols must live in a data section: __clang_ast" when it tries to write the sym in symtab.

Could you take a look again?

sbc100 added inline comments.Feb 20 2020, 4:08 PM
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
1701 ↗(On Diff #244358)

I see. I'd still rather get rid of this function completely in the long run, so if I land my change first you can just do this check directly in TargetLoweringObjectFileWasm::getExplicitSectionGlobal. Is there some other way we can identify __clang_ast here other than by section name? Is it a special kind of symbol?

In the long run I think this problem will be solved by putting each symbol in its own wasm section but to do that we need to diverge from the spec a little. Would you mind added a TODO here and referencing this bug: https://github.com/WebAssembly/tool-conventions/issues/138.

I think it we fix this bug we won't need this change.

It won't solve the alignment issue though :(

llvm/lib/MC/WasmObjectWriter.cpp
383

I think the problem is that the wasm file itself doesn't have the concept of alignment of sections, since its not designed to memory mapped in that way. Its designed to be parsed. Is the loading code expecting the file offset of the section to be aligned? I'm not sure how we can work around this.

llvm/test/MC/WebAssembly/clangast.ll
29 ↗(On Diff #245051)

I'm confused by this size value. Shouldn't it be at least 27?

My colleague pointed out that one way to archive padding in the binary format would be to create a synthetic custom section that precedes that one you want to align. Its kind of horrible, but wasm files were not designed to used memory mapped like this.

kateinoigakukun marked an inline comment as done.Feb 20 2020, 7:33 PM

My colleague pointed out that one way to archive padding in the binary format would be to create a synthetic custom section that precedes that one you want to align. Its kind of horrible, but wasm files were not designed to used memory mapped like this.

Thanks! I'll try this way. My only concern is whether the order of custom sections are guaranteed.

llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
1701 ↗(On Diff #244358)

OK, I'll update after https://reviews.llvm.org/D74565 merged.

Is there some other way we can identify __clang_ast here other than by section name? Is it a special kind of symbol?

I couldn't find any other way to identify the section. I think it's a really special case to support PCH.

In the long run I think this problem will be solved by putting each symbol in its own wasm section but to do that we need to diverge from the spec a little. Would you mind added a TODO here and referencing this bug: https://github.com/WebAssembly/tool-conventions/issues/138.

I see. I agree with you. I'll add TODO comment here.

Thanks! I'll try this way. My only concern is whether the order of custom sections are guaranteed.

If you simple create the extra section right there in the WasmObjectFileWriter it will be in the right order. I wasn't recommending creating an extra section at the llvm level.

I just landed https://reviews.llvm.org/D77115 which does similar thing for the embedded bitcode sections and should simplify this CL a little. Although it doesn't try to do the file offset alignment.

If we can figure out way to do alignment I think that is the only remaining blocker for this.

llvm/lib/MC/WasmObjectWriter.cpp
1113

I think this part might no longer be needed after https://reviews.llvm.org/D77115

patcheng added inline comments.
llvm/lib/MC/WasmObjectWriter.cpp
383

BTW, here was an implementation I did way back: https://reviews.llvm.org/D42233

If I remember correct, you could spread the extra padding across multiple integer fields.

kateinoigakukun retitled this revision from [WebAssembly] Emit PCH __clang_ast in custom section to [WebAssembly] Emit clangast in custom section aligned by 4 bytes.
kateinoigakukun edited the summary of this revision. (Show Details)

I'm sorry to keep you waiting. I updated to reduce hacks in backend to emit clangast in custom section and move it to clang.
And emit paddings for alignment to fit into 5 bytes.

Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2021, 11:38 PM
sbc100 added inline comments.Oct 18 2021, 9:18 AM
llvm/lib/MC/WasmObjectWriter.cpp
374

I would leave the first sentence alone which that applies to all custom sections. The second part of this sentence (the new part) is maybe not needed since you have a comment already on line 378.

Also, how about keeping the simpler form in the default case, and putting this new code in a new method. e.g.:

if (Name != "__clangast") {
  writeString(Name);
} else {
  writeStringWithAlignment(Name, 4);
}

Then the entire new method can be marked with the TODO to remove it if/when we change the section mapping.

Thank you for taking a look again. OK, I extracted writeStringWithAlignment and updated comment lines.

sbc100 accepted this revision.Oct 19 2021, 10:05 AM
This revision is now accepted and ready to land.Oct 19 2021, 10:05 AM

Thanks for reviewing! Can you merge this patch? I have no commit access.