This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly, LowerTypeTests] Fix control-flow integrity support
Needs ReviewPublic

Authored by ddcc on Sep 7 2020, 5:32 PM.

Details

Summary

Turns out this has been broken for WebAssembly since s2wasm was removed. This patch fixes it by:

  • LowerTypeTests: Increment index for imported symbols, and remove unsupported absolute symbols
  • WebAssemblyTargetStreamer: Support pre-assigned table indices, and add relative offset

Diff Detail

Event Timeline

ddcc created this revision.Sep 7 2020, 5:32 PM
ddcc requested review of this revision.Sep 7 2020, 5:32 PM
ddcc added inline comments.Sep 7 2020, 5:35 PM
llvm/lib/MC/WasmObjectWriter.cpp
622

Not sure if this is entirely correct, please take a look. My understanding is that D79462 introduced a bug where it fails to add the offset from a relative symbol. Without these changes, I get the following error from the lld side:

wasm-ld: warning: unexpected existing value for R_WASM_MEMORY_ADDR_SLEB: existing=8420 expected=8452

After some debugging, it appears that this is a relative symbol, where the +32 offset gets missed since only the base is examined.

RelEntry.Symbol->getVariableValue(false): .L__unnamed_1+32
symbol: Name=.L_ZTV3Foo.0, Kind=WASM_SYMBOL_TYPE_DATA, Flags=2, Segment=62, Offset=32, Size=20
WasmObjectWriter::writeObject: Ref = {Segment = 62, Offset = 32, Size = 20}
WasmObjectWriter::getProvisionalValue: Ref = {Segment = 62, Offset = 0, Size = 52}
ddcc updated this revision to Diff 290392.Sep 7 2020, 7:03 PM

Fix Twine issue

Hi, nice to hear from you, and thanks for the patch! I'm going to have to try to page-in all the vague knowledge I have about CFI, because I think it would be really good to get this working for real.

One thing that's happened since you did all of this originally is multiple-table support; the reference-types proposal includes multiple tables, has been standardized, and will soon be (or is already?) available in all the browsers. IIRC you originally did both a version with and without multiple-tables, but I forget how much of that was in s2wasm vs in LLVM. IMO we should just make CFI require multiple tables, and then we can dedicate as many tables for whatever purposes we like.
I assume that we'll still need the core functionality of this CL, which is having the table layout be determined at IPO time rather than being assigned by the objectWriter/linker?

Can you remind me of what the state was for multiple-table support when you originally wrote all this?

dschuff added a subscriber: aardappel.

oh, and @sbc100 and @aardappel should probably be the ones to review the MC bits.

ddcc added a comment.EditedSep 8 2020, 12:06 PM

One thing that's happened since you did all of this originally is multiple-table support; the reference-types proposal includes multiple tables, has been standardized, and will soon be (or is already?) available in all the browsers. IIRC you originally did both a version with and without multiple-tables, but I forget how much of that was in s2wasm vs in LLVM. IMO we should just make CFI require multiple tables, and then we can dedicate as many tables for whatever purposes we like.
I assume that we'll still need the core functionality of this CL, which is having the table layout be determined at IPO time rather than being assigned by the objectWriter/linker?

Can you remind me of what the state was for multiple-table support when you originally wrote all this?

This CL is still using a single table, so it relies on the LowerTypeTests pass pre-assigning table indices into !wasm.index metadata, then generating arithmetic checks at each call site based on the valid index range for each type. Later, the ASMPrinter and MCSymbol components propagate and assign those table indices during object generation.

The multiple table implementation was in D22246, which wasn't merged. I recall the design was more complicated and experimental because there were a couple of remaining issues:

  1. Depending on how we assign table indices, false negatives may occur with certain CFI checks. Suppose we put function "void F1(struct Foo*)" at index 1 in table A, function "void F2(struct void *)" at index 1 in table B, and there's an indirect call site CS that dispatches using the latter type, but may call either F1 or F2. Normally, the CFI pass would only allow CS to call F2 (because they have the same type), and not F1 (because it has a different type, even if this may be considered a false positive). But if the backend generates indices that could alias across tables, then we won't detect this and allow both to go through. The WebAssembly runtime signature check on indirect calls doesn't help, because both functions will have the same lowered type as a pointer is just an i64.
  2. It was difficult to keep track of which table should be used for which call site in LLVM IR. The Clang front-end generates calls to the type test intrinsic (which contains the type metadata) somewhere between e.g. a function pointer load and its indirect call, but there's no direct link to the indirect call in the IR. But, the LowerTypeTests pass needs to somehow find and tag all the call sites and valid call targets of the same type with a unique identifier. Both the single and multiple table design are using metadata for tagging things, but this probably isn't fully correct since metadata may be removed by intermediate optimizations?

In general, with the multiple table approach, LowerTypeTests would need to do the tagging and then remove all the type test intrinsics since we don't need additional arithmetic checks. Later, somewhere in the backend, we would need to place and generate (table, index) pairs for all valid call targets of the same type, and fixup each indirect call to refer to the correct table. I don't see support for e.g. a table reference on the call_indirect instruction, so it looks like all of this backend machinery would need to be implemented as well?

sbc100 added a comment.Sep 8 2020, 1:01 PM

So is the idea that the table indexes would be reserved also by the linker?

Right now wasm-ld completely ignores the table elements in the object files and generates contiguous table entries when if finds TABLE_INDEX relocations.

llvm/lib/MC/WasmObjectWriter.cpp
548

Would the read better if you flipped the if/else and remove the negation?

622

This looks correct to me. Is the reason we have not been seeing because we have -fdata-sections on by default so all symbols are in their own section?

I would hope we could land this separately with its own test(s). Do you have a way to trigger this bug?

Thanks @ddcc for the summary. Given that, I do think it makes sense to fix the current implementation before worrying about multiple tables.

(And yes, you are right that using metadata for this isn't strictly correct since metadata can be dropped. And I think you are also correct that we haven't added multiple table support for MC or the instruction definitions either).

ddcc added a comment.Sep 8 2020, 6:03 PM

Currently, I am seeing some false positive CFI failures that only occur with WebAssembly and not native, so I still need to look into what's causing that.

So is the idea that the table indexes would be reserved also by the linker?

Right now wasm-ld completely ignores the table elements in the object files and generates contiguous table entries when if finds TABLE_INDEX relocations.

Ah, so under LTO, the output from WasmObjectWriter is linked by lld with libc, etc, which writes the final output? I was hoping to avoid changing the object format to explicitly assign table element indices, since that would involve amending the specification, by just emitting the table elements in the correct order. Would it be sufficient to have WasmObjectWriter::recordRelocation also ensure that TABLE_INDEX relocations are inserted into CodeRelocations in the correct assigned order? Right now I only push them to the front of CodeRelocations to avoid conflicts when assigning indices with subsequent elements.

Thanks @ddcc for the summary. Given that, I do think it makes sense to fix the current implementation before worrying about multiple tables.

Sounds good, I don't think I have the time right now for the full multiple table design.

(And yes, you are right that using metadata for this isn't strictly correct since metadata can be dropped. And I think you are also correct that we haven't added multiple table support for MC or the instruction definitions either).

One option would be to mutate the function type so that it is in a separate address space, instead of using the !wasm.index metadata. But this may cause problems if we don't clean up all associated function pointers to perform additional address space casts where necessary. Another options could be using function prefix data? Otherwise, I don't see any suitable function attributes that we could reuse, but I also haven't used either of the these options before, so I don't know what would be best.

llvm/lib/MC/WasmObjectWriter.cpp
548

Ok, will do.

622

I'll take a look at extracting this part out and writing a testcase for it. Currently, it only occurs when I compile a local test application with CFI enabled using emsdk. I'm not sure where it's happening, but either Clang or LowerTypeTests is inserting references to a function inside the vtable for the Foo class.

ddcc updated this revision to Diff 290860.EditedSep 9 2020, 6:56 PM

Fix undefined arithmetic range ops in LowerTypeTests, split WebAssembly relative symbol bugfix into D87407

ddcc updated this revision to Diff 291132.EditedSep 10 2020, 10:25 PM

Switch to absolute_symbol metadata, which is undroppable, instead of wasm.index

ddcc added a comment.Sep 10 2020, 10:31 PM

So is the idea that the table indexes would be reserved also by the linker?

Right now wasm-ld completely ignores the table elements in the object files and generates contiguous table entries when if finds TABLE_INDEX relocations.

Ah, so under LTO, the output from WasmObjectWriter is linked by lld with libc, etc, which writes the final output? I was hoping to avoid changing the object format to explicitly assign table element indices, since that would involve amending the specification, by just emitting the table elements in the correct order. Would it be sufficient to have WasmObjectWriter::recordRelocation also ensure that TABLE_INDEX relocations are inserted into CodeRelocations in the correct assigned order? Right now I only push them to the front of CodeRelocations to avoid conflicts when assigning indices with subsequent elements.

@sbc100 I have a local prototype that reorders CodeRelocations such that symbols with an assigned index appear first, but it seems that there's a conflicting object format requirement that relocations appear in offset order. What would be the best way to propagate assigned indices through to wasm-ld, either implicitly or explicitly?

Just a cross-reference, table-number relocs are being added in D90948