This is an archive of the discontinued LLVM Phabricator instance.

Support CFI for WebAssembly target
ClosedPublic

Authored by ddcc on Jun 27 2016, 1:16 PM.

Details

Summary

This patch implements CFI for WebAssembly. It modifies the LowerTypeTest pass to pre-assign table indexes to functions that are called indirectly, and lowers type checks to test against the appropriate table indexes. It also modifies the WebAssembly backend to support a special ".indidx" assembly directive that propagates the table index assignments out to the linker.

Diff Detail

Event Timeline

ddcc updated this revision to Diff 62011.Jun 27 2016, 1:16 PM
ddcc retitled this revision from to [LowerTypeTests] Support CFI for WebAssembly target [WebAssembly] Propagate indirect function call index assignments out for CFI Don't break debug information with virtual registers.
ddcc changed the visibility from "Public (No Login Required)" to "All Users".Jun 27 2016, 1:16 PM
ddcc removed a subscriber: MatzeB.
jfb added a comment.Jun 27 2016, 3:19 PM

I'd like to see a few .ll tests as well.

Could you also update the patch description to explain what it's doing?

lib/CodeGen/AsmPrinter/DbgValueHistoryCalculator.cpp
167 ↗(On Diff #62011)

Would you have a stand-alone test that triggers this separately? I think you should do the changes in the file in a separate patch (it's similar to @dschuff's PEI patch to handle virtual registers).

lib/CodeGen/LiveIntervalAnalysis.cpp
1326 ↗(On Diff #62011)

Same here, do you have a bitcode file that demonstrates the issue? I'd fix separately (separate from DbgValue*.cpp above as well).

lib/Transforms/IPO/LowerTypeTests.cpp
689

Weird format here, you should run clang-format on the parts that you change.

835

static here is a bad idea. Is it just used once to fill GlobalLayout, or is the function called multiple times?

ddcc retitled this revision from [LowerTypeTests] Support CFI for WebAssembly target [WebAssembly] Propagate indirect function call index assignments out for CFI Don't break debug information with virtual registers to Support CFI for WebAssembly target.Jun 27 2016, 3:35 PM
ddcc updated this object.
ddcc added a comment.Jun 27 2016, 3:40 PM

Yeah, I should move the commit that modifies debug information and virtual register checking out of this patch. It's a gist by Yury Delendik (https://gist.github.com/yurydelendik/a37b786dae40736ed7a1b04770f9c9ed) that I picked up on this branch while debugging compilation crashes with debug information enabled.

ddcc added inline comments.Jun 27 2016, 3:43 PM
lib/Transforms/IPO/LowerTypeTests.cpp
835

This function is called once for each disjoint equivalence set. I'll move it out to the class level.

ddcc updated this revision to Diff 62029.Jun 27 2016, 4:03 PM

Fix formatting, drop gist

ddcc marked 3 inline comments as done.Jun 27 2016, 4:05 PM
jfb added inline comments.Jun 29 2016, 10:37 AM
lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
189

You can fold the above lines into one:

if (MDNode *Idx = F.getMetadata("wasm.index")) {

LLVM usually uses that idiom.

lib/Transforms/IPO/LowerTypeTests.cpp
224

A more descriptive name would be better: what is this counting?

640

This shouldn't get deleted since this is x86-specific?

653

Same, still x86-specific.

680

Same.

842

Why nullptr here?

ddcc updated this revision to Diff 62256.Jun 29 2016, 11:23 AM
ddcc marked 2 inline comments as done.

Rename counter, add more comments

ddcc marked 3 inline comments as done.Jun 29 2016, 11:24 AM
ddcc added inline comments.
lib/Transforms/IPO/LowerTypeTests.cpp
640

I moved the check higher up in buildBitSetsFromFunctions() and eliminated the duplicates.

842

Added a comment.

ddcc changed the visibility from "All Users" to "Public (No Login Required)".Jul 1 2016, 3:31 PM
sunfish added a subscriber: sunfish.Jul 1 2016, 3:33 PM
ddcc updated this revision to Diff 63564.Jul 11 2016, 1:42 PM
ddcc marked an inline comment as done.

Add tests

ddcc updated this revision to Diff 63566.Jul 11 2016, 1:47 PM

Fix some test bugs

jfb added inline comments.Jul 11 2016, 1:49 PM
test/Transforms/LowerTypeTests/function-disjoint.ll
2 ↗(On Diff #63564)

I'm not sure this will work because the wasm backend is still experimental (IIUC lit will exclude the backend-specific folders, but these files will be executed unconditionally). Is that the case?

jfb added a subscriber: pcc.Jul 11 2016, 1:50 PM
ddcc edited subscribers, added: llvm-commits; removed: pcc.
pcc edited edge metadata.Jul 11 2016, 2:11 PM

Is there any documentation for this ".indidx" directive?

ddcc added a comment.Jul 11 2016, 2:16 PM

As far as I know, the assembler format for WebAssembly is relatively undocumented. The pull request adding support for the ".indidx" directive in the WebAssembly linker is https://github.com/WebAssembly/binaryen/pull/616, and there is documentation for the overall CFI approach at https://github.com/ddcc/design/blob/cfi/Security.md#control-flow-integrity.

pcc added a comment.Jul 11 2016, 2:35 PM

Thanks, but that only raises more questions.

What is an "index"? Is it something that can be called, like a function pointer?

If that's the case, and indexes start from zero, how do you resolve conflicts between indexes in dynamically loaded modules? Or does your design assume no dynamic loading?

(Also, how do you represent null function pointers if indexes start at zero?)

ddcc added a comment.EditedJul 11 2016, 3:34 PM

Yes, in WebAssembly, most uses of memory addresses (pointers) are replaced with indexes into the appropriate section. As part of this, all functions and their types must be explicitly declared, and functions which can be called indirectly must be explicitly listed in a "table" section. The only remaining use of memory addresses is to an isolated linear memory region (for emulating a stack, etc), but this can only be used for storing data. Replacing function pointers with indexes is handled in s2wasm (compile/run process: c/c++ -> emscripten (clang -> opt -> llc -> s2wasm) -> browser) , which looks for functions that have their address taken, generates a table entry for each, and uses the table entry index as the "address". Additionally, s2wasm includes the expected type of the indirect function with the indirect function call. At runtime, two checks are performed at each indirect function call: (1) the specified index is a valid entry in the "table" section, and (2) the type signature specified at the call site matches the call site of the selected function (NULL "function pointers" would presumably fail this). If either of these is violated, then a runtime trap is generated.

However, this current approach has some downsides: it doesn't protect against code reuse attacks, and the signature check is fairly coarse grained, since it is performed at the WebAssembly type level (which only has four unique types: int/float, 32/64-bit). This patch adds support for Clang/LLVM's CFI implementation in WebAssembly programs, by pre-assigning continuous indexes in LLVM for each disjoint set, and introducing a new primitive (.indidx) to tell the linker that a function must be given an index. The corresponding patch to the linker first preallocates entries for functions with a preassigned index, then falls back to the normal behavior of generating new entries for each remaining function called indirectly. Unfortunately, this approach corrects the two drawbacks, but adds some runtime overhead. A future patch will use support for multiple tables with homogeneous type in WebAssembly to eliminate the runtime CFI overhead.

Dynamic linking support (https://github.com/WebAssembly/design/blob/master/DynamicLinking.md) has only recently been introduced into WebAssembly as of approx. two weeks ago, and the interaction between CFI and dynamic linking hasn't been decided yet. Currently, calling dlsym finds the imported function, copies it to the function and table sections of the host, and returns its index for use in an indirect call. Clearly, this doesn't work well with CFI; different implementations include requiring a forward declaration of dynamically-linked functions for placeholder entries, fixing table offsets and set boundaries at runtime, etc. Alternatively, if tables with homogeneous type are used for CFI, then the tables from two modules could just be joined directly: two tables with the same type are merged, otherwise new tables are created and table references are updated.

pcc added a comment.Jul 11 2016, 7:35 PM

Thanks for that writeup, it helped me understand what's going on here more clearly. Please also include those details in your commit message.

Also, how do you represent null function pointers if indexes start at zero?

I'm still a little unclear about this. Suppose I have some code that looks like this:

void f() {}

bool foo(void (*fp)()) {
  return fp != nullptr;
}

What result will foo(f) yield if this pass happens to assign index 0 to the function f, and why?

Currently, calling dlsym finds the imported function, copies it to the function and table sections of the host, and returns its index for use in an indirect call. Clearly, this doesn't work well with CFI; different implementations include requiring a forward declaration of dynamically-linked functions for placeholder entries, fixing table offsets and set boundaries at runtime, etc. Alternatively, if tables with homogeneous type are used for CFI, then the tables from two modules could just be joined directly: two tables with the same type are merged, otherwise new tables are created and table references are updated.

Okay, I'm happy for the design here to match the current wasm design, and for it to evolve as wasm does.

lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
191

clang-format this line.

lib/Transforms/IPO/LowerTypeTests.cpp
683–684

This comment belongs on buildBitSetsFromFunctionsX86.

687–693

Remove all braces here.

839

You could set the function metadata here instead of in the loop below.

841

Please explain *why* you are passing a null pointer here (i.e. because wasm does not use jump tables directly...)

test/Transforms/LowerTypeTests/function-disjoint.ll
3 ↗(On Diff #63566)

I think this should be fine because we aren't using the backend here.

ddcc updated this revision to Diff 63743.Jul 12 2016, 3:51 PM
ddcc marked 6 inline comments as done.
ddcc edited edge metadata.

Cleanup formatting, update comments, skip metadata for unused functions

ddcc added a comment.EditedJul 12 2016, 3:51 PM

Hmm, good question. Currently, if I compile that example with optimizations enabled, it simply gets optimized out to always return true. However, if I compile it without optimizations, it actually returns false. This is because f() gets assigned to table index 0 by s2wasm, and the check fails. Note that this is true regardless of whether the CFI pass is enabled; there are no indirect function calls being executed, so it doesn't affect the IR at all.

Now, suppose that I execute the following example code. Without CFI, it will trigger a runtime type check failure, because the indirect call table has the entries (0: f, 1: g), and fp2 is assigned index 0, but this doesn't match the expected type at the call site. Now, the behavior is different with CFI, because the indirect call table has the entries (0: g, 1: main, 2: f), and fp2 is assigned index 0, which executes successfully. This is because {g, main} are in the same type set and are assigned indexes {0, 1}, respectively, whereas f is skipped by the CFI pass because the type of f is never referenced by llvm.type.test() (since it is never explicitly called), but it is automatically assigned index 2 in s2wasm (since it has its address taken). However, this behavior is suboptimal and actually increases the attack surface by adding more indirect call targets, so I've modified this patch to skip index assignments for functions that are not used anywhere. With the new version of this patch, the indirect call table has the entries (0: g, 1: f), and fp2 is assigned index 0, which executes successfully. If NULL + 1 is assigned to fp2, then a CFI trap is triggered, as expected.

In general, the rationale for undefined behavior is discussed in the documentation; essentially undefined behavior in C or C++ (e.g. dereferencing a NULL pointer) is still a bug when compiling for WebAssembly, even when the corresponding behavior in WebAssembly itself is defined.

void f() {
}

int g() {
  return 2;
}

int main() __attribute__((optnone)) {
  int i = 0;
  void (*fp1)() = f;
  int (*fp2)() = i ? g : NULL;
  return fp2();
}
pcc added a comment.Jul 13 2016, 6:34 PM

This is because f() gets assigned to table index 0 by s2wasm, and the check fails. Note that this is true regardless of whether the CFI pass is enabled; there are no indirect function calls being executed, so it doesn't affect the IR at all.

So it sounds like maybe s2wasm and this pass should both be assigning indexes starting at 1? Although calls to null function pointers are undefined, null comparisons are well-defined in C++. Feel free to ignore this though with regard to this patch, as it seems like a pre-existing design issue.

lib/Transforms/IPO/LowerTypeTests.cpp
841

You may want to use Function::hasAddressTaken to exclude direct function calls here.

Please also document how this interacts with dynamic loading (i.e. not at all at the moment, right?)

ddcc updated this revision to Diff 64075.Jul 14 2016, 7:06 PM
ddcc updated this object.

Revise comments, use hasAddressTaken()

ddcc added a comment.Jul 20 2016, 1:35 PM

The issue with null pointer comparisons is definitely something that should be addressed in WebAssembly. I've filed a design bug for this at https://github.com/WebAssembly/spec/issues/312 .

dschuff accepted this revision.Aug 1 2016, 2:14 PM
dschuff edited edge metadata.

This looks ready to go to me, unless @pcc has remaining questions or issues?

This revision is now accepted and ready to land.Aug 1 2016, 2:14 PM
ddcc updated this revision to Diff 66391.Aug 1 2016, 3:13 PM
ddcc edited edge metadata.

Rebase

This revision was automatically updated to reflect the committed changes.