This is an archive of the discontinued LLVM Phabricator instance.

Change -fsanitize=function to place two words before the function entry
ClosedPublic

Authored by MaskRay on Apr 18 2023, 3:14 PM.

Details

Summary

The current implementation of -fsanitize=function places two words (the prolog
signature and the RTTI proxy) at the function entry, which makes the feature
incompatible with Intel Indirect Branch Tracking (IBT) that needs an ENDBR instruction
at the function entry. To allow the combination, move the two words before the
function entry, similar to -fsanitize=kcfi.

Armv8.5 Branch Target Identification (BTI) has a similar requirement.

Note: for IBT and BTI, whether a function gets a marker instruction at the entry
generally cannot be assumed (it can be disabled by a function attribute or
stronger LTO optimizations).

It is extremely unlikely for two words preceding a function entry to be
inaccessible. One way to achieve this is by ensuring that a function is
aligned at a page boundary and making the preceding page unmapped or
unreadable. This is not reasonable for application or library code.
(Think: the first text section has crt* code not instrumented by
-fsanitize=function.)

We use 0xc105cafe for all targets. .long 0xc105cafe disassembles to invalid
instructions on all architectures I have tested, except Power where it is
lfs 8, -13570(5) (Load Floating-Point with a weird offset, unlikely to be used in real code).


For the removed function in AsmPrinter.cpp, remove an assert: mdconst::extract
already asserts non-nullness.

For compiler-rt/test/ubsan/TestCases/TypeCheck/Function/function.cpp,
when the function doesn't have prolog/epilog (-O1 and above), after moving the two words,
the address of the function equals the address of ret instruction,
so symbolizing the function will additionally get a non-zero column number.
Adjust the test to allow an optional column number.

  .long   3238382334
  .long   .L__llvm_rtti_proxy-_Z1fv
_Z1fv:   // symbolizing here retrieves the line table entry from the second .loc
  .file   0 ...
  .loc    0 1 0
  .cfi_startproc
  .loc    0 2 1 prologue_end
  retq

Diff Detail

Event Timeline

MaskRay created this revision.Apr 18 2023, 3:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 18 2023, 3:14 PM
MaskRay requested review of this revision.Apr 18 2023, 3:14 PM
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptApr 18 2023, 3:14 PM
Herald added subscribers: llvm-commits, Restricted Project, cfe-commits. · View Herald Transcript
MaskRay edited the summary of this revision. (Show Details)Apr 18 2023, 3:14 PM
MaskRay updated this revision to Diff 514761.Apr 18 2023, 3:28 PM

fix clang/test/CodeGen

MaskRay edited the summary of this revision. (Show Details)Apr 19 2023, 3:41 PM
MaskRay edited the summary of this revision. (Show Details)Apr 19 2023, 3:47 PM
MaskRay updated this revision to Diff 515157.Apr 19 2023, 5:55 PM
MaskRay edited the summary of this revision. (Show Details)

remove an assert to make this feature available to all targets

MaskRay updated this revision to Diff 515166.Apr 19 2023, 6:03 PM

update test

My apologies for not responding. If I've got this right there are 4 related patches:
D148573 (approved)
D148785 Use type hashes rather than RTTI
D148827 support C
D148665 (this one)

My initial impressions is that this makes -fsanitize=function look more like -fsanitize=kcfi which makes it accessible from C and available to more targets. Is there anything that we lose in the process of making these changes? For example I would expect RTTI to have more information available than a type hash, although this might not make any functional difference.

I'll try and look over the next few days and leave some comments, apologies a bit busy at work at the moment so I can't promise anything speedy.

My apologies for not responding. If I've got this right there are 4 related patches:
D148573 (approved)
D148785 Use type hashes rather than RTTI
D148827 support C
D148665 (this one)

My initial impressions is that this makes -fsanitize=function look more like -fsanitize=kcfi which makes it accessible from C and available to more targets. Is there anything that we lose in the process of making these changes? For example I would expect RTTI to have more information available than a type hash, although this might not make any functional difference.

I'll try and look over the next few days and leave some comments, apologies a bit busy at work at the moment so I can't promise anything speedy.

Thanks! -fsanitize=function will indeed become more like -fsanitize=kcfi.

There is a big difference that -fsanitize=function instrumented code has a signature check for compatibility with object files not compiled with -fsanitize=function (and old implementations of -fsanitize=function with a difference location to place the signature).
-fsanitize=kcfi doesn't have the compatibility guarantee.

peter.smith accepted this revision.May 19 2023, 3:17 AM

Apologies for the delay LGTM. I think there is a case for setting up the signature to be target specific, but that could in theory be done on demand when a target adds a clashing instruction.

clang/lib/CodeGen/TargetInfo.h
205

Is it worth making this target specific? Defaulting to 0xc105cafe for all targets, if that gets allocated in the future on any one target we can change it without having to test against all other targets.

For example on AArch64 this is is in the decoding space of SME instructions op0 = 0b1 op1 = 0b0000. This may get allocated in the future. Although admittedly it is likely to be very rare to find a use of a SME instruction at the end of a function to cause it to get misidentified.

I suspect most targets have an explicit undefined instruction, such as UDF in AArch64 which is 0x0000???? where ? can be any value. On Arm there two separate 4-byte encodings for Arm, Thumb of UDF.

This revision is now accepted and ready to land.May 19 2023, 3:17 AM
MaskRay marked an inline comment as done.May 19 2023, 7:43 AM
MaskRay added inline comments.
clang/lib/CodeGen/TargetInfo.h
205

Thanks for the review! This is a virtual function, so a target can override as appropriate.

Thanks for informing that this is an encoding that AArch64 may allocate in the future. Since the signature is placed before the function label and is used to protect uninstrumented object files, the signature doesn't need to be out of all encoding space. As long as it is unlikely to be the last 2 instructions of a previous function, this is going to have a good defensive effect. So I expect that A32/A64 may not want to change this as well.

I assume that T32 unlikely uses 0xca 0xca (ldm r2, {r1-r7}) as one of the last two instructions of the previous function, so this seems fine as well, but no objection if T32 wants to change to a different signature :)

This revision was landed with ongoing or failed builds.May 19 2023, 7:50 AM
This revision was automatically updated to reflect the committed changes.
MaskRay marked an inline comment as done.