Page MenuHomePhabricator

[ubsan] Using metadata instead of prologue data for function sanitizer
Needs ReviewPublic

Authored by ychen on Dec 15 2021, 8:32 PM.

Details

Summary

Information in the function Prologue Data is intentionally opaque. This
is fine from the function sanitizer's perspective if function manipulations
(duplication etc.) do not change the function signature. However,
coroutine lowering needs to split one function into several functions
that have different signatures. The ideal way to solve this is to make
the sanitizer information not opaque so that LLVM passes know how to deal with
it. This patch detaches the information from function Prologue Data
and attaches it to a function metadata node.

Alternatively,
(1) function attributes do not work because it could not carry
GlobalValue.
(2) I've considered adding one additional function optional
data(D13829) for the function sanitizer. However, it requires LL/parser
changes that I want to avoid.

Diff Detail

Unit TestsFailed

TimeTest
60,160 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv64 -target-feature +f -target-feature +d -target-feature +zfh -target-feature +v -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vloxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vloxseg.c
60,090 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vlseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv32 -target-feature +f -target-feature +d -target-feature +v -target-feature +zfh -disable-O0-optnone -fallow-half-arguments-and-returns -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vlseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV32 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vlseg.c
60,080 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vlsegff.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv32 -target-feature +f -target-feature +d -target-feature +v -target-feature +zfh -disable-O0-optnone -fallow-half-arguments-and-returns -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vlsegff.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV32 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vlsegff.c
60,150 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vluxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv64 -target-feature +f -target-feature +d -target-feature +zfh -target-feature +v -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vluxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vluxseg.c
60,080 msx64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vsoxseg.c
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/clang -cc1 -internal-isystem /var/lib/buildkite-agent/builds/llvm-project/build/lib/clang/15.0.0/include -nostdsysteminc -triple riscv64 -target-feature +f -target-feature +d -target-feature +zfh -target-feature +v -disable-O0-optnone -emit-llvm /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vsoxseg.c -o - | /var/lib/buildkite-agent/builds/llvm-project/build/bin/opt -S -mem2reg | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=CHECK-RV64 /var/lib/buildkite-agent/builds/llvm-project/clang/test/CodeGen/RISCV/rvv-intrinsics/vsoxseg.c
View Full Test Results (13 Failed)

Event Timeline

ychen created this revision.Dec 15 2021, 8:32 PM
ychen requested review of this revision.Dec 15 2021, 8:32 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 15 2021, 8:32 PM
ychen added a comment.Jan 18 2022, 7:07 PM

Hi @pcc, does this make sense to you? Thanks.

I verified this patch together with https://reviews.llvm.org/D116130 fix the issue https://github.com/llvm/llvm-project/issues/49689. I'm not an expert in this area. Could someone review this patch so we can move forward with the fix?

ychen added a comment.Feb 2 2022, 2:49 PM

gentle ping..

pcc added a comment.Feb 14 2022, 3:22 PM

On the bug you have:

define internal fastcc void @​_Z4callIiE4taskv.resume(%_Z4callIiE4taskv.Frame* noalias nonnull align 8 dereferenceable(24
    ) %FramePtr) #&#8203;1 prologue <{ i32, i32 }> <{ i32 846595819, i32 trunc (i64 sub (i64 ptrtoint (i8** @&#8203;1 to i64), i64 ptrtoint (void ()* @&#8203;_Z4callIiE4taskv to i64)) to i32) }> {...}

Is it possible for the C/C++ code to take the address of the function _Z4callIiE4taskv.resume and call it indirectly? If not, it seems like the right fix would be to arrange for the prologue data to be dropped on the .resume function instead of duplicating it there. I would also imagine that whatever signature you have on the .resume function would be incorrect since it appears that the coro splitting pass will use a different function signature for that function.

Note that D119296 will have the same problem.

ychen added a comment.Feb 14 2022, 3:37 PM

On the bug you have:

define internal fastcc void @&#8203;_Z4callIiE4taskv.resume(%_Z4callIiE4taskv.Frame* noalias nonnull align 8 dereferenceable(24
    ) %FramePtr) #&#8203;1 prologue <{ i32, i32 }> <{ i32 846595819, i32 trunc (i64 sub (i64 ptrtoint (i8** @&#8203;1 to i64), i64 ptrtoint (void ()* @&#8203;_Z4callIiE4taskv to i64)) to i32) }> {...}

Is it possible for the C/C++ code to take the address of the function _Z4callIiE4taskv.resume and call it indirectly?

*.resume is a compiler inserted function that is opaque to the programmer. It is called indirectly most of the time if not all the time.

If not, it seems like the right fix would be to arrange for the prologue data to be dropped on the .resume function instead of duplicating it there. I would also imagine that whatever signature you have on the .resume function would be incorrect since it appears that the coro splitting pass will use a different function signature for that function.

That is addressed by D116130. @rjmccall suggested the direction of this patch (which I agreed) https://reviews.llvm.org/D114728#3159303.

Note that D119296 will have the same problem.

Thanks for the info!

pcc added a comment.Feb 14 2022, 4:26 PM

On the bug you have:

define internal fastcc void @&#8203;_Z4callIiE4taskv.resume(%_Z4callIiE4taskv.Frame* noalias nonnull align 8 dereferenceable(24
    ) %FramePtr) #&#8203;1 prologue <{ i32, i32 }> <{ i32 846595819, i32 trunc (i64 sub (i64 ptrtoint (i8** @&#8203;1 to i64), i64 ptrtoint (void ()* @&#8203;_Z4callIiE4taskv to i64)) to i32) }> {...}

Is it possible for the C/C++ code to take the address of the function _Z4callIiE4taskv.resume and call it indirectly?

*.resume is a compiler inserted function that is opaque to the programmer. It is called indirectly most of the time if not all the time.

Yes, but not indirectly called from C/C++ but rather from compiler-generated code right? That's presumably why this patch + D116130 worked for @zhuhan0.

If not, it seems like the right fix would be to arrange for the prologue data to be dropped on the .resume function instead of duplicating it there. I would also imagine that whatever signature you have on the .resume function would be incorrect since it appears that the coro splitting pass will use a different function signature for that function.

That is addressed by D116130. @rjmccall suggested the direction of this patch (which I agreed) https://reviews.llvm.org/D114728#3159303.

Okay. It seems unfortunate to have to special-case this just because it uses relative relocations. But that's probably the best that we can do without changing the global variable initializer representation (as well as prefix/prologue data) to be blob plus relocations.

clang/lib/CodeGen/CGExpr.cpp
5171 ↗(On Diff #394735)

What happens when building with other code models? Hopefully we get an error of some sort before hitting this assertion failure, right?

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
848

What if we have both prologue data and this metadata? Should it be an error?

ychen added inline comments.Feb 15 2022, 1:08 AM
clang/lib/CodeGen/CGExpr.cpp
5171 ↗(On Diff #394735)

What happens when building with other code models?

The PCRel offset is 32bit currently regardless of the code model. The proxy variable might be out of 32-bit offset from the function (for the large model; for the medium model, only if the proxy variable is in .ldata,.lrodata which is further away), then loading some random data may give false positives.

We could fix this by making the PCRel offset 64 bit for the medium or the large model. But since no one complains all these years, I guess it is a real edge use case.

Hopefully we get an error of some sort before hitting this assertion failure, right?

We didn't. I'll add it.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
848

It may or may not be depending on what is in the prologue data (if it just adds up a counter in the prologue, it should be fine?). How about clarifying this in Langref for this new MD_func_sanitize? If being conservative, we could error this out in the Verifier. WDYT?

ychen updated this revision to Diff 408752.Feb 15 2022, 1:10 AM
  • check code model in driver
  • document func_sanitizer in Langref
  • add a codegen test
ychen edited the summary of this revision. (Show Details)Feb 15 2022, 1:10 AM

gentle ping..

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 12:27 PM
ychen added a comment.Wed, Apr 27, 8:20 PM

gentle ping. @pcc could we please move this forward if the direction looks good to you? It has been sitting for a while...

pcc added inline comments.Thu, May 12, 11:04 AM
clang/lib/CodeGen/CodeGenModule.cpp
1843

Are these proxy variables necessary? I think that now that we have custom code generation for this you should be able to use a GOTPCREL relocation to refer to the global.

ychen added inline comments.Thu, May 12, 11:29 AM
clang/lib/CodeGen/CodeGenModule.cpp
1843

I attempted the GOTPCREL approach in a local branch. It didn't work for a reason that I couldn't remember off the top of my head. I'll find out.

I think that now that we have custom code generation for this

Sorry, I don't quite follow which custom code generation you are referring to. Do you mean the changes in AsmPrinter.cpp?

pcc added inline comments.Thu, May 12, 11:41 AM
clang/lib/CodeGen/CodeGenModule.cpp
1843

Sorry, I don't quite follow which custom code generation you are referring to. Do you mean the changes in AsmPrinter.cpp?

Yes, that's what I meant.