This is an archive of the discontinued LLVM Phabricator instance.

[ubsan] Using metadata instead of prologue data for function sanitizer
ClosedPublic

Authored by ychen on Dec 15 2021, 8:32 PM.

Details

Summary

Information in the function Prologue Data is intentionally opaque.
When a function with Prologue Data is duplicated. The self (global
value) references inside Prologue Data is still pointing to the
original function. This may cause errors like fatal error: error in backend: Cannot represent a difference across sections.

This patch detaches the information from function Prologue Data
and attaches it to a function metadata node.

This and D116130 fix https://github.com/llvm/llvm-project/issues/49689.

Diff Detail

Event Timeline

ychen created this revision.Dec 15 2021, 8:32 PM
ychen requested review of this revision.Dec 15 2021, 8:32 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 15 2021, 8:32 PM
ychen added a comment.Jan 18 2022, 7:07 PM

Hi @pcc, does this make sense to you? Thanks.

I verified this patch together with https://reviews.llvm.org/D116130 fix the issue https://github.com/llvm/llvm-project/issues/49689. I'm not an expert in this area. Could someone review this patch so we can move forward with the fix?

ychen added a comment.Feb 2 2022, 2:49 PM

gentle ping..

pcc added a comment.Feb 14 2022, 3:22 PM

On the bug you have:

define internal fastcc void @​_Z4callIiE4taskv.resume(%_Z4callIiE4taskv.Frame* noalias nonnull align 8 dereferenceable(24
    ) %FramePtr) #&#8203;1 prologue <{ i32, i32 }> <{ i32 846595819, i32 trunc (i64 sub (i64 ptrtoint (i8** @&#8203;1 to i64), i64 ptrtoint (void ()* @&#8203;_Z4callIiE4taskv to i64)) to i32) }> {...}

Is it possible for the C/C++ code to take the address of the function _Z4callIiE4taskv.resume and call it indirectly? If not, it seems like the right fix would be to arrange for the prologue data to be dropped on the .resume function instead of duplicating it there. I would also imagine that whatever signature you have on the .resume function would be incorrect since it appears that the coro splitting pass will use a different function signature for that function.

Note that D119296 will have the same problem.

ychen added a comment.Feb 14 2022, 3:37 PM

On the bug you have:

define internal fastcc void @&#8203;_Z4callIiE4taskv.resume(%_Z4callIiE4taskv.Frame* noalias nonnull align 8 dereferenceable(24
    ) %FramePtr) #&#8203;1 prologue <{ i32, i32 }> <{ i32 846595819, i32 trunc (i64 sub (i64 ptrtoint (i8** @&#8203;1 to i64), i64 ptrtoint (void ()* @&#8203;_Z4callIiE4taskv to i64)) to i32) }> {...}

Is it possible for the C/C++ code to take the address of the function _Z4callIiE4taskv.resume and call it indirectly?

*.resume is a compiler inserted function that is opaque to the programmer. It is called indirectly most of the time if not all the time.

If not, it seems like the right fix would be to arrange for the prologue data to be dropped on the .resume function instead of duplicating it there. I would also imagine that whatever signature you have on the .resume function would be incorrect since it appears that the coro splitting pass will use a different function signature for that function.

That is addressed by D116130. @rjmccall suggested the direction of this patch (which I agreed) https://reviews.llvm.org/D114728#3159303.

Note that D119296 will have the same problem.

Thanks for the info!

pcc added a comment.Feb 14 2022, 4:26 PM

On the bug you have:

define internal fastcc void @&#8203;_Z4callIiE4taskv.resume(%_Z4callIiE4taskv.Frame* noalias nonnull align 8 dereferenceable(24
    ) %FramePtr) #&#8203;1 prologue <{ i32, i32 }> <{ i32 846595819, i32 trunc (i64 sub (i64 ptrtoint (i8** @&#8203;1 to i64), i64 ptrtoint (void ()* @&#8203;_Z4callIiE4taskv to i64)) to i32) }> {...}

Is it possible for the C/C++ code to take the address of the function _Z4callIiE4taskv.resume and call it indirectly?

*.resume is a compiler inserted function that is opaque to the programmer. It is called indirectly most of the time if not all the time.

Yes, but not indirectly called from C/C++ but rather from compiler-generated code right? That's presumably why this patch + D116130 worked for @zhuhan0.

If not, it seems like the right fix would be to arrange for the prologue data to be dropped on the .resume function instead of duplicating it there. I would also imagine that whatever signature you have on the .resume function would be incorrect since it appears that the coro splitting pass will use a different function signature for that function.

That is addressed by D116130. @rjmccall suggested the direction of this patch (which I agreed) https://reviews.llvm.org/D114728#3159303.

Okay. It seems unfortunate to have to special-case this just because it uses relative relocations. But that's probably the best that we can do without changing the global variable initializer representation (as well as prefix/prologue data) to be blob plus relocations.

clang/lib/CodeGen/CGExpr.cpp
5171 ↗(On Diff #394735)

What happens when building with other code models? Hopefully we get an error of some sort before hitting this assertion failure, right?

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1008

What if we have both prologue data and this metadata? Should it be an error?

ychen added inline comments.Feb 15 2022, 1:08 AM
clang/lib/CodeGen/CGExpr.cpp
5171 ↗(On Diff #394735)

What happens when building with other code models?

The PCRel offset is 32bit currently regardless of the code model. The proxy variable might be out of 32-bit offset from the function (for the large model; for the medium model, only if the proxy variable is in .ldata,.lrodata which is further away), then loading some random data may give false positives.

We could fix this by making the PCRel offset 64 bit for the medium or the large model. But since no one complains all these years, I guess it is a real edge use case.

Hopefully we get an error of some sort before hitting this assertion failure, right?

We didn't. I'll add it.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1008

It may or may not be depending on what is in the prologue data (if it just adds up a counter in the prologue, it should be fine?). How about clarifying this in Langref for this new MD_func_sanitize? If being conservative, we could error this out in the Verifier. WDYT?

ychen updated this revision to Diff 408752.Feb 15 2022, 1:10 AM
  • check code model in driver
  • document func_sanitizer in Langref
  • add a codegen test
ychen edited the summary of this revision. (Show Details)Feb 15 2022, 1:10 AM

gentle ping..

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 12:27 PM
ychen added a comment.Apr 27 2022, 8:20 PM

gentle ping. @pcc could we please move this forward if the direction looks good to you? It has been sitting for a while...

pcc added inline comments.May 12 2022, 11:04 AM
clang/lib/CodeGen/CodeGenModule.cpp
1894

Are these proxy variables necessary? I think that now that we have custom code generation for this you should be able to use a GOTPCREL relocation to refer to the global.

ychen added inline comments.May 12 2022, 11:29 AM
clang/lib/CodeGen/CodeGenModule.cpp
1894

I attempted the GOTPCREL approach in a local branch. It didn't work for a reason that I couldn't remember off the top of my head. I'll find out.

I think that now that we have custom code generation for this

Sorry, I don't quite follow which custom code generation you are referring to. Do you mean the changes in AsmPrinter.cpp?

pcc added inline comments.May 12 2022, 11:41 AM
clang/lib/CodeGen/CodeGenModule.cpp
1894

Sorry, I don't quite follow which custom code generation you are referring to. Do you mean the changes in AsmPrinter.cpp?

Yes, that's what I meant.

ychen added inline comments.Jun 3 2022, 10:20 AM
clang/lib/CodeGen/CodeGenModule.cpp
1894

@pcc, I looked into my local branch that uses PCREL approach. It is not simpler/cleaner than the current approach due to the X86/X86-64, macho/ELF difference. These need their specific relocation types.

ychen added a comment.Jun 13 2022, 9:56 AM

gentle ping..

pcc accepted this revision.Jun 23 2022, 7:15 PM

LGTM

Yes, but not indirectly called from C/C++ but rather from compiler-generated code right? That's presumably why this patch + D116130 worked for @zhuhan0.

If the answer to my question is "yes" can you please update the commit message to not talk about changing the function signature since that's not the problem here.

clang/lib/CodeGen/CodeGenModule.cpp
1894

Okay, let's go with this then.

llvm/docs/LangRef.rst
5269

We should add !func_sanitize here as well.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1008

Sounds good.

This revision is now accepted and ready to land.Jun 23 2022, 7:15 PM
ychen edited the summary of this revision. (Show Details)Jun 27 2022, 11:24 AM
ychen updated this revision to Diff 440329.Jun 27 2022, 11:26 AM
ychen marked an inline comment as done.

Update LangRef.

This revision was landed with ongoing or failed builds.Jun 27 2022, 12:12 PM
This revision was automatically updated to reflect the committed changes.

@ychen This patch causes 20% .o size increase (x86_64) Is this expected?

ychen added a comment.Jul 1 2022, 12:30 PM

@ychen This patch causes 20% .o size increase (x86_64) Is this expected?

No, it is not expected. Do you have a test case? I'll take a look.

@ychen This patch causes 20% .o size increase (x86_64) Is this expected?

No, it is not expected. Do you have a test case? I'll take a look.

Sure, I'll create reproducer.