This is an archive of the discontinued LLVM Phabricator instance.

[SanitizerBounds] Add support for NoSanitizeBounds function
ClosedPublic

Authored by ztong0001 on Feb 15 2022, 1:14 AM.

Details

Summary

This fix is very similar to D102772 that adds the ability to selectively
disable sanitizer pass on certain functions.
Currently adding attribute no_sanitize(bounds) isn't really disabling
-fsanitize=local-bounds (also enabled in -fsanitize=bounds).
Clang handles fsanitize=array-bounds which is already working.
LLVM(BoundsChecking pass in middle-end) handles fsanitize=local-bounds.
In this patch, if we detect no_sanitize(bounds) provided an additional
function attribute (NoSanitizeBounds) is attached to IR to let LLVM know
we want to disable local_bounds checking for this function as well.
In order to support this feature, IR is modified (similar to D102772) to make
clang able to preserve the information and let BoundsChecking pass know bounds
checking is disabled for certain function.

Diff Detail

Event Timeline

ztong0001 created this revision.Feb 15 2022, 1:14 AM
ztong0001 requested review of this revision.Feb 15 2022, 1:14 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 15 2022, 1:14 AM
ztong0001 edited the summary of this revision. (Show Details)Feb 15 2022, 1:16 AM
ztong0001 edited the summary of this revision. (Show Details)
ztong0001 edited the summary of this revision. (Show Details)Feb 15 2022, 1:19 AM
nlopes added a comment.EditedFeb 18 2022, 2:47 AM

Well, this patch is just a band-aid and a disaster waiting to happen.
If kmalloc is tagged with an __attribute__ stating the allocation size, then you can't dereference beyond that limit or risk the alias analysis do something you don't want. You are triggering UB like that.
Can't you just remove the __attribute__ from kmalloc instead to avoid triggering UB?

What about this test then: https://github.com/llvm/llvm-project/blob/b0a0df980927ca54a7840a1b0c9766e98c05039b/clang/test/CodeGen/sanitize-coverage.c#L74

Can you show an independent C reproducer where no_sanitize does not work for you?

Is there an LKML discussion?

I also think sprinkling no_sanitize for UBSAN is again the wrong solution, as it was also for [1]. UBSAN has a tendency to generate too many false positives in the kernel, and we just have to work on finding solutions that tackle the problem wholesale rather than adding more band-aids.

In this case, I think the right solution is to simply make ksize() kill the optimizer's ability to know the object size. By definition ksize() will return the "true" allocation size, and it is fair to assume once that's called, the caller wants to use the full object size for its size-class.

[1] https://lore.kernel.org/all/20211111003519.1050494-1-tadeusz.struk@linaro.org/T/#u

Right, I was able to repro this. The problem is the trap, which generally sucks that no_sanitize still leaves in the trap.

We also have -fno-sanitize-undefined-trap-on-error, which seems to have no effect either (should it?).

So I think there are 2 problems:

  1. Clang still emitting traps even though it shouldn't.
  1. The Linux kernel problem.

I think it's fine if you address problem 1 with this, as it's an oversight. But I think problem 2 wants to be solved differently as I suggested.

Well, this patch is just a band-aid and a disaster waiting to happen.
If kmalloc is tagged with an __attribute__ stating the allocation size, then you can't dereference beyond that limit or risk the alias analysis do something you don't want. You are triggering UB like that.
Can't you just remove the __attribute__ from kmalloc instead to avoid triggering UB?

I am not sure I fully get what you are saying.
What I am suggesting is something like below to avoid bounds check pass from running on this function.

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9d0388bed0c1..186fca35266d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1679,7 +1679,7 @@ EXPORT_SYMBOL(__pskb_copy_fclone);
  *     All the pointers pointing into skb header may change and must be
  *     reloaded after call to this function.
  */
-
+__attribute__((no_sanitize("bounds")))
 int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
                     gfp_t gfp_mask)
 {

Well, this patch is just a band-aid and a disaster waiting to happen.
If kmalloc is tagged with an __attribute__ stating the allocation size, then you can't dereference beyond that limit or risk the alias analysis do something you don't want. You are triggering UB like that.
Can't you just remove the __attribute__ from kmalloc instead to avoid triggering UB?

I am not sure I fully get what you are saying.
What I am suggesting is something like below to avoid bounds check pass from running on this function.

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9d0388bed0c1..186fca35266d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1679,7 +1679,7 @@ EXPORT_SYMBOL(__pskb_copy_fclone);
  *     All the pointers pointing into skb header may change and must be
  *     reloaded after call to this function.
  */
-
+__attribute__((no_sanitize("bounds")))
 int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
                     gfp_t gfp_mask)
 {

The main issue is that the kernel is wrong. It has a bug. The sanitizer's error is not a false-positive!
So what you are proposing is a band-aid. It's not a real solution and it's just masking a wider problem. LLVM knows that kmalloc(x) allocates x bytes because someone placed an __attribute__ ((alloc_size (1))) on kmalloc. That attribute is just wrong and should be removed. It allows LLVM to mark all accesses beyond kmalloc(x) + x - 1 as undefined behavior.

TL;DR: this patch is not the solution for your problems.

melver added a subscriber: kees.Feb 18 2022, 10:14 AM

Right, I was able to repro this. The problem is the trap, which generally sucks that no_sanitize still leaves in the trap.

We also have -fno-sanitize-undefined-trap-on-error, which seems to have no effect either (should it?).

So I think there are 2 problems:

  1. Clang still emitting traps even though it shouldn't.
  1. The Linux kernel problem.

I think it's fine if you address problem 1 with this, as it's an oversight. But I think problem 2 wants to be solved differently as I suggested.

I haven't tried -fno-sanitize-undefined-trap-on-error yet.

IMO trap in kernel gives a generic crash message which is... hard to tell from other cases without further investigating. If I enable KASAN kernel will print out something like

`
[ 1.197953] BUG: KASAN: use-after-free in __pci_enable_msi_range+0x234/0x320
[ 1.198327] Freed by task 1:
[ 1.198327] kfree+0x8f/0x2b0
[ 1.198327] msi_free_msi_descs_range+0xf5/0x130
`

I agree with you that there are two problems.
I think it makes sense to let optimizer aware of ksize() if the kernel API won't change dramatically in the future.

The main issue is that the kernel is wrong. It has a bug. The sanitizer's error is not a false-positive!
So what you are proposing is a band-aid. It's not a real solution and it's just masking a wider problem. LLVM knows that kmalloc(x) allocates x bytes because someone placed an __attribute__ ((alloc_size (1))) on kmalloc. That attribute is just wrong and should be removed. It allows LLVM to mark all accesses beyond kmalloc(x) + x - 1 as undefined behavior.

But isn't this something the author intended to do in order to catch an error?
ksize() case makes some exceptions out of this.

TL;DR: this patch is not the solution for your problems.

ztong0001 edited the summary of this revision. (Show Details)Feb 18 2022, 12:47 PM

Thank you all!
I have modified the summary, this patch focus on fixing non-working __attribute__((no_sanitize("bounds"))) attribute.
I will try to fix kernel ksize() related issue and -fno-sanitize-undefined-trap-on-error on separate patches.

Adding a new IR attribute comes with a whole slew of other required changes. Please see https://reviews.llvm.org/D102772 for an example.

In addition, please update the patch description to explain what the problem is exactly (remove the old kernel-specific problem, because it's only a distraction). In particular, what current no_sanitize does (because it does something for "bounds checking), and what is missing. Introducing a new IR attribute needs to be properly justified, so if you can explain why this can't be solved another way would also be useful (something like .. the local-bounds code generation happens in the middleend and not frontend unlike some other UBSAN checks).

Also, you need to add some IR tests.

Thank you @melver. I will revise patch as suggested.

ztong0001 updated this revision to Diff 411040.Feb 24 2022, 1:27 AM
ztong0001 edited the summary of this revision. (Show Details)

update patch and description as suggested by Marco

ztong0001 retitled this revision from Fix not working attribute no_sanitize bounds that affects linux kernel to [SanitizerBounds] Add support for NoSanitizeBounds function.Feb 24 2022, 1:28 AM
melver requested changes to this revision.Feb 24 2022, 9:01 AM

Thanks - this looks good so far.

clang/test/CodeGen/sanitize-coverage.c
56

This test by itself is not sufficient, it's technically only for fsanitize-coverage.

2 places with more tests would be good:

  1. extend clang/test/CodeGen/bounds-checking.c
  2. add a pure IR test in llvm/test/Instrumentation/BoundsChecking

With that, it'll also be very clear what was broken before, and what is being fixed now.

This revision now requires changes to proceed.Feb 24 2022, 9:01 AM

Thank you, Marco! I have made the following changes:

  • extend clang/test/CodeGen/bounds-checking.c to include additional test for newly added nosanitize_bounds attribute
  • added a new pure IR test in llvm/test/Instrumentation/BoundsChecking/nosanitize-bounds.ll to test if the new attribute will prevent BoundsChecking pass running on the function
melver requested changes to this revision.Feb 25 2022, 4:39 AM

Looks good. Few minor changes.

I did some more digging, and it's only fsanitize=local-bounds, so please verify this and also update the commit description. In fact, the Linux kernel already has a comment about Clang's weirdness of local-bounds vs. array-bounds here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/Kconfig.ubsan#n62

This revision now requires changes to proceed.Feb 25 2022, 4:39 AM
melver added inline comments.Feb 25 2022, 4:42 AM
clang/lib/CodeGen/CodeGenFunction.cpp
757

These 2 checks can be reduced to 1 due to SanitizerKind::Bounds including both. However, as noted, this only affects local-bounds, so I'm not sure if we want to include both -- perhaps for completeness it makes sense, but in the array-bounds only case this attribute will be a nop (AFAIK).

Also, I think we don't want to attach the attribute if bounds checking isn't enabled -- at least it seems unnecessary to do so.

See the following suggested change:

diff --git a/clang/lib/CodeGen/CodeGenFunction.cpp b/clang/lib/CodeGen/CodeGenFunction.cpp
index 842ce0245d45..c1f3a3014a19 100644
--- a/clang/lib/CodeGen/CodeGenFunction.cpp
+++ b/clang/lib/CodeGen/CodeGenFunction.cpp
@@ -740,6 +740,7 @@ void CodeGenFunction::StartFunction(GlobalDecl GD, QualType RetTy,
   } while (false);
 
   if (D) {
+    const bool SanitizeBounds = SanOpts.hasOneOf(SanitizerKind::Bounds);
     bool NoSanitizeCoverage = false;
 
     for (auto Attr : D->specific_attrs<NoSanitizeAttr>()) {
@@ -754,16 +755,15 @@ void CodeGenFunction::StartFunction(GlobalDecl GD, QualType RetTy,
         SanOpts.set(SanitizerKind::KernelHWAddress, false);
       if (mask & SanitizerKind::KernelHWAddress)
         SanOpts.set(SanitizerKind::HWAddress, false);
-      if (mask & SanitizerKind::LocalBounds)
-        Fn->addFnAttr(llvm::Attribute::NoSanitizeBounds);
-      if (mask & SanitizerKind::ArrayBounds)
-        Fn->addFnAttr(llvm::Attribute::NoSanitizeBounds);
 
       // SanitizeCoverage is not handled by SanOpts.
       if (Attr->hasCoverage())
         NoSanitizeCoverage = true;
     }
 
+    if (SanitizeBounds && !SanOpts.hasOneOf(SanitizerKind::Bounds))
+      Fn->addFnAttr(llvm::Attribute::NoSanitizeBounds);
+
     if (NoSanitizeCoverage && CGM.getCodeGenOpts().hasSanitizeCoverage())
       Fn->addFnAttr(llvm::Attribute::NoSanitizeCoverage);
   }

Looks good. Few minor changes.

I did some more digging, and it's only fsanitize=local-bounds, so please verify this and also update the commit description. In fact, the Linux kernel already has a comment about Clang's weirdness of local-bounds vs. array-bounds here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/Kconfig.ubsan#n62

Yes I can confirm this. It is indeed fsanitize=local-bounds causing the kernel crash we discussed earlier. (UBSAN_LOCAL_BOUNDS=y)

ztong0001 added inline comments.Feb 25 2022, 10:11 AM
clang/lib/CodeGen/CodeGenFunction.cpp
757

Agreed. I will revise patch and commit description.

ztong0001 added inline comments.Feb 25 2022, 10:30 AM
clang/lib/CodeGen/CodeGenFunction.cpp
757

BTW: Just to double check we are on the same page

when no_sanitize(bounds) is provided, fsanitize=array-bounds is also disabled -- right ? I can confirm this attribute is also affecting array-bounds as this is handled in clang side and we are setting llvm::Attribute::NoSanitizeBounds so that clang's array-bounds can also see this.

I will also add another test for -fsanitize=array-bounds

  • update commit description
  • In: CodeGenFunction::StartFunction(), merge two checks(SanitizerKind::LocalBounds, SanitizerKind::ArrayBounds) into one(SanitizerKind::Bounds)
  • update test: clang/test/CodeGen/bounds-checking.c to show no_sanitize("bounds") can also affect fsanitize=array-bounds (clang)
ztong0001 edited the summary of this revision. (Show Details)Feb 25 2022, 10:51 AM
melver added inline comments.Feb 25 2022, 12:15 PM
clang/lib/CodeGen/CodeGenFunction.cpp
757

Well, no_sanitize("bounds") already worked for array-bounds before this patch. But adding more tests never hurt.

ztong0001 edited the summary of this revision. (Show Details)Feb 25 2022, 12:25 PM
ztong0001 added inline comments.
clang/lib/CodeGen/CodeGenFunction.cpp
757

Got it. Thanks!

kees added a comment.Feb 25 2022, 2:11 PM

FWIW, related problems with pskb_expand_head were seen again here:
https://github.com/ClangBuiltLinux/linux/issues/1599

I have trouble reproducing it, but I think the kernel patch there solves the problem created by __alloc_size vs ksize().

FWIW, related problems with pskb_expand_head were seen again here:
https://github.com/ClangBuiltLinux/linux/issues/1599

I have trouble reproducing it, but I think the kernel patch there solves the problem created by __alloc_size vs ksize().

yes, the asm ("" : "=r" (p) : "0" (p)); really did the trick

ztong0001 marked 2 inline comments as done.Feb 25 2022, 3:58 PM
melver accepted this revision.Feb 28 2022, 8:54 AM

LGTM

Let me know if you need help landing this.

This revision is now accepted and ready to land.Feb 28 2022, 8:54 AM
ztong0001 added a comment.EditedFeb 28 2022, 9:23 AM

Hi Marco,
@melver, Could you please help me landing it? I don't have write permission to the repo.
Please use Tong Zhang<ztong0001@gmail.com>
Thanks,

  • Tong

Hi Marco,
@melver, Could you please help me landing it? I don't have write permission to the repo.
Please use Tong Zhang<ztong0001@gmail.com>
Thanks,

Tong

melver added a comment.Mar 1 2022, 9:39 AM

Hi Marco,
@melver, Could you please help me landing it? I don't have write permission to the repo.
Please use Tong Zhang<ztong0001@gmail.com>

Sure. I had already applied the patch locally to test, but then got distracted. Doing it now.

Thanks for your work.

This revision was landed with ongoing or failed builds.Mar 1 2022, 9:48 AM
This revision was automatically updated to reflect the committed changes.