This fix is very similar to D102772 that adds the ability to selectively
disable sanitizer pass on certain functions.
Currently adding attribute no_sanitize(bounds) isn't really disabling
-fsanitize=local-bounds (also enabled in -fsanitize=bounds).
Clang handles fsanitize=array-bounds which is already working.
LLVM(BoundsChecking pass in middle-end) handles fsanitize=local-bounds.
In this patch, if we detect no_sanitize(bounds) provided an additional
function attribute (NoSanitizeBounds) is attached to IR to let LLVM know
we want to disable local_bounds checking for this function as well.
In order to support this feature, IR is modified (similar to D102772) to make
clang able to preserve the information and let BoundsChecking pass know bounds
checking is disabled for certain function.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Well, this patch is just a band-aid and a disaster waiting to happen.
If kmalloc is tagged with an __attribute__ stating the allocation size, then you can't dereference beyond that limit or risk the alias analysis do something you don't want. You are triggering UB like that.
Can't you just remove the __attribute__ from kmalloc instead to avoid triggering UB?
What about this test then: https://github.com/llvm/llvm-project/blob/b0a0df980927ca54a7840a1b0c9766e98c05039b/clang/test/CodeGen/sanitize-coverage.c#L74
Can you show an independent C reproducer where no_sanitize does not work for you?
Is there an LKML discussion?
I also think sprinkling no_sanitize for UBSAN is again the wrong solution, as it was also for [1]. UBSAN has a tendency to generate too many false positives in the kernel, and we just have to work on finding solutions that tackle the problem wholesale rather than adding more band-aids.
In this case, I think the right solution is to simply make ksize() kill the optimizer's ability to know the object size. By definition ksize() will return the "true" allocation size, and it is fair to assume once that's called, the caller wants to use the full object size for its size-class.
[1] https://lore.kernel.org/all/20211111003519.1050494-1-tadeusz.struk@linaro.org/T/#u
Right, I was able to repro this. The problem is the trap, which generally sucks that no_sanitize still leaves in the trap.
We also have -fno-sanitize-undefined-trap-on-error, which seems to have no effect either (should it?).
So I think there are 2 problems:
- Clang still emitting traps even though it shouldn't.
- The Linux kernel problem.
I think it's fine if you address problem 1 with this, as it's an oversight. But I think problem 2 wants to be solved differently as I suggested.
I am not sure I fully get what you are saying.
What I am suggesting is something like below to avoid bounds check pass from running on this function.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 9d0388bed0c1..186fca35266d 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1679,7 +1679,7 @@ EXPORT_SYMBOL(__pskb_copy_fclone); * All the pointers pointing into skb header may change and must be * reloaded after call to this function. */ - +__attribute__((no_sanitize("bounds"))) int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, gfp_t gfp_mask) {
The main issue is that the kernel is wrong. It has a bug. The sanitizer's error is not a false-positive!
So what you are proposing is a band-aid. It's not a real solution and it's just masking a wider problem. LLVM knows that kmalloc(x) allocates x bytes because someone placed an __attribute__ ((alloc_size (1))) on kmalloc. That attribute is just wrong and should be removed. It allows LLVM to mark all accesses beyond kmalloc(x) + x - 1 as undefined behavior.
TL;DR: this patch is not the solution for your problems.
I haven't tried -fno-sanitize-undefined-trap-on-error yet.
IMO trap in kernel gives a generic crash message which is... hard to tell from other cases without further investigating. If I enable KASAN kernel will print out something like
`
[ 1.197953] BUG: KASAN: use-after-free in __pci_enable_msi_range+0x234/0x320
[ 1.198327] Freed by task 1:
[ 1.198327] kfree+0x8f/0x2b0
[ 1.198327] msi_free_msi_descs_range+0xf5/0x130
`
I agree with you that there are two problems.
I think it makes sense to let optimizer aware of ksize() if the kernel API won't change dramatically in the future.
But isn't this something the author intended to do in order to catch an error?
ksize() case makes some exceptions out of this.
TL;DR: this patch is not the solution for your problems.
Thank you all!
I have modified the summary, this patch focus on fixing non-working __attribute__((no_sanitize("bounds"))) attribute.
I will try to fix kernel ksize() related issue and -fno-sanitize-undefined-trap-on-error on separate patches.
Adding a new IR attribute comes with a whole slew of other required changes. Please see https://reviews.llvm.org/D102772 for an example.
In addition, please update the patch description to explain what the problem is exactly (remove the old kernel-specific problem, because it's only a distraction). In particular, what current no_sanitize does (because it does something for "bounds checking), and what is missing. Introducing a new IR attribute needs to be properly justified, so if you can explain why this can't be solved another way would also be useful (something like .. the local-bounds code generation happens in the middleend and not frontend unlike some other UBSAN checks).
Also, you need to add some IR tests.
Thanks - this looks good so far.
clang/test/CodeGen/sanitize-coverage.c | ||
---|---|---|
56 | This test by itself is not sufficient, it's technically only for fsanitize-coverage. 2 places with more tests would be good:
With that, it'll also be very clear what was broken before, and what is being fixed now. |
Thank you, Marco! I have made the following changes:
- extend clang/test/CodeGen/bounds-checking.c to include additional test for newly added nosanitize_bounds attribute
- added a new pure IR test in llvm/test/Instrumentation/BoundsChecking/nosanitize-bounds.ll to test if the new attribute will prevent BoundsChecking pass running on the function
Looks good. Few minor changes.
I did some more digging, and it's only fsanitize=local-bounds, so please verify this and also update the commit description. In fact, the Linux kernel already has a comment about Clang's weirdness of local-bounds vs. array-bounds here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/Kconfig.ubsan#n62
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
757 | These 2 checks can be reduced to 1 due to SanitizerKind::Bounds including both. However, as noted, this only affects local-bounds, so I'm not sure if we want to include both -- perhaps for completeness it makes sense, but in the array-bounds only case this attribute will be a nop (AFAIK). Also, I think we don't want to attach the attribute if bounds checking isn't enabled -- at least it seems unnecessary to do so. See the following suggested change: diff --git a/clang/lib/CodeGen/CodeGenFunction.cpp b/clang/lib/CodeGen/CodeGenFunction.cpp index 842ce0245d45..c1f3a3014a19 100644 --- a/clang/lib/CodeGen/CodeGenFunction.cpp +++ b/clang/lib/CodeGen/CodeGenFunction.cpp @@ -740,6 +740,7 @@ void CodeGenFunction::StartFunction(GlobalDecl GD, QualType RetTy, } while (false); if (D) { + const bool SanitizeBounds = SanOpts.hasOneOf(SanitizerKind::Bounds); bool NoSanitizeCoverage = false; for (auto Attr : D->specific_attrs<NoSanitizeAttr>()) { @@ -754,16 +755,15 @@ void CodeGenFunction::StartFunction(GlobalDecl GD, QualType RetTy, SanOpts.set(SanitizerKind::KernelHWAddress, false); if (mask & SanitizerKind::KernelHWAddress) SanOpts.set(SanitizerKind::HWAddress, false); - if (mask & SanitizerKind::LocalBounds) - Fn->addFnAttr(llvm::Attribute::NoSanitizeBounds); - if (mask & SanitizerKind::ArrayBounds) - Fn->addFnAttr(llvm::Attribute::NoSanitizeBounds); // SanitizeCoverage is not handled by SanOpts. if (Attr->hasCoverage()) NoSanitizeCoverage = true; } + if (SanitizeBounds && !SanOpts.hasOneOf(SanitizerKind::Bounds)) + Fn->addFnAttr(llvm::Attribute::NoSanitizeBounds); + if (NoSanitizeCoverage && CGM.getCodeGenOpts().hasSanitizeCoverage()) Fn->addFnAttr(llvm::Attribute::NoSanitizeCoverage); } |
Yes I can confirm this. It is indeed fsanitize=local-bounds causing the kernel crash we discussed earlier. (UBSAN_LOCAL_BOUNDS=y)
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
757 | Agreed. I will revise patch and commit description. |
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
757 | BTW: Just to double check we are on the same page when no_sanitize(bounds) is provided, fsanitize=array-bounds is also disabled -- right ? I can confirm this attribute is also affecting array-bounds as this is handled in clang side and we are setting llvm::Attribute::NoSanitizeBounds so that clang's array-bounds can also see this. I will also add another test for -fsanitize=array-bounds |
- update commit description
- In: CodeGenFunction::StartFunction(), merge two checks(SanitizerKind::LocalBounds, SanitizerKind::ArrayBounds) into one(SanitizerKind::Bounds)
- update test: clang/test/CodeGen/bounds-checking.c to show no_sanitize("bounds") can also affect fsanitize=array-bounds (clang)
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
757 | Well, no_sanitize("bounds") already worked for array-bounds before this patch. But adding more tests never hurt. |
clang/lib/CodeGen/CodeGenFunction.cpp | ||
---|---|---|
757 | Got it. Thanks! |
FWIW, related problems with pskb_expand_head were seen again here:
https://github.com/ClangBuiltLinux/linux/issues/1599
I have trouble reproducing it, but I think the kernel patch there solves the problem created by __alloc_size vs ksize().
Hi Marco,
@melver, Could you please help me landing it? I don't have write permission to the repo.
Please use Tong Zhang<ztong0001@gmail.com>
Thanks,
- Tong
Hi Marco,
@melver, Could you please help me landing it? I don't have write permission to the repo.
Please use Tong Zhang<ztong0001@gmail.com>
Thanks,
Tong
Sure. I had already applied the patch locally to test, but then got distracted. Doing it now.
Thanks for your work.
These 2 checks can be reduced to 1 due to SanitizerKind::Bounds including both. However, as noted, this only affects local-bounds, so I'm not sure if we want to include both -- perhaps for completeness it makes sense, but in the array-bounds only case this attribute will be a nop (AFAIK).
Also, I think we don't want to attach the attribute if bounds checking isn't enabled -- at least it seems unnecessary to do so.
See the following suggested change: