This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/docs/
-
docs/
69/136
SafeBuffers.rst
-
index.rst

Differential D136811

[-Wunsafe-buffer-usage] WIP: RFC: NFC: User documentation.
AcceptedPublic

Authored by NoQ on Oct 26 2022, 8:25 PM.

Download Raw Diff

Details

Reviewers

aaron.ballman
gribozavr2
xazax.hun
ldionne
jkorous
t-rasmud
ziqingluo-90
malavikasamak

Summary

This document accompanies the LLVM Discourse discussion in https://discourse.llvm.org/t/rfc-c-buffer-hardening/65734 as @aaron.ballman recommended writing it up early. It's intended to eventually become the user-facing documentation for the tool.

As of now the document roughly represents our current plan; it may be corrected in the future, which may or may not be a result of our discussion here! And there are a few TODOs left for me to fill in.

Of course, separate documentation will need to be provided for individual items, such as attribute documentation, but that's a completely different story.

Diff Detail

Event Timeline

NoQ created this revision.Oct 26 2022, 8:25 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2022, 8:25 PM

Herald added subscribers: steakhal, martong, rnkovacs. · View Herald Transcript

NoQ requested review of this revision.Oct 26 2022, 8:25 PM

Harbormaster completed remote builds in B194548: Diff 471001.Oct 26 2022, 8:53 PM

MTC added a subscriber: MTC.Nov 1 2022, 7:25 PM

Thank you for getting a start on this documentation, this adds a lot of clarity to what was being proposed! I have some questions, and some minor things to fix.

clang/docs/SafeBuffers.rst
20	For consistency with the rest of our docs, you should make this change throughout.
26
32	Hmmmm that attribute name looks problematic (more below in that section).
48
59
64–65	It would be helpful to have some details here about why a user would pick these proposed features over use of a sanitizer like asan or hwasan, which also finds out of bounds accesses.
70–71	This makes it sound like the proposal doesn't cover C because there is no version of `std::span` there.
77
91	We should mention as part of the model description that pointer-chasing structures like linked lists, trees, etc are not covered by the model. (That said, I suspect the implementation of such things might be covered. e.g., a tree with `Node *Children[]`)
96	And what should C users do?
99	Similar to the single quote situation.
115	I would be surprised if we can find a heuristic that we'd feel confident is correct for most situations. e.g., `Foo buffer[10];` might be a flat array... or it might be a ring buffer without benefit of a wrapper class... or it may be a sparse matrix... and so on.
116	I think this would be a good idea, especially if you can use an example of a more involved (but still common) container. Like, how does someone harden a ring buffer?
129	One case I think this is going to trip up is: `this + 1` (which is not uncommon in practice: https://sourcegraph.com/search?q=context:global+-file:.test.+file:.*%5C.cpp+this+%2B+1&patternType=standard). How do users fix up that pointer arithmetic and do they ever get any safety or security benefits from it? (Note, this doesn't apply to `+=`, just `+`.)
133	`NULL` as well, I presume? (This matters for C where `NULL` often expands to `(void *)0`.)
134	Eventually we should extend this to have an exhaustive list.
139
140–141
174	Have you considered allowing a short-hand form of the pragmas that are scoped to the current block scope (or file scope if applied at global scope)? e.g., void func() { { #pragma only_safe_buffers int stuff[10] = {}; int x = stuff[1]; // Warning: you're a bad person who should feel bad } int array[10] = {}; int y = array[1]; // this_is_fine.png }
178
182–189	a custom pragma is preferable because the generic solution’s push/pop mechanism may be confusing. Er, I'm not certain how much I agree with that. push/pop semantics on this have been around for a long time. I do agree that the user can write code such that another user may struggle to understand what the state of the system is after a pop operation, but that shouldn't be a deciding factor for whether we add a new pragma to do the same thing as another pragma. I think I'd rewrite this as: Similar functionality can be achieved with ``#pragma clang diagnostic``, but such use is discouraged because it reduces code readability due to being an imperative action rather than describing a property of the annotated code. That reduced readability also adds complexity to security code audits. Use of one of the dedicated pragmas is strongly recommended. or something along these lines. WDYT?
195	That's stealing a name from the C and C++ committees because it's not behind a vendor attribute namespace. I think it should be `[[clang::unsafe_buffer_usage]]` or `unsafe_buffer_usage` (with no specific syntax) instead.
214	One thing I think is worth pointing out about these docs is that the first example effectively says "do add the attribute because the size passed in to the function could be wrong" and the second example says "don't add the attribute on the assumption that the container has the correct size information". The advice feels a bit conflicting to me because in one case we're assuming callers pass in incorrect values and in the other case we're assuming callers pass in correct values and the only distinction between them is a "container was used". But a significant portion of our users don't use libc++ (they use libstdc++ or MS STL for example). I think we should have more details on why the STL used at runtime doesn't matter, or if it still really does matter, we may want to reconsider the advice we give. Also, we don't give similar advice for use of the pragmas. Should we? (Maybe split the advice as to when to use markings in general out into its own section and reference it from both the pragma and attribute sections?)
215–216
217
255
258–260	I don't think I understand what you're trying to say here.
261–264	The deprecated attribute does not have a replacement argument, it has an argument for some kind of message to be printed when the deprecated interface is used. So I'm not certain what this is telling me either.
266	FWIW, I think a significant amount of these attribute docs will move into AttrDocs.td (which should explain all the gory details about the attribute) and we can leave the high-level description here and link to the attribute documentation (rather than duplicate information).
272
274
323
366–371	Because fixits can be applied automatically to an entire source file, we don't typically allow fix-its that can break code (or fixits where there are multiple choices for the user to pick from) unless they're attached to a note instead of a warning (or error). Are you planning to follow that same pattern when placeholders are involved? (For example, one potential use for automatic fix-its is to apply them automatically to source on commit before running precommit CI; similar to how folks sometimes run clang-format to automate "the source is different before/after the clang-format, so you should look into that".)

Thank you for the feedback Aaron! We really appreciate the effort you put into this!

clang/docs/SafeBuffers.rst
70–71	I read this as a question as: "Could an imaginary codebase benefit from the warnings if there were no fixits?". I can imagine projects where it would be very useful to have 90% of the code guaranteed to be free of "potentially dangerous pointer arithmetic" and have such operations contained to the remaining 10% of the code. I can imagine a principled solution to bounds-safety in C but even just splitting the code into safe and 9x smaller "unsafe" part could be extremely useful when it comes to debugging, audits, automated and manual testing and similar. Having clang guarantee this property reduces the scope of code that needs to be processed and a deliberate use of this model by projects can unlock new possibilities. That is why I feel that the proposal can benefit also C projects.
116	I believe that we should stick to hardening low-level primitives - all that a ring buffer implementation has to do is to store data in `std::array` from hardened libc++ and it's all set.
129	That's a fair point and if we don't find a good solution we should ignore arithmetic with `this` and a constant.
134	Absolutely! Is it ok if we populate the list as we land the implementation?
174	We discussed other options but generally prefer simple pragmas for their flexibility. Our current thinking is that for readability purposes we won't allow nesting of pragmas and would have end of scope to be explicit. While this might be more verbose it would be dead simple to reason about.
182–189	We'd love to arrive at a design that is very difficult to use incorrectly. You are absolutely right that our preference for not using the diagnostic pragmas stems from the "untyped" pop operation. Assuming this code is a starting point: #pragma clang diagnostic push warning "-Wfoo" // code #pragma clang diagnostic pop //end of -Wfoo // more code if the user adds pragmas like this: #pragma clang diagnostic push warning "-Wfoo" #pragma clang diagnostic push warning "-Wsafe-buffers-usage" // code #pragma clang diagnostic pop //end of -Wfoo // more code #pragma clang diagnostic pop //end of -Wsafe-buffers-usage they actually get this behavior: #pragma clang diagnostic push warning "-Wfoo" #pragma clang diagnostic push warning "-Wsafe-buffers-usage" // code #pragma clang diagnostic pop //end of -Wsafe-buffers-usage // more code #pragma clang diagnostic pop //end of -Wfoo We should have the patch for pragmas ready to be put for review here in not too distant future. Maybe it'd be easier to discuss there?
214	One thing I think is worth pointing out about these docs is that the first example effectively says "do add the attribute because the size passed in to the function could be wrong" and the second example says "don't add the attribute on the assumption that the container has the correct size information". The advice feels a bit conflicting to me because in one case we're assuming callers pass in incorrect values and in the other case we're assuming callers pass in correct values and the only distinction between them is a "container was used". But a significant portion of our users don't use libc++ (they use libstdc++ or MS STL for example). Our model depends on Standard Library used implementing the bounds checks. With that the "container was used" distinction is absolutely crucial - that is what adds the bounds checks and already mitigates certain classes of bugs (even if the `span` passed in still has the same wrong size). Let's assume we're starting with this buggy code: void caller() { int buffer[10]; callee(buffer, 20); } void callee(int* ptr, unsigned size) { ptr[size-1] = 42; ptr[size] = 42; ptr[100] = 42; } and transform it to (this is just a part of what we actually plan but hopefully illustrates the point): void caller() { int buffer[10]; callee(std::span<int>(buffer, 20), 20); } void callee(std::span<int> ptr, unsigned size) { ptr[size-1] = 42; // still unmitigated ptr[size] = 42; // mitigated at runtime by std::span::operator[]() in hardened libc++ ptr[100] = 42; // mitigated at runtime by std::span::operator[]() in hardened libc++ }
261–264	For better or worse the deprecated attribute is "overloaded" and this refers to the two-parameter version. https://clang.llvm.org/docs/AttributeReference.html#deprecated "When spelled as attribute((deprecated)), the deprecated attribute can have two optional string arguments. The first one is the message to display when emitting the warning; the second one enables the compiler to provide a Fix-It to replace the deprecated name with a new name." Since it seems confusing even to you - maybe we should take this as a signal that our users would get just as confused and remove the reference to deprecated? WDYT?
366–371	That's an extremely interesting question! Let me add our current thinking as a context for the discussion. Our analysis will have its limits and will not be able to provide 100% complete and guaranteed correct code transformation in every case. At the same time it sounds suboptimal both from our and our users perspective to not emit anything for such situations. Imagine we have a code transformation that affects 20 lines of code and all we need is for the user to address a single function call argument. The thought of bailing, producing just a warning and having user to manually change all 20 lines of code sounds very unsatisfactory to me. That's why we are thinking to use placeholders. The next question is - can we not break the code when using placeholders? By definition whenever we'd emit a placeholder our analysis can't provide a correct solution. That means that whatever way we'd make the code to compile can introduce a logic bug in the program. That is simply unacceptable given the overall goal of our effort. That pretty much leaves us with only using placeholders that would break the build. WDYT? I wasn't aware of the precedent related to fixits attached to notes not having to lead to compilable code and we should consider that path.

xazax.hun added inline comments.Nov 2 2022, 6:34 PM

clang/docs/SafeBuffers.rst
32	With C++ modules gaining some momentum, I wonder if it would be possible/beneficial to be able to annotate module import/export declarations with this attribute to signal that the APIs and the content in a module was not yet hardened.
37–38	Only buffer overflows, so comparing cases like: auto it1 = v1.begin(); auto it2 = v2.begin(); if (it1 == it2) { ... } are out of scope, right? The main reason I'm asking, because I am not sure how safe iterators are implemented and whether the same implementation would give opportunities for other safety checks without significant additional cost.
41–42	Isn't this overly specific? What about `string_view`s?
94	I think in many cases it is more desirable to accept a view as a view can be constructed from multiple different containers. An API that accepts a span can work with `std::vector`s and `std::array`s or even some 3rd party containers. This makes me think maybe mentioning views first and the other options later would result in a better guideline.
116	I think it would be great to link to some other resources like how to annotate your container such that address sanitizer can help you catch OOB accesses.
125	Isn't this too restrictive? How about arrays where both the index and the size of the array is known at compile time? Also, what about subscripts in `consteval` code where the compiler should diagnose OOB accesses at compile time? I believe this model can be made more ergonomic without losing any of the guarantees.
288–289	I anticipate that doing these fixits for multi-dimensional arrays will be confusing. Is that in scope?
325–327	While this might be desirable for some projects, I can imagine other projects want to avoid generating the overload (just generate the fixes in one pass, and apply all of them in a second to avoid a "fixed" header render another TU invalid). But that would require fixits for the call sites. Do you plan to support those batch migration scenarios or is that out of scope?
353	Just a small concern. While users are not allowed to use identifiers like this, there is always a chance someone has a global variable of this name and the code ends up compiling. So, if we want to be absolutely safe, we should either have logic to pick a unique name, or not generate fixits in that case. Although, this is probably rare enough that we could just hand-wave.
355–358	I see this part a bit confusing. While it is true that we cannot be sure whether the size is actually the size of the buffer, I wonder if we should at least generate something to call it out the user has further action items. It could be a `/revise me/` comment after the size, or documentation, both, or something else.

aaron.ballman added inline comments.Nov 3 2022, 10:15 AM

clang/docs/SafeBuffers.rst
32	I think it makes sense on module export (the module author knows whether they are buffer-safe or not), but what do you have in mind for import where the user may not be in a position to make that assertion validly?
70–71	I read this as a question as: "Could an imaginary codebase benefit from the warnings if there were no fixits?". Not quite. It's whether an imaginary codebase can benefit from the warnings when there are no available standardized alternatives that are a better replacement and no C standard library hardening done. Basically, it feels like enabling this warning in a C code base is not the same as a C++ code base. In C++, it's "enable the warning, start using these awesome replacements, and get to a good state" and in C, it's "enable the warning, start implementing exemptions everywhere, and hope they're correct or that your test coverage saves you from mistakes."
116	I believe that we should stick to hardening low-level primitives - all that a ring buffer implementation has to do is to store data in std::array from hardened libc++ and it's all set. That doesn't help real world projects like LLVM where we replace the low-level primitives when they're not appropriate for our domain. I think we should assume that users will implement their own containers and support that situation from the start.
134	Populating the list as we land the implementation sounds great to me.
174	We discussed other options but generally prefer simple pragmas for their flexibility. A single pragma with lexical scoping behaves like RAII, which is often a replacement for more complex acquire/release paired operations and is a pretty well-understood pattern in C++. Our current thinking is that for readability purposes we won't allow nesting of pragmas and would have end of scope to be explicit. While this might be more verbose it would be dead simple to reason about. True, it is easy to reason about. But not allowing nesting for readability purposes removes some utility, especially for folks in C where macros are much more common. e.g., #define ONLY_SAFE_BUFFERS _Pragma("only_safe_buffers") #define AWESOME_BUFFER_THING(ptr, size) ({ ONLY_SAFE_BUFFERS; (interesting work goes here) }) void func(int *ptr, size_t size) { ONLY_SAFE_BUFFERS; ... interesting code lives here ... AWESOME_BUFFER_THING(ptr, size); // Oops, this is nested. .. more interesting code lives here ... } (You can hit the same situation with paired pragmas.) Because C doesn't have constexpr or consteval support for functions or templates, function-like macros are a pretty common interface folks use in practice and macro expansion makes it somewhat hard to avoid potential surprising conflicts.
182–189	We should have the patch for pragmas ready to be put for review here in not too distant future. Maybe it'd be easier to discuss there? Certainly! I was mostly trying to reword what you had to keep the same intent but not make it sound like the basis for the pragmas is "we think these other pragmas, which do work, were not good enough" because that's not very technically compelling.
214	Our model depends on Standard Library used implementing the bounds checks. So will the model consider things like libstdc++ that don't implement the bounds checks? Or MSVC STL with Safe Library Support (https://learn.microsoft.com/en-us/cpp/standard-library/safe-libraries-cpp-standard-library?view=msvc-170) which seems to have some of the hardening you're talking about, but not all of it? void caller() { int buffer[10]; callee(std::span<int>(buffer, 20), 20); } void callee(std::span<int> ptr, unsigned size) { ptr[size-1] = 42; // still unmitigated ptr[size] = 42; // mitigated at runtime by std::span::operator[]() in hardened libc++ ptr[100] = 42; // mitigated at runtime by std::span::operator[]() in hardened libc++ } Understood; my point was that the comments in `callee()` are incorrect in other STL implementations. The model seems to presume all STLs are hardened (or will be in the relatively near future) but I don't think that's a valid presumption (for example, STLs exist which are focused solely on performance, like EASTL). So that makes me question whether the model should presume that container state is valid when it comes from an untrusted source.
261–264	Ohhhh! I see now! What confused me is that the docs spell it as `[[deprecated]]` which does not have an overload that takes two arguments. If we keep this part of the docs, you should probably change it to `__attribute__((deprecated))` and link to the attribute docs for it. Since it seems confusing even to you - maybe we should take this as a signal that our users would get just as confused and remove the reference to deprecated? Maybe we should, but because I'm still not quite clear on how this relates to the deprecated attribute yet, maybe you could reply with some code examples of deprecated and unsafe_buffer_usage and what diagnostics you get? If that helps clarify, maybe I can help reword the docs to improve clarity.
353	Agreed; we should not have a fixit that suggests use of a reserved identifier.
366–371	I wasn't aware of the precedent related to fixits attached to notes not having to lead to compilable code and we should consider that path. We document the expectations here: https://clang.llvm.org/docs/InternalsManual.html#fix-it-hints and the interesting bit is: "Fix-it hints on errors and warnings need to obey these rules: Since they are automatically applied if -Xclang -fixit is passed to the driver, they should only be used when it’s very likely they match the user’s intent. Clang must recover from errors as if the fix-it had been applied. Fix-it hints on a warning must not change the meaning of the code. However, a hint may clarify the meaning as intentional, for example by adding parentheses when the precedence of operators isn’t obvious. If a fix-it can’t obey these rules, put the fix-it on a note. Fix-its on notes are not applied automatically." I think using placeholders on warnings rather than notes will be an issue for both the first and second bullets.

NoQ mentioned this in D137346: [-Wunsafe-buffer-usage] Initial commit - Transition away from raw buffer accesses..Nov 3 2022, 11:25 AM

NoQ retitled this revision from WIP: RFC: NFC: C++ Buffer Hardening user documentation. to -Wunsafe-buffer-usage: WIP: RFC: NFC: User documentation..

NoQ added inline comments.Nov 3 2022, 11:28 AM

clang/docs/SafeBuffers.rst
115	Yeah it's very non-trivial. A much harder call than the legendary hack in the fixit for `__attribute__((fallthrough))`. I'm thinking of a combination of a clang flag and an attribute, to mark sufficiently suitable classes and ask the user to pick one of them as the new default. It's not a high priority for us, and it'll definitely require some work to formalize the precise class interface / contract we're relying on in our fixits, but I totally see us doing it eventually.

NoQ added a child revision: D137346: [-Wunsafe-buffer-usage] Initial commit - Transition away from raw buffer accesses..Nov 3 2022, 11:45 AM

Thank you for the feedback Gábor!

clang/docs/SafeBuffers.rst
37–38	Yes, this is out of scope for our diagnostics and code transformations. @ldionne Do you want to cover the safe iterator design part?
41–42	This is based on our current plans for near future. While we are positive we will use `std::span` in the FixIts it is very unlikely we'd use `string_view`. That's why having a checker for `std::span` is more important for us now.
288–289	Do you mean that the dimensions get reversed if we replace `int arr_3d[3][4][5]` with `array<array<array<int, 5>, 4>, 3> arr_3d_r`?
325–327	We currently don't have a machinery that would allow us to analyze the whole project at once. Two passes are generally not enough - that number is something like number of unique functions in the deepest call-stack. Part of our fixits is attributing functions as unsafe and application of such fixits can create new warnings in callers. Let's take an example: foo(int* ptr) { ptr[5] = 5; } bar(int* ptr) { foo(ptr); } baz(int* ptr) { bar(ptr); } Initially it is not obvious that `bar` or `baz` might get transformed. In the first iteration a fixit for `foo` is emitted and when applied we get: foo(span<int> ptr) { ptr[5] = 5; } [[clang::unsafe_buffer_usage]] foo(int* ptr); bar(int* ptr) { foo(ptr); } baz(int* ptr) { bar(ptr); } Only now we learn that `bar` (which can live in a diferent TU) should get transformed as well because it calls to unsafe function `foo`. But our analysis still won't see that `baz` will eventually have to be transformed too and it'll take another iteration.
353	Our thinking is to actually use syntactically incorrect placeholder to make sure that users can't accidentally miss it. This is related to: https://reviews.llvm.org/D136811#inline-1324542
355–358	We have accepted that our analysis will have some limits. In situations where we have a reasonable suspicion that a certain parameter has the semantics of size we can somehow communicate that to the user. But it is very likely that in some cases the analysis won't be able to tell even that.

xazax.hun added inline comments.Nov 3 2022, 2:05 PM

clang/docs/SafeBuffers.rst
32	Wouldn't `only_safe_buffers` make sense on the import side to ensure that no unhardened code is being imported from a specific module?
41–42	Does this mean you'd recommend using a `span<char>` over `string_view` in all cases? I think that might surprise some users, so I wonder if it is worth documenting.
288–289	Yeah, I can imagine some people wanting bounds safety but having strong opinions about the reversed dimensions in their code. Moreover, if there are multi-dimensional array typedefs in the nesting, that can make things more complicated.
325–327	I see, this explains it all, thanks! I'd love to have this motivating example in the design document why a 2-pass fix-it-all model does not work.
355–358	Not being able to cover all the cases sounds fine to me, and I'm glad you plan to communicate potential size params to the user when there is reasonable certainty.

NoQ added inline comments.Nov 3 2022, 3:36 PM

clang/docs/SafeBuffers.rst
32	I wonder if it would be possible/beneficial to be able to annotate module import/export declarations with this attribute to signal that the APIs and the content in a module was not yet hardened. It's usually not that valuable for us to know that outside the module. If the module wasn't hardened, there's nothing the "caller module" can do about it, the recommended approach is to harden the module first, and then use the attribute to indicate which functions were found to be unsafe in that process (especially if safe alternatives were implemented). So it's only valuable when the module will never be hardened, and the only thing the caller module can do in such case is isolate calls to that module, maybe cover it with a safe abstraction. In this case, yeah, I think it makes sense. Maybe we should consider this eventually.
37–38	I suspect libc++ will have such checks (or even already does, @ldionne pls correct me if I'm wrong). They're still out of scope for our proposal though. But they're definitely useful and may deserve mentioning.
41–42	Yeah I think we can cover string views as well!
116	Yes, so I think we want to eventually cover low-level primitives like `SmallVector`, and then ask people to implement their higher-level primitives like ring buffers in terms of the covered low-level primitives. The situation when people really need to have ultra-optimized high-level primitives that are implemented directly with raw buffers, and then optionally harden that, becomes significantly more rare. In such cases maybe it's ok not to have a clear step-by-step guide. But a ring buffer still makes a good example! I'll think more about that.
133	Hmm, I was thinking that `((void )0)` is a compile-time constant `0`. Like, I didn't say it's a literal*. Is there a better term?
195	Yes, absolutely, I'll update that as we get closer to landing the attribute.
214	One thing I think is worth pointing out about these docs is that the first example effectively says "do add the attribute because the size passed in to the function could be wrong" and the second example says "don't add the attribute on the assumption that the container has the correct size information". The advice feels a bit conflicting to me because in one case we're assuming callers pass in incorrect values and in the other case we're assuming callers pass in correct values and the only distinction between them is a "container was used". Yeah that's an important thing for us to properly formulate, let me try to elaborate on what I meant. So we trust containers and spans, like we trust smart pointers: if you simply use them "normally", they're correct by construction. You have to go out of your way to get them wrong. Like, you take an array, it knows the size, you implicitly convert it to span, it's now well-formed, you pass it from function to function, it's still well-formed as it still carries the original bounds information, you subspan it... well this needs a runtime bounds check yeah, but other than that it's hard to get it wrong. And our tool is supposed to transform the entire program to do exactly this. And once we're there, none of these operations require manual intervention from the user, so there's very little room for error. On the contrary, if bounds information is passed separately, there's a lot of room for error every single time it's passed from function to function. There's literally nothing that prevents such information from going out of sync. There's nothing that prevents the user from passing a size of data he intends to be written into the buffer, instead of the size of the buffer. So we don't trust that. Does that make sense?
258–260	I was trying to say that `[[deprecated]] is probably appropriate if the vendor of the function expects all users of the function to move away from it. But if the vendor thinks that the function is generally fine, and it's only important to move away from it when the `-Wunsafe-buffer-usage` analysis is requested by the user, then `[[unsafe_buffer_usage]]` is more appropriate as it only causes a warning to be emitted under that condition, not every time. I'll wordsmith.
261–264	Uh-oh I didn't notice there's a difference. Indeed, will fix.
288–289	Yeah it's gonna be fun, it's in scope, we'll see how deep we'll be able to get. Ideally we want to transform them in one step. Of course the procedure is drastically different in `int [][]` (array of arrays, one contiguous chunk of memory) vs. `int **` (span of spans, each span in its own heap allocation).
366–371	I like that! This way we can even emit multiple alternative fixes, eg. note: change type of variable 'ptr' to 'std::span' to add bound checks note: change type of variable 'ptr' to 'std::span::iterator' to add bound checks (I'm still not sure we'll ever going to recommend iterators as a good solution, but they're generally a better drop-in replacement for a raw pointer, so despite their increased verbosity, we're considering them at least in some cases) Another thing this can take care of is support for custom containers: note: change type of variable 'vla' to 'std::vector' to add bound checks note: change type of variable 'vla' to 'llvm::SmallVector' to add bound checks Now we can simply present all available containers to the user, and the user can select the one they like the most.

jkorous added inline comments.Nov 3 2022, 6:54 PM

clang/docs/SafeBuffers.rst
174	It is desirable to minimize the extent of code labeled as unsafe - if a single line of code is unsafe then it'd be suboptimal do label a whole scope as unsafe. Since scopes affect objects lifetime in C++ we won't be able to generally recommend adding a scope around the unsafe line either as that might change behavior of the code. In this sense associating the pragmas' semantics with scopes reduces their flexibility. Is there any benefit of doing that which would outweigh this? Separately from that - in theory the users could just end the scope before the macro: ONLY_SAFE_BUFFERS_BEGIN; ... interesting code lives here ... ONLY_SAFE_BUFFERS_END; AWESOME_BUFFER_THING(ptr, size); // Oops, this is nested. But point taken - there might be options for better ergonomics.
214	The warnings about unsafe buffer usage would be valid no matter what Standard Library implementation is used and I can imagine the FixIts to exist in different flavors. We don't plan to target any other Standard Library implementation but the machinery we'll bring up should allow that to be implemented in the future.
366–371	Yes, we should look into attaching FixIts that contain placeholders to notes instead.

aaron.ballman added inline comments.Nov 7 2022, 8:17 AM

clang/docs/SafeBuffers.rst
32	Wouldn't only_safe_buffers make sense on the import side to ensure that no unhardened code is being imported from a specific module? But the import side has no way to know that the module being imported has no unhardened code, only the module implementer knows that. That said, I could imagine wanting to mark a module import as "will never be hardened" as a blanket way to say "don't complain about interfaces in this module I have no control over". But now I'm questioning something from the docs: Pragmas `unsafe_buffer_usage `and` `only_safe_buffers`` and allow you to annotate sections of code as either already-hardened or not-to-be-hardened, It now occurs to me there's two ways to read this. `unsafe_buffer_usage` = will diagnose any unsafe buffer usage, it's already hardened. `only_safe_buffers` == should only be used with known-safe buffers, it's not going to be hardened or `unsafe_buffer_usage` == this uses unsafe buffers, it's not to be hardened `only_safe_buffers` == this uses only safe buffers, it's already been hardened I originally read it the first way from the docs in the patch but the comment from @xazax.hun reads to me like he read it the second way.
41–42	Oh, I was sort of assuming we'd be suggesting `string_view` over `span<char>` when appropriate. So agreed that we should clearly document this as it may surprise folks. Out of curiosity, why the strong preference for `span`?
133	Are you okay with constant expressions? e.g., consteval int foo() { return 0; } void func(char ptr) { char offset_ptr = ptr + foo(); // Is this fine? }
174	In this sense associating the pragmas' semantics with scopes reduces their flexibility. Is there any benefit of doing that which would outweigh this? FWIW, I wasn't suggesting we support only the scoped form, but that we also supported a scoped form. The benefit really boils down to more complex functions (generally ones in need of refactoring anyway) where you already have an appropriate scope handy and pairing the pragmas would be clutter. e.g., if (ptr) { #pragma clang unsafe_buffer_usage ptr += 12; } vs if (ptr) { #pragma clang unsafe_buffer_usage begin ptr += 12; #pragma clang unsafe_buffer_usage end } I don't see a need to write two pragmas in these kinds of cases where the compound scope is already a clear delimiter. WDYT?
214	Does that make sense? It does, but the situation I'm thinking of is where the user is migrating their unsafe code to this new programming mode and aren't able to use libc++ to get the extra hardening checks. Consider this contrived example: extern "C" void func(int array, size_t count) { // Do buffer things with array and count } The user can't (easily) modify this code to accept a span directly because that could potentially break ABI. So they wrap the data in a span: extern "C" void func(int array, size_t count) { std::span<int> sp(array, count); // Do buffer things with sp } In both cases, the user is taking it on faith that the data passed into the function is correct. But you want the user to believe the second case is more trustworthy because it uses `span` -- but more trustworthy based on what? There's no hardened checks in the STL they're using and the potentially-wrong information is the same flavor of potentially-wrong in either case. My fear is that we push users to make changes like that by telling them it's safer when it's actually functionally equivalent instead, but when they migrates their code they accidentally introduce a new bug where there wasn't one before. I'm used to thinking about this sort of stuff as a kind of taint analysis. We need to know where that data comes from in order to know whether we can trust it or not. Did the "count" come from user input on the command line... or did it come from a call to `.size()` on some other container? These have different levels of trust (to me). Btw, this brings up a different question about fix-it hints. Are you planning to add fix-it hints to interface boundaries? If so, are you planning to pay attention to things like `extern "C"` or the function linkage to try to reduce the amount of ABI breaks suggested?
258–260	I was trying to say that `[[deprecated]] is probably appropriate if the vendor of the function expects all users of the function to move away from it. Ahhhh that makes a lot more sense to me, thanks!
353	Our thinking is to actually use syntactically incorrect placeholder to make sure that users can't accidentally miss it. So long as that happens on a note, that would be fine depending on how you name the identifier. e.g., remember that we have a feature flag to support dollar signs in identifiers, we need to protect against macros, that sort of thing.

xazax.hun added inline comments.Nov 7 2022, 9:10 AM

clang/docs/SafeBuffers.rst
125	Small ping on this point. I think there are many code patterns that are completely safe (i.e., the compiler can diagnose OOB accesses), but the current model would ban. One example is converting an enum value to string using an array of string_views. In those cases, both enum consts' value and the array's size are known at compile time. I think those easy to diagnose special cases should be permitted to make programming more ergonomic. The more ergonomic the experience, the faster the adoption will be.

@aaron.ballman We'd like to start making progress on the implementation in parallel to iterating on the documentation and this is our first patch:
https://reviews.llvm.org/D137346

Since we'll have about 4 people working full-time on this it isn't reasonable to expect you to review all our patches.
My understanding is that you are interested mostly in the user model and the general user interface.
We can organize reviews for most patches among ourselves (with any extra feedback more than welcome!) and would reach out to you specifically with the things that are likely to interest you and/or where a broader consensus is desirable.
WDYT?

In D136811#3913759, @jkorous wrote:

@aaron.ballman We'd like to start making progress on the implementation in parallel to iterating on the documentation and this is our first patch:
https://reviews.llvm.org/D137346

Since we'll have about 4 people working full-time on this it isn't reasonable to expect you to review all our patches.
My understanding is that you are interested mostly in the user model and the general user interface.
We can organize reviews for most patches among ourselves (with any extra feedback more than welcome!) and would reach out to you specifically with the things that are likely to interest you and/or where a broader consensus is desirable.
WDYT?

I think it's reasonable to start making progress on the implementation; we might debate some of the finer points, but if you're fine with the risk of needing to update the implementation later, I'd say "go for it!". My biggest request is: please try to keep this document in sync with reality so that we don't introduce confusion to the reviews.

My biggest request is: please try to keep this document in sync with reality so that we don't introduce confusion to the reviews.

Yes, so we'll need the code to incrementally catch up to the document, but I'm absolutely updating this document every time the vision itself changes.

clang/docs/SafeBuffers.rst
125	Yes, we'll probably add occasional suppressions for known safe cases like this, as long as we're sure we'll diagnose if safety is lost in later code changes. And then document them here. It's probably even reasonable to suppress the warning when the index is a loop counter and the loop bound is known at compile time to be sufficiently small. Though it'll take some work to implement (re-use static analyzer's loop unrolling prerequisites code?) (uh-oh, I thought I sent this reply before the conference)

NoQ retitled this revision from -Wunsafe-buffer-usage: WIP: RFC: NFC: User documentation. to [-Wunsafe-buffer-usage] WIP: RFC: NFC: User documentation..Nov 17 2022, 8:00 PM

Overall, the document looks good to me, I like the general direction. I still see some pending comments (mostly small wording fixes) from Aaron.

This revision is now accepted and ready to land.Nov 18 2022, 2:23 PM

As I was reading I'll highlighted some typos.
compile time -> compile-time /g

By the looks of it, this document is not referenced anywhere.
I believe clang/docs/index.rst should refer to this document in some place.

Thanks for the huge effort driving this you all!

clang/docs/SafeBuffers.rst
28
39
46
57
64
89
109
128	Sphinx is picky about indentation, and it also complains about the missing new-line prior to the sub-list. The same also applies to the very next sub-list.
135
139
149	What does auto-attached mean?
177
217
322
363
370
385

Addressed review comments up until The Attribute section. Folks, thanks a lot for reading this longread so thoroughly. I'm very grateful.

Herald added a subscriber: arphaman. · View Herald TranscriptNov 28 2022, 8:54 PM

NoQ added inline comments.Nov 28 2022, 8:54 PM

clang/docs/SafeBuffers.rst
32	Yes our intent was to have #pragma unsafe_buffer_usage begin #pragma unsafe_buffer_usage end annotate a section of code that intentionally uses unsafe buffers (so, warnings are suppressed). It makes sense to have `#pragma unsafe...` mean something's unsafe. I wonder if it wouldn't have been confusing without my clumsy explanation. @jkorous WDYT about this wording problem, do you see ambiguity here?
41–42	`string_view` is a bit harder to recommend automatically because whether a certain sequence of bytes constitutes a "string" is often a matter of opinion / user's point of view. `string_view`'s methods such as `find` will not necessarily make sense over original data, or may turn out to have counter-intuitive behavior when users try to modify the code further. Additionally, `string_view` doesn't offer write access, so at best we'll be able to recommend it for `const` pointers. So I think we should gather more data before considering it.
70–71	Basically, it feels like enabling this warning in a C code base is not the same as a C++ code base. In C++, it's "enable the warning, start using these awesome replacements, and get to a good state" and in C, it's "enable the warning, start implementing exemptions everywhere, and hope they're correct or that your test coverage saves you from mistakes." Yes that's definitely true. This makes it sound like the proposal doesn't cover C because there is no version of `std::span` there. Hmm, I didn't intend to consider this possibility here. This section is mostly motivational, I wanted to explain why the default recommended setup makes sense. That's why I start this section with a "Whelp for C it'd probably be better to build something completely different, even if this proposal is better than nothing". I added a section "What About Plain C?" to discuss this once in detail, and let the rest of the proposal focus on C++.
91	Added a couple paragraphs above the list to cover that.
125	Added a clause about known sizes. I'll remove it if we decide it's not feasible.
128	Uh-oh, great catch!
129	I'll leave this unaddressed for now. It looks like these constructs typically do indicate that the remaining part of the object is a buffer. C++ doesn't seem to provide a good "safe" way to represent such objects. So, yeah, we need more data.
133	Yes sure, why not? If it ever becomes unsafe, we'll immediately warn.
149	Means they show up in fixits as suggested code. Clarified, thanks!
182–189	Reworded a bit and added the example!
214	In both cases, the user is taking it on faith that the data passed into the function is correct. But you want the user to believe the second case is more trustworthy because it uses span No, that's definitely not our intention. That's why I'm explicitly specifying that this assumption only applies to containers constructed outside the function. In your example both functions should wear the attribute. And there's no way to produce a safe alternative (which wouldn't need `[[unsafe_buffer_usage]]`) under `extern "C"` linkage. But on the C side, there's no expectation of having a safe alternative in the first place. Calls to this function become just another unsafe operation, which we can hope to isolate and reduce, but never to eliminate. So it's fine that a safe alternative isn't provided at all. Btw, this brings up a different question about fix-it hints. Are you planning to add fix-it hints to interface boundaries? If so, are you planning to pay attention to things like extern "C" or the function linkage to try to reduce the amount of ABI breaks suggested? So, yes, we want fixes on interface boundaries, but we only want fixes that don't break source or binary compatibility. So our fixes will always preserve the original interface, possibly annotate it as unsafe, and then provide a safe alternative that call sites can, but don't have to, transition to. So in this case we're aiming for something like this: [[clang::unsafe_buffer_usage]] extern "C" void func(int *array, size_t count) { func(std::span<int>(array, count)); } void func(std::span<int> sp) { // Do buffer things with sp } where the original function is preserved, the code is not duplicated, and the safe function doesn't have `extern "C"`. In this case the original function is still unsafe (wears the attribute) because the container isn't coming from outside.

Harbormaster completed remote builds in B199926: Diff 478450.Nov 28 2022, 9:28 PM

steakhal added inline comments.Nov 29 2022, 12:33 AM

clang/docs/SafeBuffers.rst
31

aaron.ballman added inline comments.Nov 29 2022, 5:54 AM

clang/docs/SafeBuffers.rst
41–42	string_view is a bit harder to recommend automatically because whether a certain sequence of bytes constitutes a "string" is often a matter of opinion / user's point of view. But why does that matter for `string_view`? It's not a null-terminated interface, so it explicitly encodes a pointer/length interface, same as `span`.
92
129	Fair enough -- it's worth documenting explicitly as an open question. FWIW, I agree that the construct is mostly used for accessing trailing objects as a buffer. The closest thing I've seen to a safe way to handle this is how LLVM does it with the `TrailingObjects` mixin.
133	Ah, C terminology is biting me here -- we use the term "constant" the same way C++ uses the term "literal", so when I read "constant 0" my brain interpreted that as literally just `0`.
214	In both cases, the user is taking it on faith that the data passed into the function is correct. But you want the user to believe the second case is more trustworthy because it uses span No, that's definitely not our intention. That's why I'm explicitly specifying that this assumption only applies to containers constructed outside the function. In your example both functions should wear the attribute. And there's no way to produce a safe alternative (which wouldn't need [[unsafe_buffer_usage]]) under extern "C" linkage. My point is that the assumption seems invalid to me -- it's a matter of garbage-in, garbage-out, but you're saying "this potentially garbage-in situation is assumed to not happen" and then building analysis checks around that assumption. I realize we have to make some heuristic decisions for the analysis, but this one seems dangerous for something attempting to improve buffer safety because nothing validates that assumption. Or do you have plans for the future to add an analysis to cover the situation of shoving garbage into the span? To be clear, this is the situation I'm concerned about: // SomeLibraryTU.cpp void func(int buffer, size_t count) { // Old interface for compatibility func(std::span<int>(buffer, count)); } void func(std::span<int> sp) { // New, safe interface if (!sp.empty()) { int data = sp.data(); // Do stuff on data (other than pointer arithmetic, only looking at first element) } } // SomeApplicationTU.cpp int main() { func(0, 1); // Oops, wrong count! } My complaint is that the library code is perfectly safe but the application code is not, and it sounds like this situation is purposefully not considered as part of this analysis... but it seems like this situation is the common case we'd want to catch.

NoQ added inline comments.Nov 29 2022, 6:36 PM

clang/docs/SafeBuffers.rst
41–42	We can probably offer both variants in different note-fixits as soon as we teach the machine to recognize the different requirements for these strategies, and we can have `string_view` go first in the list? Like, I don't want to eliminate the question of programmer's intent from this. It's ok for the programmer to prefer `span<char>` in some cases, just to make the code more expressive, and we'll have to implement it anyway, so we might as well offer it. I definitely agree that `string_view` is the best choice when dealing with strings, and humans will absolutely prefer that and we should be recommending that to humans when applicable. I'm just not sure we should completely drop `span<char>` from automatic recommendations, and I admit that it'll take some extra work to implement recommending `string_view` instead, which we may or may not rush to implement asap.
214	Or do you have plans for the future to add an analysis to cover the situation of shoving garbage into the span? We do have plans for a static analyzer checker that warns about shoving garbage into the span! - that's the last bullet point in the intro. We expect it to be disproportionally effective within this programming model because the programming model recommends constructing the span "as early as possible", pretty much immediately after allocation, which makes the problem very local and therefore easy to detect with path-sensitive analysis. We also think it's much harder to shove garbage into span in the first place. The size parameter of the span is extremely unambiguous, there's nothing else you can think of putting there. With an arbitrary function you see for the first time, it's tempting to pass eg. the amount of data you expect to be written into the destination buffer, instead of the amount of room in the destination buffer. Span eliminates this problem because you know that whatever you put into the size parameter is a property of the span itself, it has nothing to do with the function you pass it into. In a lot of cases you don't even have to specify the size, you can simply implicitly-convert your container into a span of the right size. That said, this isn't exactly why we make this distinction. What we're trying to do here is reduce the number of times a span needs to be manually constructed (and then decomposed back into pointer and size, only to be constructed again). Because our goal is to construct every span manually at most once, right next to the allocation. And when you're literally looking at the allocation, the size becomes obvious. And after that, by the virtue of conforming to the model there's no longer a need to use unsafe wrap-unwrap operations over those spans, so everything becomes correct by construction. So despite not addressing the problem of initial span construction directly, this choice decreases the amount of times the programmer needs to solve this problem, and it decreases the complexity of this problem by making the programmer solve it only at those few moments of time when it's the most convenient to solve. And, yes, a static analyzer checker on top of that. To be clear, this is the situation I'm concerned about: Hmm I'm not sure I understand this example, how does the lack of buffer operations in the new function kicks in here? It's definitely an expected false negative when a buffer of exactly one element is always used in the code and we therefore think that it's not a buffer at all. But why would you want to use a span at all in such case? And regardless of this problem, if the library vendor consciously provides a new interface in order to intentionally make the code compliant with our model, then they'd also add an attribute on the old interface (but not on the new interface) according to the requirements we're discussing. In this case a misuse of the old interface will issue a warning "please use the new interface" and a misuse of the new interface becomes more difficult due to the above considerations, so everything appears to be working as intended(?)

aaron.ballman added inline comments.Nov 30 2022, 5:19 AM

clang/docs/SafeBuffers.rst
41–42	We can probably offer both variants in different note-fixits as soon as we teach the machine to recognize the different requirements for these strategies, and we can have string_view go first in the list? I think that's reasonable enough; it doesn't have to do everything up front. So long as there's a plan (or at least an aspiration) to support multiple containers depending on context and preference, I'm happy enough. I had the (perhaps mistaken) impression this was heading towards only recommending `span`.
214	We do have plans for a static analyzer checker that warns about shoving garbage into the span! - that's the last bullet point in the intro. Ah, perfect! We also think it's much harder to shove garbage into span in the first place. The size parameter of the span is extremely unambiguous, there's nothing else you can think of putting there. Hehe, oh to be this optimistic still. :-D The scenario I expect to see garbage in span is from data that can be user-controlled (the exact stuff you're worried about, to be honest!): pulling in size data from the network or a file format, etc. Something where it's not at all accidental that it's garbage going into the span, but is actually malicious input from an attacker. To be clear, this is the situation I'm concerned about: Hmm I'm not sure I understand this example, how does the lack of buffer operations in the new function kicks in here? It doesn't have to lack buffer operations, I was just trying to make the simplest example I could and I probably obscured my point. My point is mostly that a library interface that takes buffer + size and then wraps it into a span to pass to a span interface is every bit as dangerous as not having the span interface at all. If the data going into the span is tainted, it's game over. So I worry about assuming the data in the span is valid and basing diagnostic decisions around that. However, because you're going to also have checkers specifically trying to catch the case where invalid data CAN be shoved into the span (like a taint analysis, I hope?), that basically resolves my concerns.

jkorous mentioned this in D138940: [-Wunsafe-buffer-usage] Introduce the `unsafe_buffer_usage` attribute.Nov 30 2022, 12:26 PM

NoQ added inline comments.Nov 30 2022, 7:13 PM

clang/docs/SafeBuffers.rst
214	Yes, so it'll definitely cover basic cases where the allocation size is known to be smaller than span size on a given path (as constants or symbolically), eg. int p = new int[5]; std::span<int> sp(p, 7); // warn int q = new int[N]; std::span<int> sq(q, N + 1); // warn int r = new int[N]; if (M > N) std::span<int> sr(r, M); // warn Then, yes, taint analysis could be a straightforward addition to that checker: int p = /* suppose we don't even know how it's allocated /; size_t S; scanf("%zu", &S); std::span<int> sp(p, S); // warn Though, of course, this will imply moving taint analysis out of alpha and making sure we have a way to annotate user-defined taint sources, but the checker can be taught to interact with that very easily. And such taint checker will be, again, disproporitonally effective within our programming model because spans are constructed immediately next to taint sources, so no need for heavy interprocedural analysis. And maybe we'll consider an opt-in aggressive mode that warns every time the size isn't "related" to the allocation size (in terms of symbolic expressions, so assume `M` and `N` are actually unrelated values): int q = new int[N]; std::span<int> sq(q, M); // warn where the warning can be suppressed with `assert(M <= N);` (we'll have to double-check that our constraint solver is up to the task). Which still has limited efficiency because the allocation size has to be known (even if symbolic), unlike the taint case where it's sufficient (but not necessary) to know that the span size unconstrained.

NoQ mentioned this in D138253: [-Wunsafe-buffer-usage] NFC: Implement fix-strategies and variable-use-claiming..Dec 19 2022, 2:24 PM

NoQ mentioned this in D140179: [-Wunsafe-buffer-usage] Add unsafe buffer checking opt-out pragmas.Jan 26 2023, 1:00 PM

NoQ edited child revisions, added: D143676: [-Wunsafe-buffer-usage] FixableGadget for handling stand alone pointers under UPC.; removed: D137346: [-Wunsafe-buffer-usage] Initial commit - Transition away from raw buffer accesses..Apr 7 2023, 2:58 PM

I'll move this PR to github, and I'll update it to reflect the current state of things, with the aim to have it in good shape (and, possibly, land) before the Dev Meeting. There weren't any major course corrections, but I'll need to spell out what are the things that we've actually implemented so far.

Revision Contents

Path

Size

clang/

docs/

SafeBuffers.rst

485 lines

index.rst

1 line

Diff 478450

clang/docs/SafeBuffers.rst

This file was added.

================

C++ Safe Buffers

================

.. contents::

:local:

Introduction

============

Clang can be used to harden your C++ code against buffer overflows, an otherwise

common security issue with C-based languages.

The solution described in this document is an integrated programming model:

it defines safety guidelines that restrict operations the code is allowed

to perform, it provides tools for updating the code to conform to these

guidelines, and it provides runtime mitigations for the situations when

following guidelines isn't sufficient.

aaron.ballmanUnsubmitted

Done

guidelines, and it provides runtime mitigations for the situations when

- following guidelines isn’t sufficient.

+ following guidelines isn't sufficient.

Namely, the solution consists of the following parts:

For consistency with the rest of our docs, you should make this change throughout.

aaron.ballman: For consistency with the rest of our docs, you should make this change throughout.

Namely, the solution consists of the following parts:

- ``-Wunsafe-buffer-usage`` is a family of warnings that warns you when

a potentially unsafe operation is performed in your code;

- Pragmas ``unsafe_buffer_usage`` and ``only_safe_buffers`` allow you to

annotate sections of code as opt-out of or opt-into the programming model,

aaron.ballmanUnsubmitted

Done

a potentially unsafe operation is performed in your code;

- - Pragmas ``unsafe_buffer_usage`` and ``only_safe_buffers`` and allow you to

+ - Pragmas ``unsafe_buffer_usage`` and ``only_safe_buffers`` that allow you to

annotate sections of code as either already-hardened or not-to-be-hardened,

aaron.ballman:

which enables incremental adoption and provides "escape hatches"

when unsafety is necessary;

steakhalUnsubmitted

Done

annotate sections of code as either already-hardened or not-to-be-hardened,

- enabling incremental adoption and and providing “escape hatches”

+ enabling incremental adoption and providing “escape hatches”

when unsafety is necessary;

steakhal:

- Automatic fixits provided by the warning act as a modernizer to help you

convert large amounts of old code to conform to the warning;

- Attribute ``[[unsafe_buffer_usage]]`` lets you annotate custom functions as

steakhalUnsubmitted

Not Done

convert large amounts of old code to conform to the warning;

- - Attribute ``[[unsafe_buffer_usage]]`` lets you annotate custom functions as

+ - Attribute ``[[clang::unsafe_buffer_usage]]`` lets you annotate custom functions as

unsafe, while providing a safe alternative that can often be suggested by

steakhal:

unsafe, while providing a safe alternative that can often be suggested by

aaron.ballmanUnsubmitted

Not Done

Hmmmm that attribute name looks problematic (more below in that section).

aaron.ballman: Hmmmm that attribute name looks problematic (more below in that section).

xazax.hunUnsubmitted

Not Done

With C++ modules gaining some momentum, I wonder if it would be possible/beneficial to be able to annotate module import/export declarations with this attribute to signal that the APIs and the content in a module was not yet hardened.

xazax.hun: With C++ modules gaining some momentum, I wonder if it would be possible/beneficial to be able…

aaron.ballmanUnsubmitted

Not Done

I think it makes sense on module export (the module author knows whether they are buffer-safe or not), but what do you have in mind for import where the user may not be in a position to make that assertion validly?

aaron.ballman: I think it makes sense on module export (the module author knows whether they are buffer-safe…

NoQAuthorUnsubmitted

Done

I wonder if it would be possible/beneficial to be able to annotate module import/export declarations with this attribute to signal that the APIs and the content in a module was not yet hardened.

It's usually not that valuable for us to know that outside the module. If the module wasn't hardened, there's nothing the "caller module" can do about it, the recommended approach is to harden the module first, and then use the attribute to indicate which functions were found to be unsafe in that process (especially if safe alternatives were implemented).

So it's only valuable when the module *will never be hardened*, and the only thing the caller module can do in such case is isolate calls to that module, maybe cover it with a safe abstraction. In this case, yeah, I think it makes sense. Maybe we should consider this eventually.

NoQ: > I wonder if it would be possible/beneficial to be able to annotate module import/export…

xazax.hunUnsubmitted

Not Done

Wouldn't only_safe_buffers make sense on the import side to ensure that no unhardened code is being imported from a specific module?

xazax.hun: Wouldn't `only_safe_buffers` make sense on the import side to ensure that no unhardened code is…

aaron.ballmanUnsubmitted

Not Done

Wouldn't only_safe_buffers make sense on the import side to ensure that no unhardened code is being imported from a specific module?

But the import side has no way to know that the module being imported has no unhardened code, only the module implementer knows that.

That said, I could imagine wanting to mark a module import as "will never be hardened" as a blanket way to say "don't complain about interfaces in this module I have no control over".

But now I'm questioning something from the docs:

Pragmas `unsafe_buffer_usage and only_safe_buffers` and allow you to
annotate sections of code as either already-hardened or not-to-be-hardened,

It now occurs to me there's two ways to read this.

unsafe_buffer_usage = will diagnose any unsafe buffer usage, it's already hardened.
only_safe_buffers == should only be used with known-safe buffers, it's not going to be hardened

unsafe_buffer_usage == this uses unsafe buffers, it's not to be hardened
only_safe_buffers == this uses only safe buffers, it's already been hardened

I originally read it the first way from the docs in the patch but the comment from @xazax.hun reads to me like he read it the second way.

aaron.ballman: > Wouldn't only_safe_buffers make sense on the import side to ensure that no unhardened code is…

NoQAuthorUnsubmitted

Done

Yes our intent was to have

#pragma unsafe_buffer_usage begin

#pragma unsafe_buffer_usage end

annotate a section of code that intentionally uses unsafe buffers (so, warnings are suppressed). It makes sense to have #pragma unsafe... mean something's unsafe.

I wonder if it wouldn't have been confusing without my clumsy explanation.

@jkorous WDYT about this wording problem, do you see ambiguity here?

NoQ: Yes our intent was to have ``` #pragma unsafe_buffer_usage begin #pragma unsafe_buffer_usage…

the compiler automatically;

- LLVM's own C++ standard library implementation, libc++, provides a

hardened mode where C++ classes such as ``std::vector`` and ``std::span``,

together with their respective ``iterator`` classes, are protected

at runtime against buffer overflows. This changes buffer overflows from

extremely dangerous undefined behavior to a predictable runtime crash;

xazax.hunUnsubmitted

Not Done

Only buffer overflows, so comparing cases like:

auto it1 = v1.begin();
auto it2 = v2.begin();

if (it1 == it2) { ... }

are out of scope, right?

The main reason I'm asking, because I am not sure how safe iterators are implemented and whether the same implementation would give opportunities for other safety checks without significant additional cost.

xazax.hun: Only buffer overflows, so comparing cases like: ``` auto it1 = v1.begin(); auto it2 = v2.begin…

NoQAuthorUnsubmitted

Done

I suspect libc++ will have such checks (or even already does, @ldionne pls correct me if I'm wrong).

They're still out of scope for our proposal though. But they're definitely useful and may deserve mentioning.

NoQ: I suspect libc++ will have such checks (or even already does, @ldionne pls correct me if I'm…

jkorousUnsubmitted

Not Done

Yes, this is out of scope for our diagnostics and code transformations. @ldionne Do you want to cover the safe iterator design part?

jkorous: Yes, this is out of scope for our diagnostics and code transformations. @ldionne Do you want to…

- Finally, in order to avoid bugs in newly converted code, the

steakhalUnsubmitted

Done

at runtime against buffer overflows. This changes buffer overflows from

- extremely dangerous undefined behavior to predictable runtime crash.

+ extremely dangerous undefined behavior to a predictable runtime crash.

- Finally, in order to avoid bugs in newly converted code, the

steakhal:

Clang static analyzer provides a checker to find misconstructed

span/view objects.

xazax.hunUnsubmitted

Done

Isn't this overly specific? What about string_views?

xazax.hun: Isn't this overly specific? What about `string_view`s?

NoQAuthorUnsubmitted

Done

Yeah I think we can cover string views as well!

NoQ: Yeah I think we can cover string views as well!

jkorousUnsubmitted

Done

This is based on our current plans for near future.
While we are positive we will use std::span in the FixIts it is very unlikely we'd use string_view. That's why having a checker for std::span is more important for us now.

jkorous: This is based on our current plans for near future. While we are positive we will use `std…

xazax.hunUnsubmitted

Done

Does this mean you'd recommend using a span<char> over string_view in all cases? I think that might surprise some users, so I wonder if it is worth documenting.

xazax.hun: Does this mean you'd recommend using a `span<char>` over `string_view` in all cases? I think…

aaron.ballmanUnsubmitted

Done

Oh, I was sort of assuming we'd be suggesting string_view over span<char> when appropriate. So agreed that we should clearly document this as it may surprise folks.

Out of curiosity, why the strong preference for span?

aaron.ballman: Oh, I was sort of assuming we'd be suggesting `string_view` over `span<char>` when appropriate.

NoQAuthorUnsubmitted

Done

string_view is a bit harder to recommend automatically because whether a certain sequence of bytes constitutes a "string" is often a matter of opinion / user's point of view. string_view's methods such as find will not necessarily make sense over original data, or may turn out to have counter-intuitive behavior when users try to modify the code further.

Additionally, string_view doesn't offer write access, so at best we'll be able to recommend it for const pointers.

So I think we should gather more data before considering it.

NoQ: `string_view` is a bit harder to recommend automatically because whether a certain sequence of…

aaron.ballmanUnsubmitted

Not Done

string_view is a bit harder to recommend automatically because whether a certain sequence of bytes constitutes a "string" is often a matter of opinion / user's point of view.

But why does that matter for string_view? It's not a null-terminated interface, so it explicitly encodes a pointer/length interface, same as span.

aaron.ballman: > string_view is a bit harder to recommend automatically because whether a certain sequence of…

NoQAuthorUnsubmitted

Done

We can probably offer both variants in different note-fixits as soon as we teach the machine to recognize the different requirements for these strategies, and we can have string_view go first in the list?

Like, I don't want to eliminate the question of programmer's intent from this. It's ok for the programmer to prefer span<char> in some cases, just to make the code more expressive, and we'll have to implement it anyway, so we might as well offer it. I definitely agree that string_view is the best choice when dealing with strings, and humans will absolutely prefer that and we should be recommending that to humans when applicable. I'm just not sure we should completely drop span<char> from automatic recommendations, and I admit that it'll take some extra work to implement recommending string_view instead, which we may or may not rush to implement asap.

NoQ: We can probably offer both variants in different note-fixits as soon as we teach the machine to…

aaron.ballmanUnsubmitted

Not Done

We can probably offer both variants in different note-fixits as soon as we teach the machine to recognize the different requirements for these strategies, and we can have string_view go first in the list?

I think that's reasonable enough; it doesn't have to do everything up front. So long as there's a plan (or at least an aspiration) to support multiple containers depending on context and preference, I'm happy enough. I had the (perhaps mistaken) impression this was heading towards *only* recommending span.

aaron.ballman: > We can probably offer both variants in different note-fixits as soon as we teach the machine…

Note that some of these partial solutions are useful on their own. For example,

hardened libc++ can be used as a "sanitizer" to find buffer overflow bugs in

existing code. The static analyzer checker is also universally useful, and

not tied to this programming model. The warning may be desired independently,

steakhalUnsubmitted

Done

hardened libc++ can be used as a “sanitizer” to find buffer overflow bugs in

- existing code. The static analyzer checker is also universally useful,

+ existing code. The static analyzer checker is also universally useful, and

not tied to this programming model. The warning may be desired independently,

steakhal:

even in plain C code, if you need to isolate, annotate and audit all

buffer-related code in your codebase. That said, some of these guidelines

aaron.ballmanUnsubmitted

Done

not tied to this programming model. The warning may be desired independently,

- even in plain C code, if you’re hard-bent on isolating, annotating and auditing

+ even in plain C code, if you must isolate, annotate, and audit

all buffer-related code in your codebase. That said, some of these guidelines

aaron.ballman:

don't cause your code to mitigate vulnerabilities better unless hardened

containers are in use to carry out runtime checks.

Why Not Just Harden All Pointer Operations?

-------------------------------------------

At a glance, it seems reasonable to ask the compiler to make raw pointer

operations crash on out-of-bounds accesses at runtime. And in case of plain C,

steakhalUnsubmitted

Done

===========================================

- At a glance it seems reasonable to ask the compiler to make raw pointer

+ At a glance, it seems reasonable to ask the compiler to make raw pointer

operations crash on out-of-bounds accesses at runtime. And in case of C, this

steakhal:

this may be the most practical way to achieve safety. The main problem with

such a solution lies in dealing with the fact that most pointers in the code

aaron.ballmanUnsubmitted

Done

operations crash on out-of-bounds accesses at runtime. And in case of C, this

- may be the only way to achieve safety. The main problem with such solution lies

+ may be the only way to achieve safety. The main problem with such a solution lies

in dealing with the fact that most pointers in the code don’t have bounds

aaron.ballman:

don't have bounds information associated with them. Adding such information

to the code in order to allow the compiler to perform bounds checks at runtime

would result either in a system of attributes significantly more advanced than

the current solution, or a significant change in the runtime representation

of all pointers.

steakhalUnsubmitted

Done

either in a system of attributes significantly more advanced than

- the current solution, or a significant change in runtime representation

+ the current solution, or a significant change in the runtime representation

of all pointers.

steakhal:

aaron.ballmanUnsubmitted

Done

It would be helpful to have some details here about why a user would pick these proposed features over use of a sanitizer like asan or hwasan, which also finds out of bounds accesses.

aaron.ballman: It would be helpful to have some details here about why a user would pick these proposed…

A different approach is taken by AddressSanitizer, which is amazing at detecting

some buffer overflows in the testing phase as it can be used for checking

pointer accesses with respect to a global map of all allocations. However,

it cannot be used as a security mitigation in production binaries, so it can

only help you with vulnerabilities within your test coverage, which is often

insufficient for security purposes.

aaron.ballmanUnsubmitted

Done

containers such as ``std::span``, you can achieve bounds safety by

- *simply writing good modern C++ code*. This is what this solution is designed

+ *writing better modern C++ code*. This is what this solution is designed

to help your codebase with.

The Guidelines

This makes it sound like the proposal doesn't cover C because there is no version of std::span there.

aaron.ballman: This makes it sound like the proposal doesn't cover C because there is no version of `std…

jkorousUnsubmitted

Done

I read this as a question as: "Could an imaginary codebase benefit from the warnings if there were no fixits?".

I can imagine projects where it would be very useful to have 90% of the code guaranteed to be free of "potentially dangerous pointer arithmetic" and have such operations contained to the remaining 10% of the code. I can imagine a principled solution to bounds-safety in C but even just splitting the code into safe and 9x smaller "unsafe" part could be extremely useful when it comes to debugging, audits, automated and manual testing and similar. Having clang guarantee this property reduces the scope of code that needs to be processed and a deliberate use of this model by projects can unlock new possibilities.

That is why I feel that the proposal can benefit also C projects.

jkorous: I read this as a question as: "Could an imaginary codebase benefit from the warnings if there…

aaron.ballmanUnsubmitted

Done

I read this as a question as: "Could an imaginary codebase benefit from the warnings if there were no fixits?".

Not quite. It's whether an imaginary codebase can benefit from the warnings when there are no available standardized alternatives that are a better replacement and no C standard library hardening done. Basically, it feels like enabling this warning in a C code base is not the same as a C++ code base. In C++, it's "enable the warning, start using these awesome replacements, and get to a good state" and in C, it's "enable the warning, start implementing exemptions everywhere, and hope they're correct or that your test coverage saves you from mistakes."

aaron.ballman: > I read this as a question as: "Could an imaginary codebase benefit from the warnings if there…

NoQAuthorUnsubmitted

Done

Basically, it feels like enabling this warning in a C code base is not the same as a C++ code base. In C++, it's "enable the warning, start using these awesome replacements, and get to a good state" and in C, it's "enable the warning, start implementing exemptions everywhere, and hope they're correct or that your test coverage saves you from mistakes."

Yes that's definitely true.

This makes it sound like the proposal doesn't cover C because there is no version of std::span there.

Hmm, I didn't intend to consider this possibility *here*. This section is mostly motivational, I wanted to explain why the default recommended setup makes sense. That's why I start this section with a "Whelp for C it'd probably be better to build something completely different, even if this proposal is better than nothing".

I added a section "What About Plain C?" to discuss this once in detail, and let the rest of the proposal focus on C++.

NoQ: > Basically, it feels like enabling this warning in a C code base is not the same as a C++ code…

Fortunately, in case of C++, there already is a standard solution for passing

bounds information alongside the pointer, namely the ``std::span``.

With the help of safe standard containers such as ``std::span``, you can achieve

bounds safety by *writing better modern C++ code*. The proposed C++ modernizer

is designed to help you with that.

aaron.ballmanUnsubmitted

Done

==============

- Let’s start with how your code would need to look like to comply with

+ Let’s start with what your code would need to look like to comply with

the programming model.

aaron.ballman:

But, Really, What About Plain C?

--------------------------------

While plain C does not offer a natural way to combine pointers with bounds

information, parts of this solution may prove useful. In particular,

``#pragma clang unsafe_buffer_usage`` and ``#pragma clang only_safe_buffers``

may be used for isolating code that deals with buffers. Such code can be

carefully audited and the rest of the code may be taught to only interact with

buffers through functions provided by the audited code. Additionally, attribute

``[[clang::unsafe_buffer_usage]]`` can be used for annotating functions

steakhalUnsubmitted

Done

allocated with ``new[]``; it doesn’t assume unique ownership so you can copy

- such span if you want, just like a raw pointer, but of course you’ll have to

+ such span if you want, just like a raw pointer, but of course, you’ll have to

manually ``delete[]`` it when appropriate.

steakhal:

that have bounds-safe alternatives.

aaron.ballmanUnsubmitted

Done

We should mention as part of the model description that pointer-chasing structures like linked lists, trees, etc are not covered by the model.

(That said, I suspect the *implementation* of such things might be covered. e.g., a tree with Node *Children[])

aaron.ballman: We should mention as part of the model description that pointer-chasing structures like linked…

NoQAuthorUnsubmitted

Done

Added a couple paragraphs above the list to cover that.

NoQ: Added a couple paragraphs above the list to cover that.

However, no automatic code modernizer for plain C is not provided,

aaron.ballmanUnsubmitted

Not Done

that have bounds-safe alternatives.

- However, no automatic code modernizer for plain C is not provided,

+ However, automatic code modernization for plain C is not provided,

and the hardened C++ standard library cannot benefit C code, which limits

aaron.ballman:

and the hardened C++ standard library cannot benefit C code, which limits

usefulness of the proposed integrated programming model in environments

xazax.hunUnsubmitted

Done

I think in many cases it is more desirable to accept a view as a view can be constructed from multiple different containers. An API that accepts a span can work with std::vectors and std::arrays or even some 3rd party containers. This makes me think maybe mentioning views first and the other options later would result in a better guideline.

xazax.hun: I think in many cases it is more desirable to accept a view as a view can be constructed from…

where C++ cannot be used.

aaron.ballmanUnsubmitted

Done

And what should C users do?

aaron.ballman: And what should C users do?

The Programming Model for C++

=============================

aaron.ballmanUnsubmitted

Done

a third-party library you’re using, and these interfaces require you to pass

- buffers as raw pointers, make sure you “containerize” these buffers

+ buffers as raw pointers, make sure you "containerize" these buffers

*as soon as possible*. For example, if the library returns a raw buffer

Similar to the single quote situation.

aaron.ballman: Similar to the single quote situation.

Let's start with what your C++ code would need to look like in order to

reap benefits of the safe buffers programming model.

Because the model focuses on preventing buffer overflows, other pointer-based

data structures that don't deal with buffers (such as linked lists or trees)

aren't affected by it, don't require any code changes, nor would they benefit

from additional security once the rest of the code is transformed

to comply to the model. In particular, problems like use-after-free and

iterator invalidation are out of scope.

steakhalUnsubmitted

Done

Say, ``std::vector`` or ``std::span`` may turn out to be poorly suitable

- for your needs, and this is fine. In such cases the same guidelines apply.

+ for your needs, and this is fine. In such cases, the same guidelines apply.

You may have to use the opt-out pragma on the implementation of

steakhal:

For dealing with buffers, the programming model encourages you to make your code

provide correct bounds information near every buffer access. Such information

can be consumed either by the standard library to conduct runtime bounds checks,

or by the code itself with assertions or safe idioms such as range-based

for-loops. In order to keep bounds information correct, the following

aaron.ballmanUnsubmitted

Done

I would be surprised if we can find a heuristic that we'd feel confident is correct for most situations. e.g., Foo buffer[10]; might be a flat array... or it might be a ring buffer without benefit of a wrapper class... or it may be a sparse matrix... and so on.

aaron.ballman: I would be surprised if we can find a heuristic that we'd feel confident is correct for most…

NoQAuthorUnsubmitted

Done

Yeah it's very non-trivial. A much harder call than the legendary hack in the fixit for __attribute__((fallthrough)).

I'm thinking of a combination of a clang flag and an attribute, to mark sufficiently suitable classes and ask the user to pick one of them as the new default.

It's not a high priority for us, and it'll definitely require some work to formalize the precise class interface / contract we're relying on in our fixits, but I totally see us doing it eventually.

NoQ: Yeah it's very non-trivial. A much harder call than the legendary hack in the fixit for…

guidelines should be followed:

aaron.ballmanUnsubmitted

Done

I think this would be a good idea, especially if you can use an example of a more involved (but still common) container. Like, how does someone harden a ring buffer?

aaron.ballman: I think this would be a good idea, especially if you can use an example of a more involved (but…

jkorousUnsubmitted

Done

I believe that we should stick to hardening low-level primitives - all that a ring buffer implementation has to do is to store data in std::array from hardened libc++ and it's all set.

jkorous: I believe that we should stick to hardening low-level primitives - all that a ring buffer…

xazax.hunUnsubmitted

Done

I think it would be great to link to some other resources like how to annotate your container such that address sanitizer can help you catch OOB accesses.

xazax.hun: I think it would be great to link to some other resources like how to annotate your container…

aaron.ballmanUnsubmitted

Done

I believe that we should stick to hardening low-level primitives - all that a ring buffer implementation has to do is to store data in std::array from hardened libc++ and it's all set.

That doesn't help real world projects like LLVM where we replace the low-level primitives when they're not appropriate for our domain. I think we should assume that users will implement their own containers and support that situation from the start.

aaron.ballman: > I believe that we should stick to hardening low-level primitives - all that a ring buffer…

NoQAuthorUnsubmitted

Done

Yes, so I think we want to eventually cover low-level primitives like SmallVector, and then ask people to implement their higher-level primitives like ring buffers in terms of the covered low-level primitives.

The situation when people really need to have ultra-optimized high-level primitives that are implemented directly with raw buffers, and then optionally harden that, becomes significantly more rare. In such cases maybe it's ok not to have a clear step-by-step guide. But a ring buffer still makes a good example! I'll think more about that.

NoQ: Yes, so I think we want to eventually cover low-level primitives like `SmallVector`, and then…

1. When you allocate a buffer, allocate a container instead. For example,

if you find yourself allocating a fixed-size array, use ``std::array``.

If you need to allocate a stack or heap buffer of variable length,

``std::vector`` can often act as a drop-in replacement. A vector can do

a lot more than that, but you can always resort to a fill-constructor that

preallocates the buffer of necessary size, and then never use any resizing

operations. This gives you a simple safe buffer at no extra cost other than

the cost of safety. Another good solution is ``std::span`` which you can

xazax.hunUnsubmitted

Done

Isn't this too restrictive? How about arrays where both the index and the size of the array is known at compile time?

Also, what about subscripts in consteval code where the compiler should diagnose OOB accesses at compile time?

I believe this model can be made more ergonomic without losing any of the guarantees.

xazax.hun: Isn't this too restrictive? How about arrays where both the index and the size of the array is…

xazax.hunUnsubmitted

Done

Small ping on this point. I think there are many code patterns that are completely safe (i.e., the compiler can diagnose OOB accesses), but the current model would ban. One example is converting an enum value to string using an array of string_views. In those cases, both enum consts' value and the array's size are known at compile time. I think those easy to diagnose special cases should be permitted to make programming more ergonomic. The more ergonomic the experience, the faster the adoption will be.

xazax.hun: Small ping on this point. I think there are many code patterns that are completely safe (i.e.

NoQAuthorUnsubmitted

Done

Yes, we'll probably add occasional suppressions for known safe cases like this, as long as we're sure we'll diagnose if safety is lost in later code changes. And then document them here.

It's probably even reasonable to suppress the warning when the index is a loop counter and the loop bound is known at compile time to be sufficiently small. Though it'll take some work to implement (re-use static analyzer's loop unrolling prerequisites code?)

(uh-oh, I thought I sent this reply before the conference)

NoQ: Yes, we'll probably add occasional suppressions for known safe cases like this, as long as…

NoQAuthorUnsubmitted

Done

Added a clause about known sizes. I'll remove it if we decide it's not feasible.

NoQ: Added a clause about known sizes. I'll remove it if we decide it's not feasible.

initialize with a heap pointer allocated with ``new[]``; it doesn't assume

unique ownership so you can copy such span if you want, just like

a raw pointer, but of course, you'll have to manually ``delete[]`` it

steakhalUnsubmitted

Done

- Addition or subtraction of a number to/from a raw pointer with operators

- ``+``, ``-``, ``+=``, ``-=``,

+ ``+``, ``-``, ``+=``, ``-=``,

- unless that number is a compile time constant ``0``;

Sphinx is picky about indentation, and it also complains about the missing new-line prior to the sub-list.
The same also applies to the very next sub-list.

steakhal: Sphinx is picky about indentation, and it also complains about the missing new-line prior to…

NoQAuthorUnsubmitted

Done

Uh-oh, great catch!

NoQ: Uh-oh, great catch!

when appropriate. While it may be desirable to use ``std::shared_ptr<T>``

aaron.ballmanUnsubmitted

Not Done

One case I think this is going to trip up is: this + 1 (which is not uncommon in practice: https://sourcegraph.com/search?q=context:global+-file:.*test.*+file:.*%5C.cpp+this+%2B+1&patternType=standard). How do users fix up that pointer arithmetic and do they ever get any safety or security benefits from it? (Note, this doesn't apply to +=, just +.)

aaron.ballman: One case I think this is going to trip up is: `this + 1` (which is not uncommon in practice…

jkorousUnsubmitted

Not Done

That's a fair point and if we don't find a good solution we should ignore arithmetic with this and a constant.

jkorous: That's a fair point and if we don't find a good solution we should ignore arithmetic with…

NoQAuthorUnsubmitted

Done

I'll leave this unaddressed for now. It looks like these constructs typically *do* indicate that the remaining part of the object is a buffer. C++ doesn't seem to provide a good "safe" way to represent such objects. So, yeah, we need more data.

NoQ: I'll leave this unaddressed for now. It looks like these constructs typically *do* indicate…

aaron.ballmanUnsubmitted

Not Done

Fair enough -- it's worth documenting explicitly as an open question. FWIW, I agree that the construct is mostly used for accessing trailing objects as a buffer. The closest thing I've seen to a safe way to handle this is how LLVM does it with the TrailingObjects mixin.

aaron.ballman: Fair enough -- it's worth documenting explicitly as an open question. FWIW, I agree that the…

for shared access to the buffer, note that the standard library cannot

provide hardening in such cases.

2. When you provide function or class interfaces that accept or return

raw buffers, consider using standard "view" classes such as ``std::span`` or

aaron.ballmanUnsubmitted

Done

NULL as well, I presume? (This matters for C where NULL often expands to (void *)0.)

aaron.ballman: `NULL` as well, I presume? (This matters for C where `NULL` often expands to `(void *)0`.)

NoQAuthorUnsubmitted

Done

Hmm, I was thinking that ((void *)0) is a compile-time constant 0. Like, I didn't say it's a *literal*. Is there a better term?

NoQ: Hmm, I was thinking that `((void *)0)` is a compile-time constant `0`. Like, I didn't say it's…

aaron.ballmanUnsubmitted

Done

Are you okay with constant expressions? e.g.,

consteval int foo() { return 0; }

void func(char *ptr) {
  char *offset_ptr = ptr + foo(); // Is this fine?
}

aaron.ballman: Are you okay with constant expressions? e.g., ``` consteval int foo() { return 0; } void func…

NoQAuthorUnsubmitted

Done

Yes sure, why not? If it ever becomes unsafe, we'll immediately warn.

NoQ: Yes sure, why not? If it ever becomes unsafe, we'll immediately warn.

aaron.ballmanUnsubmitted

Not Done

Ah, C terminology is biting me here -- we use the term "constant" the same way C++ uses the term "literal", so when I read "constant 0" my brain interpreted that as literally just 0.

aaron.ballman: Ah, C terminology is biting me here -- we use the term "constant" the same way C++ uses the…

``std::string_view``. Spans are particularly useful as they can be implicitly

aaron.ballmanUnsubmitted

Done

Eventually we should extend this to have an exhaustive list.

aaron.ballman: Eventually we should extend this to have an exhaustive list.

jkorousUnsubmitted

Done

Absolutely! Is it ok if we populate the list as we land the implementation?

jkorous: Absolutely! Is it ok if we populate the list as we land the implementation?

aaron.ballmanUnsubmitted

Done

Populating the list as we land the implementation sounds great to me.

aaron.ballman: Populating the list as we land the implementation sounds great to me.

constructed from arbitrary containers which keeps the interface flexible.

steakhalUnsubmitted

Done

- a number of C/C++ standard library buffer manipulation functions

- (such as std::memcpy() or std::next()) are hardcoded to act

+ (such as ``std::memcpy()`` or ``std::next()``) are hardcoded to act

as if they had the attribute.

steakhal:

Alternatively, you can pass the original container – move it or pass

by reference or use a smart pointer, but that locks the function down to

a specific container type, so this is only useful if you want to use

container-specific methods inside the function.

aaron.ballmanUnsubmitted

Not Done

The warning doesn’t warn on single pointer use in general, such as

- dereference with operations like * and -> or [0]. If such operation causes

+ dereferencing operations like * and -> or [0]. If such an operation causes

a buffer overflow, there’s probably another unsafe operation nearby,

aaron.ballman:

steakhalUnsubmitted

Not Done

The warning doesn’t warn on single pointer use in general, such as

- dereference with operations like * and -> or [0]. If such operation causes

+ dereference with operations like ``*`` and ``->`` or ``[0]``. If such operation causes

a buffer overflow, there’s probably another unsafe operation nearby,

steakhal:

3. If the interface you're providing needs to preserve compatibility with

other code already written to use the old raw buffer interface, annotate the

aaron.ballmanUnsubmitted

Not Done

dereference with operations like * and -> or [0]. If such operation causes

- a buffer overflow, there’s probably another unsafe operation nearby,

+ a buffer overflow, there’s probably another unsafe operation nearby

that the warning does warn about. Pointer-based data structures

such as linked lists or trees are allowed as they don't typically cause

aaron.ballman:

raw buffer interface function with the ``[[clang::unsafe_buffer_usage]]``

attribute. This would inform your clients that there exists a safer

alternative so that they could gracefully convert their code to comply with

the programming model as well.

4. If you deal with interfaces you cannot change, such as an interface of

a third-party library you're using, and these interfaces require you to pass

buffers as raw pointers, make sure you "containerize" these buffers

*as soon as possible*. For example, if the library returns a raw buffer

steakhalUnsubmitted

Done

What does auto-attached mean?

steakhal: What does //auto-attached// mean?

NoQAuthorUnsubmitted

Done

Means they show up in fixits as suggested code. Clarified, thanks!

NoQ: Means they show up in fixits as suggested code. Clarified, thanks!

pointer, put it into ``std::span`` in order to immediately write down

the size of that buffer for the purposes of future bounds checks.

Then keep such buffers contained this way *for as long as possible*,

especially when passing across function boundaries. Say, until you need to

pass it back to that library, you can pass ``std::span`` by value between

your functions to preserve precise size information naturally.

5. Sometimes you will find yourself implementing a custom container.

Say, ``std::vector`` or ``std::span`` may turn out to be poorly suitable

for your needs, and this is fine. In such cases, the same guidelines apply.

You may have to use the opt-out pragma on the implementation of

the container – that's exactly their purpose! Additionally, consider

implementing runtime checks in your container similar to the ones already

present in hardened libc++, because following the guidelines alone is often

insufficient without such hardening.

(TODO: Will automatic fixits be able to suggest custom containers or views?)

(TODO: Explain how to implement such checks in a custom container?)

Compiler Tooling And Enforcement

================================

Now that we know what the code needs to look like, let's talk about what can

the compiler do for you to help you with converting your code, as well as

making sure that the code stays safe after later changes.

aaron.ballmanUnsubmitted

Not Done

Have you considered allowing a short-hand form of the pragmas that are scoped to the current block scope (or file scope if applied at global scope)? e.g.,

void func() {
  {
    #pragma only_safe_buffers
    int stuff[10] = {};
    int x = stuff[1]; // Warning: you're a bad person who should feel bad
  }
  int array[10] = {};
  int y = array[1]; // this_is_fine.png
}

aaron.ballman: Have you considered allowing a short-hand form of the pragmas that are scoped to the current…

jkorousUnsubmitted

Not Done

We discussed other options but generally prefer simple pragmas for their flexibility.
Our current thinking is that for readability purposes we won't allow nesting of pragmas and would have end of scope to be explicit.
While this might be more verbose it would be dead simple to reason about.

jkorous: We discussed other options but generally prefer simple pragmas for their flexibility. Our…

aaron.ballmanUnsubmitted

Not Done

We discussed other options but generally prefer simple pragmas for their flexibility.

A single pragma with lexical scoping behaves like RAII, which is often a replacement for more complex acquire/release paired operations and is a pretty well-understood pattern in C++.

Our current thinking is that for readability purposes we won't allow nesting of pragmas and would have end of scope to be explicit.
While this might be more verbose it would be dead simple to reason about.

True, it is easy to reason about. But not allowing nesting for readability purposes removes some utility, especially for folks in C where macros are much more common. e.g.,

#define ONLY_SAFE_BUFFERS _Pragma("only_safe_buffers")

#define AWESOME_BUFFER_THING(ptr, size) ({ ONLY_SAFE_BUFFERS; (interesting work goes here) })

void func(int *ptr, size_t size) {
  ONLY_SAFE_BUFFERS;
  ... interesting code lives here ...
  AWESOME_BUFFER_THING(ptr, size); // Oops, this is nested.
 .. more interesting code lives here ...
}

(You can hit the same situation with paired pragmas.) Because C doesn't have constexpr or consteval support for functions or templates, function-like macros are a pretty common interface folks use in practice and macro expansion makes it somewhat hard to avoid potential surprising conflicts.

aaron.ballman: > We discussed other options but generally prefer simple pragmas for their flexibility. A…

jkorousUnsubmitted

Not Done

It is desirable to minimize the extent of code labeled as unsafe - if a single line of code is unsafe then it'd be suboptimal do label a whole scope as unsafe.
Since scopes affect objects lifetime in C++ we won't be able to generally recommend adding a scope around the unsafe line either as that might change behavior of the code.
In this sense associating the pragmas' semantics with scopes reduces their flexibility. Is there any benefit of doing that which would outweigh this?

Separately from that - in theory the users could just end the scope before the macro:

ONLY_SAFE_BUFFERS_BEGIN;
... interesting code lives here ...
ONLY_SAFE_BUFFERS_END;
AWESOME_BUFFER_THING(ptr, size); // Oops, this is nested.

But point taken - there might be options for better ergonomics.

jkorous: It is desirable to minimize the extent of code labeled as unsafe - if a single line of code is…

aaron.ballmanUnsubmitted

Not Done

In this sense associating the pragmas' semantics with scopes reduces their flexibility. Is there any benefit of doing that which would outweigh this?

FWIW, I wasn't suggesting we support *only* the scoped form, but that we *also* supported a scoped form. The benefit really boils down to more complex functions (generally ones in need of refactoring anyway) where you already have an appropriate scope handy and pairing the pragmas would be clutter. e.g.,

if (ptr) {
  #pragma clang unsafe_buffer_usage
  ptr += 12;
}

if (ptr) {
  #pragma clang unsafe_buffer_usage begin
  ptr += 12;
  #pragma clang unsafe_buffer_usage end
}

I don't see a need to write two pragmas in these kinds of cases where the compound scope is already a clear delimiter. WDYT?

aaron.ballman: > In this sense associating the pragmas' semantics with scopes reduces their flexibility. Is…

The Warning

-----------

steakhalUnsubmitted

Done

but also provide essential “escape hatches” when the programming model

- is undesirable for a section of code (such as tight loop in performance-critical

+ is undesirable for a section of code (such as tight loops in performance-critical

code, or implementation of a custom container). In such cases sections of code

steakhal:

The warning ``-Wunsafe-buffer-usage`` warns on the following operations:

aaron.ballmanUnsubmitted

Done

is undesirable for a section of code (such as tight loop in performance-critical

- code, or implementation of a custom container). In such cases sections of code

+ code, or implementation of a custom container). In such cases, sections of code

with unsafe buffer usage deserve being explicitly marked and easily auditable

aaron.ballman:

- Array subscript expression on raw arrays or raw pointers,

- unless the index is a compile-time constant ``0``,

- or, in case of arrays, if both the index and the array size is known

at compile time and the index is within bounds;

- Increment and decrement of a raw pointer with operators ``++`` and ``--``;

- Addition or subtraction of a number to/from a raw pointer with operators

``+``, ``-``, ``+=``, ``-=``,

aaron.ballmanUnsubmitted

Not Done

a custom pragma is preferable because the generic solution’s push/pop mechanism may be confusing.

Er, I'm not certain how much I agree with that. push/pop semantics on this have been around for a *long* time. I do agree that the user can write code such that another user may struggle to understand what the state of the system is after a pop operation, but that shouldn't be a deciding factor for whether we add a new pragma to do the same thing as another pragma. I think I'd rewrite this as:

Similar functionality can be achieved with ``#pragma clang diagnostic``, but such use is discouraged because it
reduces code readability due to being an imperative action rather than describing a property of the annotated
code. That reduced readability also adds complexity to security code audits. Use of one of the dedicated
pragmas is strongly recommended.

or something along these lines. WDYT?

aaron.ballman: > a custom pragma is preferable because the generic solution’s push/pop mechanism may be…

jkorousUnsubmitted

Not Done

We'd love to arrive at a design that is very difficult to use incorrectly.

You are absolutely right that our preference for not using the diagnostic pragmas stems from the "untyped" pop operation.

Assuming this code is a starting point:

#pragma clang diagnostic push warning "-Wfoo"
// code
#pragma clang diagnostic pop //end of -Wfoo
// more code

if the user adds pragmas like this:

#pragma clang diagnostic push warning "-Wfoo"
#pragma clang diagnostic push warning "-Wsafe-buffers-usage"
// code
#pragma clang diagnostic pop //end of -Wfoo
// more code
#pragma clang diagnostic pop //end of -Wsafe-buffers-usage

they actually get this behavior:

#pragma clang diagnostic push warning "-Wfoo"
#pragma clang diagnostic push warning "-Wsafe-buffers-usage"
// code
#pragma clang diagnostic pop //end of -Wsafe-buffers-usage
// more code
#pragma clang diagnostic pop //end of -Wfoo

We should have the patch for pragmas ready to be put for review here in not too distant future. Maybe it'd be easier to discuss there?

jkorous: We'd love to arrive at a design that is very difficult to use incorrectly. You are absolutely…

aaron.ballmanUnsubmitted

Not Done

We should have the patch for pragmas ready to be put for review here in not too distant future. Maybe it'd be easier to discuss there?

Certainly! I was mostly trying to reword what you had to keep the same intent but not make it sound like the basis for the pragmas is "we think these other pragmas, which do work, were not good enough" because that's not very technically compelling.

aaron.ballman: > We should have the patch for pragmas ready to be put for review here in not too distant…

NoQAuthorUnsubmitted

Done

Reworded a bit and added the example!

NoQ: Reworded a bit and added the example!

- unless that number is a compile time constant ``0``;

- subtraction between two pointers is also fine;

- Passing a pointer through a function parameter annotated with

attribute ``[[clang::unsafe_buffer_usage]]``,

aaron.ballmanUnsubmitted

Done

That's stealing a name from the C and C++ committees because it's not behind a vendor attribute namespace. I think it should be [[clang::unsafe_buffer_usage]] or unsafe_buffer_usage (with no specific syntax) instead.

aaron.ballman: That's stealing a name from the C and C++ committees because it's not behind a vendor attribute…

NoQAuthorUnsubmitted

Done

Yes, absolutely, I'll update that as we get closer to landing the attribute.

NoQ: Yes, absolutely, I'll update that as we get closer to landing the attribute.

- unless the pointer is a compile time constant ``0`` or ``nullptr``

(possibly a result of simple operations, such as C-style ``NULL``

that expands to ``((void) 0)`` which can be "folded" to ``0``

at compile time);

- a number of C/C++ standard library buffer manipulation functions

(such as ``std::memcpy()`` or ``std::next()``) are hardcoded to act

as if they had the attribute.

The warning doesn't warn on single pointer use in general, such as

dereferencing operations like ``*`` or ``->`` or ``[0]``. If such operation

causes a buffer overflow, there's probably another unsafe operation nearby

that the warning does warn about. Pointer-based data structures

such as linked lists or trees are allowed as they don't typically cause

buffer overflows. "Temporal" safety issues that arise from using raw pointers,

such use-after-free, null pointer dereference, dangling pointers,

reference invalidation, are out of scope for this warning.

The warning also doesn't warn every time a pointer is passed into a function,

but only when the function is annotated with the attribute. Because

aaron.ballmanUnsubmitted

Not Done

One thing I think is worth pointing out about these docs is that the first example effectively says "do add the attribute because the size passed in to the function could be wrong" and the second example says "don't add the attribute on the assumption that the container has the correct size information". The advice feels a bit conflicting to me because in one case we're assuming callers pass in incorrect values and in the other case we're assuming callers pass in correct values and the only distinction between them is a "container was used". But a significant portion of our users don't use libc++ (they use libstdc++ or MS STL for example).

I think we should have more details on why the STL used at runtime doesn't matter, or if it still really does matter, we may want to reconsider the advice we give.

Also, we don't give similar advice for use of the pragmas. Should we? (Maybe split the advice as to when to use markings in general out into its own section and reference it from both the pragma and attribute sections?)

aaron.ballman: One thing I think is worth pointing out about these docs is that the first example effectively…

jkorousUnsubmitted

Not Done

One thing I think is worth pointing out about these docs is that the first example effectively says "do add the attribute because the size passed in to the function could be wrong" and the second example says "don't add the attribute on the assumption that the container has the correct size information". The advice feels a bit conflicting to me because in one case we're assuming callers pass in incorrect values and in the other case we're assuming callers pass in correct values and the only distinction between them is a "container was used". But a significant portion of our users don't use libc++ (they use libstdc++ or MS STL for example).

Our model depends on Standard Library used implementing the bounds checks.
With that the "container was used" distinction is absolutely crucial - that is what adds the bounds checks and already mitigates certain classes of bugs (even if the span passed in still has the same wrong size).

Let's assume we're starting with this buggy code:

void caller() {
  int buffer[10];
  callee(buffer, 20);
}

void callee(int* ptr, unsigned size) {
  ptr[size-1] = 42;
  ptr[size] = 42;
  ptr[100] = 42;
}

and transform it to (this is just a part of what we actually plan but hopefully illustrates the point):

void caller() {
  int buffer[10];
  callee(std::span<int>(buffer, 20), 20);
}

void callee(std::span<int> ptr, unsigned size) {
  ptr[size-1] = 42; // still unmitigated
  ptr[size] = 42; // mitigated at runtime by std::span::operator[]() in hardened libc++
  ptr[100] = 42; // mitigated at runtime by std::span::operator[]() in hardened libc++
}

jkorous: > One thing I think is worth pointing out about these docs is that the first example…

aaron.ballmanUnsubmitted

Not Done

Our model depends on Standard Library used implementing the bounds checks.

So will the model consider things like libstdc++ that don't implement the bounds checks? Or MSVC STL with Safe Library Support (https://learn.microsoft.com/en-us/cpp/standard-library/safe-libraries-cpp-standard-library?view=msvc-170) which seems to have some of the hardening you're talking about, but not all of it?

void caller() {
  int buffer[10];
  callee(std::span<int>(buffer, 20), 20);
}

void callee(std::span<int> ptr, unsigned size) {
  ptr[size-1] = 42; // still unmitigated
  ptr[size] = 42; // mitigated at runtime by std::span::operator[]() in hardened libc++
  ptr[100] = 42; // mitigated at runtime by std::span::operator[]() in hardened libc++
}

Understood; my point was that the comments in callee() are incorrect in other STL implementations. The model seems to presume all STLs are hardened (or will be in the relatively near future) but I don't think that's a valid presumption (for example, STLs exist which are focused solely on performance, like EASTL). So that makes me question whether the model *should* presume that container state is valid when it comes from an untrusted source.

aaron.ballman: > Our model depends on Standard Library used implementing the bounds checks. So will the model…

jkorousUnsubmitted

Not Done

The warnings about unsafe buffer usage would be valid no matter what Standard Library implementation is used and I can imagine the FixIts to exist in different flavors.
We don't plan to target any other Standard Library implementation but the machinery we'll bring up should allow that to be implemented in the future.

jkorous: The warnings about unsafe buffer usage would be valid no matter what Standard Library…

NoQAuthorUnsubmitted

Done

One thing I think is worth pointing out about these docs is that the first example effectively says "do add the attribute because the size passed in to the function could be wrong" and the second example says "don't add the attribute on the assumption that the container has the correct size information". The advice feels a bit conflicting to me because in one case we're assuming callers pass in incorrect values and in the other case we're assuming callers pass in correct values and the only distinction between them is a "container was used".

Yeah that's an important thing for us to properly formulate, let me try to elaborate on what I meant.

So we trust containers and spans, like we trust smart pointers: if you simply use them "normally", they're correct by construction. You have to go out of your way to get them wrong. Like, you take an array, it knows the size, you implicitly convert it to span, it's now well-formed, you pass it from function to function, it's still well-formed as it still carries the original bounds information, you subspan it... well this needs a runtime bounds check yeah, but other than that it's hard to get it wrong. And our tool is supposed to transform the entire program to do exactly this. And once we're there, none of these operations require manual intervention from the user, so there's very little room for error.

On the contrary, if bounds information is passed separately, there's a lot of room for error every single time it's passed from function to function. There's literally nothing that prevents such information from going out of sync. There's nothing that prevents the user from passing a size of data he *intends to be written* into the buffer, instead of the size of the buffer. So we don't trust that.

Does that make sense?

NoQ: > One thing I think is worth pointing out about these docs is that the first example…

aaron.ballmanUnsubmitted

Not Done

Does that make sense?

It does, but the situation I'm thinking of is where the user is migrating their unsafe code to this new programming mode and aren't able to use libc++ to get the extra hardening checks. Consider this contrived example:

extern "C" void func(int *array, size_t count) {
  // Do buffer things with array and count
}

The user can't (easily) modify this code to accept a span directly because that could potentially break ABI. So they wrap the data in a span:

extern "C" void func(int *array, size_t count) {
  std::span<int> sp(array, count);
  // Do buffer things with sp
}

In both cases, the user is taking it on faith that the data passed into the function is correct. But you want the user to believe the second case is more trustworthy because it uses span -- but more trustworthy based on what? There's no hardened checks in the STL they're using and the potentially-wrong information is the same flavor of potentially-wrong in either case.

My fear is that we push users to make changes like that by telling them it's safer when it's actually functionally equivalent instead, but when they migrates their code they accidentally *introduce* a new bug where there wasn't one before.

I'm used to thinking about this sort of stuff as a kind of taint analysis. We need to know where that data comes from in order to know whether we can trust it or not. Did the "count" come from user input on the command line... or did it come from a call to .size() on some other container? These have different levels of trust (to me).

Btw, this brings up a different question about fix-it hints. Are you planning to add fix-it hints to interface boundaries? If so, are you planning to pay attention to things like extern "C" or the function linkage to try to reduce the amount of ABI breaks suggested?

aaron.ballman: > Does that make sense? It does, but the situation I'm thinking of is where the user is…

NoQAuthorUnsubmitted

Done

In both cases, the user is taking it on faith that the data passed into the function is correct. But you want the user to believe the second case is more trustworthy because it uses span

No, that's definitely not our intention. That's why I'm explicitly specifying that this assumption only applies to containers constructed *outside* the function. In your example both functions should wear the attribute. And there's no way to produce a safe alternative (which wouldn't need [[unsafe_buffer_usage]]) under extern "C" linkage.

But on the C side, there's no expectation of having a safe alternative in the first place. Calls to this function become just another unsafe operation, which we can hope to isolate and reduce, but never to eliminate. So it's fine that a safe alternative isn't provided at all.

Btw, this brings up a different question about fix-it hints. Are you planning to add fix-it hints to interface boundaries? If so, are you planning to pay attention to things like extern "C" or the function linkage to try to reduce the amount of ABI breaks suggested?

So, yes, we want fixes on interface boundaries, but we only want fixes that don't break source or binary compatibility. So our fixes will always preserve the original interface, possibly annotate it as unsafe, and then provide a safe alternative that call sites can, but *don't have to*, transition to. So in this case we're aiming for something like this:

[[clang::unsafe_buffer_usage]]
extern "C" void func(int *array, size_t count) {
  func(std::span<int>(array, count));
}

void func(std::span<int> sp) {
  // Do buffer things with sp
}

where the original function is preserved, the code is not duplicated, and the safe function doesn't have extern "C". In this case the original function is still unsafe (wears the attribute) because the container isn't coming from outside.

NoQ: > In both cases, the user is taking it on faith that the data passed into the function is…

aaron.ballmanUnsubmitted

Not Done

In both cases, the user is taking it on faith that the data passed into the function is correct. But you want the user to believe the second case is more trustworthy because it uses span

No, that's definitely not our intention. That's why I'm explicitly specifying that this assumption only applies to containers constructed *outside* the function. In your example both functions should wear the attribute. And there's no way to produce a safe alternative (which wouldn't need [[unsafe_buffer_usage]]) under extern "C" linkage.

My point is that the assumption seems invalid to me -- it's a matter of garbage-in, garbage-out, but you're saying "this potentially garbage-in situation is assumed to not happen" and then building analysis checks around that assumption. I realize we have to make some heuristic decisions for the analysis, but this one seems dangerous for something attempting to improve buffer safety because nothing validates that assumption. Or do you have plans for the future to add an analysis to cover the situation of shoving garbage into the span?

To be clear, this is the situation I'm concerned about:

// SomeLibraryTU.cpp
void func(int *buffer, size_t count) { // Old interface for compatibility
  func(std::span<int>(buffer, count));
}

void func(std::span<int> sp) {  // New, safe interface
  if (!sp.empty()) {
    int *data = sp.data();
    // Do stuff on data (other than pointer arithmetic, only looking at first element)
  }
}

// SomeApplicationTU.cpp
int main() {
  func(0, 1); // Oops, wrong count!
}

My complaint is that the library code is perfectly safe but the application code is not, and it sounds like this situation is purposefully not considered as part of this analysis... but it seems like this situation is the common case we'd *want* to catch.

aaron.ballman: >>In both cases, the user is taking it on faith that the data passed into the function is…

NoQAuthorUnsubmitted

Done

Or do you have plans for the future to add an analysis to cover the situation of shoving garbage into the span?

We do have plans for a static analyzer checker that warns about shoving garbage into the span! - that's the last bullet point in the intro. We expect it to be disproportionally effective within this programming model because the programming model recommends constructing the span "as early as possible", pretty much immediately after allocation, which makes the problem very local and therefore easy to detect with path-sensitive analysis.

We also think it's much harder to shove garbage into span in the first place. The size parameter of the span is extremely unambiguous, there's nothing else you can think of putting there. With an arbitrary function you see for the first time, it's tempting to pass eg. the amount of data you expect to be written into the destination buffer, instead of the amount of room in the destination buffer. Span eliminates this problem because you know that whatever you put into the size parameter is a property of the span itself, it has nothing to do with the function you pass it into. In a lot of cases you don't even *have to* specify the size, you can simply implicitly-convert your container into a span of the right size.

That said, this isn't exactly why we make this distinction. What we're trying to do here is reduce *the number of times* a span needs to be manually constructed (and then decomposed back into pointer and size, only to be constructed again). Because our goal is to construct every span manually at most once, right next to the allocation. And when you're literally looking at the allocation, the size becomes *obvious*. And after that, by the virtue of conforming to the model there's no longer a need to use unsafe wrap-unwrap operations over those spans, so everything becomes correct by construction.

So despite not addressing the problem of initial span construction directly, this choice decreases the amount of times the programmer needs to solve this problem, and it decreases the complexity of this problem by making the programmer solve it only at those few moments of time when it's the most convenient to solve.

And, yes, a static analyzer checker on top of that.

To be clear, this is the situation I'm concerned about:

Hmm I'm not sure I understand this example, how does the lack of buffer operations in the new function kicks in here? It's definitely an expected false negative when a buffer of exactly one element is always used in the code and we therefore think that it's not a buffer at all. But why would you want to use a span at all in such case?

And regardless of this problem, if the library vendor consciously provides a new interface in order to intentionally make the code compliant with our model, then they'd also add an attribute on the old interface (but not on the new interface) according to the requirements we're discussing. In this case a misuse of the old interface will issue a warning "please use the new interface" and a misuse of the new interface becomes more difficult due to the above considerations, so everything appears to be working as intended(?)

NoQ: > Or do you have plans for the future to add an analysis to cover the situation of shoving…

aaron.ballmanUnsubmitted

Not Done

We do have plans for a static analyzer checker that warns about shoving garbage into the span! - that's the last bullet point in the intro.

Ah, perfect!

We also think it's much harder to shove garbage into span in the first place. The size parameter of the span is extremely unambiguous, there's nothing else you can think of putting there.

Hehe, oh to be this optimistic still. :-D The scenario I expect to see garbage in span is from data that can be user-controlled (the exact stuff you're worried about, to be honest!): pulling in size data from the network or a file format, etc. Something where it's not at all accidental that it's garbage going into the span, but is actually malicious input from an attacker.

To be clear, this is the situation I'm concerned about:

Hmm I'm not sure I understand this example, how does the lack of buffer operations in the new function kicks in here?

It doesn't have to lack buffer operations, I was just trying to make the simplest example I could and I probably obscured my point. My point is mostly that a library interface that takes buffer + size and then wraps it into a span to pass to a span interface is every bit as dangerous as not having the span interface at all. If the data going into the span is tainted, it's game over. So I worry about *assuming* the data in the span is valid and basing diagnostic decisions around that. However, because you're going to also have checkers specifically trying to catch the case where invalid data CAN be shoved into the span (like a taint analysis, I hope?), that basically resolves my concerns.

aaron.ballman: > We do have plans for a static analyzer checker that warns about shoving garbage into the span!

NoQAuthorUnsubmitted

Done

Yes, so it'll definitely cover basic cases where the allocation size is known to be smaller than span size on a given path (as constants or symbolically), eg.

int *p = new int[5];
std::span<int> sp(p, 7); // warn

int *q = new int[N];
std::span<int> sq(q, N + 1); // warn

int *r = new int[N];
if (M > N)
  std::span<int> sr(r, M); // warn

Then, yes, taint analysis could be a straightforward addition to that checker:

int *p = /* suppose we don't even know how it's allocated */;
size_t S;
scanf("%zu", &S);
std::span<int> sp(p, S); // warn

Though, of course, this will imply moving taint analysis out of alpha and making sure we have a way to annotate user-defined taint sources, but the checker can be taught to interact with that very easily. And such taint checker will be, again, disproporitonally effective within our programming model because spans are constructed immediately next to taint sources, so no need for heavy interprocedural analysis.

And maybe we'll consider an opt-in aggressive mode that warns every time the size isn't "related" to the allocation size (in terms of symbolic expressions, so assume M and N are actually unrelated values):

int *q = new int[N];
std::span<int> sq(q, M); // warn

where the warning can be suppressed with assert(M <= N); (we'll have to double-check that our constraint solver is up to the task). Which still has limited efficiency because the allocation size has to be known (even if symbolic), unlike the taint case where it's sufficient (but not necessary) to know that the span size unconstrained.

NoQ: Yes, so it'll definitely cover basic cases where the allocation size is known to be smaller…

the attribute can be added to functions by automatic fixits, the warning

and the fixes can propagate across function boundaries. The users are also

aaron.ballmanUnsubmitted

Not Done

The attribute is NOT warranted when the function has runtime protection against

- overflows, assuming hardened libc++, assuming that containers constructed

+ overflows, assuming use of a hardened libc++ and that containers constructed

outside the function are well-formed. For example, function ``bar()`` below

doesn’t need an attribute, because assuming buf is well-formed (has size that

aaron.ballman:

encouraged to annotate their unsafe functions manually. But the warning is

aaron.ballmanUnsubmitted

Not Done

outside the function are well-formed. For example, function ``bar()`` below

- doesn’t need an attribute, because assuming buf is well-formed (has size that

+ doesn’t need an attribute, because assuming ``buf`` is well-formed (has size that

fits the original buffer it refers to), hardened libc++ protects this function

aaron.ballman:

steakhalUnsubmitted

Not Done

outside the function are well-formed. For example, function ``bar()`` below

- doesn’t need an attribute, because assuming buf is well-formed (has size that

+ doesn’t need an attribute, because assuming ``buf`` is well-formed (has size that

fits the original buffer it refers to), hardened libc++ protects this function

steakhal:

not otherwise inter-procedural.

The Pragmas

-----------

In order to aid incremental adoption of the programming model, you are

encouraged to enable/disable the warning on a file-by-file basis. Additionally,

pragmas are provided to annotate sections of the code as opt-in to the model

and activate the warnings:

.. code-block:: c++

#pragma clang only_safe_buffers begin

...

#pragma clang only_safe_buffers end

or opt-out of the model and deactivate the warnings on a section of code:

.. code-block:: c++

#pragma clang unsafe_buffer_usage end

...

#pragma clang unsafe_buffer_usage begin

Such pragmas not only enable incremental adoption with much smaller granularity,

but also provide essential "escape hatches" when the programming model

is undesirable for a section of code (such as tight loops in

performance-critical code, or implementation of a custom container). In such

cases, sections of code with unsafe buffer usage deserve being explicitly marked

and easily auditable by security researches.

Even though similar functionality can be achieved with the generic pragma

``#pragma clang diagnostic``, our specialized pragmas are preferable because

they clearly document the property of the code within them. Additionally,

it is problematic to "interleave" different instances of

``#pragma clang diagnostic push`` and ``#pragma clang diagnostic pop``

between each other, as ``pop`` always pops the last ``push`` and there's no way

aaron.ballmanUnsubmitted

Not Done

inside the function is still valuable for internal consistency.

- Attribute ``[[unsafe_buffer_usage]]`` is similar to attribute [[deprecated]]

+ Attribute ``[[unsafe_buffer_usage]]`` is similar to attribute ``[[deprecated]]``

but it has important differences:

aaron.ballman:

to pop the one you want:

.. code-block:: c++

#pragma clang diagnostic push warning "-Wfoo"

aaron.ballmanUnsubmitted

Not Done

I don't think I understand what you're trying to say here.

aaron.ballman: I don't think I understand what you're trying to say here.

NoQAuthorUnsubmitted

Done

I was trying to say that `[[deprecated]] is probably appropriate if the vendor of the function expects *all* users of the function to move away from it.

But if the vendor thinks that the function is generally fine, and it's only important to move away from it when the -Wunsafe-buffer-usage analysis is requested by the user, then [[unsafe_buffer_usage]] is more appropriate as it only causes a warning to be emitted under that condition, not every time.

I'll wordsmith.

NoQ: I was trying to say that `[[deprecated]] is probably appropriate if the vendor of the function…

aaron.ballmanUnsubmitted

Not Done

I was trying to say that `[[deprecated]] is probably appropriate if the vendor of the function expects *all* users of the function to move away from it.

Ahhhh that makes a lot more sense to me, thanks!

aaron.ballman: > I was trying to say that `[[deprecated]] is probably appropriate if the vendor of the…

#pragma clang diagnostic push warning "-Wsafe-buffers-usage"

#pragma clang diagnostic pop //end of -Wsafe-buffers-usage

#pragma clang diagnostic pop //end of -Wfoo

aaron.ballmanUnsubmitted

Not Done

The deprecated attribute does not have a replacement argument, it has an argument for some kind of message to be printed when the deprecated interface is used. So I'm not certain what this is telling me either.

aaron.ballman: The deprecated attribute does not have a replacement argument, it has an argument for some kind…

jkorousUnsubmitted

Not Done

For better or worse the deprecated attribute is "overloaded" and this refers to the two-parameter version.

https://clang.llvm.org/docs/AttributeReference.html#deprecated

"When spelled as attribute((deprecated)), the deprecated attribute can have two optional string arguments. The first one is the message to display when emitting the warning; the second one enables the compiler to provide a Fix-It to replace the deprecated name with a new name."

Since it seems confusing even to you - maybe we should take this as a signal that our users would get just as confused and remove the reference to deprecated?
WDYT?

jkorous: For better or worse the deprecated attribute is "overloaded" and this refers to the two…

aaron.ballmanUnsubmitted

Not Done

Ohhhh! I see now! What confused me is that the docs spell it as [[deprecated]] which does *not* have an overload that takes two arguments. If we keep this part of the docs, you should probably change it to __attribute__((deprecated)) and link to the attribute docs for it.

Since it seems confusing even to you - maybe we should take this as a signal that our users would get just as confused and remove the reference to deprecated?

Maybe we should, but because I'm still not quite clear on how this relates to the deprecated attribute yet, maybe you could reply with some code examples of deprecated and unsafe_buffer_usage and what diagnostics you get? If that helps clarify, maybe I can help reword the docs to improve clarity.

aaron.ballman: Ohhhh! I see now! What confused me is that the docs spell it as `[[deprecated]]` which does…

NoQAuthorUnsubmitted

Done

Uh-oh I didn't notice there's a difference. Indeed, will fix.

NoQ: Uh-oh I didn't notice there's a difference. Indeed, will fix.

versus

aaron.ballmanUnsubmitted

Not Done

FWIW, I think a significant amount of these attribute docs will move into AttrDocs.td (which should explain all the gory details about the attribute) and we can leave the high-level description here and link to the attribute documentation (rather than duplicate information).

aaron.ballman: FWIW, I think a significant amount of these attribute docs will move into AttrDocs.td (which…

.. code-block:: c++

#pragma clang diagnostic push warning "-Wfoo"

#pragma clang unsafe_buffer_usage begin

#pragma clang diagnostic pop //end of -Wfoo

#pragma clang unsafe_buffer_usage end //end of -Wunsafe-buffer-usage

aaron.ballmanUnsubmitted

Not Done

Every time your code preforms an unsafe operation that causes a

- ``-Wunsafe-buffer-usage warning`` to appear, the warning may be accompanied

+ ``-Wunsafe-buffer-usage`` warning to appear, the warning may be accompanied

by an automatic fix that changes types of one or more variables associated

aaron.ballman:

aaron.ballmanUnsubmitted

Not Done

by an automatic fix that changes types of one or more variables associated

- with the unsafe operation from raw pointer or array type to safe container type.

+ with the unsafe operation from a raw pointer or array type to a safe container type.

For example, the following function contains a local constant-size array.

aaron.ballman:

The Attribute

-------------

The attribute ``[[clang::unsafe_buffer_usage]]`` should be placed on functions

that need to be avoided as they may cause buffer overflows. It is designed to

aid automatic fixits which would replace such unsafe functions with safe

alternatives, though it can be used even when the fix can't be automated.

The attribute is warranted even if the only way a function can overflow

the buffer is by violating the function's preconditions. For example, it

would make sense to put the attribute on function ``foo()`` below because

passing an incorrect size parameter would cause a buffer overflow:

.. code-block:: c++

xazax.hunUnsubmitted

Not Done

I anticipate that doing these fixits for multi-dimensional arrays will be confusing. Is that in scope?

xazax.hun: I anticipate that doing these fixits for multi-dimensional arrays will be confusing. Is that in…

NoQAuthorUnsubmitted

Done

Yeah it's gonna be fun, it's in scope, we'll see how deep we'll be able to get. Ideally we want to transform them in one step. Of course the procedure is drastically different in int [][] (array of arrays, one contiguous chunk of memory) vs. int ** (span of spans, each span in its own heap allocation).

NoQ: Yeah it's gonna be fun, it's in scope, we'll see how deep we'll be able to get. Ideally we want…

jkorousUnsubmitted

Not Done

Do you mean that the dimensions get reversed if we replace int arr_3d[3][4][5] with array<array<array<int, 5>, 4>, 3> arr_3d_r?

jkorous: Do you mean that the dimensions get reversed if we replace `int arr_3d[3][4][5]` with…

xazax.hunUnsubmitted

Not Done

Yeah, I can imagine some people wanting bounds safety but having strong opinions about the reversed dimensions in their code. Moreover, if there are multi-dimensional array typedefs in the nesting, that can make things more complicated.

xazax.hun: Yeah, I can imagine some people wanting bounds safety but having strong opinions about the…

[[clang::unsafe_buffer_usage]]

void foo(int *buf, size_t size) {

for (size_t i = 0; i < size; ++i) {

buf[i] = i;

}

The attribute is NOT warranted when the function has runtime protection against

overflows, assuming hardened libc++, assuming that containers constructed

outside the function are well-formed. For example, function ``bar()`` below

doesn't need an attribute, because assuming buf is well-formed (has size that

fits the original buffer it refers to), hardened libc++ protects this function

from overflowing the buffer:

.. code-block:: c++

void bar(std::span<int> buf) {

for (size_t i = 0; i < buf.size(); ++i) {

buf[i] = i;

}

This corresponds to our safety precaution about keeping buffers "containerized"

in spans for as long as possible. Function ``foo()`` may have internal

consistency, but by accepting a raw buffer it requires the user to unwrap

the span, which is undesirable.

The attribute is warranted when a function accepts a raw buffer only to

immediately put it into a span:

.. code-block:: c++

[[clang::unsafe_buffer_usage]]

steakhalUnsubmitted

Not Done

}

- In spirit of our guidelines, the automatic fixit would prefer this function

+ In the spirit of our guidelines, the automatic fixit would prefer this function

to accept a span instead of raw buffer, so that the span didn't need to be

steakhal:

void baz(int *buf, size_t size) {

aaron.ballmanUnsubmitted

Not Done

In spirit of our guidelines, the automatic fixit would prefer this function

- to accept a span instead of raw buffer, so that the span didn't need to be

+ to accept a span instead of a raw buffer, so that the span didn't need to be

unwrapped. Of course, such change alone would break both source compatibility

aaron.ballman:

std::span<int> sp{ buf, size };

for (size_t i = 0; i < sp.size(); ++i) {

sp[i] = i;

}

xazax.hunUnsubmitted

Not Done

While this might be desirable for some projects, I can imagine other projects want to avoid generating the overload (just generate the fixes in one pass, and apply all of them in a second to avoid a "fixed" header render another TU invalid). But that would require fixits for the call sites. Do you plan to support those batch migration scenarios or is that out of scope?

xazax.hun: While this might be desirable for some projects, I can imagine other projects want to avoid…

jkorousUnsubmitted

Not Done

We currently don't have a machinery that would allow us to analyze the whole project at once.

Two passes are generally not enough - that number is something like number of unique functions in the deepest call-stack.

Part of our fixits is attributing functions as unsafe and application of such fixits can create new warnings in callers.

Let's take an example:

foo(int* ptr) {
  ptr[5] = 5;
}

bar(int* ptr) {
  foo(ptr);
}

baz(int* ptr) {
  bar(ptr);
}

Initially it is not obvious that bar or baz might get transformed.

In the first iteration a fixit for foo is emitted and when applied we get:

foo(span<int> ptr) {
  ptr[5] = 5;
}

[[clang::unsafe_buffer_usage]] foo(int* ptr);

bar(int* ptr) {
  foo(ptr);
}

baz(int* ptr) {
  bar(ptr);
}

Only now we learn that bar (which can live in a diferent TU) should get transformed as well because it calls to unsafe function foo. But our analysis still won't see that baz will eventually have to be transformed too and it'll take another iteration.

jkorous: We currently don't have a machinery that would allow us to analyze the whole project at once.

xazax.hunUnsubmitted

Not Done

I see, this explains it all, thanks! I'd love to have this motivating example in the design document why a 2-pass fix-it-all model does not work.

xazax.hun: I see, this explains it all, thanks! I'd love to have this motivating example in the design…

}

In this case ``baz()`` does not contain any unsafe operations, but the awkward

parameter type causes the caller to unwrap the span unnecessarily.

In such cases the attribute may never be removed.

In particular, the attribute is NOT an "escape hatch". It does not suppress

the warnings about unsafe operations in the function. Addressing warnings

inside the function is still valuable for internal consistency.

Attribute ``[[clang::unsafe_buffer_usage]]`` is similar to attribute

[[deprecated]] but it has important differences:

* Use of a function annotated by such attribute causes ``-Wunsafe-buffer-usage``

warning to appear, instead of ``-Wdeprecated``, so they can be

enabled/disabled independently as all four combinations make sense;

* The "replacement" parameter of ``[[deprecated]]``, which allows for automatic

fixits when the function has a drop-in replacement, becomes significantly more

powerful and flexible in ``[[clang::unsafe_buffer_usage]]`` where it will allow

non-trivial automatic fixes.

(TODO: Explain parameters of the attribute, how they aid automatic fixits)

Code Modernization Workflow With Semi-Automatic Fixits

------------------------------------------------------

xazax.hunUnsubmitted

Not Done

Just a small concern. While users are not allowed to use identifiers like this, there is always a chance someone has a global variable of this name and the code ends up compiling. So, if we want to be absolutely safe, we should either have logic to pick a unique name, or not generate fixits in that case. Although, this is probably rare enough that we could just hand-wave.

xazax.hun: Just a small concern. While users are not allowed to use identifiers like this, there is always…

aaron.ballmanUnsubmitted

Not Done

Agreed; we should not have a fixit that suggests use of a reserved identifier.

aaron.ballman: Agreed; we should not have a fixit that suggests use of a reserved identifier.

jkorousUnsubmitted

Not Done

Our thinking is to actually use syntactically incorrect placeholder to make sure that users can't accidentally miss it.
This is related to:
https://reviews.llvm.org/D136811#inline-1324542

jkorous: Our thinking is to actually use syntactically incorrect placeholder to make sure that users…

aaron.ballmanUnsubmitted

Not Done

Our thinking is to actually use syntactically incorrect placeholder to make sure that users can't accidentally miss it.

So long as that happens on a note, that would be fine depending on how you name the identifier. e.g., remember that we have a feature flag to support dollar signs in identifiers, we need to protect against macros, that sort of thing.

aaron.ballman: > Our thinking is to actually use syntactically incorrect placeholder to make sure that users…

Every time your code preforms an unsafe operation that causes a

``-Wunsafe-buffer-usage warning`` to appear, the warning may be accompanied

by an automatic fix that changes types of one or more variables associated

with the unsafe operation from raw pointer or array type to safe container type.

xazax.hunUnsubmitted

Not Done

I see this part a bit confusing. While it is true that we cannot be sure whether the size is actually the size of the buffer, I wonder if we should at least generate something to call it out the user has further action items. It could be a /*revise me*/ comment after the size, or documentation, both, or something else.

xazax.hun: I see this part a bit confusing. While it is true that we cannot be sure whether the size is…

jkorousUnsubmitted

Not Done

We have accepted that our analysis will have some limits. In situations where we have a reasonable suspicion that a certain parameter has the semantics of size we can somehow communicate that to the user. But it is very likely that in some cases the analysis won't be able to tell even that.

jkorous: We have accepted that our analysis will have some limits. In situations where we have a…

xazax.hunUnsubmitted

Not Done

Not being able to cover all the cases sounds fine to me, and I'm glad you plan to communicate potential size params to the user when there is reasonable certainty.

xazax.hun: Not being able to cover all the cases sounds fine to me, and I'm glad you plan to communicate…

For example, the following function contains a local constant-size array.

.. code-block:: c++

void use_array(int[] x);

steakhalUnsubmitted

Not Done

of bounds checks. Variable ``size`` does not *have* to carry the size of the

- buffer (or the size of *that* buffer) just becaused it's named "size".

+ buffer (or the size of *that* buffer) just because it's named "size".

The compiler will avoid making guesses about that.

steakhal:

void test_array() {

int x[5];

x[3] = 3; // Warning: Array indexing is unsafe operation!

use_array(x);

}

steakhalUnsubmitted

Not Done

will demonstrate incorrect runtime behavior compared to the original code.

- In an otherwise well-formed program it is always possible (and usually easy)

+ In an otherwise well-formed program, it is always possible (and usually easy)

to resolve the placeholder correctly.

steakhal:

The automatic fixit associated with the warning would transform this array

aaron.ballmanUnsubmitted

Not Done

Because fixits can be applied automatically to an entire source file, we don't typically allow fix-its that can break code (or fixits where there are multiple choices for the user to pick from) unless they're attached to a *note* instead of a warning (or error).

Are you planning to follow that same pattern when placeholders are involved?

(For example, one potential use for automatic fix-its is to apply them automatically to source on commit before running precommit CI; similar to how folks sometimes run clang-format to automate "the source is different before/after the clang-format, so you should look into that".)

aaron.ballman: Because fixits can be applied automatically to an entire source file, we don't typically allow…

jkorousUnsubmitted

Not Done

That's an extremely interesting question! Let me add our current thinking as a context for the discussion.

Our analysis will have its limits and will not be able to provide 100% complete and guaranteed correct code transformation in every case.

At the same time it sounds suboptimal both from our and our users perspective to not emit anything for such situations.
Imagine we have a code transformation that affects 20 lines of code and all we need is for the user to address a single function call argument.
The thought of bailing, producing just a warning and having user to manually change all 20 lines of code sounds very unsatisfactory to me.
That's why we are thinking to use placeholders.

The next question is - can we not break the code when using placeholders?
By definition whenever we'd emit a placeholder our analysis can't provide a correct solution. That means that whatever way we'd make the code to compile can introduce a logic bug in the program. That is simply unacceptable given the overall goal of our effort.
That pretty much leaves us with only using placeholders that would break the build.

WDYT?

I wasn't aware of the precedent related to fixits attached to notes not having to lead to compilable code and we should consider that path.

jkorous: That's an extremely interesting question! Let me add our current thinking as a context for the…

aaron.ballmanUnsubmitted

Not Done

I wasn't aware of the precedent related to fixits attached to notes not having to lead to compilable code and we should consider that path.

We document the expectations here: https://clang.llvm.org/docs/InternalsManual.html#fix-it-hints and the interesting bit is:

"Fix-it hints on errors and warnings need to obey these rules:

Since they are automatically applied if -Xclang -fixit is passed to the driver, they should only be used when it’s very likely they match the user’s intent.
Clang must recover from errors as if the fix-it had been applied.
Fix-it hints on a warning must not change the meaning of the code. However, a hint may clarify the meaning as intentional, for example by adding parentheses when the precedence of operators isn’t obvious.

If a fix-it can’t obey these rules, put the fix-it on a note. Fix-its on notes are not applied automatically."

I think using placeholders on warnings rather than notes will be an issue for both the first and second bullets.

aaron.ballman: > I wasn't aware of the precedent related to fixits attached to notes not having to lead to…

NoQAuthorUnsubmitted

Done

I like that! This way we can even emit multiple alternative fixes, eg.

note: change type of variable 'ptr' to 'std::span' to add bound checks
note: change type of variable 'ptr' to 'std::span::iterator' to add bound checks

(I'm still not sure we'll ever going to recommend iterators as a good solution, but they're generally a better drop-in replacement for a raw pointer, so despite their increased verbosity, we're considering them at least in some cases)

Another thing this can take care of is support for custom containers:

note: change type of variable 'vla' to 'std::vector' to add bound checks
note: change type of variable 'vla' to 'llvm::SmallVector' to add bound checks

Now we can simply present all available containers to the user, and the user can select the one they like the most.

NoQ: I like that! This way we can even emit multiple alternative fixes, eg. ``` note: change type of…

jkorousUnsubmitted

Not Done

Yes, we should look into attaching FixIts that contain placeholders to notes instead.

jkorous: Yes, we should look into attaching FixIts that contain placeholders to notes instead.

into an ``std::array``:

.. code-block:: c++

void use_array(int[] x);

void test_array() {

std::array<int> x[5];

x[3] = 3;

use_array(x.data());

}

Note that the use site of variable ``x`` inside the call to ``use_array()``

needed an update. The fixit considers every use site of the fixed variable

steakhalUnsubmitted

Not Done

will remain operational. A warning will be emitted if the replacement function's

- prototype diverges from the original prototype beyond recognition. In such cases

+ prototype diverges from the original prototype beyond recognition. In such cases,

an attribute can be either updated to give more manual hints to the compiler, or

steakhal:

and updates them if necessary.

In some cases an automatic fix would be problematic, so the warning will simply

highlight unsafe operations for you to consider.

In some cases a partial fix is emitted, where you'll have to fill in a few

placeholders in order to document the bounds information related to the pointer.

This is particularly common when suggesting ``std::span``.

For example, consider function ``foo()`` we've seen before:

.. code-block:: c++

void foo(int *buf, size_t size) {

for (size_t i = 0; i < size; ++i) {

buf[i] = i;

}

In spirit of our guidelines, the automatic fixit would prefer this function

to accept a span instead of raw buffer, so that the span didn't need to be

unwrapped. Of course, such change alone would break both source compatibility

and binary compatibility. In order to avoid that, the fix will provide

a compatibility overload to preserve the old functionality. The updated code

produced by the fixit will look like this:

.. code-block:: c++

[[clang::unsafe_buffer_usage]]

void foo(int *buf, size_t size) {

foo(std::span<int>{ buf, __placeholder__}, size);

}

void foo(std::span<int> buf, size_t size) {

for (size_t i = 0; i < size; ++i) {

buf[i] = i;

}

The following changes were performed automatically:

- The type of parameter ``buf`` was changed from ``int *`` to

``std::span<int>``. Use sites updated if necessary.

- A compatibility overload was autogenerated with the old prototype, with

attribute ``[[clang::unsafe_buffer_usage]]`` attached to it to encourage

the callers to switch to the new function – and possibly update the callers

automatically!

The following changes need manual intervention:

- The compatibility overload contains a ``__placeholder__`` which needs

to be populated manually. In this case ``size`` is a good candidate.

- Despite accepting a ``std::span`` which carries size information,

the fixed function still accepts ``size`` separately. It can be removed

manually, or it can be preserved, if ``size`` and ``buf.size()``

actually need to be different in your case.

Placeholders fulfill an important purpose as they attract attention to

situations where the buffer's size wasn't properly documented for the purposes

of bounds checks. Variable ``size`` does not *have* to carry the size of the

buffer (or the size of *that* buffer) just becaused it's named "size".

The compiler will avoid making guesses about that.

The fixits emitted by the warning are correct modulo placeholders. Placeholders

are the only reason why fixed code is allowed to fail to compile.

Incorrectly resolving the placeholder is the only reason why fixed code

will demonstrate incorrect runtime behavior compared to the original code.

In an otherwise well-formed program it is always possible (and usually easy)

to resolve the placeholder correctly.

Note that regardless of how ``__placeholder__`` is resolved, it does not allow

you to remove the ``[[clang::unsafe_buffer_usage]]`` annotation. The annotation

will stay forever because that function is now equivalent to function ``baz()``

we've seen before: it contains no unsafe operations, but it only offers internal

consistency. It is still possible to misuse that function by supplying an

invalid ``size`` parameter. It still requires you to unwrap ``std::span`` if you

already have it, only to wrap it back immediately. So the callers should still

be updated to use the new function, and automatic fixits will now be emitted

for the call sites to aid that.

Even if you decide to remove the ``size`` parameter, fixits at call sites

will remain operational. A warning will be emitted if the replacement function's

prototype diverges from the original prototype beyond recognition. In such cases

an attribute can be either updated to give more manual hints to the compiler, or

changed to a different variant that explicitly opts out of automatic fixits.

(TODO: Elaborate on that last point and confirm that we actually want

such behavior.)

(TODO: Cover more examples.)

The Hardened libc++

===================

(TODO: Write this section. Probably just link to its own documentation.)

Clang Static Analyzer Checks

============================

(TODO: Write this section.)

clang/docs/index.rst

Show All 19 Lines	.. toctree::
Toolchain		Toolchain
LanguageExtensions		LanguageExtensions
ClangCommandLineReference		ClangCommandLineReference
AttributeReference		AttributeReference
DiagnosticsReference		DiagnosticsReference
CrossCompilation		CrossCompilation
ClangStaticAnalyzer		ClangStaticAnalyzer
ThreadSafetyAnalysis		ThreadSafetyAnalysis
		SafeBuffers
DataFlowAnalysisIntro		DataFlowAnalysisIntro
AddressSanitizer		AddressSanitizer
ThreadSanitizer		ThreadSanitizer
MemorySanitizer		MemorySanitizer
UndefinedBehaviorSanitizer		UndefinedBehaviorSanitizer
DataFlowSanitizer		DataFlowSanitizer
LeakSanitizer		LeakSanitizer
SanitizerCoverage		SanitizerCoverage
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[-Wunsafe-buffer-usage] WIP: RFC: NFC: User documentation.AcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 478450

clang/docs/SafeBuffers.rst

clang/docs/index.rst

[-Wunsafe-buffer-usage] WIP: RFC: NFC: User documentation.
AcceptedPublic