Page MenuHomePhabricator

Allow volatile parameters to __builtin_mem{cpy,move,set}
Needs ReviewPublic

Authored by jfb on May 1 2020, 6:01 PM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

The mem* builtins are often used (or should be used) in places where time-of-check time-of-use security issues are important (e.g. copying from untrusted buffers), because it prevents multiple reads / multiple writes from occurring at the untrusted memory location. The current builtins don't accept volatile pointee parameters in C++, and merely warn about such parameters in C, which leads to confusion. In these settings, it's useful to overload the builtin and permit volatile pointee parameters. The code generation then directly emits the existing volatile variant of the mem* builtin function call, which ensures that the affected memory location is only accessed once (thereby preventing double-reads under an adversarial memory mapping).

Side-note: yes, ToCToU avoidance is a valid use for volatile https://wg21.link/p1152r0#uses.

RFC for this patch: http://lists.llvm.org/pipermail/cfe-dev/2020-May/065385.html

Diff Detail

Event Timeline

jfb created this revision.May 1 2020, 6:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2020, 6:01 PM

Most of the complexity of this patch is introduced by the decision to type-check these calls with a volatile-typed parameter, which seems like it does nothing but cause problems. If your goal is to make these functions do the right thing when given arbitrary pointer types, I think you need to give these calls special type-checking semantics. Done right, that will also let you e.g. accept pointers into arbitrary address spaces. But I'm not sure how good of an idea this actually is at base, since these builtins are typically used for direct calls to their associated library functions.

jfb added a comment.May 2 2020, 3:24 PM

Most of the complexity of this patch is introduced by the decision to type-check these calls with a volatile-typed parameter, which seems like it does nothing but cause problems. If your goal is to make these functions do the right thing when given arbitrary pointer types, I think you need to give these calls special type-checking semantics. Done right, that will also let you e.g. accept pointers into arbitrary address spaces. But I'm not sure how good of an idea this actually is at base, since these builtins are typically used for direct calls to their associated library functions.

You mean: in Builtins,def allow a ? modifier on volatile (so, D?) to denote overloading on volatile, and consume that overload directly while type checking? That seems fine to me. Just want to make sure that's what you have in mind.

In D79279#2016496, @jfb wrote:

Most of the complexity of this patch is introduced by the decision to type-check these calls with a volatile-typed parameter, which seems like it does nothing but cause problems. If your goal is to make these functions do the right thing when given arbitrary pointer types, I think you need to give these calls special type-checking semantics. Done right, that will also let you e.g. accept pointers into arbitrary address spaces. But I'm not sure how good of an idea this actually is at base, since these builtins are typically used for direct calls to their associated library functions.

You mean: in Builtins,def allow a ? modifier on volatile (so, D?) to denote overloading on volatile, and consume that overload directly while type checking? That seems fine to me. Just want to make sure that's what you have in mind.

I don't know how that ? would actually work, but more importantly, no, that's not really what I meant. The current semantics of these builtins are that they are not polymorphic. I think the right way to understand what you're trying to do is that you're trying to make them polymorphic over qualifiers. That means that we can do a __builtin_memset of a volatile void *, yes, but it also means that we can do a __builtin_memset of a volatile restrict __local void *. (Now, some qualifiers don't make sense to honor because they'd be a radical change in behavior — we wouldn't want to honor __strong by doing a bunch of ARC assignments, for example. But most qualifiers do still make sense for these operations.) And in fact there's a certain amount of precedent for this because we already honor the presumed alignment of the pointer before it's converted to void*.

Unfortunately, the only way to do this kind of polymorphism in Clang today is to give the entire builtin custom type-checking with t. And you'll need to do a bunch of work to make sure that the behavior of this builtin is consistent with an ordinary call to memcpy when e.g. the argument is a class type with overloaded conversion operators to pointer types. But I think that's what you're signing up for.

I do think this is a somewhat debatable change in the behavior of these builtins, though.

I do think this is a somewhat debatable change in the behavior of these builtins, though.

Let me put more weight on this. You need to propose this on cfe-dev.

jfb added a comment.EditedMay 2 2020, 9:22 PM

I do think this is a somewhat debatable change in the behavior of these builtins, though.

Let me put more weight on this. You need to propose this on cfe-dev.

Happy to do so. Is this more about the change in the builtin, or about spelling it __builtin_volatile_memcpy and such? I've thought about this, and when the builtin has two potentially volatile arguments I've concluded that the IR builtin really wasn't sufficient in semantics, but in practice it is sufficient today. So putting volatile in a function name (versus overloading) seems to not really be what makes sense here. I'd therefore rather overload, and as you say we could support more than just volatile in doing so. Is that the main thing you'd suggest going for in an RFC (volatile as well as address space overloads and whatever else)? Again, I'm happy to do that, but I want to make sure I reflect your feedback correctly.

In D79279#2016604, @jfb wrote:

I do think this is a somewhat debatable change in the behavior of these builtins, though.

Let me put more weight on this. You need to propose this on cfe-dev.

Happy to do so. Is this more about the change in the builtin, or about spelling it __builtin_volatile_memcpy and such? I've thought about this, and when the builtin has two potentially volatile arguments I've concluded that the IR builtin really wasn't sufficient in semantics, but in practice it is sufficient today. So putting volatile in a function name (versus overloading) seems to not really be what makes sense here. I'd therefore rather overload, and as you say we could support more than just volatile in doing so. Is that the main thing you'd suggest going for in an RFC (volatile as well as address space overloads and whatever else)? Again, I'm happy to do that, but I want to make sure I reflect your feedback correctly.

__builtin_memcpy is a library builtin, which means its primary use pattern is #define tricks in the C standard library headers that redirect calls to the memcpy library function. So doing what you're suggesting to __builtin_memcpy is also changing the semantics of memcpy, which is not something we should do lightly. If we were talking about changing a non-library builtin function, or introducing a new builtin, the considerations would be very different. If that's what you want to do now, I think that's much easier to justify, although I still think it's worth starting a thread about it to give people an opportunity to weigh in.

jfb edited the summary of this revision. (Show Details)Wed, May 6, 3:42 PM