This is an archive of the discontinued LLVM Phabricator instance.

Add overloaded versions of builtin mem* functions
Needs RevisionPublic

Authored by jfb on May 1 2020, 6:01 PM.

Details

Summary

The mem* builtins are often used (or should be used) in places where time-of-check time-of-use security issues are important (e.g. copying from untrusted buffers), because it prevents multiple reads / multiple writes from occurring at the untrusted memory location. The current builtins don't accept volatile pointee parameters in C++, and merely warn about such parameters in C, which leads to confusion. In these settings, it's useful to overload the builtin and permit volatile pointee parameters. The code generation then directly emits the existing volatile variant of the mem* builtin function call, which ensures that the affected memory location is only accessed once (thereby preventing double-reads under an adversarial memory mapping).

Instead of just handling volatile, this patch handles other overloads when they make sense, specifically around pointer qualification. It also cleans up some of the custom diagnostic handling to reduce duplication.

The patch also happens to be a decent base to implement C's memset_s from Annex K, as well as an alternate (and arguably better approach) that the C++ Committee's proposal for byte-wise atomic memcpy as described in https://wg21.link/P1478 (in particular, it lets developer own the fences, a topic discussed in https://wg21.link/p2061).

Side-note: yes, ToCToU avoidance is a valid use for volatile https://wg21.link/p1152r0#uses.

RFC for this patch: http://lists.llvm.org/pipermail/cfe-dev/2020-May/065385.html

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jfb added inline comments.Jul 2 2020, 5:29 PM
clang/lib/Sema/SemaChecking.cpp
1509 ↗(On Diff #275005)

It's to avoid weird corner cases where this check isn't super relevant, but subsequent ones are. It avoids making isVolatileQualified below sad because e.g. void makes the QualType null. That one can't be _Atomic, and it can be volatile but then the size won't match the _Atomic's size.

1527 ↗(On Diff #275005)

I don't think so here either.

rjmccall added inline comments.Jul 2 2020, 11:01 PM
clang/lib/Sema/SemaChecking.cpp
1605 ↗(On Diff #275266)

I am not a fan of this lambda style, not because I dislike lambdas, but because you've pulled a ton of code that's supporting one or two cases (that could easily be handled together) into a much wider scope.

Your helper code are doing a ton of redundant type checks and is probably not as general as you think it is. You need to call DefaultFunctionArrayLvalueConversion on the pointer arguments, after which you can just check for a pointer type. You also need to convert the size argument to a size_t as if initializing a parameter. If you do these things, the IRGen code will get much simpler because e.g. it will not need to specially handle arrays anymore. You will also start magically doing the right thing w.r.t ODR-uses of constexpr variables.

gchatelet added inline comments.Jul 3 2020, 1:00 PM
clang/include/clang/Basic/Builtins.def
485

I don't see memmove_inline being useful but memset and memcmp would make sense to add as building blocks for C++ implementations (e.g. libc memcpy)

As for this new addition, how about __builtin_memcpy_honor_qualifiers?
I fear that __builtin_memcpy_overloaded is too ambiguous.

jfb updated this revision to Diff 279896.Jul 22 2020, 11:26 AM

Follow John's suggestions

You need to add user docs for these builtins.

clang/include/clang/Basic/DiagnosticSemaKinds.td
8917 ↗(On Diff #279896)

I don't know why you're adding a bunch of new diagnostics about _Atomic.

clang/lib/CodeGen/CGBuiltin.cpp
630

Since arrays are handled separately now, this is just getPointeeType(), but I don't know why you need to support ObjC object pointer types here at all.

clang/lib/CodeGen/CGExpr.cpp
1070 ↗(On Diff #279896)

Why arrays?

clang/lib/Sema/SemaChecking.cpp
5446 ↗(On Diff #279896)

Do you ever write these back into the call?

5468 ↗(On Diff #279896)

You already know that DstTy and SrcTy are non-null here.

Why do you need to support atomic types for these operations anyway? It just seems treacherous and unnecessary.

jfb updated this revision to Diff 279984.Jul 22 2020, 5:39 PM
jfb marked 7 inline comments as done.

Address all but one of John's comments

clang/include/clang/Basic/DiagnosticSemaKinds.td
8917 ↗(On Diff #279896)

Maybe the tests clarify this? Here's my rationale for the 3 new atomic diagnostics:

  • Don't support mixing volatile and atomic, because we'd need to add IR support for it. It might be useful, but as a follow-up.
  • Overloaded memcpy figures out the atomic operation size based on the element's own size. There's a destination and a source pointer, and we can't figure out the expected atomic operation size if they differ. It's likely an unintentional error to have different sizes when doing an atomic memcpy, so instead of figuring out the largest common matching size I figure it's better to diagnose.
  • Supporting non-lock-free sizes seems fraught with peril, since it's likely unintentional. It's certainly doable (loop call the runtime support), but it's unclear if we should take the lock just once over the entire loop, or once for load+store, or once for load and once for store. I don't see a point in supporting it.
clang/lib/CodeGen/CGBuiltin.cpp
630

I'll remove ObjC handling for now, I added it because of code like what's in:
clang/test/CodeGenObjC/builtin-memfns.m

// PR13697
void cpy1(int *a, id b) {
  // CHECK-LABEL: @cpy1(
  // CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* {{.*}}, i8* {{.*}}, i64 8, i1 false)
  memcpy(a, b, 8);
}

Should we support this? It seems to me like yes, but you seem to think otherwise?

On arrays / ObjC being handled now: that's not really true... or rather, it now is for the builtins I'm adding, but not for the previously existing builtins. We can't just get the pointer argument type for this code:

// <rdar://problem/11314941>
// Make sure we don't over-estimate the alignment of fields of
// packed structs.
struct PS {
  int modes[4];
} __attribute__((packed));
struct PS ps;
void test8(int *arg) {
  // CHECK: @test8
  // CHECK: call void @llvm.memcpy{{.*}} align 4 {{.*}} align 1 {{.*}} 16, i1 false)
  __builtin_memcpy(arg, ps.modes, sizeof(struct PS));
}

Because __builtin_memcpy doesn't perform the conversion. Arguable a pre-existing bug, which I can patch here as I have, or fix in Sema if you'd rather see that? LMK.

clang/lib/Sema/SemaChecking.cpp
5468 ↗(On Diff #279896)

Leftover from the refactoring :)

It's useful to get atomic memcpy, see https://wg21.link/P1478
It's also part of "support overloaded memcpy" which is what doing more than volatile implies.

Is there a need for an atomic memcpy at all? Why is it useful to allow this operation to take on "atomic" semantics — which aren't actually atomic because the loads and stores to elements are torn — with hardcoded memory ordering and somewhat arbitrary rules about what the atomic size is?

jfb added a comment.Jul 22 2020, 8:32 PM

Is there a need for an atomic memcpy at all? Why is it useful to allow this operation to take on "atomic" semantics — which aren't actually atomic because the loads and stores to elements are torn — with hardcoded memory ordering and somewhat arbitrary rules about what the atomic size is?

Hans lays out a rationale for usefulness in his paper, but what I've implemented is more useful: it's *unordered* so you can fence as you desire around it, yet it guarantees a minimum memory access size based on the pointer parameters. For example, copying an atomic int will be 4 byte operations which are single-copy-atomic, but the accesses from one int to the next aren't performed in any guaranteed order (or observable in any guaranteed order either). I talked about this with him a while ago but IIRC he wasn't sure about implementation among other things, so when you asked me to widen my original volatile-only memcpy to also do other qualifiers, I realized that it was a neat way to do atomic as well as other qualifiers. I've talked to a few SG1 folks about this, and I believe (for other reasons too) it's where the design will end up for Hans' paper.

In D79279#2168533, @jfb wrote:

Is there a need for an atomic memcpy at all? Why is it useful to allow this operation to take on "atomic" semantics — which aren't actually atomic because the loads and stores to elements are torn — with hardcoded memory ordering and somewhat arbitrary rules about what the atomic size is?

Hans lays out a rationale for usefulness in his paper, but what I've implemented is more useful: it's *unordered* so you can fence as you desire around it, yet it guarantees a minimum memory access size based on the pointer parameters. For example, copying an atomic int will be 4 byte operations which are single-copy-atomic, but the accesses from one int to the next aren't performed in any guaranteed order (or observable in any guaranteed order either). I talked about this with him a while ago but IIRC he wasn't sure about implementation among other things, so when you asked me to widen my original volatile-only memcpy to also do other qualifiers, I realized that it was a neat way to do atomic as well as other qualifiers. I've talked to a few SG1 folks about this, and I believe (for other reasons too) it's where the design will end up for Hans' paper.

I can see the usefulness of this operation, but it seems like a odd semantic mismatch for what is basically just a memcpy where one of the pointers happens to have _Atomic type, like you're shoe-horning it into this builtin just to avoid declaring a different one.

jfb added a comment.Jul 23 2020, 8:09 AM
In D79279#2168533, @jfb wrote:

Is there a need for an atomic memcpy at all? Why is it useful to allow this operation to take on "atomic" semantics — which aren't actually atomic because the loads and stores to elements are torn — with hardcoded memory ordering and somewhat arbitrary rules about what the atomic size is?

Hans lays out a rationale for usefulness in his paper, but what I've implemented is more useful: it's *unordered* so you can fence as you desire around it, yet it guarantees a minimum memory access size based on the pointer parameters. For example, copying an atomic int will be 4 byte operations which are single-copy-atomic, but the accesses from one int to the next aren't performed in any guaranteed order (or observable in any guaranteed order either). I talked about this with him a while ago but IIRC he wasn't sure about implementation among other things, so when you asked me to widen my original volatile-only memcpy to also do other qualifiers, I realized that it was a neat way to do atomic as well as other qualifiers. I've talked to a few SG1 folks about this, and I believe (for other reasons too) it's where the design will end up for Hans' paper.

I can see the usefulness of this operation, but it seems like a odd semantic mismatch for what is basically just a memcpy where one of the pointers happens to have _Atomic type, like you're shoe-horning it into this builtin just to avoid declaring a different one.

I'm following the discussion we had here regarding overloading:

There are other qualifiers that can meaningfully contribute to the operation here besides volatile, such as restrict and (more importantly) address spaces. And again, for the copy operations these might differ between the two pointer types.

In both cases, I’d say that the logical design is to allow the pointers to be to arbitrarily-qualified types. We can then propagate that information from the builtin into the LLVM intrinsic call as best as we’re allowed. So I think you should make builtins called something like __builtin_overloaded_memcpy (name to be decided) and just have their semantics be type-directed.

Ah yes, I’d like to hear what others think of this. I hadn’t thought about it before you brought it up, and it sounds like a good idea.

As you noted earlier, for memcpy you probably want to express differences in destination and source qualification, even if today IR can't express e.g. volatile source and non-volatile destination. You were talking about volatile, but this applies to the entire combination of dst+src qualified with zero-to-five volatile, _Atomic, __unaligned, restrict, and address space. Pulling the entire combination space out into different functions would create way too many functions. Right now the implementation has a few limitations: it treats both dst and src as volatile if either are, it can't do _Atomic with volatile so we diagnose, and it ignores restrict. Otherwise it supports all combinations.

jfb updated this revision to Diff 280135.Jul 23 2020, 8:20 AM

Improve documentation

jfb updated this revision to Diff 280136.Jul 23 2020, 8:21 AM

Re-update

In D79279#2169522, @jfb wrote:
In D79279#2168533, @jfb wrote:

Is there a need for an atomic memcpy at all? Why is it useful to allow this operation to take on "atomic" semantics — which aren't actually atomic because the loads and stores to elements are torn — with hardcoded memory ordering and somewhat arbitrary rules about what the atomic size is?

Hans lays out a rationale for usefulness in his paper, but what I've implemented is more useful: it's *unordered* so you can fence as you desire around it, yet it guarantees a minimum memory access size based on the pointer parameters. For example, copying an atomic int will be 4 byte operations which are single-copy-atomic, but the accesses from one int to the next aren't performed in any guaranteed order (or observable in any guaranteed order either). I talked about this with him a while ago but IIRC he wasn't sure about implementation among other things, so when you asked me to widen my original volatile-only memcpy to also do other qualifiers, I realized that it was a neat way to do atomic as well as other qualifiers. I've talked to a few SG1 folks about this, and I believe (for other reasons too) it's where the design will end up for Hans' paper.

I can see the usefulness of this operation, but it seems like a odd semantic mismatch for what is basically just a memcpy where one of the pointers happens to have _Atomic type, like you're shoe-horning it into this builtin just to avoid declaring a different one.

I'm following the discussion we had here regarding overloading:

There are other qualifiers that can meaningfully contribute to the operation here besides volatile, such as restrict and (more importantly) address spaces. And again, for the copy operations these might differ between the two pointer types.

In both cases, I’d say that the logical design is to allow the pointers to be to arbitrarily-qualified types. We can then propagate that information from the builtin into the LLVM intrinsic call as best as we’re allowed. So I think you should make builtins called something like __builtin_overloaded_memcpy (name to be decided) and just have their semantics be type-directed.

Ah yes, I’d like to hear what others think of this. I hadn’t thought about it before you brought it up, and it sounds like a good idea.

As you noted earlier, for memcpy you probably want to express differences in destination and source qualification, even if today IR can't express e.g. volatile source and non-volatile destination. You were talking about volatile, but this applies to the entire combination of dst+src qualified with zero-to-five volatile, _Atomic, __unaligned, restrict, and address space. Pulling the entire combination space out into different functions would create way too many functions. Right now the implementation has a few limitations: it treats both dst and src as volatile if either are, it can't do _Atomic with volatile so we diagnose, and it ignores restrict. Otherwise it supports all combinations.

My point is that this has nothing to do with the ordinary semantics of _Atomic. You're basically just looking at the word "atomic" and saying that, hey, a minimum access size is sortof related to atomicity.

If you want this to be able to control the minimum access size, you should allow that to be passed in as an optional argument instead.

jfb added a comment.Jul 23 2020, 10:17 AM

My point is that this has nothing to do with the ordinary semantics of _Atomic. You're basically just looking at the word "atomic" and saying that, hey, a minimum access size is sortof related to atomicity.

If you want this to be able to control the minimum access size, you should allow that to be passed in as an optional argument instead.

OK so it sounds like you're suggesting *two* versions of the overloaded builtins:

  1. __builtin_memcpy_overloaded which overloads on volatile, restrict, __unaligned, and address spaces, but not on _Atomic qualifiers.
  2. __builtin_atomic_memcpy_overloaded which overloads on volatile (but unsupported for now), restrict, and address spaces, but not on _Atomic qualifiers (because it's implicit), and not on __unaligned because that's a constraint. This takes an extra "element size" parameter, which we hope folks don't confuse with the size parameter (I'd expect a template or macro wrapper to hide that extra parameter when actually using the builtin).

Of course, that's two versions for each of memcpy, memmove, memset, and any other *mem that we decide to add to this list of overloadable functions.

Is that correct?

I don't think any of these should allow _Atomic unless we're going to give it some sort of consistent atomic semantics (which is hard to imagine being useful), and I think you should just take an extra argument of the minimum access width on all of them uniformly if you think that's important. Builtins can have optional arguments.

jfb added a comment.Jul 23 2020, 10:29 AM

I don't think any of these should allow _Atomic unless we're going to give it some sort of consistent atomic semantics (which is hard to imagine being useful), and I think you should just take an extra argument of the minimum access width on all of them uniformly if you think that's important. Builtins can have optional arguments.

OK so: __builtin_memcpy_overloaded which overloads on volatile, restrict, __unaligned, and address spaces, but not on _Atomic qualifiers. Optionally, a 4th integer parameter can be provided to represent element_size. If provided, this becomes an unordered atomic memcpy with element size equal to or greater than the provided element_size. That value must be a power of two, and must be lock-free (what we call maximum atomic inline width in target info). If provided, then __unaligned is invalid, and volatile ought to be valid but is currently unsupported because IR can't do atomic+volatile memcpy (it would be useful, say for shared memory, but Patches Welcome).

Do you think there should be any relationship at all between dst/src pointee type's size and element_size? i.e. if I copy short* using an element size of 1 byte, is that OK? It seems like larger element sizes is always OK, but smaller might be a programmer error? If that's what they wanted, they could have done (char*)my_short. Or is this trying to be too helpful?

I think the argument is treated as if it were 1 if not given. That's all that ordinary memcpy formally guarantees, which seems to work fine (semantically, if not performance-wise) for pretty much everything today. I don't think you need any restrictions on element size. It's probably sensible to require the pointers to be dynamically aligned to a multiple of the access width, but I don't think you can enforce that statically. And of course the length needs to be a multiple of the access size.

Do you think it'd be useful to have different guarantees for different operands? I guess it could come up, but it'd be a whole lot of extra complexity that I can't imagine we'd ever support.

If one of the arguments is volatile, arguably the minimum access width (if given) needs to be exact. If we don't support that right now, it's okay to make it an error, which is basically you've already done with the _Atomic volatile diagnostic.

jfb added a comment.Jul 23 2020, 10:52 AM

I think the argument is treated as if it were 1 if not given. That's all that ordinary memcpy formally guarantees, which seems to work fine (semantically, if not performance-wise) for pretty much everything today.

I'm not sure that's true: consider a memcpy implementation which copies some bytes twice (at different access size, there's an overlap because somehow it's more efficient). That would probably violate the programmer's expectations, and I don't think volatile nor atomic memcpy allow this (but regular memcpy does).

I don't think you need any restrictions on element size. It's probably sensible to require the pointers to be dynamically aligned to a multiple of the access width, but I don't think you can enforce that statically.

Agreed, if we're given a short and told to copy 4 bytes at a time then UBSan could find the constraint violation on alignment, but generally the only way we can diagnose is if the parameter is __unaligned (because there you're explicitly telling me it's not aligned, and the constraint is that it has to be).

And of course the length needs to be a multiple of the access size.

Yeah.

Do you think it'd be useful to have different guarantees for different operands? I guess it could come up, but it'd be a whole lot of extra complexity that I can't imagine we'd ever support.

You mean, if element_size is passed then you get different guarantees? I think that's what makes sense: if you're asking for atomic memcpy then you get guarantees. If you're asking for volatile mempcy then you get others. That's why overloading (and multiple parameters) can be confusing, but at the same time I think it's better than having the combinatorial number of named functions instead.

If one of the arguments is volatile, arguably the minimum access width (if given) needs to be exact. If we don't support that right now, it's okay to make it an error, which is basically you've already done with the _Atomic volatile diagnostic.

Agreed. volatile with size makes a lot of sense, and the IR version of it, once created, ought to not be able to widen accesses. volatile without a size specified makes sense too, because you just want a single read and a single write, don't care about tearing.

In D79279#2170187, @jfb wrote:

I think the argument is treated as if it were 1 if not given. That's all that ordinary memcpy formally guarantees, which seems to work fine (semantically, if not performance-wise) for pretty much everything today.

I'm not sure that's true: consider a memcpy implementation which copies some bytes twice (at different access size, there's an overlap because somehow it's more efficient). That would probably violate the programmer's expectations, and I don't think volatile nor atomic memcpy allow this (but regular memcpy does).

Yes, that's true, if you need an only-accessed-once guarantee, that's above and beyond just a minimum access size. I agree that volatile would need to make this guarantee.

Do you think it'd be useful to have different guarantees for different operands? I guess it could come up, but it'd be a whole lot of extra complexity that I can't imagine we'd ever support.

You mean, if element_size is passed then you get different guarantees?

No, sorry, I mean different guarantees for the different pointer operands. In principle, we could allow you to say that the memcpy has to be done with 4-byte accesses from the source and 2-byte accesses to the destination. That's implementable but a lot of work.

If one of the arguments is volatile, arguably the minimum access width (if given) needs to be exact. If we don't support that right now, it's okay to make it an error, which is basically you've already done with the _Atomic volatile diagnostic.

Agreed. volatile with size makes a lot of sense, and the IR version of it, once created, ought to not be able to widen accesses. volatile without a size specified makes sense too, because you just want a single read and a single write, don't care about tearing.

Right.

jfb added a comment.Jul 23 2020, 11:30 AM

Do you think it'd be useful to have different guarantees for different operands? I guess it could come up, but it'd be a whole lot of extra complexity that I can't imagine we'd ever support.

You mean, if element_size is passed then you get different guarantees?

No, sorry, I mean different guarantees for the different pointer operands. In principle, we could allow you to say that the memcpy has to be done with 4-byte accesses from the source and 2-byte accesses to the destination. That's implementable but a lot of work.

Gotcha. Yeah I think it's useful as a niche thing, and if IR supports that for say volatile then we can honor lopsided volatile overloads (instead of treating the entire thing as volatile). I hadn't really thought about lopsided access sizes (since it fell out of _Atomic). Maybe it's useful? I was just banning unequal sizes before because it seemed like a mistake to copy to/from different types. If we wanted to support it, I suppose we could add another optional parameter, so I'm OK not doing it now, and adding later if useful.

Alright, I'll update the patch as discussed, thanks!

These new builtins should ideally support constant evaluation if possible.

clang/docs/LanguageExtensions.rst
2439–2440 ↗(On Diff #280136)

This is missing some important details:

  • What does the size parameter mean? Is it number of bytes or number of elements? If it's number of bytes, what happens if it's not a multiple of the element size, particularly in the _Atomic case?
  • What does the value parameter to memset mean? Is it splatted to the element width? Does it specify a complete element value?
  • For _Atomic, what memory order is used?
  • For volatile, what access size / type is used? Do we want to make any promises?
  • Are the loads and stores typed or untyped? (In particular, do we annotate with TBAA metadata?)
  • Do we guarantee to copy the object representation or only the value representation? (Do we preserve the values of padding bits in the source, and initialize padding bits in the destination?)

You should also document whether constant evaluation of these builtins is supported.

2446–2448 ↗(On Diff #280136)

Mixing those qualifiers doesn't seem like it will work in many cases: we don't allow mixing volatile and _Atomic (though I'm not sure why; LLVM supports volatile atomic operations), and presumably we shouldn't allow mixing __unaligned and _Atomic (although I don't see any tests for that, and maybe we should just outright disallow combining _Atomic with __unaligned in general).

clang/include/clang/Basic/Builtins.def
467

Are these really GCC builtins?

1476

The new builtins probably belong in this section of the file instead.

clang/include/clang/Basic/DiagnosticSemaKinds.td
7961–7963 ↗(On Diff #280136)

I'd prefer to keep this diagnostic separate, since it communicates more information than err_argument_needs_trivial_copy does: specifically that we need a trivial copy because we're performing an atomic operation.

8926–8934 ↗(On Diff #280136)

Please format these diagnostics consistently with the rest of the file: line break after Error<, wrap to 80 columns, don't leave blank lines between individual diagnostics in a group of related diagnostics.

clang/lib/Sema/SemaChecking.cpp
1277–1278 ↗(On Diff #280136)

There are a bunch of places in this file that do manual argument count checking and could use checkArgCount instead (search for err_typecheck_call_too_ to find them). If you want to clean this up, please do so in a separate change.

5510–5512 ↗(On Diff #280136)

Do we need this constraint?

  • If one side is atomic and the other is not, then we can do all of the operations with the atomic width.
  • If both sides are atomic, then one side is 2^N times the size of the other; we can do 2^N operations on one side for each operation on the other side.

Maybe the second case is not worth the effort, but permitting (for example) a memcpy from an _Atomic int* to a char* seems useful and there doesn't seem to be a good reason to disallow it.

jfb updated this revision to Diff 281087.Jul 27 2020, 5:24 PM
jfb marked 15 inline comments as done.

Address comments

jfb added a subscriber: dneilson.Jul 27 2020, 5:24 PM

I've addressed @rsmith @rjmccall suggestions (unless noted), thanks!
An open question: as of 6e4aa1e48138182685431c76184dfc36e620aea2 @dneilson added an assertion on CreateElementUnorderedAtomicMemCpy to check that the pointer arguments have alignments of at least the element size. That makes sense when the IR is only ever built internally to LLVM, but now that I'm adding a builtin it's more of a dynamic property. Should I also force this in the frontend (understanding that alignment isn't always well known at compile time), or should simply assume that the alignment is correct because it's a dynamic property?

I left some FIXMEs in the CodeGen test for this.

clang/docs/LanguageExtensions.rst
2439–2440 ↗(On Diff #280136)

Most of these are answered in the update.

Some of the issue is that the current documentation is silent on these points already, by saying "same as C's mem* function". I'm relying on that approach here as well.

Size is bytes.

memset value is an unsigned char.

Memory order is unordered, and accesses themselves are done in indeterminate order.

For volatile, it falls out of the new wording that we don't provide access size guarantees. We'd need to nail down IR better to do so, and I don't think it's the salient property (though as discussed above, it might be useful, and the element_size parameter make it easy to do so).

Same on TBAA, no mention because "same as C" (no TBAA annotations).

Same on copying bits as-is.

Good point on constant evaluation. I added support. Note that we don't have memset constant evaluation, so I didn't support it. Seems easy, but ought to be a separate patch.

2446–2448 ↗(On Diff #280136)

volatile and _Atomic ought to work...

For this code I didn't make it work (even if it might be useful), because we'd need IR support for it.

On mixing _Atomic __unaligned: I left a FIXME because I'm not 100% sure, given the alignment discussion on atomic in general. Let's see where we settle: if we make it a pure runtime property then __unaligned ought to be fine because it's a constraint violation if the actual pointer is truly unaligned.

clang/include/clang/Basic/Builtins.def
467

Oops, I didn't see that comment, was just copying __builtin_memcpy_inline. I'll move it too.

clang/lib/Sema/SemaChecking.cpp
1277–1278 ↗(On Diff #280136)
5510–5512 ↗(On Diff #280136)

Based on @rjmccall's feedback, I'm disallowing _Atomic qualification, and keying off the optional element_size parameter to determine atomicity. I'm also only taking in one size, not two, since as discussed it might be useful to allow two but I haven't heard that anyone actually wants it at the moment.

jfb marked an inline comment as done.Jul 27 2020, 5:29 PM
jfb added inline comments.
clang/lib/AST/ExprConstant.cpp
8793 ↗(On Diff #281087)

If we end up making alignment a runtime constraint, then I'll need to check it in consteval. Otherwise I don't think we need to check anything since Sema ought to have done all the required checks already.

rsmith added inline comments.Jul 27 2020, 7:53 PM
clang/docs/LanguageExtensions.rst
2435–2437 ↗(On Diff #281087)

What happens if byte_element_size does not divide byte_size?

2435–2437 ↗(On Diff #281087)

Did you really mean void* here? I've been pretty confused by some of the stuff happening below that seems to depend on the actual type of the passed pointer, which would make more sense if you meant QUAL T * here rather than QUAL void*. Do the builtins do different things for different argument pointee types or not?

2444–2445 ↗(On Diff #281087)

"the pointer's element size" -- do you mean "the provided element size"?

Does the element size need to be a compile-time constant? (Presumably, but you don't say so.)

2446–2447 ↗(On Diff #281087)

Presumably this means that it's an error if we don't provide lock-free atomic access for that size. Would be worth saying so.

2450–2451 ↗(On Diff #281087)

What happens if they're not? Is it UB, or is it just not guaranteed to be atomic?

2456 ↗(On Diff #281087)

*facilities

clang/include/clang/Basic/DiagnosticSemaKinds.td
8937–8938 ↗(On Diff #281087)

Given the new documentation, I would expect you don't need this any more.

8941–8943 ↗(On Diff #281087)

Presumably the number of bytes need not be a compile-time constant? It's a bit weird to produce an error rather than a warning on a case that would be valid but (perhaps?) UB if the argument were non-constant.

clang/lib/AST/ExprConstant.cpp
8793 ↗(On Diff #281087)

I don't see how you can check the alignment at compile time given a void* argument.

We presumably need to check that the element size (if given) divides the total size, assuming the outcome is UB if not.

jfb updated this revision to Diff 282281.Jul 31 2020, 11:30 AM
jfb marked 9 inline comments as done.

Address Richard's comments.

jfb added a comment.Jul 31 2020, 11:30 AM

This is almost ready I think!
There are a few things still open, I'd love feedback on them.

clang/docs/LanguageExtensions.rst
2435–2437 ↗(On Diff #281087)

Runtime constraint violation. constexpr needs to catch this too, added. Though IIUC we can't actually check alignment in constexpr, which makes sense since there's no actual address.

Similarly, I think we ought to add UBSan builtin check for this. I think it makes sense to add as an option to CreateElementUnorderedAtomicMemCpy: either assert-check at compile-time (the current default, which triggers assertions as I've annotated in the tests' FIXME), or at runtime if the sanitizer is enabled. WDYT?

I've added these two to the documentation.

2435–2437 ↗(On Diff #281087)

Oh yeah, this should be T* and U*. Fixed.

They used to key atomicity off of element size, but now that we have the extra parameter we only look at T and U for correctness (not behavior).

clang/include/clang/Basic/DiagnosticSemaKinds.td
8941–8943 ↗(On Diff #281087)

I commented below, indeed it seems like some of this ought to be relaxed.

clang/lib/Sema/SemaChecking.cpp
5567 ↗(On Diff #281087)

I'm re-thinking these checks:

if (ElSz->urem(DstElSz))
  return ExprError(
      Diag(TheCall->getBeginLoc(),
           PDiag(diag::err_atomic_builtin_ext_size_mismatches_el))
      << (int)ElSz->getLimitedValue() << DstElSz << DstValTy
      << DstOp->getSourceRange() << Arg->getSourceRange());

I'm not sure we ought to have them anymore. We know that the types are trivially copyable, it therefore doesn't really matter if you're copying with operations smaller than the type itself. For example:

struct Data {
  int a, b, c, d;
};

It ought to be fine to do 4-byte copies of Data, if whatever your algorithm is is happy with that. I therefore think I'll remove these checks based on the dst / src element types. The only thing that seems to make sense is making sure that you don't straddle object boundaries with element size.

I removed sizeless types: we'll codegen whatever you ask for.

rsmith added inline comments.Jul 31 2020, 4:51 PM
clang/docs/LanguageExtensions.rst
2430 ↗(On Diff #282281)
2454 ↗(On Diff #282281)

From the above description, I think the documentation is unclear what the types T and U are used for. I think the answer is something like:

"""
The types T and U are required to be trivially-copyable types, and byte_element_size (if specified) must be a multiple of the size of both types. dst and src are expected to be suitably aligned for T and U objects, respectively.
"""

But... we actually get the alignment information by analyzing pointer argument rather than from the types, just like we do for memcpy and memmove, so maybe the latter part is not right. (What did you intend regarding alignment for the non-atomic case?) The trivial-copyability and divisibility checks don't seem fundamentally important to the goal of the builtin, so I wonder if we could actually just use void here and remove the extra checks. (I don't really have strong views one way or the other on this, except that we should either document what T and U are used for or make the builtins not care about the pointee type beyond its qualifiers.)

clang/lib/AST/ExprConstant.cpp
8851 ↗(On Diff #282281)

We could use the same logic we use in __builtin_is_aligned here. For any object whose value the constant evaluator can reason about, we should be able to compute at least a minimal alignment (though the actual runtime alignment might of course be greater).

clang/lib/CodeGen/CGBuiltin.cpp
2543–2544

Looking through implicit conversions in getPtrArgType here will change the code we generate for cases like:

void f(volatile void *p, volatile void *q) {
  memcpy(p, q, 4);
}

... (in C, where we permit such implicit conversions) to use a volatile memcpy intrinsic. Is that an intentional change?

clang/lib/CodeGen/CGExpr.cpp
1167–1168 ↗(On Diff #282281)

Do we still need this? We should be doing the decay in Sema.

jfb updated this revision to Diff 283078.Aug 4 2020, 5:49 PM
jfb marked 6 inline comments as done.

Address Richard's comments, add UBSan support.

Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2020, 5:49 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
jfb added a comment.Aug 4 2020, 5:51 PM

Thanks for the detailed comments, I think I've addressed all of them! I also added UBSan support to check the builtin invocation. I think this patch is pretty much ready to go. A follow-up will need to add the support functions to compiler-rt (they're currently optional, as per https://reviews.llvm.org/D33240), and in cases where size is known we can inline them (as we do for memcpy and friends).

clang/docs/LanguageExtensions.rst
2454 ↗(On Diff #282281)

You're right. I've removed most treatment of T / U, and updated the documentation.

I left the trivial copy check, but void* is a usual escape hatch.

Divisibility is now only checked for size / element_size.

clang/lib/AST/ExprConstant.cpp
8851 ↗(On Diff #282281)

I think the runtime alignment is really the only thing that matters here. I played with constexpr checking based on what __builtin_is_aligned does, and it's not particularly useful IMO.

clang/lib/CodeGen/CGBuiltin.cpp
2543–2544

I'm confused... what's the difference that this makes for the pre-existing builtins? My intent was to get the QualType unconditionally, but I can conditionalize it if needed... However this ought to make no difference:

static QualType getPtrArgType(CodeGenModule &CGM, const CallExpr *E,
                              unsigned ArgNo) {
  QualType ArgTy = E->getArg(ArgNo)->IgnoreImpCasts()->getType();
  if (ArgTy->isArrayType())
    return CGM.getContext().getAsArrayType(ArgTy)->getElementType();
  if (ArgTy->isObjCObjectPointerType())
    return ArgTy->castAs<clang::ObjCObjectPointerType>()->getPointeeType();
  return ArgTy->castAs<clang::PointerType>()->getPointeeType();
}

and indeed I can't see the example you provided change in IR from one to the other. The issue I'm working around is that getting it unconditionally would make ObjC code sad when id is passed in as I outlined above.

clang/lib/Sema/SemaChecking.cpp
5567 ↗(On Diff #281087)

They're gone, we now only check that size and element size match up.

jfb added inline comments.Aug 4 2020, 5:57 PM
compiler-rt/test/ubsan/TestCases/Misc/builtins-ctz-clz.cpp
1 ↗(On Diff #283078)

Phab is confused.... I did a git rename of compiler-rt/test/ubsan/TestCases/Misc/builtins.cpp and it thinks this is new, and I deleted the other.

rsmith added a comment.Aug 4 2020, 6:24 PM

Thanks. I'd like @rjmccall to approve too, but the design of these intrinsics and the Sema and constant evaluation parts seem fine. (We don't strictly need constant evaluation to abort on library UB, so I think not catching the misalignment case is OK.)

clang/docs/LanguageExtensions.rst
2447–2449 ↗(On Diff #283078)

"runtime constraint violation" is an odd phrase; in C, "constraint violation" means a diagnostic is required. Can we instead say that it results in undefined behavior?

2454 ↗(On Diff #282281)

Please document the trivial copy check.

clang/lib/CodeGen/CGBuiltin.cpp
2543–2544

The example I gave should produce a non-volatile memcpy, and used to do so (we passed false as the fourth parameter to CreateMemCpy). With this patch, getPtrArgTypewill strip off the implicit conversion from volatile void* to void* in the argument type, so isVolatile below will be true, so I think it will now create a volatile memcpy for the same testcase. If that's not what's happening, then I'd like to understand why not :)

I'm not saying that's necessarily a bad change, but it is a change, and it's one we should make intentionally if we make it at all.

Patch looks basically okay to me, although I'll second Richard's concern that we shouldn't absent-mindedly start producing overloaded memcpys for ordinary __builtin_memcpy.

clang/docs/LanguageExtensions.rst
2446 ↗(On Diff #283078)

"*The* element size must..." But I would suggest using "access size" consistently rather than "element size".

jfb updated this revision to Diff 283121.Aug 4 2020, 9:21 PM
jfb marked 2 inline comments as done.

Update docs

jfb added a comment.Aug 4 2020, 9:21 PM

Patch looks basically okay to me, although I'll second Richard's concern that we shouldn't absent-mindedly start producing overloaded memcpys for ordinary __builtin_memcpy.

Yeah I think that's a leftover from the first patch. Should I drop it? At the same time, address spaces are currently accidentally "working", should I drop that in a patch before this change? Or leave as-is?

clang/docs/LanguageExtensions.rst
2446 ↗(On Diff #283078)

I'm being consistent with the naming for IR, which uses "element" as well. I'm not enamored with the naming, but wanted to point out the purposeful consistency to make sure you preferred "access size". Without precedent I would indeed prefer "access size", but have a slight preference for consistency here. This is extremely weakly held preference.

(I fixed "the").

2454 ↗(On Diff #282281)

Should I bubble this to the rest of the builtin in a follow-up patch? I know there are cases where that'll cause issues, but I worry that it would be a pretty noisy diagnostic (especially if we instead bubble it to C's memcpy instead).

clang/lib/CodeGen/CGBuiltin.cpp
2543–2544

Oh yes, sorry I thought you were talking about something that getPtrArgType did implicitly! Indeed the C code behaves differently in that it doesn't just strip volatile anymore.

I'm not super thrilled by the default C behavior, and I think this new behavior removes a gotcha, and is in fact what I was going for in the first iteration of the patch. Now that I've separated the builtin I agree that it's a bit odd... but it seems like the right thing to do anyways? But it no longer matches the C library function to do so.

FWIW, this currently "works as you'd expect":

void f(__attribute__((address_space(32))) void *dst, const void *src, int sz) {
    __builtin_memcpy(dst, src, sz);
}

https://godbolt.org/z/dcWxcK

and I think that's completely accidental because the C library function doesn't (and, as John pointed out earlier, the builtin is meant to act like the C function).

vsk added a subscriber: vsk.Aug 5 2020, 10:46 AM
vsk added inline comments.
compiler-rt/lib/ubsan/ubsan_handlers.cpp
659 ↗(On Diff #283121)

It looks like __ubsan_handle_invalid_builtin is meant to be recoverable, so I think this should be GET_REPORT_OPTIONS(false). Marking this unrecoverable makes it impossible to suppress redundant diagnostics at the same source location. It looks this isn't code you've added: feel free to punt this to me. If you don't mind folding in a fix, adding a test would be simple (perform UB in a loop and verify only one diagnostic is printed).

rjmccall added inline comments.Aug 5 2020, 11:29 AM
clang/docs/LanguageExtensions.rst
2446 ↗(On Diff #283078)

IR naming is generally random fluff plucked from the mind of an inspired compiler engineer. User documentation is the point where we're supposed to put our bad choices behind us and do something that makes sense to users. :)

Two observations that are new to me in this review:

  1. We already treat all builtins as being overloaded on address space.
  2. The revised patch treats __builtin_mem*_overloaded as being overloaded *only* on address space, volatility, and atomicity. (We've tuned the design to a point where the other qualifiers don't matter any more.)

So, we're adding three features here: overloading on (a) address space, (b) volatility, and (c) atomicity. (a) is already available in the non-_overloaded form, and we seem to be approaching agreement that (b) should be available in the non-_overloaded form too. So that only leaves (c), which is really not _overloaded but _atomic.

Based on those observations I'm now thinking that we might prefer a somewhat different approach (but one that should require only minimal changes to the patch in front of us). Specifically:

  1. Stop treating lib builtins (eg, plain memcpy) as overloaded on address space. That's a (pre-existing) conformance bug, at least for the Embedded C TR.
  2. Keep __builtin_ forms of lib builtins overloaded on address space. (No change.)
  3. Also overload __builtin_ forms of lib builtins on volatility where it makes sense, instead of adding new builtin names __builtin_mem*_overloaded.
  4. Add a new name for the builtin for the atomic forms of memcpy and memset (__builtin_memcpy_unordered_atomic maybe?).
  5. Convert the "trivial types" check from an error to a warning and apply it to all the mem* overloads. (Though I think we might actually already have such a check, so this might only require extending it to cover the atomic builtin.)

What do you think?

clang/docs/LanguageExtensions.rst
2434 ↗(On Diff #283121)

Does restrict really make sense here? It seems like the only difference it could possibly make would be to treat memcpy as memmove if either operand is marked restrict, but
(a) if the caller wants that, they can just use memcpy directly, and
(b) it's not correct to propagate restrict-ness from the caller to the callee anyway, because restrict-ness is really a property of the declaration of the identifier in its scope, not a property of its type:

void f(void *restrict p) {
  __builtin_memmove(p, p + 1, 4);
}

(c) a restrict qualifier on the pointee type is irrelevant to memcpy and a restrict qualifier on the pointer type isn't part of QUAL.

2435 ↗(On Diff #283121)

I don't think `__unaligned matters any more. We always take the actual alignment inferred from the pointer arguments, just like we do for non-overloaded memcpy`.

2455–2456 ↗(On Diff #283121)
clang/lib/CodeGen/CGBuiltin.cpp
2543–2544

FWIW, this currently "works as you'd expect":

void f(__attribute__((address_space(32))) void *dst, const void *src, int sz) {
    __builtin_memcpy(dst, src, sz);
}

The same is true even if you remove the __builtin_ (and add a suitable include), and that seems like a bug to me. It looks like we have special logic that treats *all* builtins taking pointers as being overloaded on address space, which seems wrong at least for lib builtins. C TR 18037:2008 is quite explicit about this. Section 5.1.4 says:

"""
The standard C library (ISO/IEC 9899:1999 clause 7 - Libraries) is unchanged; the library's
functions and objects continue to be declared only with regard to the generic address space. One
consequence is that pointers into named address spaces cannot be passed as arguments to library
functions except in the special case that the named address spaces are subsets of the generic
address space.
"""

We could retain that special rule for __builtin_-spelled variants of lib builtins. If we do, then maybe we shouldn't be adding __builtin_memcpy_overloaded at all and should only extend the behavior of __builtin_memcpy to also propagate volatility (and add a new builtin for the atomic case).

Regarding volatile, consider:

void maybe_volatile_memcpy(volatile void *dst, const volatile void *src, int sz, _Bool is_volatile) {
  if (is_volatile) {
#ifdef __clang__
    __builtin_memcpy_overloaded(dst, src, sz);
#elif __GNUC__
    // ...
#else
    // volatile char copy loop
#endif
  }
  memcpy(dst, src, sz);
}

With this patch, the above code will always perform a volatile memcpy. I think that's a surprise. A call to memcpy should follow the C semantics, even if we choose to change the semantics of __builtin_memcpy.

clang/lib/Sema/SemaChecking.cpp
5732 ↗(On Diff #283121)

You need to call Sema::isCompleteType first before asking this question, in order to trigger class instantiation when necessary in C++. (Likewise for the checks in the previous function.)

I thought part of the point of __builtin_memcpy was so that C library headers could do #define memcpy(x, y, z) __builtin_memcpy(x, y, z). If so, the conformance issue touches __builtin_memcpy as well, not just calls to the library builtin.

If that's not true, or if we're willing to ignore it, I agree that making __builtin_memcpy do the right thing for qualifiers in general is the right thing to do.

jfb updated this revision to Diff 283390.Aug 5 2020, 2:02 PM

Do UBSan change suggested by Vedant.

jfb updated this revision to Diff 283400.Aug 5 2020, 2:36 PM
jfb marked an inline comment as done.

Add loop test requested by Vedant

compiler-rt/lib/ubsan/ubsan_handlers.cpp
659 ↗(On Diff #283121)

I folded this into the patch.

jfb updated this revision to Diff 283402.Aug 5 2020, 2:42 PM
jfb marked an inline comment as done.

Use 'access size' instead of 'element size'.

clang/docs/LanguageExtensions.rst
2446 ↗(On Diff #283078)

"access size" it is :)

jfb updated this revision to Diff 283406.Aug 5 2020, 3:02 PM
jfb marked 5 inline comments as done.

Remove restrict, update docs, call isCompleteType

jfb added a comment.Aug 5 2020, 3:02 PM

Two observations that are new to me in this review:

  1. We already treat all builtins as being overloaded on address space.
  2. The revised patch treats __builtin_mem*_overloaded as being overloaded *only* on address space, volatility, and atomicity. (We've tuned the design to a point where the other qualifiers don't matter any more.)

So, we're adding three features here: overloading on (a) address space, (b) volatility, and (c) atomicity. (a) is already available in the non-_overloaded form, and we seem to be approaching agreement that (b) should be available in the non-_overloaded form too. So that only leaves (c), which is really not _overloaded but _atomic.

Based on those observations I'm now thinking that we might prefer a somewhat different approach (but one that should require only minimal changes to the patch in front of us). Specifically:

  1. Stop treating lib builtins (eg, plain memcpy) as overloaded on address space. That's a (pre-existing) conformance bug, at least for the Embedded C TR.
  2. Keep __builtin_ forms of lib builtins overloaded on address space. (No change.)
  3. Also overload __builtin_ forms of lib builtins on volatility where it makes sense, instead of adding new builtin names __builtin_mem*_overloaded.
  4. Add a new name for the builtin for the atomic forms of memcpy and memset (__builtin_memcpy_unordered_atomic maybe?).
  5. Convert the "trivial types" check from an error to a warning and apply it to all the mem* overloads. (Though I think we might actually already have such a check, so this might only require extending it to cover the atomic builtin.)

What do you think?

That's fine with me, but as John noted that's inconsistent with what he thought builtins allowed (i.e. #define memcpy(dst, src, sz) __builtin_memcpy(dst, src, sz). If that's Not A Thing then your plan works. If it is then we need to tune it a bit.

Also note that the _atomic builtin also needs to support some overloading, at least for address spaces (and maybe volatile in the future).

So, let me know what you'd both rather see, so I don't ping-pong code too much.

clang/docs/LanguageExtensions.rst
2434 ↗(On Diff #283121)

I dropped restrict.

2435 ↗(On Diff #283121)

It's still allowed as a qualifier, though.

clang/lib/Sema/SemaChecking.cpp
5732 ↗(On Diff #283121)

Before the condition, right? LMK if I added the right thing!

I thought part of the point of __builtin_memcpy was so that C library headers could do #define memcpy(x, y, z) __builtin_memcpy(x, y, z). If so, the conformance issue touches __builtin_memcpy as well, not just calls to the library builtin.

They would have to declare it as well (because C code can #undef memcpy and expect to then be able to call a real function), so the #define would be pointless. It doesn't look like glibc does anything like this; do you know of a C standard library implementation that does?

If we want to follow that path, then we'll presumably (eventually) want address-space-_overloaded versions of all lib builtins that take pointers -- looks like that's around 60 functions total. That said, I do wonder how many of the functions in question that we're implicitly overloading on address space actually support such overloading -- certainly any of them that we lower to a call to a library function is going to go wrong at runtime.

+@tstellar, who added this functionality in r233706 -- what was the intent here?

I thought part of the point of __builtin_memcpy was so that C library headers could do #define memcpy(x, y, z) __builtin_memcpy(x, y, z). If so, the conformance issue touches __builtin_memcpy as well, not just calls to the library builtin.

They would have to declare it as well (because C code can #undef memcpy and expect to then be able to call a real function), so the #define would be pointless. It doesn't look like glibc does anything like this; do you know of a C standard library implementation that does?

If we want to follow that path, then we'll presumably (eventually) want address-space-_overloaded versions of all lib builtins that take pointers -- looks like that's around 60 functions total. That said, I do wonder how many of the functions in question that we're implicitly overloading on address space actually support such overloading -- certainly any of them that we lower to a call to a library function is going to go wrong at runtime.

+@tstellar, who added this functionality in r233706 -- what was the intent here?

The goal of this patch was to avoid having to overload all the builtin with address spaces, which would be a lot of new builtins, but this functionality was added for targets that do not have a memcpy lib call, so I didn't consider the case where a libcall would be emitted.

I thought part of the point of __builtin_memcpy was so that C library headers could do #define memcpy(x, y, z) __builtin_memcpy(x, y, z). If so, the conformance issue touches __builtin_memcpy as well, not just calls to the library builtin.

They would have to declare it as well (because C code can #undef memcpy and expect to then be able to call a real function), so the #define would be pointless.

It wouldn't be pointless; it would enable optimization of direct calls to memcpy (the 99% case) without the compiler having to special-case a function by name. And you don't need an #undef, since &memcpy doesn't trigger the preprocessor when memcpy is a function-like macro. I seem to remember this being widely used in some math libraries, but it's entirely possible that it's never been used for memcpy and the like. It's also entirely possible that I'm passing around folklore.

If we just want memcpy to do the right thing when called directly, that's not ridiculous. I don't think it would actually have any conformance problems: a volatile memcpy is just a less optimizable memcpy, and to the extent that an address-space-aware memcpy is different from the standard definition, it's probably UB to use the standard definition to copy memory in non-generic address spaces.

rsmith added a comment.Aug 6 2020, 4:50 PM

I thought part of the point of __builtin_memcpy was so that C library headers could do #define memcpy(x, y, z) __builtin_memcpy(x, y, z). If so, the conformance issue touches __builtin_memcpy as well, not just calls to the library builtin.

For what it's worth, giving __builtin_memcpy broader semantics than memcpy wouldn't be unprecedented: it's already the case that __builtin_memcpy is usable in constant expressions where plain memcpy is not (but there's no macro risk in that case at least since C++ memcpy can't be a macro, and a call to memcpy is never an ICE in C).

They would have to declare it as well (because C code can #undef memcpy and expect to then be able to call a real function), so the #define would be pointless.

It wouldn't be pointless; it would enable optimization of direct calls to memcpy (the 99% case) without the compiler having to special-case a function by name.

I mean, yes, in an environment where the compiler didn't special-case the function by name anyway it wouldn't be pointless. I'm not aware of any such environment in popular usage, but that could just be ignorance on my part.

If we just want memcpy to do the right thing when called directly, that's not ridiculous. I don't think it would actually have any conformance problems: a volatile memcpy is just a less optimizable memcpy, and to the extent that an address-space-aware memcpy is different from the standard definition, it's probably UB to use the standard definition to copy memory in non-generic address spaces.

I think this is a constraint violation in C; C11 6.5.2.2/2 (a "Constraints" paragraph) requires that "Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter." So this would be an extension, not just defining UB, and seems like the kind of extension we'd normally diagnose under -pedantic-errors, but we could make an exception. I also think it'd be mildly surprising for an explicitly-declared function to allow different argument conversions depending on whether its name is memcpy.

It seems to me that you're not entirely comfortable with making memcpy and __builtin_memcpy differ in this way, and I'm not entirely comfortable with making memcpy's behavior differ from its declared type. Meanwhile, @tstellar's patch wants __builtin_memcpy to be overloaded on address space. I don't see a way to address all three concerns at once.

I think it would be reasonable in general to guarantee that our __builtin_ functions have contracts at least as wide as the underlying C function, but allow them to have extensions, and to keep the plain C functions unextended. I had actually thought we already did that in more cases than we do (but perhaps I was thinking about the LLVM math intrinsics that guarantee to not set errno). That would mean that a C standard library implementation is still free to #define foo(x,y,z) __builtin_foo(x,y,z), but if they do, they may pick up extensions.

jfb added a comment.Aug 11 2020, 11:17 AM

I think it would be reasonable in general to guarantee that our __builtin_ functions have contracts at least as wide as the underlying C function, but allow them to have extensions, and to keep the plain C functions unextended. I had actually thought we already did that in more cases than we do (but perhaps I was thinking about the LLVM math intrinsics that guarantee to not set errno). That would mean that a C standard library implementation is still free to #define foo(x,y,z) __builtin_foo(x,y,z), but if they do, they may pick up extensions.

Alright, how about I go with what you say here, and instead of adding __builtin_*_overloaded versions I just overload the __builtin_* variants? This includes having an optional 4th parameter for access size.
Alternatively, I could overload __builtin_*, but have a separate set of functions (say __builtin_*_sized) for the atomic access size variants.

jfb updated this revision to Diff 285414.Aug 13 2020, 10:03 AM

Update overloading as discussed: on the original builtins, and separate the _sized variant, making its 4th parameter non-optional. If this looks good, I'll need to update codege for a few builtins (to handle volatile), as well as add tests for their codegen and address space (which should already work, but isn't tested).

jfb updated this revision to Diff 285479.Aug 13 2020, 1:11 PM

Fix a test.

jfb added a comment.Aug 13 2020, 2:09 PM

Actually I think any subsequent updates to tests can be done in a follow-up patch, since I'm not changing the status-quo on address space here.

jfb added a comment.Aug 24 2020, 10:59 AM

Ping, I think I've addressed all comments here.

Thanks, I'm happy with this approach.

If I understand correctly, the primary (perhaps sole) purpose of __builtin_memcpy_sized is to provide a primitive from which an atomic operation can be produced. That being the case, I wonder if the name is emphasizing the wrong thing, and a name that contains atomic would be more obvious. As a concrete alternative, __atomic_unordered_memcpy is not much longer and seems to have the right kinds of implications. WDYT?

clang/docs/LanguageExtensions.rst
2395–2397 ↗(On Diff #285479)

"can also be overloaded" -> "are also overloaded" would be clearer I think. ("Can also be overloaded" would suggest to me that it's the user of the builtin who overloads them, perhaps by declaring overloads.)

2403–2406 ↗(On Diff #285479)

Is this true in general, or only for [w]mem{cpy,move}? I thought for the other cases, we required an array of char / wchar_t?

2409 ↗(On Diff #285479)

I think this only applies to the above list minus the five functions you added to it. Given this and the previous comment, I'm not sure that merging the documentation on string builtins and memory builtins is working out well -- they seem to have more differences than similarities.

(memset is an outlier here that should be called out -- we don't seem to provide any constant evaluation support for it whatsoever.)

2451 ↗(On Diff #285479)
2452 ↗(On Diff #285479)
clang/lib/AST/ExprConstant.cpp
8870 ↗(On Diff #285479)

getLimitedValue() here seems unnecessary; urem can take an APInt.

8872 ↗(On Diff #285479)

Consider using toString instead of truncating each APSInt to uint64_t then to int. The size might reliably fit into uint64_t, but I don't think we can assume that int is large enough.

clang/lib/CodeGen/CGBuiltin.cpp
629

I'm not sure this castAs is necessarily correct. If the operand is C++11 nullptr, we could perform a null-to-pointer implicit conversion, and ArgTy could be NullPtrTy after stripping that back off here.

It seems like maybe what we want to do is strip off implicit conversions until we hit a non-pointer type, and take the pointee type we found immediately before that?

clang/lib/Sema/SemaChecking.cpp
5609 ↗(On Diff #285479)

Generally, I'm a little uncomfortable about producing an error if a type is complete but allowing the construct if the type is incomplete -- that seems like a situation where a warning would be more appropriate to me. It's surprising and largely unprecedented that providing more information about a type would change the program from valid to invalid.

Do we really need the protection of an error here rather than an enabled-by-default warning? Moreover, don't we already have a warning for memcpy of a non-trivially-copyable object somewhere? If not, then I think we should add such a thing that also applies to the real memcpy, rather than only warning on the builtin.

5732 ↗(On Diff #283121)

It would be more correct from a modules perspective to use

if (isCompleteType(Loc, T) && !T.isTriviallyCopyableType(Context))

That way, if the definition of the type is in some loaded-but-not-imported module file, we'll treat it the same as if the definition of the type is entirely unknown. (That also removes the need to check for the void case.) But given that this only allows us to accept code that is wrong in some sense, I'm not sure it really matters much.

jfb updated this revision to Diff 288131.Aug 26 2020, 4:22 PM
jfb marked 12 inline comments as done.

Address Richard's comments.

jfb added a comment.Aug 26 2020, 4:25 PM

Thanks, I'm happy with this approach.

If I understand correctly, the primary (perhaps sole) purpose of __builtin_memcpy_sized is to provide a primitive from which an atomic operation can be produced. That being the case, I wonder if the name is emphasizing the wrong thing, and a name that contains atomic would be more obvious. As a concrete alternative, __atomic_unordered_memcpy is not much longer and seems to have the right kinds of implications. WDYT?

Kinda, that's the motivating from Hans' paper which I'm following. One other use case (and the reason I assume Azul folks want it too) is when there's a GC that looks at objects. With this it knows it won't see torn objects when the GC is concurrent. It's similar, but generally atomic also implies an ordering, and here it's explicitly unordered (i.e. Bring Your Own Fences).

So I don't have a strong opinion on the name, but atomic seem slightly wrong.

Follow-ups I suggest in comments:

  • Make "must be trivially copyable" a warning throughout (not just here, but for atomics too).
  • Implement that diagnostic for mem* functions.
clang/docs/LanguageExtensions.rst
2403–2406 ↗(On Diff #285479)

This is just moving documentation that was below. Phab is confused with the diff.

2409 ↗(On Diff #285479)

[w]memset are indeed the odd ones, update says so.

clang/lib/AST/ExprConstant.cpp
8870 ↗(On Diff #285479)

Their bitwidth doesn't always match, and that asserts out.

8872 ↗(On Diff #285479)

OK I updated 2 other places as well.

clang/lib/CodeGen/CGBuiltin.cpp
629

Ah good catch! The new functions I'm adding just disallow nullptr, but the older ones allow it. I've modified the code accordingly and added a test in CodeGen for nullptr.

2543–2544

Yes, I believe that this is a pre-existing inconsistency with the non-__builtin_ variants.

clang/lib/Sema/SemaChecking.cpp
5609 ↗(On Diff #285479)

That rationale makes sense, but it's pre-existing behavior for atomic. I can change all of them in a follow-up if that's OK?

We don't have such a check for other builtins. I can do a second follow-up to then adopt these warnings for them too?

jfb added a reviewer: rsmith.Aug 29 2020, 3:52 PM
erichkeane accepted this revision.Nov 4 2020, 9:34 AM

I went through all the comments here, plus looked at the code myself. I believe all of the comments by other reviewers have been fixed/answered acceptably. I don't have any additional comments to add, therefore I think it is appropriate to accept this revision.

@jfb: Please give this a day or two before committing to give the other reviewers a chance to speak up!

This revision is now accepted and ready to land.Nov 4 2020, 9:34 AM
rsmith requested changes to this revision.Nov 4 2020, 5:43 PM

I think the documentation isn't quite right yet, but otherwise I think I'm happy. (With a couple of code change suggestions.)

In D79279#2240487, @jfb wrote:

If I understand correctly, the primary (perhaps sole) purpose of __builtin_memcpy_sized is to provide a primitive from which an atomic operation can be produced. That being the case, I wonder if the name is emphasizing the wrong thing, and a name that contains atomic would be more obvious. As a concrete alternative, __atomic_unordered_memcpy is not much longer and seems to have the right kinds of implications. WDYT?

Kinda, that's the motivating from Hans' paper which I'm following. One other use case (and the reason I assume Azul folks want it too) is when there's a GC that looks at objects. With this it knows it won't see torn objects when the GC is concurrent. It's similar, but generally atomic also implies an ordering, and here it's explicitly unordered (i.e. Bring Your Own Fences).

So I don't have a strong opinion on the name, but atomic seem slightly wrong.

I mean, I see your point, but I think not mentioning atomic at all also seems a little surprising, given how tied this feature is to atomic access (eg, rejecting access sizes larger than the inline atomic width). But this is not a hill I have any desire to die on :) If you'd prefer to keep the current name, I can live with it.

clang/docs/LanguageExtensions.rst
2403–2406 ↗(On Diff #285479)

The documentation that was moved applied to memcpy, memmove, wmemcpy, and wmemmove. In the new location, it doesn't apply to wmemcpy nor wmemmove but does now apply to memchr, memcmp, ..., for which it is incorrect.

We used to distinguish between "string builtins", which had the restriction to character types, and "memory builtins", which had the restriction to trivially-copyable types. Can you put that distinction back, or otherwise restore the wording to the old / correct state?

2409 ↗(On Diff #285479)

This feature test macro doesn't cover memcpy / memmove; this documentation is incorrect for older versions of Clang. Clang 4-7 define this feature test macro but do not support constant evaluation of memcpy / memmove.

clang/lib/CodeGen/CGBuiltin.cpp
630–637

(Just a simplification, NFC.)

clang/lib/Sema/SemaChecking.cpp
5609 ↗(On Diff #285479)

The pre-existing behavior for atomic builtins is to reject if the type is incomplete. Eg;

<stdin>:1:33: error: address argument to atomic operation must be a pointer to a trivially-copyable type ('struct A *' invalid)
struct A; void f(struct A *p) { __atomic_store(p, p, 0); }
                                ^              ~

We should do the same here. (Though I'd suggest calling RequireCompleteType instead to get a more meaningful diagnostic.) These days I think we should check for unsized types too, eg:

if (RequireCompleteSizedType(ScrOp->getBeginLoc(), SrcValTy))
  return true;
if (!SrcValTy.isTriviallyCopyableType(Context) && !SrcValTy->isVoidType())
  return ExprError(...);
This revision now requires changes to proceed.Nov 4 2020, 5:43 PM