This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/docs/
-
docs/
34/36
LanguageExtensions.rst

Differential D79279

Add overloaded versions of builtin mem* functions
Needs RevisionPublic

Authored by jfb on May 1 2020, 6:01 PM.

Download Raw Diff

Details

Reviewers

tstellar
rsmith
erichkeane

Summary

The mem* builtins are often used (or should be used) in places where time-of-check time-of-use security issues are important (e.g. copying from untrusted buffers), because it prevents multiple reads / multiple writes from occurring at the untrusted memory location. The current builtins don't accept volatile pointee parameters in C++, and merely warn about such parameters in C, which leads to confusion. In these settings, it's useful to overload the builtin and permit volatile pointee parameters. The code generation then directly emits the existing volatile variant of the mem* builtin function call, which ensures that the affected memory location is only accessed once (thereby preventing double-reads under an adversarial memory mapping).

Instead of just handling volatile, this patch handles other overloads when they make sense, specifically around pointer qualification. It also cleans up some of the custom diagnostic handling to reduce duplication.

The patch also happens to be a decent base to implement C's memset_s from Annex K, as well as an alternate (and arguably better approach) that the C++ Committee's proposal for byte-wise atomic memcpy as described in https://wg21.link/P1478 (in particular, it lets developer own the fences, a topic discussed in https://wg21.link/p2061).

Side-note: yes, ToCToU avoidance is a valid use for volatile https://wg21.link/p1152r0#uses.

RFC for this patch: http://lists.llvm.org/pipermail/cfe-dev/2020-May/065385.html

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

jfb added inline comments.Jul 2 2020, 5:29 PM

clang/lib/Sema/SemaChecking.cpp
1509 ↗	(On Diff #275005)	It's to avoid weird corner cases where this check isn't super relevant, but subsequent ones are. It avoids making `isVolatileQualified` below sad because e.g. `void` makes the `QualType` null. That one can't be `_Atomic`, and it can be `volatile` but then the size won't match the `_Atomic`'s size.
1527 ↗	(On Diff #275005)	I don't think so here either.

Harbormaster failed remote builds in B62786: Diff 275266!Jul 2 2020, 6:53 PM

rjmccall added inline comments.Jul 2 2020, 11:01 PM

clang/lib/Sema/SemaChecking.cpp
1605 ↗	(On Diff #275266)	I am not a fan of this lambda style, not because I dislike lambdas, but because you've pulled a ton of code that's supporting one or two cases (that could easily be handled together) into a much wider scope. Your helper code are doing a ton of redundant type checks and is probably not as general as you think it is. You need to call `DefaultFunctionArrayLvalueConversion` on the pointer arguments, after which you can just check for a pointer type. You also need to convert the size argument to a `size_t` as if initializing a parameter. If you do these things, the IRGen code will get much simpler because e.g. it will not need to specially handle arrays anymore. You will also start magically doing the right thing w.r.t ODR-uses of constexpr variables.

gchatelet added inline comments.Jul 3 2020, 1:00 PM

clang/include/clang/Basic/Builtins.def
489 ↗	(On Diff #275005)	I don't see `memmove_inline` being useful but memset and memcmp would make sense to add as building blocks for C++ implementations (e.g. libc memcpy) As for this new addition, how about `__builtin_memcpy_honor_qualifiers`? I fear that `__builtin_memcpy_overloaded` is too ambiguous.

Follow John's suggestions

Harbormaster completed remote builds in B65267: Diff 279896.Jul 22 2020, 12:26 PM

You need to add user docs for these builtins.

clang/include/clang/Basic/DiagnosticSemaKinds.td
8917 ↗	(On Diff #279896)	I don't know why you're adding a bunch of new diagnostics about _Atomic.
clang/lib/CodeGen/CGBuiltin.cpp
636 ↗	(On Diff #279896)	Since arrays are handled separately now, this is just `getPointeeType()`, but I don't know why you need to support ObjC object pointer types here at all.
clang/lib/CodeGen/CGExpr.cpp
1070 ↗	(On Diff #279896)	Why arrays?
clang/lib/Sema/SemaChecking.cpp
5446 ↗	(On Diff #279896)	Do you ever write these back into the call?
5468 ↗	(On Diff #279896)	You already know that DstTy and SrcTy are non-null here. Why do you need to support atomic types for these operations anyway? It just seems treacherous and unnecessary.

Address all but one of John's comments

clang/include/clang/Basic/DiagnosticSemaKinds.td
8917 ↗	(On Diff #279896)	Maybe the tests clarify this? Here's my rationale for the 3 new atomic diagnostics: Don't support mixing `volatile` and `atomic`, because we'd need to add IR support for it. It might be useful, but as a follow-up. Overloaded `memcpy` figures out the atomic operation size based on the element's own size. There's a destination and a source pointer, and we can't figure out the expected atomic operation size if they differ. It's likely an unintentional error to have different sizes when doing an atomic `memcpy`, so instead of figuring out the largest common matching size I figure it's better to diagnose. Supporting non-lock-free sizes seems fraught with peril, since it's likely unintentional. It's certainly doable (loop call the runtime support), but it's unclear if we should take the lock just once over the entire loop, or once for load+store, or once for load and once for store. I don't see a point in supporting it.
clang/lib/CodeGen/CGBuiltin.cpp
636 ↗	(On Diff #279896)	I'll remove ObjC handling for now, I added it because of code like what's in: clang/test/CodeGenObjC/builtin-memfns.m // PR13697 void cpy1(int a, id b) { // CHECK-LABEL: @cpy1( // CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8 {{.}}, i8 {{.}}, i64 8, i1 false) memcpy(a, b, 8); } Should we support this? It seems to me like yes, but you seem to think otherwise? On arrays / ObjC being handled now: that's not really true... or rather, it now is for the builtins I'm adding, but not for the previously existing builtins. We can't just get the pointer argument type for this code: // <rdar://problem/11314941> // Make sure we don't over-estimate the alignment of fields of // packed structs. struct PS { int modes[4]; } __attribute__((packed)); struct PS ps; void test8(int arg) { // CHECK: @test8 // CHECK: call void @llvm.memcpy{{.}} align 4 {{.}} align 1 {{.*}} 16, i1 false) __builtin_memcpy(arg, ps.modes, sizeof(struct PS)); } Because `__builtin_memcpy` doesn't perform the conversion. Arguable a pre-existing bug, which I can patch here as I have, or fix in Sema if you'd rather see that? LMK.
clang/lib/Sema/SemaChecking.cpp
5468 ↗	(On Diff #279896)	Leftover from the refactoring :) It's useful to get atomic memcpy, see https://wg21.link/P1478 It's also part of "support overloaded memcpy" which is what doing more than `volatile` implies.

Harbormaster completed remote builds in B65315: Diff 279984.Jul 22 2020, 6:29 PM

Is there a need for an atomic memcpy at all? Why is it useful to allow this operation to take on "atomic" semantics — which aren't actually atomic because the loads and stores to elements are torn — with hardcoded memory ordering and somewhat arbitrary rules about what the atomic size is?

In D79279#2168479, @rjmccall wrote:

Is there a need for an atomic memcpy at all? Why is it useful to allow this operation to take on "atomic" semantics — which aren't actually atomic because the loads and stores to elements are torn — with hardcoded memory ordering and somewhat arbitrary rules about what the atomic size is?

Hans lays out a rationale for usefulness in his paper, but what I've implemented is more useful: it's *unordered* so you can fence as you desire around it, yet it guarantees a minimum memory access size based on the pointer parameters. For example, copying an atomic int will be 4 byte operations which are single-copy-atomic, but the accesses from one int to the next aren't performed in any guaranteed order (or observable in any guaranteed order either). I talked about this with him a while ago but IIRC he wasn't sure about implementation among other things, so when you asked me to widen my original volatile-only memcpy to also do other qualifiers, I realized that it was a neat way to do atomic as well as other qualifiers. I've talked to a few SG1 folks about this, and I believe (for other reasons too) it's where the design will end up for Hans' paper.

In D79279#2168533, @jfb wrote:

In D79279#2168479, @rjmccall wrote:

Is there a need for an atomic memcpy at all? Why is it useful to allow this operation to take on "atomic" semantics — which aren't actually atomic because the loads and stores to elements are torn — with hardcoded memory ordering and somewhat arbitrary rules about what the atomic size is?

Hans lays out a rationale for usefulness in his paper, but what I've implemented is more useful: it's *unordered* so you can fence as you desire around it, yet it guarantees a minimum memory access size based on the pointer parameters. For example, copying an atomic int will be 4 byte operations which are single-copy-atomic, but the accesses from one int to the next aren't performed in any guaranteed order (or observable in any guaranteed order either). I talked about this with him a while ago but IIRC he wasn't sure about implementation among other things, so when you asked me to widen my original volatile-only memcpy to also do other qualifiers, I realized that it was a neat way to do atomic as well as other qualifiers. I've talked to a few SG1 folks about this, and I believe (for other reasons too) it's where the design will end up for Hans' paper.

I can see the usefulness of this operation, but it seems like a odd semantic mismatch for what is basically just a memcpy where one of the pointers happens to have _Atomic type, like you're shoe-horning it into this builtin just to avoid declaring a different one.

In D79279#2168649, @rjmccall wrote:

In D79279#2168533, @jfb wrote:

In D79279#2168479, @rjmccall wrote:

Is there a need for an atomic memcpy at all? Why is it useful to allow this operation to take on "atomic" semantics — which aren't actually atomic because the loads and stores to elements are torn — with hardcoded memory ordering and somewhat arbitrary rules about what the atomic size is?

Hans lays out a rationale for usefulness in his paper, but what I've implemented is more useful: it's *unordered* so you can fence as you desire around it, yet it guarantees a minimum memory access size based on the pointer parameters. For example, copying an atomic int will be 4 byte operations which are single-copy-atomic, but the accesses from one int to the next aren't performed in any guaranteed order (or observable in any guaranteed order either). I talked about this with him a while ago but IIRC he wasn't sure about implementation among other things, so when you asked me to widen my original volatile-only memcpy to also do other qualifiers, I realized that it was a neat way to do atomic as well as other qualifiers. I've talked to a few SG1 folks about this, and I believe (for other reasons too) it's where the design will end up for Hans' paper.

I can see the usefulness of this operation, but it seems like a odd semantic mismatch for what is basically just a memcpy where one of the pointers happens to have _Atomic type, like you're shoe-horning it into this builtin just to avoid declaring a different one.

I'm following the discussion we had here regarding overloading:

There are other qualifiers that can meaningfully contribute to the operation here besides volatile, such as restrict and (more importantly) address spaces. And again, for the copy operations these might differ between the two pointer types.

In both cases, I’d say that the logical design is to allow the pointers to be to arbitrarily-qualified types. We can then propagate that information from the builtin into the LLVM intrinsic call as best as we’re allowed. So I think you should make builtins called something like __builtin_overloaded_memcpy (name to be decided) and just have their semantics be type-directed.

Ah yes, I’d like to hear what others think of this. I hadn’t thought about it before you brought it up, and it sounds like a good idea.

As you noted earlier, for memcpy you probably want to express differences in destination and source qualification, even if today IR can't express e.g. volatile source and non-volatile destination. You were talking about volatile, but this applies to the entire combination of dst+src qualified with zero-to-five volatile, _Atomic, __unaligned, restrict, and address space. Pulling the entire combination space out into different functions would create way too many functions. Right now the implementation has a few limitations: it treats both dst and src as volatile if either are, it can't do _Atomic with volatile so we diagnose, and it ignores restrict. Otherwise it supports all combinations.

Improve documentation

Harbormaster failed remote builds in B65382: Diff 280135!Jul 23 2020, 8:21 AM

Re-update

Harbormaster completed remote builds in B65383: Diff 280136.Jul 23 2020, 9:26 AM

In D79279#2169522, @jfb wrote:

In D79279#2168649, @rjmccall wrote:

In D79279#2168533, @jfb wrote:

In D79279#2168479, @rjmccall wrote:

Is there a need for an atomic memcpy at all? Why is it useful to allow this operation to take on "atomic" semantics — which aren't actually atomic because the loads and stores to elements are torn — with hardcoded memory ordering and somewhat arbitrary rules about what the atomic size is?

Hans lays out a rationale for usefulness in his paper, but what I've implemented is more useful: it's *unordered* so you can fence as you desire around it, yet it guarantees a minimum memory access size based on the pointer parameters. For example, copying an atomic int will be 4 byte operations which are single-copy-atomic, but the accesses from one int to the next aren't performed in any guaranteed order (or observable in any guaranteed order either). I talked about this with him a while ago but IIRC he wasn't sure about implementation among other things, so when you asked me to widen my original volatile-only memcpy to also do other qualifiers, I realized that it was a neat way to do atomic as well as other qualifiers. I've talked to a few SG1 folks about this, and I believe (for other reasons too) it's where the design will end up for Hans' paper.

I can see the usefulness of this operation, but it seems like a odd semantic mismatch for what is basically just a memcpy where one of the pointers happens to have _Atomic type, like you're shoe-horning it into this builtin just to avoid declaring a different one.

I'm following the discussion we had here regarding overloading:

There are other qualifiers that can meaningfully contribute to the operation here besides volatile, such as restrict and (more importantly) address spaces. And again, for the copy operations these might differ between the two pointer types.

In both cases, I’d say that the logical design is to allow the pointers to be to arbitrarily-qualified types. We can then propagate that information from the builtin into the LLVM intrinsic call as best as we’re allowed. So I think you should make builtins called something like __builtin_overloaded_memcpy (name to be decided) and just have their semantics be type-directed.

Ah yes, I’d like to hear what others think of this. I hadn’t thought about it before you brought it up, and it sounds like a good idea.

As you noted earlier, for memcpy you probably want to express differences in destination and source qualification, even if today IR can't express e.g. volatile source and non-volatile destination. You were talking about volatile, but this applies to the entire combination of dst+src qualified with zero-to-five volatile, _Atomic, __unaligned, restrict, and address space. Pulling the entire combination space out into different functions would create way too many functions. Right now the implementation has a few limitations: it treats both dst and src as volatile if either are, it can't do _Atomic with volatile so we diagnose, and it ignores restrict. Otherwise it supports all combinations.

My point is that this has nothing to do with the ordinary semantics of _Atomic. You're basically just looking at the word "atomic" and saying that, hey, a minimum access size is sortof related to atomicity.

If you want this to be able to control the minimum access size, you should allow that to be passed in as an optional argument instead.

My point is that this has nothing to do with the ordinary semantics of _Atomic. You're basically just looking at the word "atomic" and saying that, hey, a minimum access size is sortof related to atomicity.

If you want this to be able to control the minimum access size, you should allow that to be passed in as an optional argument instead.

OK so it sounds like you're suggesting *two* versions of the overloaded builtins:

__builtin_memcpy_overloaded which overloads on volatile, restrict, __unaligned, and address spaces, but not on _Atomic qualifiers.
__builtin_atomic_memcpy_overloaded which overloads on volatile (but unsupported for now), restrict, and address spaces, but not on _Atomic qualifiers (because it's implicit), and not on __unaligned because that's a constraint. This takes an extra "element size" parameter, which we hope folks don't confuse with the size parameter (I'd expect a template or macro wrapper to hide that extra parameter when actually using the builtin).

Of course, that's two versions for each of memcpy, memmove, memset, and any other *mem that we decide to add to this list of overloadable functions.

Is that correct?

I don't think any of these should allow _Atomic unless we're going to give it some sort of consistent atomic semantics (which is hard to imagine being useful), and I think you should just take an extra argument of the minimum access width on all of them uniformly if you think that's important. Builtins can have optional arguments.

In D79279#2170095, @rjmccall wrote:

I don't think any of these should allow _Atomic unless we're going to give it some sort of consistent atomic semantics (which is hard to imagine being useful), and I think you should just take an extra argument of the minimum access width on all of them uniformly if you think that's important. Builtins can have optional arguments.

OK so: __builtin_memcpy_overloaded which overloads on volatile, restrict, __unaligned, and address spaces, but not on _Atomic qualifiers. Optionally, a 4th integer parameter can be provided to represent element_size. If provided, this becomes an unordered atomic memcpy with element size equal to or greater than the provided element_size. That value must be a power of two, and must be lock-free (what we call maximum atomic inline width in target info). If provided, then __unaligned is invalid, and volatile ought to be valid but is currently unsupported because IR can't do atomic+volatile memcpy (it would be useful, say for shared memory, but Patches Welcome).

Do you think there should be any relationship at all between dst/src pointee type's size and element_size? i.e. if I copy short* using an element size of 1 byte, is that OK? It seems like larger element sizes is always OK, but smaller might be a programmer error? If that's what they wanted, they could have done (char*)my_short. Or is this trying to be too helpful?

I think the argument is treated as if it were 1 if not given. That's all that ordinary memcpy formally guarantees, which seems to work fine (semantically, if not performance-wise) for pretty much everything today. I don't think you need any restrictions on element size. It's probably sensible to require the pointers to be dynamically aligned to a multiple of the access width, but I don't think you can enforce that statically. And of course the length needs to be a multiple of the access size.

Do you think it'd be useful to have different guarantees for different operands? I guess it could come up, but it'd be a whole lot of extra complexity that I can't imagine we'd ever support.

If one of the arguments is volatile, arguably the minimum access width (if given) needs to be exact. If we don't support that right now, it's okay to make it an error, which is basically you've already done with the _Atomic volatile diagnostic.

In D79279#2170157, @rjmccall wrote:

I think the argument is treated as if it were 1 if not given. That's all that ordinary memcpy formally guarantees, which seems to work fine (semantically, if not performance-wise) for pretty much everything today.

I'm not sure that's true: consider a memcpy implementation which copies some bytes twice (at different access size, there's an overlap because somehow it's more efficient). That would probably violate the programmer's expectations, and I don't think volatile nor atomic memcpy allow this (but regular memcpy does).

I don't think you need any restrictions on element size. It's probably sensible to require the pointers to be dynamically aligned to a multiple of the access width, but I don't think you can enforce that statically.

Agreed, if we're given a short and told to copy 4 bytes at a time then UBSan could find the constraint violation on alignment, but generally the only way we can diagnose is if the parameter is __unaligned (because there you're explicitly telling me it's not aligned, and the constraint is that it has to be).

And of course the length needs to be a multiple of the access size.

Yeah.

Do you think it'd be useful to have different guarantees for different operands? I guess it could come up, but it'd be a whole lot of extra complexity that I can't imagine we'd ever support.

You mean, if element_size is passed then you get different guarantees? I think that's what makes sense: if you're asking for atomic memcpy then you get guarantees. If you're asking for volatile mempcy then you get others. That's why overloading (and multiple parameters) can be confusing, but at the same time I think it's better than having the combinatorial number of named functions instead.

If one of the arguments is volatile, arguably the minimum access width (if given) needs to be exact. If we don't support that right now, it's okay to make it an error, which is basically you've already done with the _Atomic volatile diagnostic.

Agreed. volatile with size makes a lot of sense, and the IR version of it, once created, ought to not be able to widen accesses. volatile without a size specified makes sense too, because you just want a single read and a single write, don't care about tearing.

In D79279#2170187, @jfb wrote:

In D79279#2170157, @rjmccall wrote:

I think the argument is treated as if it were 1 if not given. That's all that ordinary memcpy formally guarantees, which seems to work fine (semantically, if not performance-wise) for pretty much everything today.

I'm not sure that's true: consider a memcpy implementation which copies some bytes twice (at different access size, there's an overlap because somehow it's more efficient). That would probably violate the programmer's expectations, and I don't think volatile nor atomic memcpy allow this (but regular memcpy does).

Yes, that's true, if you need an only-accessed-once guarantee, that's above and beyond just a minimum access size. I agree that volatile would need to make this guarantee.

Do you think it'd be useful to have different guarantees for different operands? I guess it could come up, but it'd be a whole lot of extra complexity that I can't imagine we'd ever support.

You mean, if element_size is passed then you get different guarantees?

No, sorry, I mean different guarantees for the different pointer operands. In principle, we could allow you to say that the memcpy has to be done with 4-byte accesses from the source and 2-byte accesses to the destination. That's implementable but a lot of work.

If one of the arguments is volatile, arguably the minimum access width (if given) needs to be exact. If we don't support that right now, it's okay to make it an error, which is basically you've already done with the _Atomic volatile diagnostic.

Agreed. volatile with size makes a lot of sense, and the IR version of it, once created, ought to not be able to widen accesses. volatile without a size specified makes sense too, because you just want a single read and a single write, don't care about tearing.

Right.

Do you think it'd be useful to have different guarantees for different operands? I guess it could come up, but it'd be a whole lot of extra complexity that I can't imagine we'd ever support.

You mean, if element_size is passed then you get different guarantees?

No, sorry, I mean different guarantees for the different pointer operands. In principle, we could allow you to say that the memcpy has to be done with 4-byte accesses from the source and 2-byte accesses to the destination. That's implementable but a lot of work.

Gotcha. Yeah I think it's useful as a niche thing, and if IR supports that for say volatile then we can honor lopsided volatile overloads (instead of treating the entire thing as volatile). I hadn't really thought about lopsided access sizes (since it fell out of _Atomic). Maybe it's useful? I was just banning unequal sizes before because it seemed like a mistake to copy to/from different types. If we wanted to support it, I suppose we could add another optional parameter, so I'm OK not doing it now, and adding later if useful.

Alright, I'll update the patch as discussed, thanks!

These new builtins should ideally support constant evaluation if possible.

clang/docs/LanguageExtensions.rst
2439–2440	This is missing some important details: What does the size parameter mean? Is it number of bytes or number of elements? If it's number of bytes, what happens if it's not a multiple of the element size, particularly in the `_Atomic` case? What does the value parameter to `memset` mean? Is it splatted to the element width? Does it specify a complete element value? For `_Atomic`, what memory order is used? For `volatile`, what access size / type is used? Do we want to make any promises? Are the loads and stores typed or untyped? (In particular, do we annotate with TBAA metadata?) Do we guarantee to copy the object representation or only the value representation? (Do we preserve the values of padding bits in the source, and initialize padding bits in the destination?) You should also document whether constant evaluation of these builtins is supported.
2446–2448	Mixing those qualifiers doesn't seem like it will work in many cases: we don't allow mixing `volatile` and `_Atomic` (though I'm not sure why; LLVM supports volatile atomic operations), and presumably we shouldn't allow mixing `__unaligned` and `_Atomic` (although I don't see any tests for that, and maybe we should just outright disallow combining `_Atomic` with `__unaligned` in general).
clang/include/clang/Basic/Builtins.def
471 ↗	(On Diff #280136)	Are these really GCC builtins?
1487 ↗	(On Diff #280136)	The new builtins probably belong in this section of the file instead.
clang/include/clang/Basic/DiagnosticSemaKinds.td
7961–7963 ↗	(On Diff #280136)	I'd prefer to keep this diagnostic separate, since it communicates more information than `err_argument_needs_trivial_copy` does: specifically that we need a trivial copy because we're performing an atomic operation.
8926–8934 ↗	(On Diff #280136)	Please format these diagnostics consistently with the rest of the file: line break after `Error<`, wrap to 80 columns, don't leave blank lines between individual diagnostics in a group of related diagnostics.
clang/lib/Sema/SemaChecking.cpp
1277–1278 ↗	(On Diff #280136)	There are a bunch of places in this file that do manual argument count checking and could use `checkArgCount` instead (search for `err_typecheck_call_too_` to find them). If you want to clean this up, please do so in a separate change.
5510–5512 ↗	(On Diff #280136)	Do we need this constraint? If one side is atomic and the other is not, then we can do all of the operations with the atomic width. If both sides are atomic, then one side is 2^N times the size of the other; we can do 2^N operations on one side for each operation on the other side. Maybe the second case is not worth the effort, but permitting (for example) a memcpy from an `_Atomic int` to a `char` seems useful and there doesn't seem to be a good reason to disallow it.

jfb mentioned this in D84666: [NFC] Sema: use checkArgCount instead of custom checking.Jul 27 2020, 8:34 AM

Address comments

I've addressed @rsmith @rjmccall suggestions (unless noted), thanks!
An open question: as of 6e4aa1e48138182685431c76184dfc36e620aea2 @dneilson added an assertion on CreateElementUnorderedAtomicMemCpy to check that the pointer arguments have alignments of at least the element size. That makes sense when the IR is only ever built internally to LLVM, but now that I'm adding a builtin it's more of a dynamic property. Should I also force this in the frontend (understanding that alignment isn't always well known at compile time), or should simply assume that the alignment is correct because it's a dynamic property?

I left some FIXMEs in the CodeGen test for this.

clang/docs/LanguageExtensions.rst
2439–2440	Most of these are answered in the update. Some of the issue is that the current documentation is silent on these points already, by saying "same as C's `mem*` function". I'm relying on that approach here as well. Size is bytes. `memset` value is an `unsigned char`. Memory order is unordered, and accesses themselves are done in indeterminate order. For `volatile`, it falls out of the new wording that we don't provide access size guarantees. We'd need to nail down IR better to do so, and I don't think it's the salient property (though as discussed above, it might be useful, and the `element_size` parameter make it easy to do so). Same on TBAA, no mention because "same as C" (no TBAA annotations). Same on copying bits as-is. Good point on constant evaluation. I added support. Note that we don't have `memset` constant evaluation, so I didn't support it. Seems easy, but ought to be a separate patch.
2446–2448	`volatile` and `_Atomic` ought to work... For this code I didn't make it work (even if it might be useful), because we'd need IR support for it. On mixing `_Atomic __unaligned`: I left a FIXME because I'm not 100% sure, given the alignment discussion on atomic in general. Let's see where we settle: if we make it a pure runtime property then `__unaligned` ought to be fine because it's a constraint violation if the actual pointer is truly unaligned.
clang/include/clang/Basic/Builtins.def
471 ↗	(On Diff #280136)	Oops, I didn't see that comment, was just copying `__builtin_memcpy_inline`. I'll move it too.
clang/lib/Sema/SemaChecking.cpp
1277–1278 ↗	(On Diff #280136)	D84666
5510–5512 ↗	(On Diff #280136)	Based on @rjmccall's feedback, I'm disallowing `_Atomic` qualification, and keying off the optional `element_size` parameter to determine atomicity. I'm also only taking in one size, not two, since as discussed it might be useful to allow two but I haven't heard that anyone actually wants it at the moment.

jfb marked an inline comment as done.Jul 27 2020, 5:29 PM

jfb added inline comments.

clang/lib/AST/ExprConstant.cpp
8793 ↗	(On Diff #281087)	If we end up making alignment a runtime constraint, then I'll need to check it in consteval. Otherwise I don't think we need to check anything since Sema ought to have done all the required checks already.

Harbormaster completed remote builds in B65942: Diff 281087.Jul 27 2020, 7:11 PM

rsmith added inline comments.Jul 27 2020, 7:53 PM

clang/docs/LanguageExtensions.rst
2435–2437	What happens if `byte_element_size` does not divide `byte_size`?
2435–2437	Did you really mean `void` here? I've been pretty confused by some of the stuff happening below that seems to depend on the actual type of the passed pointer, which would make more sense if you meant `QUAL T ` here rather than `QUAL void*`. Do the builtins do different things for different argument pointee types or not?
2444–2445	"the pointer's element size" -- do you mean "the provided element size"? Does the element size need to be a compile-time constant? (Presumably, but you don't say so.)
2446–2447	Presumably this means that it's an error if we don't provide lock-free atomic access for that size. Would be worth saying so.
2450–2451	What happens if they're not? Is it UB, or is it just not guaranteed to be atomic?
2456	*facilities
clang/include/clang/Basic/DiagnosticSemaKinds.td
8937–8938 ↗	(On Diff #281087)	Given the new documentation, I would expect you don't need this any more.
8941–8943 ↗	(On Diff #281087)	Presumably the number of bytes need not be a compile-time constant? It's a bit weird to produce an error rather than a warning on a case that would be valid but (perhaps?) UB if the argument were non-constant.
clang/lib/AST/ExprConstant.cpp
8793 ↗	(On Diff #281087)	I don't see how you can check the alignment at compile time given a `void*` argument. We presumably need to check that the element size (if given) divides the total size, assuming the outcome is UB if not.

jfb mentioned this in rG389f009c5757: [NFC] Sema: use checkArgCount instead of custom checking.Jul 28 2020, 1:41 PM

Address Richard's comments.

This is almost ready I think!
There are a few things still open, I'd love feedback on them.

clang/docs/LanguageExtensions.rst
2435–2437	Runtime constraint violation. constexpr needs to catch this too, added. Though IIUC we can't actually check alignment in constexpr, which makes sense since there's no actual address. Similarly, I think we ought to add UBSan builtin check for this. I think it makes sense to add as an option to `CreateElementUnorderedAtomicMemCpy`: either assert-check at compile-time (the current default, which triggers assertions as I've annotated in the tests' FIXME), or at runtime if the sanitizer is enabled. WDYT? I've added these two to the documentation.
2435–2437	Oh yeah, this should be `T` and `U`. Fixed. They used to key atomicity off of element size, but now that we have the extra parameter we only look at `T` and `U` for correctness (not behavior).
clang/include/clang/Basic/DiagnosticSemaKinds.td
8941–8943 ↗	(On Diff #281087)	I commented below, indeed it seems like some of this ought to be relaxed.
clang/lib/Sema/SemaChecking.cpp
5567 ↗	(On Diff #281087)	I'm re-thinking these checks: if (ElSz->urem(DstElSz)) return ExprError( Diag(TheCall->getBeginLoc(), PDiag(diag::err_atomic_builtin_ext_size_mismatches_el)) << (int)ElSz->getLimitedValue() << DstElSz << DstValTy << DstOp->getSourceRange() << Arg->getSourceRange()); I'm not sure we ought to have them anymore. We know that the types are trivially copyable, it therefore doesn't really matter if you're copying with operations smaller than the type itself. For example: struct Data { int a, b, c, d; }; It ought to be fine to do 4-byte copies of `Data`, if whatever your algorithm is is happy with that. I therefore think I'll remove these checks based on the dst / src element types. The only thing that seems to make sense is making sure that you don't straddle object boundaries with element size. I removed sizeless types: we'll codegen whatever you ask for.

Harbormaster completed remote builds in B66592: Diff 282281.Jul 31 2020, 12:49 PM

rsmith added inline comments.Jul 31 2020, 4:51 PM

clang/docs/LanguageExtensions.rst
2427
2451	From the above description, I think the documentation is unclear what the types `T` and `U` are used for. I think the answer is something like: """ The types `T` and `U` are required to be trivially-copyable types, and `byte_element_size` (if specified) must be a multiple of the size of both types. `dst` and `src` are expected to be suitably aligned for `T` and `U` objects, respectively. """ But... we actually get the alignment information by analyzing pointer argument rather than from the types, just like we do for `memcpy` and `memmove`, so maybe the latter part is not right. (What did you intend regarding alignment for the non-atomic case?) The trivial-copyability and divisibility checks don't seem fundamentally important to the goal of the builtin, so I wonder if we could actually just use `void` here and remove the extra checks. (I don't really have strong views one way or the other on this, except that we should either document what `T` and `U` are used for or make the builtins not care about the pointee type beyond its qualifiers.)
clang/lib/AST/ExprConstant.cpp
8851 ↗	(On Diff #282281)	We could use the same logic we use in `__builtin_is_aligned` here. For any object whose value the constant evaluator can reason about, we should be able to compute at least a minimal alignment (though the actual runtime alignment might of course be greater).
clang/lib/CodeGen/CGBuiltin.cpp
2639–2640 ↗	(On Diff #282281)	Looking through implicit conversions in `getPtrArgType` here will change the code we generate for cases like: void f(volatile void p, volatile void q) { memcpy(p, q, 4); } ... (in C, where we permit such implicit conversions) to use a volatile memcpy intrinsic. Is that an intentional change?
clang/lib/CodeGen/CGExpr.cpp
1167–1168 ↗	(On Diff #282281)	Do we still need this? We should be doing the decay in `Sema`.

Address Richard's comments, add UBSan support.

Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2020, 5:49 PM

Herald added a subscriber: Restricted Project. · View Herald Transcript

Thanks for the detailed comments, I think I've addressed all of them! I also added UBSan support to check the builtin invocation. I think this patch is pretty much ready to go. A follow-up will need to add the support functions to compiler-rt (they're currently optional, as per https://reviews.llvm.org/D33240), and in cases where size is known we can inline them (as we do for memcpy and friends).

clang/docs/LanguageExtensions.rst
2451	You're right. I've removed most treatment of `T` / `U`, and updated the documentation. I left the trivial copy check, but `void*` is a usual escape hatch. Divisibility is now only checked for `size` / `element_size`.
clang/lib/AST/ExprConstant.cpp
8851 ↗	(On Diff #282281)	I think the runtime alignment is really the only thing that matters here. I played with constexpr checking based on what `__builtin_is_aligned` does, and it's not particularly useful IMO.
clang/lib/CodeGen/CGBuiltin.cpp
2639–2640 ↗	(On Diff #282281)	I'm confused... what's the difference that this makes for the pre-existing builtins? My intent was to get the `QualType` unconditionally, but I can conditionalize it if needed... However this ought to make no difference: static QualType getPtrArgType(CodeGenModule &CGM, const CallExpr *E, unsigned ArgNo) { QualType ArgTy = E->getArg(ArgNo)->IgnoreImpCasts()->getType(); if (ArgTy->isArrayType()) return CGM.getContext().getAsArrayType(ArgTy)->getElementType(); if (ArgTy->isObjCObjectPointerType()) return ArgTy->castAs<clang::ObjCObjectPointerType>()->getPointeeType(); return ArgTy->castAs<clang::PointerType>()->getPointeeType(); } and indeed I can't see the example you provided change in IR from one to the other. The issue I'm working around is that getting it unconditionally would make ObjC code sad when `id` is passed in as I outlined above.
clang/lib/Sema/SemaChecking.cpp
5567 ↗	(On Diff #281087)	They're gone, we now only check that size and element size match up.

jfb added inline comments.Aug 4 2020, 5:57 PM

compiler-rt/test/ubsan/TestCases/Misc/builtins-ctz-clz.cpp
1 ↗	(On Diff #283078)	Phab is confused.... I did a git rename of `compiler-rt/test/ubsan/TestCases/Misc/builtins.cpp` and it thinks this is new, and I deleted the other.

Thanks. I'd like @rjmccall to approve too, but the design of these intrinsics and the Sema and constant evaluation parts seem fine. (We don't strictly need constant evaluation to abort on library UB, so I think not catching the misalignment case is OK.)

clang/docs/LanguageExtensions.rst
2444–2446	"runtime constraint violation" is an odd phrase; in C, "constraint violation" means a diagnostic is required. Can we instead say that it results in undefined behavior?
2451	Please document the trivial copy check.
clang/lib/CodeGen/CGBuiltin.cpp
2639–2640 ↗	(On Diff #282281)	The example I gave should produce a non-volatile memcpy, and used to do so (we passed `false` as the fourth parameter to `CreateMemCpy`). With this patch, `getPtrArgType`will strip off the implicit conversion from `volatile void` to `void` in the argument type, so `isVolatile` below will be `true`, so I think it will now create a volatile memcpy for the same testcase. If that's not what's happening, then I'd like to understand why not :) I'm not saying that's necessarily a bad change, but it is a change, and it's one we should make intentionally if we make it at all.

Harbormaster completed remote builds in B67023: Diff 283078.Aug 4 2020, 6:33 PM

Patch looks basically okay to me, although I'll second Richard's concern that we shouldn't absent-mindedly start producing overloaded memcpys for ordinary __builtin_memcpy.

clang/docs/LanguageExtensions.rst
2443	"The element size must..." But I would suggest using "access size" consistently rather than "element size".

Update docs

In D79279#2195299, @rjmccall wrote:

Patch looks basically okay to me, although I'll second Richard's concern that we shouldn't absent-mindedly start producing overloaded memcpys for ordinary __builtin_memcpy.

Yeah I think that's a leftover from the first patch. Should I drop it? At the same time, address spaces are currently accidentally "working", should I drop that in a patch before this change? Or leave as-is?

clang/docs/LanguageExtensions.rst
2443	I'm being consistent with the naming for IR, which uses "element" as well. I'm not enamored with the naming, but wanted to point out the purposeful consistency to make sure you preferred "access size". Without precedent I would indeed prefer "access size", but have a slight preference for consistency here. This is extremely weakly held preference. (I fixed "the").
2451	Should I bubble this to the rest of the builtin in a follow-up patch? I know there are cases where that'll cause issues, but I worry that it would be a pretty noisy diagnostic (especially if we instead bubble it to C's `memcpy` instead).
clang/lib/CodeGen/CGBuiltin.cpp
2639–2640 ↗	(On Diff #282281)	Oh yes, sorry I thought you were talking about something that `getPtrArgType` did implicitly! Indeed the C code behaves differently in that it doesn't just strip `volatile` anymore. I'm not super thrilled by the default C behavior, and I think this new behavior removes a gotcha, and is in fact what I was going for in the first iteration of the patch. Now that I've separated the builtin I agree that it's a bit odd... but it seems like the right thing to do anyways? But it no longer matches the C library function to do so. FWIW, this currently "works as you'd expect": void f(__attribute__((address_space(32))) void dst, const void src, int sz) { __builtin_memcpy(dst, src, sz); } https://godbolt.org/z/dcWxcK and I think that's completely accidental because the C library function doesn't (and, as John pointed out earlier, the builtin is meant to act like the C function).

Harbormaster completed remote builds in B67046: Diff 283121.Aug 4 2020, 10:28 PM

vsk added a subscriber: vsk.Aug 5 2020, 10:46 AM

vsk added inline comments.

compiler-rt/lib/ubsan/ubsan_handlers.cpp
659 ↗	(On Diff #283121)	It looks like `__ubsan_handle_invalid_builtin` is meant to be recoverable, so I think this should be `GET_REPORT_OPTIONS(false)`. Marking this unrecoverable makes it impossible to suppress redundant diagnostics at the same source location. It looks this isn't code you've added: feel free to punt this to me. If you don't mind folding in a fix, adding a test would be simple (perform UB in a loop and verify only one diagnostic is printed).

rjmccall added inline comments.Aug 5 2020, 11:29 AM

clang/docs/LanguageExtensions.rst
2443	IR naming is generally random fluff plucked from the mind of an inspired compiler engineer. User documentation is the point where we're supposed to put our bad choices behind us and do something that makes sense to users. :)

Two observations that are new to me in this review:

We already treat all builtins as being overloaded on address space.
The revised patch treats __builtin_mem*_overloaded as being overloaded *only* on address space, volatility, and atomicity. (We've tuned the design to a point where the other qualifiers don't matter any more.)

So, we're adding three features here: overloading on (a) address space, (b) volatility, and (c) atomicity. (a) is already available in the non-_overloaded form, and we seem to be approaching agreement that (b) should be available in the non-_overloaded form too. So that only leaves (c), which is really not _overloaded but _atomic.

Based on those observations I'm now thinking that we might prefer a somewhat different approach (but one that should require only minimal changes to the patch in front of us). Specifically:

Stop treating lib builtins (eg, plain memcpy) as overloaded on address space. That's a (pre-existing) conformance bug, at least for the Embedded C TR.
Keep __builtin_ forms of lib builtins overloaded on address space. (No change.)
Also overload __builtin_ forms of lib builtins on volatility where it makes sense, instead of adding new builtin names __builtin_mem*_overloaded.
Add a new name for the builtin for the atomic forms of memcpy and memset (__builtin_memcpy_unordered_atomic maybe?).
Convert the "trivial types" check from an error to a warning and apply it to all the mem* overloads. (Though I think we might actually already have such a check, so this might only require extending it to cover the atomic builtin.)

What do you think?

clang/docs/LanguageExtensions.rst
2431	Does `restrict` really make sense here? It seems like the only difference it could possibly make would be to treat `memcpy` as `memmove` if either operand is marked restrict, but (a) if the caller wants that, they can just use `memcpy` directly, and (b) it's not correct to propagate restrict-ness from the caller to the callee anyway, because restrict-ness is really a property of the declaration of the identifier in its scope, not a property of its type: void f(void *restrict p) { __builtin_memmove(p, p + 1, 4); } (c) a restrict qualifier on the pointee type is irrelevant to memcpy and a restrict qualifier on the pointer type isn't part of QUAL.
2432	I don't think `__unaligned `matters any more. We always take the actual alignment inferred from the pointer arguments, just like we do for non-overloaded` memcpy`.
2452–2453
clang/lib/CodeGen/CGBuiltin.cpp
2639–2640 ↗	(On Diff #282281)	FWIW, this currently "works as you'd expect": void f(__attribute__((address_space(32))) void dst, const void src, int sz) { __builtin_memcpy(dst, src, sz); } The same is true even if you remove the `__builtin_` (and add a suitable include), and that seems like a bug to me. It looks like we have special logic that treats all builtins taking pointers as being overloaded on address space, which seems wrong at least for lib builtins. C TR 18037:2008 is quite explicit about this. Section 5.1.4 says: """ The standard C library (ISO/IEC 9899:1999 clause 7 - Libraries) is unchanged; the library's functions and objects continue to be declared only with regard to the generic address space. One consequence is that pointers into named address spaces cannot be passed as arguments to library functions except in the special case that the named address spaces are subsets of the generic address space. """ We could retain that special rule for `__builtin_`-spelled variants of lib builtins. If we do, then maybe we shouldn't be adding `__builtin_memcpy_overloaded` at all and should only extend the behavior of `__builtin_memcpy` to also propagate volatility (and add a new builtin for the atomic case). Regarding volatile, consider: void maybe_volatile_memcpy(volatile void dst, const volatile void src, int sz, _Bool is_volatile) { if (is_volatile) { #ifdef __clang__ __builtin_memcpy_overloaded(dst, src, sz); #elif __GNUC__ // ... #else // volatile char copy loop #endif } memcpy(dst, src, sz); } With this patch, the above code will always perform a volatile `memcpy`. I think that's a surprise. A call to `memcpy` should follow the C semantics, even if we choose to change the semantics of `__builtin_memcpy`.
clang/lib/Sema/SemaChecking.cpp
5732 ↗	(On Diff #283121)	You need to call `Sema::isCompleteType` first before asking this question, in order to trigger class instantiation when necessary in C++. (Likewise for the checks in the previous function.)

I thought part of the point of __builtin_memcpy was so that C library headers could do #define memcpy(x, y, z) __builtin_memcpy(x, y, z). If so, the conformance issue touches __builtin_memcpy as well, not just calls to the library builtin.

If that's not true, or if we're willing to ignore it, I agree that making __builtin_memcpy do the right thing for qualifiers in general is the right thing to do.

Do UBSan change suggested by Vedant.

Add loop test requested by Vedant

compiler-rt/lib/ubsan/ubsan_handlers.cpp
659 ↗	(On Diff #283121)	I folded this into the patch.

Use 'access size' instead of 'element size'.

clang/docs/LanguageExtensions.rst
2443	"access size" it is :)

Remove restrict, update docs, call isCompleteType

In D79279#2197118, @rsmith wrote:

Two observations that are new to me in this review:

We already treat all builtins as being overloaded on address space.

The revised patch treats __builtin_mem*_overloaded as being overloaded *only* on address space, volatility, and atomicity. (We've tuned the design to a point where the other qualifiers don't matter any more.)

So, we're adding three features here: overloading on (a) address space, (b) volatility, and (c) atomicity. (a) is already available in the non-_overloaded form, and we seem to be approaching agreement that (b) should be available in the non-_overloaded form too. So that only leaves (c), which is really not _overloaded but _atomic.

Based on those observations I'm now thinking that we might prefer a somewhat different approach (but one that should require only minimal changes to the patch in front of us). Specifically:

Stop treating lib builtins (eg, plain memcpy) as overloaded on address space. That's a (pre-existing) conformance bug, at least for the Embedded C TR.

Keep __builtin_ forms of lib builtins overloaded on address space. (No change.)

Also overload __builtin_ forms of lib builtins on volatility where it makes sense, instead of adding new builtin names __builtin_mem*_overloaded.

Add a new name for the builtin for the atomic forms of memcpy and memset (__builtin_memcpy_unordered_atomic maybe?).

Convert the "trivial types" check from an error to a warning and apply it to all the mem* overloads. (Though I think we might actually already have such a check, so this might only require extending it to cover the atomic builtin.)

What do you think?

That's fine with me, but as John noted that's inconsistent with what he thought builtins allowed (i.e. #define memcpy(dst, src, sz) __builtin_memcpy(dst, src, sz). If that's Not A Thing then your plan works. If it is then we need to tune it a bit.

Also note that the _atomic builtin also needs to support some overloading, at least for address spaces (and maybe volatile in the future).

So, let me know what you'd both rather see, so I don't ping-pong code too much.

clang/docs/LanguageExtensions.rst
2431	I dropped `restrict`.
2432	It's still allowed as a qualifier, though.
clang/lib/Sema/SemaChecking.cpp
5732 ↗	(On Diff #283121)	Before the condition, right? LMK if I added the right thing!

Harbormaster completed remote builds in B67193: Diff 283390.Aug 5 2020, 3:44 PM

Harbormaster completed remote builds in B67199: Diff 283402.Aug 5 2020, 4:13 PM

Harbormaster completed remote builds in B67198: Diff 283400.Aug 5 2020, 4:19 PM

Harbormaster completed remote builds in B67201: Diff 283406.Aug 5 2020, 4:42 PM

In D79279#2197176, @rjmccall wrote:

I thought part of the point of __builtin_memcpy was so that C library headers could do #define memcpy(x, y, z) __builtin_memcpy(x, y, z). If so, the conformance issue touches __builtin_memcpy as well, not just calls to the library builtin.

They would have to declare it as well (because C code can #undef memcpy and expect to then be able to call a real function), so the #define would be pointless. It doesn't look like glibc does anything like this; do you know of a C standard library implementation that does?

If we want to follow that path, then we'll presumably (eventually) want address-space-_overloaded versions of all lib builtins that take pointers -- looks like that's around 60 functions total. That said, I do wonder how many of the functions in question that we're implicitly overloading on address space actually support such overloading -- certainly any of them that we lower to a call to a library function is going to go wrong at runtime.

+@tstellar, who added this functionality in r233706 -- what was the intent here?

In D79279#2200916, @rsmith wrote:

In D79279#2197176, @rjmccall wrote:

I thought part of the point of __builtin_memcpy was so that C library headers could do #define memcpy(x, y, z) __builtin_memcpy(x, y, z). If so, the conformance issue touches __builtin_memcpy as well, not just calls to the library builtin.

They would have to declare it as well (because C code can #undef memcpy and expect to then be able to call a real function), so the #define would be pointless. It doesn't look like glibc does anything like this; do you know of a C standard library implementation that does?

If we want to follow that path, then we'll presumably (eventually) want address-space-_overloaded versions of all lib builtins that take pointers -- looks like that's around 60 functions total. That said, I do wonder how many of the functions in question that we're implicitly overloading on address space actually support such overloading -- certainly any of them that we lower to a call to a library function is going to go wrong at runtime.

+@tstellar, who added this functionality in r233706 -- what was the intent here?

The goal of this patch was to avoid having to overload all the builtin with address spaces, which would be a lot of new builtins, but this functionality was added for targets that do not have a memcpy lib call, so I didn't consider the case where a libcall would be emitted.

In D79279#2200916, @rsmith wrote:

In D79279#2197176, @rjmccall wrote:

I thought part of the point of __builtin_memcpy was so that C library headers could do #define memcpy(x, y, z) __builtin_memcpy(x, y, z). If so, the conformance issue touches __builtin_memcpy as well, not just calls to the library builtin.

They would have to declare it as well (because C code can #undef memcpy and expect to then be able to call a real function), so the #define would be pointless.

It wouldn't be pointless; it would enable optimization of direct calls to memcpy (the 99% case) without the compiler having to special-case a function by name. And you don't need an #undef, since &memcpy doesn't trigger the preprocessor when memcpy is a function-like macro. I seem to remember this being widely used in some math libraries, but it's entirely possible that it's never been used for memcpy and the like. It's also entirely possible that I'm passing around folklore.

If we just want memcpy to do the right thing when called directly, that's not ridiculous. I don't think it would actually have any conformance problems: a volatile memcpy is just a less optimizable memcpy, and to the extent that an address-space-aware memcpy is different from the standard definition, it's probably UB to use the standard definition to copy memory in non-generic address spaces.

In D79279#2201136, @rjmccall wrote:

In D79279#2200916, @rsmith wrote:

In D79279#2197176, @rjmccall wrote:

I thought part of the point of __builtin_memcpy was so that C library headers could do #define memcpy(x, y, z) __builtin_memcpy(x, y, z). If so, the conformance issue touches __builtin_memcpy as well, not just calls to the library builtin.

For what it's worth, giving __builtin_memcpy broader semantics than memcpy wouldn't be unprecedented: it's already the case that __builtin_memcpy is usable in constant expressions where plain memcpy is not (but there's no macro risk in that case at least since C++ memcpy can't be a macro, and a call to memcpy is never an ICE in C).

They would have to declare it as well (because C code can #undef memcpy and expect to then be able to call a real function), so the #define would be pointless.

It wouldn't be pointless; it would enable optimization of direct calls to memcpy (the 99% case) without the compiler having to special-case a function by name.

I mean, yes, in an environment where the compiler didn't special-case the function by name anyway it wouldn't be pointless. I'm not aware of any such environment in popular usage, but that could just be ignorance on my part.

If we just want memcpy to do the right thing when called directly, that's not ridiculous. I don't think it would actually have any conformance problems: a volatile memcpy is just a less optimizable memcpy, and to the extent that an address-space-aware memcpy is different from the standard definition, it's probably UB to use the standard definition to copy memory in non-generic address spaces.

I think this is a constraint violation in C; C11 6.5.2.2/2 (a "Constraints" paragraph) requires that "Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter." So this would be an extension, not just defining UB, and seems like the kind of extension we'd normally diagnose under -pedantic-errors, but we could make an exception. I also think it'd be mildly surprising for an explicitly-declared function to allow different argument conversions depending on whether its name is memcpy.

It seems to me that you're not entirely comfortable with making memcpy and __builtin_memcpy differ in this way, and I'm not entirely comfortable with making memcpy's behavior differ from its declared type. Meanwhile, @tstellar's patch wants __builtin_memcpy to be overloaded on address space. I don't see a way to address all three concerns at once.

I think it would be reasonable in general to guarantee that our __builtin_ functions have contracts at least as wide as the underlying C function, but allow them to have extensions, and to keep the plain C functions unextended. I had actually thought we already did that in more cases than we do (but perhaps I was thinking about the LLVM math intrinsics that guarantee to not set errno). That would mean that a C standard library implementation is still free to #define foo(x,y,z) __builtin_foo(x,y,z), but if they do, they may pick up extensions.

In D79279#2201484, @rsmith wrote:

I think it would be reasonable in general to guarantee that our __builtin_ functions have contracts at least as wide as the underlying C function, but allow them to have extensions, and to keep the plain C functions unextended. I had actually thought we already did that in more cases than we do (but perhaps I was thinking about the LLVM math intrinsics that guarantee to not set errno). That would mean that a C standard library implementation is still free to #define foo(x,y,z) __builtin_foo(x,y,z), but if they do, they may pick up extensions.

Alright, how about I go with what you say here, and instead of adding __builtin_*_overloaded versions I just overload the __builtin_* variants? This includes having an optional 4th parameter for access size.
Alternatively, I could overload __builtin_*, but have a separate set of functions (say __builtin_*_sized) for the atomic access size variants.

Update overloading as discussed: on the original builtins, and separate the _sized variant, making its 4th parameter non-optional. If this looks good, I'll need to update codege for a few builtins (to handle volatile), as well as add tests for their codegen and address space (which should already work, but isn't tested).

Harbormaster completed remote builds in B68291: Diff 285414.Aug 13 2020, 11:12 AM

Fix a test.

Harbormaster completed remote builds in B68321: Diff 285479.Aug 13 2020, 2:02 PM

Actually I think any subsequent updates to tests can be done in a follow-up patch, since I'm not changing the status-quo on address space here.

Ping, I think I've addressed all comments here.

Thanks, I'm happy with this approach.

If I understand correctly, the primary (perhaps sole) purpose of __builtin_memcpy_sized is to provide a primitive from which an atomic operation can be produced. That being the case, I wonder if the name is emphasizing the wrong thing, and a name that contains atomic would be more obvious. As a concrete alternative, __atomic_unordered_memcpy is not much longer and seems to have the right kinds of implications. WDYT?

clang/docs/LanguageExtensions.rst
2388–2390	"can also be overloaded" -> "are also overloaded" would be clearer I think. ("Can also be overloaded" would suggest to me that it's the user of the builtin who overloads them, perhaps by declaring overloads.)
2396–2399	Is this true in general, or only for `[w]mem{cpy,move}`? I thought for the other cases, we required an array of `char` / `wchar_t`?
2402	I think this only applies to the above list minus the five functions you added to it. Given this and the previous comment, I'm not sure that merging the documentation on string builtins and memory builtins is working out well -- they seem to have more differences than similarities. (`memset` is an outlier here that should be called out -- we don't seem to provide any constant evaluation support for it whatsoever.)
2449
2450
clang/lib/AST/ExprConstant.cpp
8870 ↗	(On Diff #285479)	`getLimitedValue()` here seems unnecessary; `urem` can take an `APInt`.
8872 ↗	(On Diff #285479)	Consider using `toString` instead of truncating each `APSInt` to `uint64_t` then to `int`. The size might reliably fit into `uint64_t`, but I don't think we can assume that `int` is large enough.
clang/lib/CodeGen/CGBuiltin.cpp
635 ↗	(On Diff #285479)	I'm not sure this `castAs` is necessarily correct. If the operand is C++11 `nullptr`, we could perform a null-to-pointer implicit conversion, and `ArgTy` could be `NullPtrTy` after stripping that back off here. It seems like maybe what we want to do is strip off implicit conversions until we hit a non-pointer type, and take the pointee type we found immediately before that?
clang/lib/Sema/SemaChecking.cpp
5609 ↗	(On Diff #285479)	Generally, I'm a little uncomfortable about producing an error if a type is complete but allowing the construct if the type is incomplete -- that seems like a situation where a warning would be more appropriate to me. It's surprising and largely unprecedented that providing more information about a type would change the program from valid to invalid. Do we really need the protection of an error here rather than an enabled-by-default warning? Moreover, don't we already have a warning for `memcpy` of a non-trivially-copyable object somewhere? If not, then I think we should add such a thing that also applies to the real `memcpy`, rather than only warning on the builtin.
5732 ↗	(On Diff #283121)	It would be more correct from a modules perspective to use if (isCompleteType(Loc, T) && !T.isTriviallyCopyableType(Context)) That way, if the definition of the type is in some loaded-but-not-imported module file, we'll treat it the same as if the definition of the type is entirely unknown. (That also removes the need to check for the `void` case.) But given that this only allows us to accept code that is wrong in some sense, I'm not sure it really matters much.

Address Richard's comments.

In D79279#2235085, @rsmith wrote:

Thanks, I'm happy with this approach.

If I understand correctly, the primary (perhaps sole) purpose of __builtin_memcpy_sized is to provide a primitive from which an atomic operation can be produced. That being the case, I wonder if the name is emphasizing the wrong thing, and a name that contains atomic would be more obvious. As a concrete alternative, __atomic_unordered_memcpy is not much longer and seems to have the right kinds of implications. WDYT?

Kinda, that's the motivating from Hans' paper which I'm following. One other use case (and the reason I assume Azul folks want it too) is when there's a GC that looks at objects. With this it knows it won't see torn objects when the GC is concurrent. It's similar, but generally atomic also implies an ordering, and here it's explicitly unordered (i.e. Bring Your Own Fences).

So I don't have a strong opinion on the name, but atomic seem slightly wrong.

Follow-ups I suggest in comments:

Make "must be trivially copyable" a warning throughout (not just here, but for atomics too).
Implement that diagnostic for mem* functions.

clang/docs/LanguageExtensions.rst
2396–2399	This is just moving documentation that was below. Phab is confused with the diff.
2402	[w]memset are indeed the odd ones, update says so.
clang/lib/AST/ExprConstant.cpp
8870 ↗	(On Diff #285479)	Their bitwidth doesn't always match, and that asserts out.
8872 ↗	(On Diff #285479)	OK I updated 2 other places as well.
clang/lib/CodeGen/CGBuiltin.cpp
635 ↗	(On Diff #285479)	Ah good catch! The new functions I'm adding just disallow nullptr, but the older ones allow it. I've modified the code accordingly and added a test in CodeGen for nullptr.
2639–2640 ↗	(On Diff #282281)	Yes, I believe that this is a pre-existing inconsistency with the non-`__builtin_` variants.
clang/lib/Sema/SemaChecking.cpp
5609 ↗	(On Diff #285479)	That rationale makes sense, but it's pre-existing behavior for atomic. I can change all of them in a follow-up if that's OK? We don't have such a check for other builtins. I can do a second follow-up to then adopt these warnings for them too?

Harbormaster completed remote builds in B69689: Diff 288131.Aug 26 2020, 5:34 PM

jfb added a reviewer: rsmith.Aug 29 2020, 3:52 PM

Herald added a subscriber: danielkiss. · View Herald TranscriptAug 29 2020, 3:52 PM

ping

gchatelet mentioned this in D86066: IR: Merge MemCpyInlineInst and MemCpyInst.Sep 11 2020, 1:03 AM

ping

I went through all the comments here, plus looked at the code myself. I believe all of the comments by other reviewers have been fixed/answered acceptably. I don't have any additional comments to add, therefore I think it is appropriate to accept this revision.

@jfb: Please give this a day or two before committing to give the other reviewers a chance to speak up!

This revision is now accepted and ready to land.Nov 4 2020, 9:34 AM

I think the documentation isn't quite right yet, but otherwise I think I'm happy. (With a couple of code change suggestions.)

In D79279#2240487, @jfb wrote:

In D79279#2235085, @rsmith wrote:

If I understand correctly, the primary (perhaps sole) purpose of __builtin_memcpy_sized is to provide a primitive from which an atomic operation can be produced. That being the case, I wonder if the name is emphasizing the wrong thing, and a name that contains atomic would be more obvious. As a concrete alternative, __atomic_unordered_memcpy is not much longer and seems to have the right kinds of implications. WDYT?

Kinda, that's the motivating from Hans' paper which I'm following. One other use case (and the reason I assume Azul folks want it too) is when there's a GC that looks at objects. With this it knows it won't see torn objects when the GC is concurrent. It's similar, but generally atomic also implies an ordering, and here it's explicitly unordered (i.e. Bring Your Own Fences).

So I don't have a strong opinion on the name, but atomic seem slightly wrong.

I mean, I see your point, but I think not mentioning atomic at all also seems a little surprising, given how tied this feature is to atomic access (eg, rejecting access sizes larger than the inline atomic width). But this is not a hill I have any desire to die on :) If you'd prefer to keep the current name, I can live with it.

clang/docs/LanguageExtensions.rst
2396–2399	The documentation that was moved applied to `memcpy`, `memmove`, `wmemcpy`, and `wmemmove`. In the new location, it doesn't apply to `wmemcpy` nor `wmemmove` but does now apply to `memchr`, `memcmp`, ..., for which it is incorrect. We used to distinguish between "string builtins", which had the restriction to character types, and "memory builtins", which had the restriction to trivially-copyable types. Can you put that distinction back, or otherwise restore the wording to the old / correct state?
2402	This feature test macro doesn't cover `memcpy` / `memmove`; this documentation is incorrect for older versions of Clang. Clang 4-7 define this feature test macro but do not support constant evaluation of `memcpy` / `memmove`.
clang/lib/CodeGen/CGBuiltin.cpp
636–643 ↗	(On Diff #288131)	(Just a simplification, NFC.)
clang/lib/Sema/SemaChecking.cpp
5609 ↗	(On Diff #285479)	The pre-existing behavior for atomic builtins is to reject if the type is incomplete. Eg; <stdin>:1:33: error: address argument to atomic operation must be a pointer to a trivially-copyable type ('struct A ' invalid) struct A; void f(struct A p) { __atomic_store(p, p, 0); } ^ ~ We should do the same here. (Though I'd suggest calling `RequireCompleteType` instead to get a more meaningful diagnostic.) These days I think we should check for unsized types too, eg: if (RequireCompleteSizedType(ScrOp->getBeginLoc(), SrcValTy)) return true; if (!SrcValTy.isTriviallyCopyableType(Context) && !SrcValTy->isVoidType()) return ExprError(...);

This revision now requires changes to proceed.Nov 4 2020, 5:43 PM

Revision Contents

Path

Size

clang/

docs/

LanguageExtensions.rst

8 lines

Diff 280135

clang/docs/LanguageExtensions.rst

Show First 20 Lines • Show All 2,379 Lines • ▼ Show 20 Lines

.. code-block:: c .. code-block:: c

void *p = __builtin_memchr("foobar", 'b', 5); void *p = __builtin_memchr("foobar", 'b', 5);

In addition to the above, one further builtin is provided: In addition to the above, one further builtin is provided:

.. code-block:: c .. code-block:: c

char *__builtin_char_memchr(const char *haystack, int needle, size_t size); char *__builtin_char_memchr(const char *haystack, int needle, size_t size);

rsmithUnsubmitted

Done

"can also be overloaded" -> "are also overloaded" would be clearer I think. ("Can also be overloaded" would suggest to me that it's the user of the builtin who overloads them, perhaps by declaring overloads.)

rsmith: "can also be overloaded" -> "are also overloaded" would be clearer I think. ("Can also be…

``__builtin_char_memchr(a, b, c)`` is identical to ``__builtin_char_memchr(a, b, c)`` is identical to

``(char*)__builtin_memchr(a, b, c)`` except that its use is permitted within ``(char*)__builtin_memchr(a, b, c)`` except that its use is permitted within

constant expressions in C++11 onwards (where a cast from ``void*`` to ``char*`` constant expressions in C++11 onwards (where a cast from ``void*`` to ``char*``

is disallowed in general). is disallowed in general).

Constant evaluation support for the ``__builtin_mem*`` functions is provided Constant evaluation support for the ``__builtin_mem*`` functions is provided

only for arrays of ``char``, ``signed char``, ``unsigned char``, or ``char8_t``, only for arrays of ``char``, ``signed char``, ``unsigned char``, or ``char8_t``,

despite these functions accepting an argument of type ``const void*``. despite these functions accepting an argument of type ``const void*``.

rsmithUnsubmitted

Done

Is this true in general, or only for [w]mem{cpy,move}? I thought for the other cases, we required an array of char / wchar_t?

rsmith: Is this true in general, or only for `[w]mem{cpy,move}`? I thought for the other cases, we…

jfbAuthorUnsubmitted

Done

This is just moving documentation that was below. Phab is confused with the diff.

jfb: This is just moving documentation that was below. Phab is confused with the diff.

rsmithUnsubmitted

Not Done

The documentation that was moved applied to memcpy, memmove, wmemcpy, and wmemmove. In the new location, it doesn't apply to wmemcpy nor wmemmove but does now apply to memchr, memcmp, ..., for which it is incorrect.

We used to distinguish between "string builtins", which had the restriction to character types, and "memory builtins", which had the restriction to trivially-copyable types. Can you put that distinction back, or otherwise restore the wording to the old / correct state?

rsmith: The documentation that was moved applied to `memcpy`, `memmove`, `wmemcpy`, and `wmemmove`. In…

Support for constant expression evaluation for the above builtins can be detected Support for constant expression evaluation for the above builtins can be detected

with ``__has_feature(cxx_constexpr_string_builtins)``. with ``__has_feature(cxx_constexpr_string_builtins)``.

rsmithUnsubmitted

Done

I think this only applies to the above list minus the five functions you added to it. Given this and the previous comment, I'm not sure that merging the documentation on string builtins and memory builtins is working out well -- they seem to have more differences than similarities.

(memset is an outlier here that should be called out -- we don't seem to provide any constant evaluation support for it whatsoever.)

rsmith: I think this only applies to the above list minus the five functions you added to it. Given…

jfbAuthorUnsubmitted

Done

[w]memset are indeed the odd ones, update says so.

jfb: [w]memset are indeed the odd ones, update says so.

rsmithUnsubmitted

Not Done

This feature test macro doesn't cover memcpy / memmove; this documentation is incorrect for older versions of Clang. Clang 4-7 define this feature test macro but do not support constant evaluation of memcpy / memmove.

rsmith: This feature test macro doesn't cover `memcpy` / `memmove`; this documentation is incorrect for…

Memory builtins Memory builtins

--------------- ---------------

* ``__builtin_memcpy_inline`` * ``__builtin_memcpy_inline``

.. code-block:: c .. code-block:: c

void __builtin_memcpy_inline(void *dst, const void *src, size_t size); void __builtin_memcpy_inline(void *dst, const void *src, size_t size);

``__builtin_memcpy_inline(dst, src, size)`` is identical to ``__builtin_memcpy_inline(dst, src, size)`` is identical to

``__builtin_memcpy(dst, src, size)`` except that the generated code is ``__builtin_memcpy(dst, src, size)`` except that the generated code is

guaranteed not to call any external functions. See [LLVM IR ‘llvm.memcpy.inline’ guaranteed not to call any external functions. See [LLVM IR ‘llvm.memcpy.inline’

Intrinsic](https://llvm.org/docs/LangRef.html#llvm-memcpy-inline-intrinsic) for Intrinsic](https://llvm.org/docs/LangRef.html#llvm-memcpy-inline-intrinsic) for

more information. more information.

Note that the `size` argument must be a compile time constant. Note that the `size` argument must be a compile time constant.

Clang provides constant expression evaluation support for builtin forms of the Clang provides constant expression evaluation support for builtin forms of the

following functions from the C standard library headers following functions from the C standard library headers

``<string.h>`` and ``<wchar.h>``: ``<string.h>`` and ``<wchar.h>``:

* ``memcpy`` * ``memcpy``

* ``memmove`` * ``memmove``

* ``wmemcpy`` * ``wmemcpy``

* ``wmemmove`` * ``wmemmove``

rsmithUnsubmitted

Done

* ``__builtin_memset_overloaded(QUAL T *dst, unsigned char val, size_t byte_size, size_t byte_element_size = <unspecified>)``

- These overloads support destinations and sources which are a mix of the

+ These overloads support destinations and sources which have a mix of the

following qualifiers:

rsmith:

In each case, the builtin form has the name of the C library function prefixed In each case, the builtin form has the name of the C library function prefixed

by ``__builtin_``. by ``__builtin_``.

rsmithUnsubmitted

Done

* ``volatile``

- * ``restrict``

* ``__unaligned``

Does restrict really make sense here? It seems like the only difference it could possibly make would be to treat memcpy as memmove if either operand is marked restrict, but
(a) if the caller wants that, they can just use memcpy directly, and
(b) it's not correct to propagate restrict-ness from the caller to the callee anyway, because restrict-ness is really a property of the declaration of the identifier in its scope, not a property of its type:

void f(void *restrict p) {
  __builtin_memmove(p, p + 1, 4);
}

(c) a restrict qualifier on the pointee type is irrelevant to memcpy and a restrict qualifier on the pointer type isn't part of QUAL.

rsmith: Does `restrict` really make sense here? It seems like the only difference it could possibly…

jfbAuthorUnsubmitted

Done

I dropped restrict.

jfb: I dropped `restrict`.

Constant evaluation support is only provided when the source and destination Constant evaluation support is only provided when the source and destination

rsmithUnsubmitted

Done

* ``restrict``

- * ``__unaligned``

* non-default address spaces

I don't think `__unaligned matters any more. We always take the actual alignment inferred from the pointer arguments, just like we do for non-overloaded memcpy`.

rsmith: I don't think ``__unaligned`` matters any more. We always take the actual alignment inferred…

jfbAuthorUnsubmitted

Done

It's still allowed as a qualifier, though.

jfb: It's still allowed as a qualifier, though.

are pointers to arrays with the same trivially copyable element type, and the are pointers to arrays with the same trivially copyable element type, and the

given size is an exact multiple of the element size that is no greater than given size is an exact multiple of the element size that is no greater than

the number of elements accessible through the source and destination operands. the number of elements accessible through the source and destination operands.

Constant evaluation support is not yet provided for ``__builtin_memcpy_inline``. Constant evaluation support is not yet provided for ``__builtin_memcpy_inline``.

rsmithUnsubmitted

Done

What happens if byte_element_size does not divide byte_size?

rsmith: What happens if `byte_element_size` does not divide `byte_size`?

jfbAuthorUnsubmitted

Done

Runtime constraint violation. constexpr needs to catch this too, added. Though IIUC we can't actually check alignment in constexpr, which makes sense since there's no actual address.

Similarly, I think we ought to add UBSan builtin check for this. I think it makes sense to add as an option to CreateElementUnorderedAtomicMemCpy: either assert-check at compile-time (the current default, which triggers assertions as I've annotated in the tests' FIXME), or at runtime if the sanitizer is enabled. WDYT?

I've added these two to the documentation.

jfb: Runtime constraint violation. constexpr needs to catch this too, added. Though IIUC we can't…

rsmithUnsubmitted

Done

Did you really mean void* here? I've been pretty confused by some of the stuff happening below that seems to depend on the actual type of the passed pointer, which would make more sense if you meant QUAL T * here rather than QUAL void*. Do the builtins do different things for different argument pointee types or not?

rsmith: Did you really mean `void*` here? I've been pretty confused by some of the stuff happening…

jfbAuthorUnsubmitted

Done

Oh yeah, this should be T* and U*. Fixed.

They used to key atomicity off of element size, but now that we have the extra parameter we only look at T and U for correctness (not behavior).

jfb: Oh yeah, this should be `T*` and `U*`. Fixed. They used to key atomicity off of element size…

Clang provides versions of the following functions which are overloaded based on Clang provides versions of the following functions which are overloaded based on

the pointer parameter types: the pointer parameter types:

rsmithUnsubmitted

Done

This is missing some important details:

What does the size parameter mean? Is it number of bytes or number of elements? If it's number of bytes, what happens if it's not a multiple of the element size, particularly in the _Atomic case?
What does the value parameter to memset mean? Is it splatted to the element width? Does it specify a complete element value?
For _Atomic, what memory order is used?
For volatile, what access size / type is used? Do we want to make any promises?
Are the loads and stores typed or untyped? (In particular, do we annotate with TBAA metadata?)
Do we guarantee to copy the object representation or only the value representation? (Do we preserve the values of padding bits in the source, and initialize padding bits in the destination?)

You should also document whether constant evaluation of these builtins is supported.

rsmith: This is missing some important details: - What does the size parameter mean? Is it number of…

jfbAuthorUnsubmitted

Done

Most of these are answered in the update.

Some of the issue is that the current documentation is silent on these points already, by saying "same as C's mem* function". I'm relying on that approach here as well.

Size is bytes.

memset value is an unsigned char.

Memory order is unordered, and accesses themselves are done in indeterminate order.

For volatile, it falls out of the new wording that we don't provide access size guarantees. We'd need to nail down IR better to do so, and I don't think it's the salient property (though as discussed above, it might be useful, and the element_size parameter make it easy to do so).

Same on TBAA, no mention because "same as C" (no TBAA annotations).

Same on copying bits as-is.

Good point on constant evaluation. I added support. Note that we don't have memset constant evaluation, so I didn't support it. Seems easy, but ought to be a separate patch.

jfb: Most of these are answered in the update. Some of the issue is that the current documentation…

* ``__builtin_memcpy_overloaded`` * ``__builtin_memcpy_overloaded``

* ``__builtin_memmove_overloaded`` * ``__builtin_memmove_overloaded``

rjmccallUnsubmitted

Done

"*The* element size must..." But I would suggest using "access size" consistently rather than "element size".

rjmccall: "*The* element size must..." But I would suggest using "access size" consistently rather than…

jfbAuthorUnsubmitted

Done

I'm being consistent with the naming for IR, which uses "element" as well. I'm not enamored with the naming, but wanted to point out the purposeful consistency to make sure you preferred "access size". Without precedent I would indeed prefer "access size", but have a slight preference for consistency here. This is extremely weakly held preference.

(I fixed "the").

jfb: I'm being consistent with the naming for IR, which uses "element" as well. I'm not enamored…

rjmccallUnsubmitted

Done

IR naming is generally random fluff plucked from the mind of an inspired compiler engineer. User documentation is the point where we're supposed to put our bad choices behind us and do something that makes sense to users. :)

rjmccall: IR naming is generally random fluff plucked from the mind of an inspired compiler engineer.

jfbAuthorUnsubmitted

Done

"access size" it is :)

jfb: "access size" it is :)

* ``__builtin_memset_overloaded`` * ``__builtin_memset_overloaded``

rsmithUnsubmitted

Done

"the pointer's element size" -- do you mean "the provided element size"?

Does the element size need to be a compile-time constant? (Presumably, but you don't say so.)

rsmith: "the pointer's element size" -- do you mean "the provided element size"? Does the element size…

These overloads support destinations and sources which are a mix of These overloads support destinations and sources which are a mix of

rsmithUnsubmitted

Done

"runtime constraint violation" is an odd phrase; in C, "constraint violation" means a diagnostic is required. Can we instead say that it results in undefined behavior?

rsmith: "runtime constraint violation" is an odd phrase; in C, "constraint violation" means a…

``volatile``, ``_Atomic``, ``restrict``, ``__unaligned``, and use non-default ``volatile``, ``_Atomic``, ``restrict``, ``__unaligned``, and use non-default

rsmithUnsubmitted

Done

Presumably this means that it's an error if we don't provide lock-free atomic access for that size. Would be worth saying so.

rsmith: Presumably this means that it's an error if we don't provide lock-free atomic access for that…

address spaces. These can be used as building blocks for different facitilies: address spaces. These can be used as building blocks for different facitilies:

rsmithUnsubmitted

Done

Mixing those qualifiers doesn't seem like it will work in many cases: we don't allow mixing volatile and _Atomic (though I'm not sure why; LLVM supports volatile atomic operations), and presumably we shouldn't allow mixing __unaligned and _Atomic (although I don't see any tests for that, and maybe we should just outright disallow combining _Atomic with __unaligned in general).

rsmith: Mixing those qualifiers doesn't seem like it will work in many cases: we don't allow mixing…

jfbAuthorUnsubmitted

Done

volatile and _Atomic ought to work...

For this code I didn't make it work (even if it might be useful), because we'd need IR support for it.

On mixing _Atomic __unaligned: I left a FIXME because I'm not 100% sure, given the alignment discussion on atomic in general. Let's see where we settle: if we make it a pure runtime property then __unaligned ought to be fine because it's a constraint violation if the actual pointer is truly unaligned.

jfb: `volatile` and `_Atomic` ought to work... For this code I didn't make it work (even if it…

rsmithUnsubmitted

Done

lock-free size for the target architecture. It is undefined behavior to provide

- a memory locations which is aligned to less than the access size. It is

+ a memory location which is aligned to less than the access size. It is

undefined behavior to provide a size which is not evenly divided by the

rsmith:

* Using ``volatile`` to copy data a single time from untrusted buffers, avoiding * Using ``volatile`` to copy data a single time from untrusted buffers, avoiding

rsmithUnsubmitted

Done

a memory locations which is aligned to less than the access size. It is

- undefined behavior to provide a size which is not evenly divided by the

+ undefined behavior to provide a size which is not evenly divisible by the

specified access size.

rsmith:

Time-of-Check Time-of-Use security issues. Time-of-Check Time-of-Use security issues.

rsmithUnsubmitted

Done

What happens if they're not? Is it UB, or is it just not guaranteed to be atomic?

rsmith: What happens if they're not? Is it UB, or is it just not guaranteed to be atomic?

rsmithUnsubmitted

Done

From the above description, I think the documentation is unclear what the types T and U are used for. I think the answer is something like:

"""
The types T and U are required to be trivially-copyable types, and byte_element_size (if specified) must be a multiple of the size of both types. dst and src are expected to be suitably aligned for T and U objects, respectively.
"""

But... we actually get the alignment information by analyzing pointer argument rather than from the types, just like we do for memcpy and memmove, so maybe the latter part is not right. (What did you intend regarding alignment for the non-atomic case?) The trivial-copyability and divisibility checks don't seem fundamentally important to the goal of the builtin, so I wonder if we could actually just use void here and remove the extra checks. (I don't really have strong views one way or the other on this, except that we should either document what T and U are used for or make the builtins not care about the pointee type beyond its qualifiers.)

rsmith: From the above description, I think the documentation is unclear what the types `T` and `U` are…

jfbAuthorUnsubmitted

Done

You're right. I've removed most treatment of T / U, and updated the documentation.

I left the trivial copy check, but void* is a usual escape hatch.

Divisibility is now only checked for size / element_size.

jfb: You're right. I've removed most treatment of `T` / `U`, and updated the documentation. I left…

rsmithUnsubmitted

Done

Please document the trivial copy check.

rsmith: Please document the trivial copy check.

jfbAuthorUnsubmitted

Done

Should I bubble this to the rest of the builtin in a follow-up patch? I know there are cases where that'll cause issues, but I worry that it would be a pretty noisy diagnostic (especially if we instead bubble it to C's memcpy instead).

jfb: Should I bubble this to the rest of the builtin in a follow-up patch? I know there are cases…

* Using ``volatile`` to implement memory operations which will not be eliminated * Using ``volatile`` to implement memory operations which will not be eliminated

by the optimizer, such as C's Annex K ``memset_s``. by the optimizer, such as C's Annex K ``memset_s``.

rsmithUnsubmitted

Done

and might be non-uniform throughout the operation.

- The overloaded builtins expect both destination and source to be trivially

- copyable types.

+ The overloaded builtins require both ``dst`` and ``src`` to be pointers to trivially copyable types or to ``void`` prior to conversion to the parameter type.

The builtins can be used as building blocks for different facilities:

rsmith:

* Implement an atomic memory with atomic operations of a particular size, * Implement an atomic memory with atomic operations of a particular size,

similar to that presented in C++ proposal [p1478](https://wg21.link/p1478). similar to that presented in C++ proposal [p1478](https://wg21.link/p1478).

rsmithUnsubmitted

Done

*facilities

rsmith: *facilities

When using the `_Atomic` qualifier, the memory will be accessed with a sequence

of operations of size equal to or a multiple of the pointer's element size. The

order of operations is unspecified, and has unordered atomic semantics. This

means that reads and writes do not tear at the individual element level, and

they each occur exactly once, but the order in which they occur can only be

guaranteed using appropriate fences. Atomic memory operations must be to memory

locations which are aligned to no less than the element size.

Atomic Min/Max builtins with memory ordering Atomic Min/Max builtins with memory ordering

-------------------------------------------- --------------------------------------------

There are two atomic builtins with min/max in-memory comparison and swap. There are two atomic builtins with min/max in-memory comparison and swap.

The syntax and semantics are similar to GCC-compatible __atomic_* builtins. The syntax and semantics are similar to GCC-compatible __atomic_* builtins.

* ``__atomic_fetch_min`` * ``__atomic_fetch_min``

* ``__atomic_fetch_max`` * ``__atomic_fetch_max``

▲ Show 20 Lines • Show All 1,154 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add overloaded versions of builtin mem* functionsNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 280135

clang/docs/LanguageExtensions.rst

Add overloaded versions of builtin mem* functions
Needs RevisionPublic