Page MenuHomePhabricator

Implement P1007R3 `std::assume_aligned`
Needs RevisionPublic

Authored by mclow.lists on Nov 27 2018, 12:25 PM.

Details

Summary

https://wg21.link/P1007R3

This is a first cut.

  • The constexpr stuff is not tested; I don't think it's going to work. Still investigating.
  • Only GCC and clang are supported at the moment. I need to add other compilers.

Diff Detail

Event Timeline

mclow.lists created this revision.Nov 27 2018, 12:25 PM

Looks good in general, like i suspected "deceivingly simple" :)
Related: D54589

include/memory
5591

Drop noexcept?
— R2, 2018-10-08: Revised wording and removed noexcept following LWG guidance.

mclow.lists marked 3 inline comments as done.Nov 27 2018, 12:51 PM
mclow.lists added inline comments.
include/memory
5591

Nope. The standard has to leave off noexcept. Implementers can add it.

Hence the comment I left about LIBCPP_ASSERT_NOEXCEPT(std::assume_aligned<1, T>(p))
which is a libc++-specific check.

test/std/utilities/memory/ptr.align/assume_aligned.pass.cpp
29

Needs a LIBCPP_ASSERT_NOEXCEPT(std::assume_aligned<1, T>(p))

Subscribed Chandler and emailed Timur in case they want to take a look, since it's their paper.

include/memory
5590

_LIBCPP_INLINE_VISIBILITY?

test/std/utilities/memory/ptr.align/assume_aligned.pass.cpp
29

This check needs to go into test/libcxx/<...>.

mclow.lists marked 4 inline comments as done.Nov 28 2018, 7:10 AM

Chandler has said to me (via IRC) that when called at constexpr time, this should just be a return __p. I'll add that after we have std:: is_constant_evaluated

include/memory
5590

Right.

test/std/utilities/memory/ptr.align/assume_aligned.pass.cpp
29

I don't think we need an entire test just for one line.
We have that macro specifically for libc++-specific assertions.

ldionne marked an inline comment as done.Nov 28 2018, 7:26 AM
ldionne added inline comments.
test/std/utilities/memory/ptr.align/assume_aligned.pass.cpp
29

SGTM

mclow.lists marked an inline comment as done.

Updated the tests; added constexpr tests (which are currently disabled).
Added _LIBCPP_INLINE_VISIBILITY per Louis' (correct) request.
Added LIBCPP_ASSERT_NOEXCEPT

mclow.lists marked an inline comment as done.Nov 28 2018, 7:35 AM
mclow.lists added inline comments.
test/std/utilities/memory/ptr.align/assume_aligned.pass.cpp
29

A tab snuck in here. I'll detab before I commit.

ldionne accepted this revision.Nov 28 2018, 8:20 AM

Can you update the cxx2a_status page to say that we're missing constexpr support for the feature?

This revision is now accepted and ready to land.Nov 28 2018, 8:20 AM
lebedev.ri added inline comments.Nov 28 2018, 8:32 AM
include/memory
5596–5598

@mclow.lists were you planning on finishing adding support for other spellings?

ldionne added inline comments.Nov 28 2018, 9:30 AM
include/memory
5594

Also, this should probably go.

EricWF requested changes to this revision.Nov 29 2018, 10:06 PM

We don't need to use __builtin_assume_aligned to implement this. We can use __attribute__((assume_aligned(N))) instead. That solves our constexpr problem.

This revision now requires changes to proceed.Nov 29 2018, 10:06 PM
rsmith added a subscriber: rsmith.Dec 3 2018, 1:56 PM

We don't need to use __builtin_assume_aligned to implement this. We can use __attribute__((assume_aligned(N))) instead. That solves our constexpr problem.

More than that, we can't use __builtin_assume_aligned, as currently implemented in Clang and GCC, to implement this -- it is non-constant if it cannot prove that the pointer is properly-aligned. The library specification for std::assume_aligned doesn't permit that behavior, and instead appears to require that a call is a constant subexpression whenever p is properly aligned (even if that can't be determined until runtime), meaning that it must also be a constant subexpression in cases where you cannot prove alignment one way or the other. Example:

extern char p; // might be defined with `alignas(16)`
extern char *q;
char *r = q; // required to be initialized to `&p` if `p` is properly aligned, may be `&p` or `nullptr` otherwise
char *q = std::assume_aligned<16>(&p); // required to be statically initialized if `p` is properly aligned

int main() { assert((uintptr_t)&p & 15 || r == &q); }

If you use __builtin_assume_aligned to implement std::assume_aligned, the above assertion can fail, because &p can't be proven to be properly-aligned. But if you use the attribute, then no checking will be done during constant evaluation, and the assertion will always pass. (Try it out yourself: https://godbolt.org/z/YKIdWF -- try changing the alignment from 1 to 2 and back, and observe that q switches between static and dynamic initialization, and r switches from being initialized to &p to being initialized to nullptr, even though the compiler does not know that p will not be 2-byte aligned.)

It's OK that assume_aligned is a constant subexpression even when such a call would be UB; the fact that we might have undefined behavior within a constant expression evaluation of the library function std::assume_aligned is explicitly permitted by [expr.const]p4 (after the long list of bullets):

If e satisfies the constraints of a core constant expression, but evaluation of e would evaluate an operation that has undefined behavior as specified in Clause 15 through Clause 31 of this document[...], it is unspecified whether e is a core constant expression.

@chandlerc, @hfinkel: does an attribute-only implementation (with no constant evaluation enforcement) materially hurt the ability for the optimizer to use this annotation? Eg, in:

extern char k[16];
void f() {
  // the initializer of p is a constant expression and might get constant-folded to &k by the frontend
  char *p = std::assume_aligned<16>(&k);
  // ...do stuff...
}

the alignment assumption may well never be emitted as IR. Is that acceptable?

We don't need to use __builtin_assume_aligned to implement this. We can use __attribute__((assume_aligned(N))) instead. That solves our constexpr problem.

...

@chandlerc, @hfinkel: does an attribute-only implementation (with no constant evaluation enforcement) materially hurt the ability for the optimizer to use this annotation? Eg, in:

extern char k[16];
void f() {
  // the initializer of p is a constant expression and might get constant-folded to &k by the frontend
  char *p = std::assume_aligned<16>(&k);
  // ...do stuff...
}

the alignment assumption may well never be emitted as IR. Is that acceptable?

__attribute__((assume_aligned(N))) and __builtin_assume_aligned are, at the IR level, implemented in a very-similar way. For functions with that attribute on the return type, we essentially emit an alignment assumption on the return value at every call site (in CodeGenFunction::EmitCall). Thus, from the optimizer's perspective, I don't think that it makes a big difference.

rsmith added a comment.Dec 4 2018, 3:20 PM

@chandlerc, @hfinkel: does an attribute-only implementation (with no constant evaluation enforcement) materially hurt the ability for the optimizer to use this annotation? Eg, in:

extern char k[16];
void f() {
  // the initializer of p is a constant expression and might get constant-folded to &k by the frontend
  char *p = std::assume_aligned<16>(&k);
  // ...do stuff...
}

the alignment assumption may well never be emitted as IR. Is that acceptable?

__attribute__((assume_aligned(N))) and __builtin_assume_aligned are, at the IR level, implemented in a very-similar way. For functions with that attribute on the return type, we essentially emit an alignment assumption on the return value at every call site (in CodeGenFunction::EmitCall). Thus, from the optimizer's perspective, I don't think that it makes a big difference.

The point here is that you may well get no IR annotation whatsoever for the above call, because the frontend might constant-evaluate the initializer of p down to just &k, and then emit IR that just initializes p to &k with no alignment assumption. Whereas if we treated assume_aligned<N>(p) as non-constant in the cases where we cannot prove that p is suitably aligned (as __builtin_assume_aligned does), then we would emit IR for the alignment assumption, but the downside is that the initializer of p would no longer be a constant expression.

Essentially, what I'm trying to gauge here is, is it OK that you probably don't actually get an alignment assumption in a function like the f() above, because it will probably be constant-evaluated away to nothing? Or do we need the constant evaluator to have some kind of side-channel by which it can communicate back to the code generator that an alignment assumption should be applied to k? Or, indeed, should assume_aligned<N>(p) not be treated as a constant expression unless we can prove during constant expression evaluation that p is in fact suitably aligned -- as GCC and Clang currently do for __builtin_assume_aligned?

@chandlerc, @hfinkel: does an attribute-only implementation (with no constant evaluation enforcement) materially hurt the ability for the optimizer to use this annotation? Eg, in:

extern char k[16];
void f() {
  // the initializer of p is a constant expression and might get constant-folded to &k by the frontend
  char *p = std::assume_aligned<16>(&k);
  // ...do stuff...
}

the alignment assumption may well never be emitted as IR. Is that acceptable?

__attribute__((assume_aligned(N))) and __builtin_assume_aligned are, at the IR level, implemented in a very-similar way. For functions with that attribute on the return type, we essentially emit an alignment assumption on the return value at every call site (in CodeGenFunction::EmitCall). Thus, from the optimizer's perspective, I don't think that it makes a big difference.

The point here is that you may well get no IR annotation whatsoever for the above call, because the frontend might constant-evaluate the initializer of p down to just &k, and then emit IR that just initializes p to &k with no alignment assumption. Whereas if we treated assume_aligned<N>(p) as non-constant in the cases where we cannot prove that p is suitably aligned (as __builtin_assume_aligned does), then we would emit IR for the alignment assumption, but the downside is that the initializer of p would no longer be a constant expression.

Essentially, what I'm trying to gauge here is, is it OK that you probably don't actually get an alignment assumption in a function like the f() above, because it will probably be constant-evaluated away to nothing? Or do we need the constant evaluator to have some kind of side-channel by which it can communicate back to the code generator that an alignment assumption should be applied to k?

I thought about this side channel option for k, but I don't think that we can because we'd need to prove that f() function was always executed, and that's likely not generally possible. I think that the side channel would apply only to p.

Or, indeed, should assume_aligned<N>(p) not be treated as a constant expression unless we can prove during constant expression evaluation that p is in fact suitably aligned -- as GCC and Clang currently do for __builtin_assume_aligned?

From a C++ perspective, this seems suboptimal. I don't want people to duplicate code, some with assume_aligned, some without, if I want the same code with work both in a constexpr and not. A side channel would be better. It is a trade off, however, and I'd need to think more about it.

Could you please test thread_local too?

@chandlerc, @hfinkel: does an attribute-only implementation (with no constant evaluation enforcement) materially hurt the ability for the optimizer to use this annotation? Eg, in:

extern char k[16];
void f() {
  // the initializer of p is a constant expression and might get constant-folded to &k by the frontend
  char *p = std::assume_aligned<16>(&k);
  // ...do stuff...
}

the alignment assumption may well never be emitted as IR. Is that acceptable?

__attribute__((assume_aligned(N))) and __builtin_assume_aligned are, at the IR level, implemented in a very-similar way. For functions with that attribute on the return type, we essentially emit an alignment assumption on the return value at every call site (in CodeGenFunction::EmitCall). Thus, from the optimizer's perspective, I don't think that it makes a big difference.

The point here is that you may well get no IR annotation whatsoever for the above call, because the frontend might constant-evaluate the initializer of p down to just &k, and then emit IR that just initializes p to &k with no alignment assumption. Whereas if we treated assume_aligned<N>(p) as non-constant in the cases where we cannot prove that p is suitably aligned (as __builtin_assume_aligned does), then we would emit IR for the alignment assumption, but the downside is that the initializer of p would no longer be a constant expression.

Essentially, what I'm trying to gauge here is, is it OK that you probably don't actually get an alignment assumption in a function like the f() above, because it will probably be constant-evaluated away to nothing? Or do we need the constant evaluator to have some kind of side-channel by which it can communicate back to the code generator that an alignment assumption should be applied to k?

I thought about this side channel option for k, but I don't think that we can because we'd need to prove that f() function was always executed, and that's likely not generally possible. I think that the side channel would apply only to p.

After thinking more about it, I think users of p in the function f really should be able to assume the alignment, and whether they succeed at that should not be determined by whether the initialize of p happens to fail to be a constant expression for some reason.

Let me lay out my reasoning, it comes from considering a few examples.

1a) The fact that the address happens to fold to a constant and thus the initializer is a constant as well is not going to be enough to reliably optimize the uses. Just because we know the address of some very hot vector data is a global and thus a constant "relocation" that we can fold does *not* mean that we will be able to reconstruct the alignment guarantees if we need to do so. This means that we would be missing real optimization opportunities here, and indeed, the exact opportunities that std::assume_aligned was intended to open up.

1b) Indeed, I could imagine a collection of routines which all use the same constant initialized address but which make different alignment assumptions. And I could imagine code dispatching (potentially dynamically) to the correctly aligned routine. That seems like reasonable code to expect to be able to write given this facility, and yet it would be directly undermined by what you describe.

  1. Imagine well tuned code using std::assume_aligned that gets refactored slightly such that the pointer happens to become suitable for constant initialization when previously it wasn't. This would in turn cause significant regressions in generated code quality which I think would be an unacceptable surprise to users. We would never be able to explain reasonably why a particular refactoring or change would *regress* performance of some code.

When considering these kinds of situations, it really seems like this needs to be propagated. And not just locally, but as far as the constant evaluation proceeds, walking past as many constant evaluated wrappers as needed. =/

I don't really think of this as needing a side-channel so much as address in the constant evaluation needing to track alignment in some way such that these get propagated as one would expect.

Or, indeed, should assume_aligned<N>(p) not be treated as a constant expression unless we can prove during constant expression evaluation that p is in fact suitably aligned -- as GCC and Clang currently do for __builtin_assume_aligned?

From a C++ perspective, this seems suboptimal. I don't want people to duplicate code, some with assume_aligned, some without, if I want the same code with work both in a constexpr and not. A side channel would be better. It is a trade off, however, and I'd need to think more about it.

Yeah, I really don't think we want to kill constant evaluation just to preserve alignment assumptions. Instead, we want to model those assumptions in the evaluator IMO (even if we don't allow them to fold away in core constant expressions or parts of the ABI).

zoecarver added inline comments.
test/std/utilities/memory/ptr.align/assume_aligned.fail.cpp
24

This should be std::assume_aligned<4>(p);.