This is an archive of the discontinued LLVM Phabricator instance.

Revert "[Clang] Propagate guaranteed alignment for malloc and others"
ClosedPublic

Authored by jyknight on Feb 2 2022, 8:48 AM.

Details

Summary

The above change assumed that malloc (and friends) would always
allocate memory to getNewAlign(), even for allocations which have a
smaller size. This is not actually required by spec (a 1-byte
allocation may validly have 1-byte alignment).

Some real-world malloc implementations do not provide this guarantee,
and thus this optimization is breaking programs.

Fixes #53540

This reverts commit c2297544c04764237cedc523083c7be2fb3833d4.

Diff Detail

Event Timeline

jyknight requested review of this revision.Feb 2 2022, 8:48 AM
jyknight created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2022, 8:48 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
xbolva00 requested changes to this revision.EditedFeb 2 2022, 9:17 AM

GCC also does (almost) same assumptions. Instead of revert, do something else like: implement alloc size check

If alloc size < 16, do not use align 16, but smaller one. With this check, we will match gcc fully.

This revision now requires changes to proceed.Feb 2 2022, 9:17 AM

It is worth noting that GCC started assuming 16-byte alignment for small objects in 2016, before N2293 was written and accepted into C2x; see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90569#c8 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90569#c9 for more recent (informal) statements from GCC developers. Other malloc implementations, such as tcmalloc (see https://google.github.io/tcmalloc/reference.html) also seem to be planning on relaxing the alignment assumptions once C2x compliance is widespread. It might be worth filing a GCC bug as well.

jyknight updated this revision to Diff 405315.Feb 2 2022, 9:41 AM

(fix git mishap: neglected to add a file to original change after conflict resolution)

GCC also does same assumptions

Looking at GCC: GCC only assumes 4-byte alignment on i386, and 8-byte alignment on x86-64, which is why it hasn't actively broken users, unlike the clang change.

However, I think that's also still a bug in GCC. It should not be assuming 8-byte alignment for a 1-byte allocation.

So better to find solution / agreement with gcc devs too.

It is worth noting that GCC started assuming 16-byte alignment for small objects in 2016, before N2293 was written and accepted into C2x; see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90569#c8 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90569#c9 for more recent (informal) statements from GCC developers.

Note that linked bug is about assumed alignment (to decide which allocator to call for C++ "new X", which uses max_align_t as part of its decision (quite reasonably). For malloc assumed-alignment, GCC uses the value MALLOC_ABI_ALIGNMENT (e.g. https://github.com/gcc-mirror/gcc/blob/c123096cf14ac87875bba51279e46cceeb18faa1/gcc/tree-ssa-ccp.cc#L2336), which is set to BITS_PER_WORD by default, unless overridden by a target config. On x86, it's not overridden, and thus is BITS_PER_WORD (so 32 or 64)

I agree that the original patch (c2297544c04764237cedc523083c7be2fb3833d4) should be reverted.

The __STDCPP_DEFAULT_NEW_ALIGNMENT__==16 (on x86-64) thing is the glibc behavior.
It should be taken with a grain of salt since many malloc implementations don't provide the guarantee for allocations with smaller size (jemalloc/mimalloc/tcmalloc just guarantee 8 alignment on 64-bit systems).
And of course, this doesn't apply to to malloc and friends.

@urnathan mentioned this on https://github.com/llvm/llvm-project/issues/53540

Notice the 'suitably-sized allocation' caveat. Either poor wording, or getNewAlign is not guaranteeing smaller alignment for smaller allocations. I have not audited all the uses of getNewAlign to see if they presume the value is a conditional guarantee or not.

GCC also does same assumptions

If GCC makes use of __STDCPP_DEFAULT_NEW_ALIGNMENT__==16 for malloc or operator new with smaller allocations, looks like it is a bug that should be fixed as well.

collares removed a subscriber: collares.Feb 2 2022, 9:54 AM
MaskRay accepted this revision.Feb 2 2022, 9:54 AM

LGTM. This should be ported to release/14.x as well.

might be worth filing a GCC bug as well

Yes, a bug report for GCC should be opened as well. @collares do you want to take that?

So better to find solution / agreement with gcc devs too.

Reverting to the previous behavior is safe for clang to do in any case, and thus I think we ought to. Subsequently landing a change that restricts the alignment assumptions based on allocation size would be fine, as well. But reverting the incorrect change doesn't need to wait for that to be written, and neither need wait for a discussion with GCC.

Or can we bailout just for gnu? and preserve this for darwin?

Or can we bailout just for gnu? and preserve this for darwin?

As is, the original patch applies the C++ getNewAlign to C calloc/malloc/strdup/etc. This is outright wrong.

For C++, I confess I have some problems interpreting this sentence:

(https://eel.is/c++draft/cpp.predefined)

__STDCPP_­DEFAULT_­NEW_­ALIGNMENT__: An integer literal of type std::size_t whose value is the alignment guaranteed by a call to operator new(std::size_t) or operator new[](std::size_t). [ Note: Larger alignments will be passed to operator new(std::size_t, std::align_val_t), etc. (8.3.4). — end note ]

It seems to suggest that a large alignment may be assumed, but practically, I know at least mimalloc uses 8-byte alignment for 64-bit Darwin, not 16-byte alignment.

Or can we bailout just for gnu? and preserve this for darwin?

If deemed useful, someone may create a subsequent patch applying a fixed/restricted form.
This needs to be very careful since __STDCPP_­DEFAULT_­NEW_­ALIGNMENT__ doesn't seem to match the practice.

xbolva00 added a comment.EditedFeb 2 2022, 11:03 AM

mimalloc uses 8-byte alignment for 64-bit Darwin, not 16-byte alignment.

Than they are doing something fishy if I understood rjmccall’s comment correctly about darwin.

Also see:
https://reviews.llvm.org/D95910

So we have here … optimistic assumptions vs broken requirements

For C++, I confess I have some problems interpreting this sentence:

(https://eel.is/c++draft/cpp.predefined)

__STDCPP_­DEFAULT_­NEW_­ALIGNMENT__: An integer literal of type std::size_t whose value is the alignment guaranteed by a call to operator new(std::size_t) or operator new[](std::size_t). [ Note: Larger alignments will be passed to operator new(std::size_t, std::align_val_t), etc. (8.3.4). — end note ]

It seems to suggest that a large alignment may be assumed, but practically, I know at least mimalloc uses 8-byte alignment for 64-bit Darwin, not 16-byte alignment.

See https://eel.is/c++draft/basic.stc.dynamic.allocation#3 (this text was moved around a bit since C++17, but it hasn't changed meaning). operator new(std::size_t) requires that "the storage is aligned for any object that does not have new-extended alignment and is of the requested size." ()That is, if you have a size 8 allocation, it is not required to be 16-byte aligned.

[...] This needs to be very careful since __STDCPP_­DEFAULT_­NEW_­ALIGNMENT__ doesn't seem to match the practice.

I don't think that's true? I believe that macro is generally being set / handled correctly. Reintroducing an optimization like this with an additional check that the allocation size is large enough should be valid everywhere.

mimalloc uses 8-byte alignment for 64-bit Darwin, not 16-byte alignment.

It uses 8-byte alignment for 8-byte allocations, you mean. It uses 16-byte alignment for 16-byte (or larger) allocations (doing otherwise would be a clear bug, given that max_align_t is 16).

Or can we bailout just for gnu? and preserve this for darwin?

This isn't a GNU-only issue, so I don't think that makes sense.

Reintroducing an optimization like this with an additional check that the allocation size is large enough should be valid everywhere.

Should not be hard, could you do it? Then LGTM.

MaskRay added a comment.EditedFeb 2 2022, 3:25 PM

Reintroducing an optimization like this with an additional check that the allocation size is large enough should be valid everywhere.

Should not be hard, could you do it? Then LGTM.

@xbolva00 While I appreciate you contribution to the optimizations, I am not sure the burden of fixing a broken optimization lays on the issue reporter(s)... (quite a few now).
The revert is the safest choice both for origin/main and the now created origin/release/14.x.
If you are still motivated to improve the situation, you can contribute after the original broken patch is reverted...

ychen added a subscriber: ychen.Feb 2 2022, 3:27 PM

I don't see why the patch is wrong... It uses the target/platform-specific NewAlign. If the platform allows customized memory allocation that assumes weak alignment, it should set the NewAlign accordingly, no?

xbolva00 added a comment.EditedFeb 2 2022, 3:40 PM

Can you also confirm that there is no similar related to op new?

Yes, we can “hide” this assumption by removing it but.. there should be consensus whether just this code is problematic or problem is bigger (as we use NewAlign..)

xbolva00 added a comment.EditedFeb 2 2022, 3:44 PM

Reintroducing an optimization like this with an additional check that the allocation size is large enough should be valid everywhere.

Should not be hard, could you do it? Then LGTM.

@xbolva00 While I appreciate you contribution to the optimizations, I am not sure the burden of fixing a broken optimization lays on the issue reporter(s)... (quite a few now).
The revert is the safest choice both for origin/main and the now created origin/release/14.x.
If you are still motivated to improve the situation, you can contribute after the original broken patch is reverted...

I still think we can spend some more minutes to discuss it than rushing with revert.

Mozilla folks reported it after 7-8 months in trunk? So we can spend day or two to discuss it properly.

I don't see why the patch is wrong... It uses the target/platform-specific NewAlign. If the platform allows customized memory allocation that assumes weak alignment, it should set the NewAlign accordingly, no?

NewAlign is set to the largest object alignment for which the compiler can call ::operator new(size_t), instead of calling ::operator new(size_t, align_t), so, no, you definitely wouldn't want to change that value.

That is: it is correctly set to 16 if ::operator new(16) will return 16-byte-aligned memory, even if ::operator new(8)` will return an 8-byte-aligned memory. That's because 8-byte-aligned memory is "suitably-aligned" for any possible object that can fit in 8 bytes.

ychen added a comment.Feb 2 2022, 4:30 PM

I don't see why the patch is wrong... It uses the target/platform-specific NewAlign. If the platform allows customized memory allocation that assumes weak alignment, it should set the NewAlign accordingly, no?

NewAlign is set to the largest object alignment for which the compiler can call ::operator new(size_t), instead of calling ::operator new(size_t, align_t), so, no, you definitely wouldn't want to change that value.

That is: it is correctly set to 16 if ::operator new(16) will return 16-byte-aligned memory, even if ::operator new(8)` will return an 8-byte-aligned memory. That's because 8-byte-aligned memory is "suitably-aligned" for any possible object that can fit in 8 bytes.

If I understand it correctly, this is assuming a malloc that could return a different alignment depending on the allocation size. Some other malloc always returns the same alignment (which the reverted patch has assumed). Yeah, then the reverted patch needs to do it in a platform-specific way (default opt-out?).

ychen accepted this revision.Feb 2 2022, 4:31 PM

LGTM

xbolva00 requested changes to this revision.Feb 3 2022, 9:25 AM
xbolva00 added inline comments.
clang/test/CodeGen/alloc-fns-alignment.c
37

These tests for should stay.

This revision now requires changes to proceed.Feb 3 2022, 9:25 AM
jyknight updated this revision to Diff 406478.Feb 7 2022, 8:32 AM

Rebase, and preserve (and modify) test-case.

xbolva00 accepted this revision.Feb 7 2022, 9:25 AM

LG.

Please wait a day in case any new comments.

This revision is now accepted and ready to land.Feb 7 2022, 9:25 AM

Do you think it should be ported to release/13.x too?

llvm 14 - yes

llvm 13.0.2 is not planned.

If you need a workaround, use -fno-builtin-malloc.

Frankly, sounds quite inconvenient. This means that the default behavior changes for LLVM 13 and then changes back for LLVM 14.
Everyone using a custom allocator with weak alignment will need to know about this caveat and be really wary of this additional flag to ensure everything works as expected. This just burdens every user of Clang+(tcmalloc|jemalloc|mimalloc) with unnecessary steps.

Is it hard to retrofit this diff into LLVM 13?

Is it hard to retrofit this diff into LLVM 13?

It would be easy to backport this change into LLVM 13, but the problem is that there are no more releases planned on the LLVM 13 branch.

Is it hard to retrofit this diff into LLVM 13?

It would be easy to backport this change into LLVM 13, but the problem is that there are no more releases planned on the LLVM 13 branch.

I see, thanks for the quick reply.

Also, __builtin_malloc(...) can be used to avoid any alignment assumptions.

rsmith added a comment.Feb 7 2022, 2:01 PM

I support this revert.

malloc, operator new, and operator new[] (by the latter two I mean the usual global allocation functions, not user-provided ones) follow these rules:

  • malloc always returns storage whose alignment is at least the largest fundamental alignment. This allows, for example, T *p = (T*)malloc(n * sizeof(T) + 1); for any non-overaligned type T, even when n is zero. (The actual alignment might depend on the allocation size, but can never be less than max(_Alignof(long double), _Alignof(long long)), even if the allocation size is lower. We could easily add an alignment assumption stating that fact if we so chose.)
  • operator new(n) always returns storage whose alignment is sufficient for any type T with sizeof(T) == n && alignof(T) <= __STDCPP_DEFAULT_NEW_ALIGNMENT__. For example, operator new(n) for any odd size n is always permitted to return 1-byte aligned storage. This makes sense because operator new is expected to power (only) new T.
  • operator new[](n) always returns storage whose alignment is sufficient for any type T with sizeof(T) <= n && alignof(T) <= __STDCPP_DEFAULT_NEW_ALIGNMENT__. This is necessary to support things like new std::byte[n] followed by putting objects into that storage. However, unlike the malloc rules, this doesn't support zero-sized arrays in general.

Note that these are three different rules and we should not confuse them. The malloc rules have nothing to do with the default new alignment.

One possible and intended use of the C++ new alignment rules is to set the default new alignment to *less* than that guaranteed by malloc. For example, using -fnew-alignment=1 can be used to pass alignment information into the allocator whenever possible, which may result in more efficient code generation. This flag and its effect on operator new are entirely unrelated to the behavior of malloc, and should not reduce the alignment that is assumed to be provided by malloc.

Providing information to LLVM about the expected alignment of memory returned by malloc -- if we have that information and LLVM does not -- seems like a good thing, but this is not the way to do it. Adding a separate, target-specific notion of malloc alignment, and using that as the default for the new alignment, might be reasonable.

I support this revert.

Having received enough support, I'll go ahead and commit, and then propose backport to llvm 14 branch.

But --

  • malloc always returns storage whose alignment is at least the largest fundamental alignment.

As has been discussed previously in this review thread, that's not true -- and the in-practice-falseness of this statement was the trigger for reverting the change.

Not only is it not true in practice, it was clarified for C2x such that it's definitely not true per standard, either (while, before, it was unclear and could've been read either way). The current wording in C2x is: "The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and size less than or equal to the size requested.", see http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2293.htm

I support this revert.

Having received enough support, I'll go ahead and commit, and then propose backport to llvm 14 branch.

But --

  • malloc always returns storage whose alignment is at least the largest fundamental alignment.

As has been discussed previously in this review thread, that's not true -- and the in-practice-falseness of this statement was the trigger for reverting the change.

Not only is it not true in practice, it was clarified for C2x such that it's definitely not true per standard, either (while, before, it was unclear and could've been read either way). The current wording in C2x is: "The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and size less than or equal to the size requested.", see http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2293.htm

As the community's WG14 representative, I can confirm this information. We discussed N2293 in Pittsburgh in 2018 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2375.pdf) and the proposal had unanimous consent from the committee. WG14 is still establishing their new DR process, so the committee had no way to flag this as a defect report that applies retroactively to prior standards, but given the purpose of the paper was to clarify the intent, I think it is reasonable for us to treat it as a defect in all C versions if we wish to do so, not just C2x.

While C2X has blessed such smaller alignments, the x86_64 ABI (in particular), has not. However, using that ABI to justify 'It. Is. 16. Bytes.', is really an exercise in reality denial at this point. just thought I'd make it clear we have conflicting standards and practicality to attend to.

While C2X has blessed such smaller alignments, the x86_64 ABI (in particular), has not. However, using that ABI to justify 'It. Is. 16. Bytes.', is really an exercise in reality denial at this point. just thought I'd make it clear we have conflicting standards and practicality to attend to.

Do we want me to report back to WG14 with information that N2293 might not suitable for adoption into C2x?

While C2X has blessed such smaller alignments, the x86_64 ABI (in particular), has not. However, using that ABI to justify 'It. Is. 16. Bytes.', is really an exercise in reality denial at this point. just thought I'd make it clear we have conflicting standards and practicality to attend to.

Do we want me to report back to WG14 with information that N2293 might not suitable for adoption into C2x?

I think N2293 is fine for C2x. It is blessing an implementation of lower alignment allocations. Putting the programmer on notice that they can no longer assume some things.

As a compiler I think we need to deal with the reality that there are non-ABI conforming [system-dependent] allocators out there, and not simply say 'But the ABI says ...'

There is already at least one thing the ABI says that is not valid on some systems [sret return behaviour and Swift], for equally good reasons. The compiler deals with that.

FWIW, although gcc's code generation has a similar bug, its C++ library is already cognizant of lower-alignment allocators -- I convinced Mr Wakely a few years ago :)

I think we need to deal with the reality that there are non-ABI conforming [system-dependent] allocators out there,

Definitely not a good thing, just ticking bomb.

While C2X has blessed such smaller alignments, the x86_64 ABI (in particular), has not. However, using that ABI to justify 'It. Is. 16. Bytes.', is really an exercise in reality denial at this point. just thought I'd make it clear we have conflicting standards and practicality to attend to.

Do we want me to report back to WG14 with information that N2293 might not suitable for adoption into C2x?

I think N2293 is fine for C2x. It is blessing an implementation of lower alignment allocations. Putting the programmer on notice that they can no longer assume some things.

Okay, that's good to know, thank you! If the consensus changes and we want me to take feedback to WG14, please let me know.

As a compiler I think we need to deal with the reality that there are non-ABI conforming [system-dependent] allocators out there, and not simply say 'But the ABI says ...'

There is already at least one thing the ABI says that is not valid on some systems [sret return behaviour and Swift], for equally good reasons. The compiler deals with that.

FWIW, although gcc's code generation has a similar bug, its C++ library is already cognizant of lower-alignment allocators -- I convinced Mr Wakely a few years ago :)

rjmccall added a comment.EditedFeb 8 2022, 10:21 AM

While C2X has blessed such smaller alignments, the x86_64 ABI (in particular), has not. However, using that ABI to justify 'It. Is. 16. Bytes.', is really an exercise in reality denial at this point. just thought I'd make it clear we have conflicting standards and practicality to attend to.

Do we want me to report back to WG14 with information that N2293 might not suitable for adoption into C2x?

I think N2293 is fine for C2x. It is blessing an implementation of lower alignment allocations. Putting the programmer on notice that they can no longer assume some things.

As a compiler I think we need to deal with the reality that there are non-ABI conforming [system-dependent] allocators out there, and not simply say 'But the ABI says ...'

If a platform's ABI guarantees something, and the platform's system allocator actually does it, then that's the required behavior there, and replacement allocators that don't live up to it are non-conformant. But if the system allocator intentionally *doesn't* do it, and the platform's ABI is a generic ABI, then yeah, we have to understand that as an intent to declare a platform-specific deviation from the generic ABI.

There is already at least one thing the ABI says that is not valid on some systems [sret return behaviour and Swift], for equally good reasons. The compiler deals with that.

Can you explain what you mean here? Swift's conventions are not really bound by the C ABI.

MaskRay added a comment.EditedFeb 8 2022, 10:55 AM

While C2X has blessed such smaller alignments, the x86_64 ABI (in particular), has not. However, using that ABI to justify 'It. Is. 16. Bytes.', is really an exercise in reality denial at this point. just thought I'd make it clear we have conflicting standards and practicality to attend to.

By x86-64 ABI, do you mean "System V Application Binary Interface AMD64 Architecture Processor Supplement"? I have searched a bit but do not find the relevant wording on memory allocation.
If there is one, you can report issues about conflicting standards on https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues (gitlab account needed).

This revision was landed with ongoing or failed builds.Feb 8 2022, 11:35 AM
This revision was automatically updated to reflect the committed changes.

Are we considering relanding this with smaller alignment for smaller mallocs (and friends)?

See: https://github.com/llvm/llvm-project/issues/54747#issuecomment-1088420860

Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2022, 7:51 AM
Herald added a subscriber: StephenFan. · View Herald Transcript