This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Propagate guaranteed alignment for malloc and others
ClosedPublic

Authored by xbolva00 on Apr 20 2021, 11:54 AM.

Details

Summary

LLVM should be smarter about *known* malloc's alignment and this knowledge may enable other optimizations.

Originally started as LLVM patch - https://reviews.llvm.org/D100862 but this logic should be really in Clang.

Diff Detail

Event Timeline

xbolva00 requested review of this revision.Apr 20 2021, 11:54 AM
xbolva00 created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2021, 11:54 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

If this approach makes sense, I will update / add tests.

xbolva00 updated this revision to Diff 338950.Apr 20 2021, 11:55 AM
xbolva00 added a reviewer: jdoerfert.
xbolva00 added inline comments.
clang/lib/CodeGen/CGCall.cpp
2059

As a followup, I need to teach Clang about aligned_alloc and memalign.

lebedev.ri added inline comments.
clang/lib/CodeGen/CGCall.cpp
2060

malloc != new

xbolva00 added inline comments.Apr 20 2021, 11:59 AM
clang/lib/CodeGen/CGCall.cpp
2060

But the comment for the implementation of this value points on eg. gcc's docs:

https://clang.llvm.org/doxygen/Basic_2TargetInfo_8cpp_source.html line 67

// From the glibc documentation, on GNU systems, malloc guarantees 16-byte
// alignment on 64-bit systems and 8-byte alignment on 32-bit systems. See
// https://www.gnu.org/software/libc/manual/html_node/Malloc-Examples.html.
// This alignment guarantee also applies to Windows and Android. On Darwin,
// the alignment is 16 bytes on both 64-bit and 32-bit systems.

So this value is correct to be used for malloc and friends IMHO.

GCC already knows about this guaranteed alignment, check:
https://godbolt.org/z/sxn6K7Yq7

Alignment checks were optimized out. I can confirm that with this patch we do same.

Please addd tests, including tests that we suppress this assumption under e.g. -fno-builtin-malloc

xbolva00 updated this revision to Diff 339189.Apr 21 2021, 5:13 AM

Added tests

Please addd tests, including tests that we suppress this assumption under e.g. -fno-builtin-malloc

Done

xbolva00 updated this revision to Diff 339191.Apr 21 2021, 5:14 AM

Handle also aligned_alloc and memalign.

xbolva00 updated this revision to Diff 339192.Apr 21 2021, 5:15 AM

Fixed small typo in tests.

Harbormaster completed remote builds in B99952: Diff 339192.

Test looks good, thanks.

clang/lib/CodeGen/CGCall.cpp
2060

There's no immediate reason to have two different methods, but we should at least update the comment on getNewAlign to document that we're assuming it describes both operator new and malloc.

xbolva00 updated this revision to Diff 339912.Apr 23 2021, 12:22 AM

Updated comment for

getNewAlign()
xbolva00 marked an inline comment as done.Apr 23 2021, 12:22 AM
rjmccall accepted this revision.Apr 23 2021, 12:56 AM

Thanks, LGTM.

This revision is now accepted and ready to land.Apr 23 2021, 12:56 AM
This revision was landed with ongoing or failed builds.Apr 23 2021, 2:07 AM
This revision was automatically updated to reflect the committed changes.

Hello. Nice idea. Unfortunately this blocks tail folding, making codesize a bit bigger: https://godbolt.org/z/rncPvbh8d
I'm guessing it shouldn't? But the attribute isn't handled somewhere along the way. Any ideas where?

xbolva00 added a comment.EditedApr 24 2021, 3:28 AM

Hello. Nice idea. Unfortunately this blocks tail folding, making codesize a bit bigger: https://godbolt.org/z/rncPvbh8d
I'm guessing it shouldn't? But the attribute isn't handled somewhere along the way. Any ideas where?

Hmm.. I know probably where to fix it.

diff --git a/llvm/lib/CodeGen/Analysis.cpp b/llvm/lib/CodeGen/Analysis.cpp
index 5d1b00463dc8..425e1b4e8e2d 100644
--- a/llvm/lib/CodeGen/Analysis.cpp
+++ b/llvm/lib/CodeGen/Analysis.cpp
@@ -560,6 +560,8 @@ bool llvm::attributesPermitTailCall(const Function *F, const Instruction *I,
 
   // Following attributes are completely benign as far as calling convention
   // goes, they shouldn't affect whether the call is a tail call.
+  CallerAttrs.removeAttribute(Attribute::Alignment);
+  CalleeAttrs.removeAttribute(Attribute::Alignment);
   CallerAttrs.removeAttribute(Attribute::NoAlias);
   CalleeAttrs.removeAttribute(Attribute::NoAlias);
   CallerAttrs.removeAttribute(Attribute::NonNull);

Your testcase works for me with this modification.

Inspiration: https://github.com/llvm/llvm-project/commit/62ad2128255877ed41c714366861eee9c1da30dd

If you want you can fix it (or I can do that later today).

In https://bugzilla.mozilla.org/show_bug.cgi?id=1741454, oxalica discovered this breaks jemalloc-using code, since jemalloc does not guarantee 16-byte alignment.

In https://bugzilla.mozilla.org/show_bug.cgi?id=1741454, oxalica discovered this breaks jemalloc-using code, since jemalloc does not guarantee 16-byte alignment.

Not sure if jemalloc does the right thing here. (See comment above getNewAlign). You can force 16-byte alignment by configuring with --with-lg-quantum=4.

collares added a comment.EditedNov 20 2021, 3:36 PM

Reproducing oxalica's comment at https://github.com/NixOS/nixpkgs/pull/146817, to make sure it is not missed:

"The divergence here is about how to understand the sentence from C17, 7.22.3. There is already an proposal rephrasing it and making it clear that 8-byte allocation CAN result in only 8-byte alignment, not 16-byte-aligned. It also mentions that forcing jemalloc to have 16-byte alignment will significantly slow down its performance.
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2293.htm

This N2293 is already accepted in the latest working draft, and should be present in the next C2x standard. Clang should eventually change to support it."

xbolva00 added a comment.EditedNov 20 2021, 8:29 PM

Just curious, did you try firefox with rpmalloc? (If you care about the best performance).

“ It also mentions that forcing jemalloc to have 16-byte alignment will significantly slow down its performance.”

It would be interesting to see real world data, like firefox benchmarks. glibc and macOS defaults to 16-byte alignment and I think they have good reasons too.

Platforms are permitted to make stronger guarantees than the C standard requires, and it is totally reasonable for compilers to assume that malloc meets the target platform's documented guarantees. Arguably, libraries should not be replacing malloc if they fail to meet the platform's documented guarantees.

With that said, it's probably not unreasonable for Clang to provide an option that controls its assumptions about malloc. We should think somewhat carefully about whether there are other behaviors we'd want to include in that, though.

I think this patch was wrong -- we should not be assuming 16-byte alignment for an allocation smaller than 16 bytes.

(For background: I'm a jemalloc maintainer and wrote N2293).

Platforms are permitted to make stronger guarantees than the C standard requires, and it is totally reasonable for compilers to assume that malloc meets the target platform's documented guarantees. Arguably, libraries should not be replacing malloc if they fail to meet the platform's documented guarantees.

I agree generally that assuming platform guarantees is reasonable for the compiler in targeting that platform, but I don't think I agree that the dlmalloc alignment is one of them. Supporting malloc replacement is a first-class feature for glibc malloc, and related projects avoid relying on alignment guarantees (e.g. libstdc++ at one point considered assuming that 8-byte allocs were 16-byte aligned, and decided not to). At least one Linux distribution using clang (Alpine) uses musl, which is a weak-alignment implementation (contrary to what I claimed in N2293; I screwed up my background research).

With that said, it's probably not unreasonable for Clang to provide an option that controls its assumptions about malloc. We should think somewhat carefully about whether there are other behaviors we'd want to include in that, though.

This makes sense to me; this is broadly how clang handles operator new alignment as well.

fhahn added a subscriber: fhahn.Feb 2 2022, 7:37 AM

Someone filed https://github.com/llvm/llvm-project/issues/53540, which covers some of the issues also discussed here.