Page MenuHomePhabricator

Add builtins for aligning and checking alignment of pointers and integers
ClosedPublic

Authored by arichardson on Dec 13 2019, 4:49 PM.

Details

Summary

This change introduces three new builtins (which work on both pointers
and integers) that can be used instead of common bitwise arithmetic:
builtin_align_up(x, alignment), builtin_align_down(x, alignment) and
__builtin_is_aligned(x, alignment).

I originally added these builtins to the CHERI fork of LLVM a few years ago
to handle the slightly different C semantics that we use for CHERI [1].
Until recently these builtins (or sequences of other builtins) were
required to generate correct code. I have since made changes to the default
C semantics so that they are no longer strictly necessary (but using them
does generate slightly more efficient code). However, based on our experience
using them in various projects over the past few years, I believe that adding
these builtins to clang would be useful.

These builtins have the following benefit over bit-manipulation and casts
via uintptr_t:

  • The named builtins clearly convey the semantics of the operation. While checking alignment using builtin_is_aligned(x, 16) versus ((x & 15) == 0) is probably not a huge win in readably, I personally find builtin_align_up(x, N) a lot easier to read than (x+(N-1))&~(N-1).
  • They preserve the type of the argument (including const qualifiers). When using casts via uintptr_t, it is easy to cast to the wrong type or strip qualifiers such as const.
  • If the alignment argument is a constant value, clang can check that it is a power-of-two and within the range of the type. Since the semantics of these builtins is well defined compared to arbitraty bit-manipulation, it is possible to add a UBSAN checker that the run-time value is a valid power-of-two. I intend to add this as a follow-up to this change.
  • The builtins avoids int-to-pointer casts both in C and LLVM IR. In the future (i.e. once most optimizations handle it), we could use the new llvm.ptrmask intrinsic to avoid the ptrtoint instruction that would normally be generated.
  • They can be used to round up/down to the next aligned value for both integers and pointers without requiring two separate macros.
  • In many projects the alignment operations are already wrapped in macros (e.g. roundup2 and rounddown2 in FreeBSD), so by replacing the macro implementation with a builtin call, we get improved diagnostics for many call-sites while only having to change a few lines.

[1] In our CHERI compiler we have compilation mode where all pointers are
implemented as capabilities (essentially unforgeable 128-bit fat pointers).
In our original model, casts from uintptr_t (which is a 128-bit capability)
to an integer value returned the "offset" of the capability (i.e. the
difference between the virtual address and the base of the allocation).
This causes problems for cases such as checking the alignment: for example, the
expression if ((uintptr_t)ptr & 63) == 0 is generally used to check if the
pointer is aligned to a multiple of 64 bytes. The problem with offsets is that
any pointer to the beginning of an allocation will have an offset of zero, so
this check always succeeds in that case (even if the address is not correctly
aligned). The same issues also exist when aligning up or down. Using the
alignment builtins ensures that the address is used instead of the offset. While
I have since changed the default C semantics to return the address instead of
the offset when casting, this offset compilation mode can still be used by
passing a command-line flag.

Diff Detail

Unit TestsFailed

TimeTest
30 msClang.CodeGen::Unknown Unit Message ("")
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/lib/clang/10.0.0/include -nostdsysteminc -triple=x86_64-unknown-unknown -o - -emit-llvm /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/clang/test/CodeGen/builtin-align-array.c | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/clang/test/CodeGen/builtin-align-array.c

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  • Add documentation for the new alignment builtins

Unit tests: pass. 61055 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

aaron.ballman added inline comments.Dec 31 2019, 7:08 AM
clang/docs/LanguageExtensions.rst
2844

Should explicitly specify whether the arguments are expressed in bytes or bits.

2853

determined -> determine
is be not -> is not

clang/lib/AST/ExprConstant.cpp
8154

Can go with const auto * where the type is spelled out in the initialization like this (here and elsewhere).

8156–8158

No else after a return.

8173

Should drop the top-level const qualifier as that isn't an idiom we typically use for locals.

10682

Comment missing a full stop at the end.

10717

This assertion doesn't add much given the above line of code.

10731

Same here as above.

clang/lib/CodeGen/CGBuiltin.cpp
14267

Can elide braces.

clang/lib/Sema/SemaChecking.cpp
224

Comment should probably be a FIXME unless you intend to address it as part of this patch. Also, the comment is missing a full stop (please check all comments in the patch as many of them are missing one).

arichardson marked 9 inline comments as done.
  • Address feedback

Unit tests: pass. 61165 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

aaron.ballman accepted this revision.Jan 1 2020, 6:05 AM

LGTM with a few nits to be fixed.

clang/lib/AST/ExprConstant.cpp
8156–8158

You can still drop this else (and use a regular if) because of the preceding return.

10682

The comment is still missing its full stop at the end.

clang/lib/Sema/SemaChecking.cpp
246–259

There are a bunch of else after return issues in this ladder. I would rearrange to put the error cases first and the warning case last.

This revision is now accepted and ready to land.Jan 1 2020, 6:05 AM
arichardson marked an inline comment as done.
  • Address remaining comments

Unit tests: pass. 61172 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

lebedev.ri added inline comments.Jan 1 2020, 1:19 PM
clang/lib/CodeGen/CGBuiltin.cpp
14322

Until we are there, can we still emit sane IR, without inttoptr?
https://godbolt.org/z/atBtSx

lebedev.ri requested changes to this revision.Jan 1 2020, 2:33 PM
This revision now requires changes to proceed.Jan 1 2020, 2:33 PM
arichardson marked an inline comment as done.Jan 1 2020, 2:53 PM
arichardson added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
14322

It seems like avoiding the select might be better?
https://godbolt.org/z/UdPjZk

I've done that in the current version.

Address feedback: Avoid inttoptr by using ptrtoint + GEP instead.

Using two GEPs for align_up seems to generate better code than select + a single GEP.
See https://godbolt.org/z/UdPjZk

  • Also update the other codegen test

Unit tests: fail. 61172 tests passed, 1 failed and 728 were skipped.

failed: Clang.CodeGen/builtin-align-array.c

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Unit tests: pass. 61173 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Address feedback: Avoid inttoptr by using ptrtoint + GEP instead.

Using two GEPs for align_up seems to generate better code than select + a single GEP.
See https://godbolt.org/z/UdPjZk

Avoiding select is good, true. But we can do better with a single gep still:
https://godbolt.org/z/u7YHKw

lebedev.ri added inline comments.Jan 2 2020, 2:09 AM
clang/lib/CodeGen/CGBuiltin.cpp
14318

But this isn't what we are doing, negation != inversion.

arichardson marked an inline comment as done.

Improved code generation with a single GEP

The pointer case of align_up is now generates the expected add+mask assembly even without the llvm.ptrmask intrinsic.

Unit tests: pass. 61172 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

lebedev.ri accepted this revision.Jan 2 2020, 5:15 AM
lebedev.ri marked an inline comment as done.

Looks ok to me now in principle.
I have one more question about pointer variants though (see inline)

clang/lib/CodeGen/CGBuiltin.cpp
14321

I honestly wonder if (at least for align up) that would result in worse results,
but we'll see if/when such patch appears i guess..

14327

What's the semantics of these aligning builtins for pointers?
Is this GEP supposed to comply to the C17 6.5.6p8, C++ [expr.add]?
(TLDR: both the old pointer, and the newly-aligned pointer
must both point to the same array (allocated memory region))

I.e. can this GEP be inbounds? If it can, it'd be most great for optimizations.

This revision is now accepted and ready to land.Jan 2 2020, 5:15 AM

Looks ok to me now in principle.
I have one more question about pointer variants though (see inline)

I am not sure the GEP can be inbounds since I have seen some cases where aligning pointers is used to get a pointer to a different object.
I most cases it should be in-bounds (even when used to implement malloc()), but I have seen some cases where aligning pointers is used to get a pointer to a different object.
For example, some versions WebKit align pointers down by 64k to get a pointer to a structure that holds metadata for all objects allocated inside that region.

I am not sure what happens for those cases if we add inbounds (miscompilation?), so I haven't added it here.
I guess we could add it if alignment is a constant and is less than the object size, but there might already be a pass to infer if a GEP is inbounds?

lebedev.ri added a subscriber: nlopes.

(would be good for @nlopes to comment, maybe i'm overweighting this..)

Looks ok to me now in principle.
I have one more question about pointer variants though (see inline)

What i'm asking is:

  • Are these builtins designed (as per clang/docs/LanguageExtensions.rst) to only be passed pointers in-bounds to the allocated memory chunk (Logical pointer*), or any random bag of bits casted to pointer type (Physical Pointer*)?
  • If Logical pointer, are they designed to also produce Logical pointer? Or Physical Pointer?

I am not sure the GEP can be inbounds since I have seen some cases
where aligning pointers is used to get a pointer to a different object.
I most cases it should be in-bounds (even when used to implement malloc()),
but I have seen some cases where aligning pointers is used to get a pointer
to a different object.

Object as in C++ object?
I'm specifically talking about memory region/chunk, as in what is returned by malloc().

For example, some versions WebKit align pointers down by 64k to get a pointer to a structure that holds metadata for all objects allocated inside that region.

But that entire region is still a single memory region/chunk,
not just some other random memory chunk that *happens* to be close nearby?

I am not sure what happens for those cases if we add inbounds (miscompilation?), so I haven't added it here.
I guess we could add it if alignment is a constant and is less than the object size, but there might already be a pass to infer if a GEP is inbounds?

I'm pushing on this because these intrinsics are supposed to be better than hand-rolled variants
(and in their current form with a single non-inbounds GEP they already are infinitly better
than ptrtoint+inttoptr pair), but if we go with current design (return Physical pointer),
(which is less optimizer friendly than gep inbounds which would requre/produce Logical pointers),
we'll be stuck with it..

Since there is no such builtin currently (certainly not in clang,
i don't see one in GCC), we do get to dictate it's semantics.
We don't have to define it to be most lax/UB-free (ptrtoint+inttoptr or non-inbounds gep),
but we can define it to be most optimizer-friendly (gep inbounds).

What i'm asking is:

  • Are these builtins designed (as per clang/docs/LanguageExtensions.rst) to only be passed pointers in-bounds to the allocated memory chunk (Logical pointer*), or any random bag of bits casted to pointer type (Physical Pointer*)?
  • If Logical pointer, are they designed to also produce Logical pointer? Or Physical Pointer?

In my view you both input and output should be a Logical pointer. If you want random values, you should be aligning uintptr_t instead.
For CHERI we only need to support passing and creating valid pointers (i.e. with the capability tag bit set and valid bounds information).
Since the source pointer contains bounds information, it will never be possible to use the __builtin_align_up/down result to access memory from a different underlying allocation.

I am not sure the GEP can be inbounds since I have seen some cases
where aligning pointers is used to get a pointer to a different object.
I most cases it should be in-bounds (even when used to implement malloc()),
but I have seen some cases where aligning pointers is used to get a pointer
to a different object.

Object as in C++ object?
I'm specifically talking about memory region/chunk, as in what is returned by malloc().

For example, some versions WebKit align pointers down by 64k to get a pointer to a structure that holds metadata for all objects allocated inside that region.

But that entire region is still a single memory region/chunk,
not just some other random memory chunk that *happens* to be close nearby?

Since we can run that code on CHERI-MIPS, I realize it must be the same memory region.
I had look at the code again and the aligned structure is constructed in a 64K region that is created by a single call to fastMalloc(blockSize), i.e. malloc()/mmap().

I am not sure what happens for those cases if we add inbounds (miscompilation?), so I haven't added it here.
I guess we could add it if alignment is a constant and is less than the object size, but there might already be a pass to infer if a GEP is inbounds?

I'm pushing on this because these intrinsics are supposed to be better than hand-rolled variants
(and in their current form with a single non-inbounds GEP they already are infinitly better
than ptrtoint+inttoptr pair), but if we go with current design (return Physical pointer),
(which is less optimizer friendly than gep inbounds which would requre/produce Logical pointers),
we'll be stuck with it..

Since there is no such builtin currently (certainly not in clang,
i don't see one in GCC), we do get to dictate it's semantics.
We don't have to define it to be most lax/UB-free (ptrtoint+inttoptr or non-inbounds gep),
but we can define it to be most optimizer-friendly (gep inbounds).

It seems like we should be able to add inbounds to the GEP when aligning pointer values.

However, I think this means that for the downstream CHERI codegen there is one more case where we need to be careful (but that shouldn't matter for upstream).
For us uintptr_t is represented as i8 addrspace(200)*.
As uintptr_t can hold values that are plain integers and not pointers, we would need to add the inbounds only if the AST type is a pointer, and use a non-inbounds GEP for (u)intptr_t.
Does that sound correct or did I misunderstand something?

Note: Creating out-of-bounds pointers (capabilities) is fine for CHERI in hardware, they just can't be dereferenced without trapping. You can bring them back in bounds and then use them again (as long as they weren't *too* far out of bounds, since the compression scheme can't represent all possible values).

Adding @brooks in case there is a use case in the OS kernel that may need to create out-of-bounds pointers that I forgot about.

arichardson edited the summary of this revision. (Show Details)Jan 2 2020, 7:51 AM
arichardson added subscribers: jrtc27, bsdjhb.
jrtc27 added inline comments.Jan 2 2020, 8:01 AM
clang/docs/LanguageExtensions.rst
2844

This should clearly state whether an already-aligned argument is left unmodified or incremented/decremented by the alignment amount. As it stands it reads more like such arguments will be modified, but the implementation is to round, ie the former.

lebedev.ri requested changes to this revision.Jan 2 2020, 8:53 AM

(temporarily undoing my review since it looks like things may change)

...

Aha! :)

However, I think this means that for the downstream CHERI codegen
there is one more case where we need to be careful
(but that shouldn't matter for upstream).
For us uintptr_t is represented as i8 addrspace(200)*.
As uintptr_t can hold values that are plain integers and not pointers,
we would need to add the inbounds only if the AST type is a pointer,
and use a non-inbounds GEP for (u)intptr_t.

I may be missing some details, but won't we get that for free,
since we only do gep for pointers already?

Does that sound correct or did I misunderstand something?

To me this all sounds great.
Since we get to dictate^W define semantics, i'd think we can require

both input and output should be a Logical pointer.
If you want random values, you should be aligning uintptr_t instead.

This revision now requires changes to proceed.Jan 2 2020, 8:53 AM
arichardson marked 2 inline comments as done.
  • Generate inbounds GEP. Also mention that values must point to the same allocation in the documentation.
  • Clarify that correctly aligned values do not change.

(temporarily undoing my review since it looks like things may change)

...

Aha! :)

However, I think this means that for the downstream CHERI codegen
there is one more case where we need to be careful
(but that shouldn't matter for upstream).
For us uintptr_t is represented as i8 addrspace(200)*.
As uintptr_t can hold values that are plain integers and not pointers,
we would need to add the inbounds only if the AST type is a pointer,
and use a non-inbounds GEP for (u)intptr_t.

I may be missing some details, but won't we get that for free,
since we only do gep for pointers already?

I believe we would need something like this downstream instead of always creating an inbounds GEP:

if (E->getType()->isPointerType()) {
  // The result must point to the same underlying allocation. This means we
  // can use an inbounds GEP to enable better optimization.
  Result = Builder.CreateInBoundsGEP(EmitCastToVoidPtr(Args.Src),
                                     Difference, "aligned_result");
} else {
  // However, for when performing operations on __intcap_t (which is also
  // a pointer type in LLVM IR), we cannot set the inbounds flag as the
  // result could be either an arbitrary integer value or a valid pointer.
  // Setting the inbounds flag for the arbitrary integer case is not safe.
  assert(E->getType()->isIntCapType());
  Result = Builder.CreateGEP(EmitCastToVoidPtr(Args.Src), Difference,
                             "aligned_result");
}

Does that sound correct or did I misunderstand something?

To me this all sounds great.
Since we get to dictate^W define semantics, i'd think we can require

both input and output should be a Logical pointer.
If you want random values, you should be aligning uintptr_t instead.

Thank you very much for the useful suggestion!

  • Fix typos caused by dodgy n key on my keyboard.

Unit tests: pass. 61176 tests passed, 0 failed and 729 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Hmm, i keep coming up with things here, sorry :/

I think we should take one more step - explicitly call out that the result of __builtin_align_* is, well, aligned :)
For that, we need to add __attribute__((alloc_align(2))) attribute.
Despite the name, it does not imply anything about provenance/aliasing/etc, only about alignment:
https://godbolt.org/z/9bAjxK

clang/docs/LanguageExtensions.rst
2854–2855

Might be worth explicitly calling out C17 6.5.6p8, C++ [expr.add]?

clang/lib/CodeGen/CGBuiltin.cpp
14329–14330

I would suspect this should follow suit of the rest of clang codegen, i.e. do

Value* Base = EmitCastToVoidPtr(Args.Src);
if (CGF.getLangOpts().isSignedOverflowDefined())
  Result = Builder.CreateGEP(Base, Difference, "aligned_result");
else
  Result = CGF.EmitCheckedInBoundsGEP(Base, Difference,
                 /*SignedIndices=*/true, /*isSubtraction*/=false,
                 E->getExprLoc(), "aligned_result");

Unit tests: pass. 61176 tests passed, 0 failed and 729 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

For future reference, if anyone needs here's the C versions of these builtins:
https://godbolt.org/z/oHeAh-

^ looks like we are missing some backend-level folds for round-down variant, filed https://bugs.llvm.org/show_bug.cgi?id=44448

For future reference, if anyone needs here's the C versions of these builtins:
https://godbolt.org/z/oHeAh-

^ looks like we are missing some backend-level folds for round-down variant, filed https://bugs.llvm.org/show_bug.cgi?id=44448

Appled some backend codegen fixes (8dab0a4a7d691f2704f1079538e0ef29548db159, 3d492d7503d197246115eb38e7b1b61143d0c99f, 0727e2b90c7b11d5c6be55919c443628d8e2bc6e),
__builtin_align_down() codegen should now be as optimal as __builtin_align_up() is. (at least on X86)

nlopes added a comment.Jan 3 2020, 9:18 AM

@lebedev.ri I agree with you that the semantics of these alignment builtins should only return a pointer that is of the same object as the one given as input.
Otherwise, these builtins would be even worst that ptr2int/int2ptr, since their result could alias with any other pointer in the program, not just the escaped pointers.

I gave a glance through the patch, especially the documentation section, and the semantics look right to me.
Would be nice to see this using ptrmask on a subsequent patch.

  • Add a test that the new alignment is propagated to e.g. llvm.memcpy

To do this emit a llvm.assume for pointer types and mark the builtin
as having an alloc_align attribute (although that does not seem to do anything)

  • Update docs to mention C17 6.5.6p8, C++ [expr.add]
  • Use getLangOpts().isSignedOverflowDefined() in codegen.

Unit tests: pass. 61248 tests passed, 0 failed and 736 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

lebedev.ri accepted this revision.Jan 5 2020, 6:04 AM
lebedev.ri added a subscriber: erichkeane.

Nice. Thank you for working on this.
I think this finally fully looks good to me.

Not sure whether we should be really adding an attribute to AST,
or just calling EmitAlignmentAssumption() like it is done now.
I'll defer to @erichkeane on that.

This revision is now accepted and ready to land.Jan 5 2020, 6:04 AM

Land this? (branching is near..)

Land this? (branching is near..)

I was planning to wait until Sunday for @erichkeane to comment whether the AST attribute is needed and if I don't hear back until then land I would commit the current version. Does that seem sensible or is it too close to the branch point?

Otherwise I can commit it tomorrow morning and change it postcommit if necessary.
What do you think?

Sorry, I didn't realize you were waiting for me. This seems fine as it is, I don't see the attribute buying us anything.

This revision was automatically updated to reflect the committed changes.