This is an archive of the discontinued LLVM Phabricator instance.

[Builtin] Add __builtin_zero_non_value_bits.
Needs ReviewPublic

Authored by zoecarver on Sep 19 2020, 1:25 PM.

Details

Summary

Adds __builtin_zero_non_value_bits to zero all padding bits of a struct. This builtin should match the behavior of those in NVCC and GCC (and MSVC?). There are some tests in this patch but hopefully we'll also get tests from other compilers (so all builtins can be as similar as possible).

I'm planning to add support for unions, bitfields (both as members and members of sub-objects), and booleans as follow up patches.

Diff Detail

Event Timeline

zoecarver created this revision.Sep 19 2020, 1:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2020, 1:25 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
zoecarver requested review of this revision.Sep 19 2020, 1:25 PM
zoecarver edited the summary of this revision. (Show Details)Sep 19 2020, 1:32 PM
zoecarver added reviewers: jfb, rsmith, Bigcheese, __simt__.
jfb added inline comments.Sep 19 2020, 4:14 PM
clang/lib/CodeGen/CGBuiltin.cpp
1682

Typo in "fields".

clang/test/CodeGenCXX/builtin-zero-non-value-bits.cpp
161

Usually CodeGen tests will use lit to check the emitted IR matches expectations. I think that's what you want to do here.

Remember to test volatile qualified pointers, as well as address spaces too.

clang/test/SemaCXX/builtin-zero-non-value-bits.cpp
12

You should also check incomplete types, vector, variable width integers, const qualified.

zoecarver added inline comments.Sep 20 2020, 1:35 PM
clang/test/CodeGenCXX/builtin-zero-non-value-bits.cpp
161

What's a good place for me to put this end-to-end test?

zoecarver retitled this revision from Summary: [Builtin] Add __builtin_zero_non_value_bits. to [Builtin] Add __builtin_zero_non_value_bits..Sep 22 2020, 9:32 AM
zoecarver updated this revision to Diff 293606.Sep 22 2020, 5:46 PM
zoecarver edited the summary of this revision. (Show Details)
  • Add more test cases.
  • Fix typo.
  • Add codegen tests.
zoecarver marked 2 inline comments as done and an inline comment as not done.Sep 22 2020, 5:48 PM
jfb added inline comments.Sep 23 2020, 10:10 AM
clang/lib/CodeGen/CGBuiltin.cpp
1647

I'd like to hear @rsmith's thoughts on this approach, in particular w.r.t. aliasing concerns. I also wonder if below the GEPs should be inbounds, depending on how they're created.

1652

You should use alignmentAtOffset here.

clang/test/CodeGenCXX/builtin-zero-non-value-bits-codegen.cpp
17

It would be helpful to have a comment with the final layout of the struct, including padding. Give each padding field a name, and reference them in the IR check below.

27

It would help read the tests if you had a comment on top of each store, for example here "padding byte X".

47

It would be useful to see a test for arrays with a type that contains tail padding.

clang/test/CodeGenCXX/builtin-zero-non-value-bits.cpp
161

I'm not sure, I don't usually add this type of test :)

zoecarver marked 3 inline comments as done.Sep 27 2020, 10:29 PM
zoecarver added inline comments.
clang/test/CodeGenCXX/builtin-zero-non-value-bits-codegen.cpp
17

Done. I've named each padding field as "PAD_X" so below it should be clear what fields are being stored without a comment.

27

See above comment.

47

Hmm, this test case doesn't seem to be working. I'll investigate further.

clang/test/CodeGenCXX/builtin-zero-non-value-bits.cpp
161

Even if it's a bit unconventional, I think it would be good to have this type of test. I think we should try to cover as many test cases as possible because it's important that this builtin both doesn't zero non-padding bits and does zero all padding bits. And it wouldn't be practical to add the 100+ test cases covered here as codegen tests.

zoecarver marked 3 inline comments as done.
  • Add UnsizedTail codegen test
zoecarver marked an inline comment as done.Sep 30 2020, 5:57 PM
zoecarver added inline comments.
clang/test/CodeGenCXX/builtin-zero-non-value-bits-codegen.cpp
47

OK, I've added that. Just to clarify, you mean a type that contains a constant array type of types with tail padding (i.e., Bar [2])?

zoecarver updated this revision to Diff 295445.Sep 30 2020, 5:58 PM
  • Support constant arrays
  • Format changes with clang-format
zoecarver added inline comments.Sep 30 2020, 6:00 PM
clang/lib/CodeGen/CGBuiltin.cpp
1652

I'm using CharUnits::One().alignmentAtOffset to here because this type will always have a size of 1 because it's an i8 ptr.

zoecarver added inline comments.Sep 30 2020, 6:08 PM
clang/lib/CodeGen/CGBuiltin.cpp
1704

Is it OK to possibly create hundreds of stores here? I assume later optimizations will catch this and turn it into a loop or a call to memset or something. But this could potentially be harmful to code size.

@jfb (and others), friendly ping. Are there any other changes (especially to the logic) that you'd like me to make?

clang/lib/CodeGen/CGBuiltin.cpp
1647

Hmm, I don't know much about it but, I think these could be inbound. Because we will never actually go beyond the size of the llvm type.

clang/test/CodeGenCXX/builtin-zero-non-value-bits-codegen.cpp
47

Let me know if there are more codegen tests you'd like me to add. Happy to add them. (Maybe one for vector types, or volatile, or one that doesn't have an non-value bits?)

jwakely added a subscriber: jwakely.EditedNov 20 2020, 7:02 AM

As of a few hours ago, GCC has __builtin_clear_padding, see https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fclear_005fpadding for the docs.

As of a few hours ago, GCC has __builtin_clear_padding, see https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fclear_005fpadding for the docs.

Great. Then I'll update this to be named __builtin_clear_padding.

jfb added a comment.Nov 20 2020, 1:49 PM

Let's make sure that we follow the same semantics that GCC does, particularly w.r.t. union, bitfields, and padding at the end of an object (whether it's in an array or not).

Let's make sure that we follow the same semantics that GCC does, particularly w.r.t. union, bitfields, and padding at the end of an object (whether it's in an array or not).

Agreed. I'm planning to run some tests tomorrow once the nightly build has updated.

I've resumed looking at the library code.

How should I check for support? Is it going to be e.g. __has_feature(__builtin_clear_padding)?

How should I check for support? Is it going to be e.g. has_feature(builtin_clear_padding)?

I'm not sure __has_feature will work but __has_builtin should work on both Clang and GCC.

@jwakely It looks like UnsizedTail causes a crash.

Other than that all the tests in this PR pass. It also looks like there's (at least) some support for unions and bitfields. This patch doesn't support those but I'm planning to add support as a follow-up.

@jwakely It looks like UnsizedTail causes a crash.

Filed https://gcc.gnu.org/PR97943 for that. Avoiding the crash is trivial, deciding what we want to do exactly for flexible array members (which are beyond the C++ standard) is harder.

Filed https://gcc.gnu.org/PR97943 for that. Avoiding the crash is trivial, deciding what we want to do exactly for flexible array members (which are beyond the C++ standard) is harder.

Yes, and we should try to do the same thing in this case. Currently, this implementation clears any padding bits that might come before the flexible array member but doesn't attempt to clear any of the array's padding bits (which I'm pretty sure wouldn't be feasible). So, what we're really deciding is whether or not to error for these types.

So, it looks like GCC already uses __builtin_clear_padding and MSVC already uses __builtin_zero_non_value_bits. This patch (obviously) is currently implementing __builtin_zero_non_value_bits but, I had planned to update it to use __builtin_clear_padding. Maybe that's not the best course of action, though.

We should all try to agree on _one_ name. CC @BillyONeal @jwakely thoughts?

So, it looks like GCC already uses __builtin_clear_padding and MSVC already uses __builtin_zero_non_value_bits. This patch (obviously) is currently implementing __builtin_zero_non_value_bits but, I had planned to update it to use __builtin_clear_padding. Maybe that's not the best course of action, though.

We should all try to agree on _one_ name. CC @BillyONeal @jwakely thoughts?

The name MSFT is already shipping in production is __builtin_zero_non_value_bits. If gcc is already shipping another name in production I think clang is stuck supporting both names, if gcc has not yet shipped their implementation perhaps we can choose one. That seems to be more on gcc than it is on clang given clang's desire to be more or less a drop in replacement for either gcc or msvc.

The MSFT STL implementation can of course use a different builtin when we detect clang.

The name MSFT is already shipping in production is __builtin_zero_non_value_bits. If gcc is already shipping another name in production I think clang is stuck supporting both names, if gcc has not yet shipped their implementation perhaps we can choose one. That seems to be more on gcc than it is on clang given clang's desire to be more or less a drop in replacement for either gcc or msvc.

Are they actually the same, with the same handling of corner cases like unions and tail padding?

There's more to this than just the name, and if they aren't the same, it seems better to have two names.

Is there a specification for __builtin_zero_non_value_bits available somewhere?

Are they actually the same, with the same handling of corner cases like unions and tail padding?
There's more to this than just the name, and if they aren't the same, it seems better to have two names.

They are both implementing the same C++ feature, with the same desired semantics of zeroing out any bits in the object representation that aren't in the value representation. If they differ, one or the other would have a bug.

Is there a specification for __builtin_zero_non_value_bits available somewhere?

I don't know if there is a formal spec for it beyond the actual C++ standard.

Are they actually the same, with the same handling of corner cases like unions and tail padding?
There's more to this than just the name, and if they aren't the same, it seems better to have two names.

They are both implementing the same C++ feature, with the same desired semantics of zeroing out any bits in the object representation that aren't in the value representation. If they differ, one or the other would have a bug.

I agree, they either need to be identical (including corner cases) or there needs to be two of them (i.e., GCC ships both __builtin_zero_non_value_bits and __builtin_clear_padding and the first is the same as MSVC, Clang, and NVCC).

Is there a specification for __builtin_zero_non_value_bits available somewhere?

I don't know if there is a formal spec for it beyond the actual C++ standard.

I think P0528 is the relevant paper but other than that, no, there's not a spec. I think that's going to be the most time sensitive part of implementing this: coming up with the spec and making sure all the tests pass on all the implementations.

I think Jonathan is asking whether there is a match in the gray areas.

The two cases people bring up most:

  1. Unions, where the padding overlaps for all the possible active members.
  2. Tail padding, up to the allocator granularity / alignment size.

If the implementation-specific builtins don't match on these, then maybe they should have different names, is his argument I think.

If the implementation-specific builtins don't match on these, then maybe they should have different names, is his argument I think.

That's a fair point. And I agree, if they don't match, maybe it would be best to have different names. I'm hoping that we can all agree on how to handle these gray areas, though.

Are they actually the same, with the same handling of corner cases like unions and tail padding?
There's more to this than just the name, and if they aren't the same, it seems better to have two names.

They are both implementing the same C++ feature, with the same desired semantics of zeroing out any bits in the object representation that aren't in the value representation. If they differ, one or the other would have a bug.

Do they support non-trivially copyable types? That isn't required for the atomic compare exchange feature, but is relevant for a feature exposed to users. What about extensions like zero-sized arrays or C99 flexible array members?

I agree, they either need to be identical (including corner cases) or there needs to be two of them (i.e., GCC ships both __builtin_zero_non_value_bits and __builtin_clear_padding and the first is the same as MSVC, Clang, and NVCC).

GCC doesn't need to support both. It only works with libstdc++ so it only needs to support the one used by libstdc++ (although there is a patch to add -stdlib=libc++ to GCC).

If libstdc++ uses __has_builtin to check what the compiler supports then Clang doesn't even need to support GCC's built-in, because libstdc++ wouldn't use it if not supported (and could use __builtin_zero_non_value_bits instead when supported).

The Intel compiler would need to support both though.

Is there a specification for __builtin_zero_non_value_bits available somewhere?

I don't know if there is a formal spec for it beyond the actual C++ standard.

I think P0528 is the relevant paper but other than that, no, there's not a spec. I think that's going to be the most time sensitive part of implementing this: coming up with the spec and making sure all the tests pass on all the implementations.

GCC has publicly available documentation describing its built-in, and publicly available tests for it. That's the kind of spec I'm looking for.

Are they actually the same, with the same handling of corner cases like unions and tail padding?
There's more to this than just the name, and if they aren't the same, it seems better to have two names.

They are both implementing the same C++ feature, with the same desired semantics of zeroing out any bits in the object representation that aren't in the value representation. If they differ, one or the other would have a bug.

Do they support non-trivially copyable types? That isn't required for the atomic compare exchange feature, but is relevant for a feature exposed to users. What about extensions like zero-sized arrays or C99 flexible array members?

As far as MSVC is concerned this isn't "exposed to users".

What about extensions like zero-sized arrays or C99 flexible array members?

We don't have those extensions at all so it's irrelevant to talk about what the builtin would do with them in our case. (And at such time we would add such extensions presumably we would match gcc's behavior since again, there's no reason for the behavior to differ here)

I agree, they either need to be identical (including corner cases) or there needs to be two of them (i.e., GCC ships both __builtin_zero_non_value_bits and __builtin_clear_padding and the first is the same as MSVC, Clang, and NVCC).

GCC doesn't need to support both. It only works with libstdc++ so it only needs to support the one used by libstdc++ (although there is a patch to add -stdlib=libc++ to GCC).

If libstdc++ uses __has_builtin to check what the compiler supports then Clang doesn't even need to support GCC's built-in, because libstdc++ wouldn't use it if not supported (and could use __builtin_zero_non_value_bits instead when supported).

The Intel compiler would need to support both though.

Is there a specification for __builtin_zero_non_value_bits available somewhere?

I don't know if there is a formal spec for it beyond the actual C++ standard.

I think P0528 is the relevant paper but other than that, no, there's not a spec. I think that's going to be the most time sensitive part of implementing this: coming up with the spec and making sure all the tests pass on all the implementations.

GCC has publicly available documentation describing its built-in, and publicly available tests for it. That's the kind of spec I'm looking for.

We don't consider it "publicly available" so there isn't going to be that kind of documentation for it. I don't see a serious problem with the gcc version of that builtin supporting a superset of the functionality of the equivalent msvc builtin.

Of course if it's already publicly documented for you the horse has presumably already left the barn which makes the discussion moot?

Of course if it's already publicly documented for you the horse has presumably already left the barn which makes the discussion moot?

It's not in a shipping release yet. But the point of documenting such built-ins partly so that other compiler implementors (and vendors of tools such as static analyzers) know what they're trying to be consistent with.

After following up with the team responsible for __builtin_zero_non_value_bits on MSVC, I have some more information to offer in conjunction with @BillyONeal's report:

  • For unions, we always assume that it has unique object representations and thus does not have any padding bytes. This allows __builtin_zero_non_value_bits to be used with types with union members, not always accurately but never destructively. (If the union has padding bytes, we don't know which member is active so we don't know exactly where the padding bytes are. To return padding bytes assuming any member risks changing the value.)
  • Our implementation does appear to include tail padding and this would include any such padding that is due to alignment requirements, etc.
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 12:20 PM