This is an archive of the discontinued LLVM Phabricator instance.

[clang] Add builtin_nondeterministic_value
ClosedPublic

Authored by ManuelJBrito on Jan 23 2023, 10:48 AM.

Details

Summary

This patch adds a builtin that returns a non deterministic value of the same type as the argument.

Example:
int x = __builtin_nondeterministic_value(x);

Which is then lowered to freeze(poison).

This is a continuation of the work done in https://reviews.llvm.org/D136737, following the provided feedback.

Diff Detail

Event Timeline

ManuelJBrito created this revision.Jan 23 2023, 10:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 23 2023, 10:48 AM
ManuelJBrito requested review of this revision.Jan 23 2023, 10:48 AM
erichkeane added inline comments.Jan 23 2023, 11:04 AM
clang/include/clang/Basic/Builtins.def
658

Not a fan of the name in general, it doesn't really explain what is happening. Perhaps __builtin_nondeterministic_value or something?

I also wonder if others might have a better name :)

clang/lib/CodeGen/CGBuiltin.cpp
3064

Is this troublesome with some of the types that have a ConvertTypeForMem? That is, bool and bool vector types might need a different type/representation? https://clang.llvm.org/doxygen/CodeGenTypes_8cpp_source.html#l00091

clang/test/CodeGen/builtins-nondet.c
3 ↗(On Diff #491441)

Add some examples for 'bool' and vector-bool as well to make sure this is doing what you want.

shafik added inline comments.Jan 23 2023, 11:55 AM
clang/include/clang/Basic/Builtins.def
658

__builtin_indeterminate_value wdyt?

clang/test/CodeGen/builtins-nondet.c
3 ↗(On Diff #491441)

Does this also work for structs?

Be sure to add documentation for this to the language extensions page and mention the new builtin in the release notes.

clang/include/clang/Basic/Builtins.def
658

I think __builtin_unspecified_value() makes sense, as that's what you're getting back (there is a valid value, but you may get a different value on every call). I don't think indeterminate makes sense because that implies it may return trap values, and I'm guessing that's not behavior we want here (or am I wrong about that)?

Also, should this be usable within a constant expression? I would assume not; we should have a test case showing you can't use it in such a context.

658

Why does this require custom typechecking?

erichkeane added inline comments.Jan 23 2023, 1:18 PM
clang/include/clang/Basic/Builtins.def
658

Our issue last time with "unspecified" value was that this has a distinct meaning in C/C++, so we don't want to use that word.

ManuelJBrito added inline comments.Jan 24 2023, 6:56 AM
clang/include/clang/Basic/Builtins.def
658

I don't think indeterminate makes sense because that implies it may return trap values, and I'm guessing that's not behavior we want here (or am I wrong about that)?

The idea is for it to return some correct value of the same type as the argument, so no trap values.

Can we settle on __builtin_nondeterministic_value?

658

Why does this require custom typechecking?

I wasn't sure how to have the builtin take one argument of any type.

658

Also, should this be usable within a constant expression? I would assume not; we should have a test case showing you can't use it in such a context.

Can you shed a light on the implications of this? Or point me to some reference?

clang/lib/CodeGen/CGBuiltin.cpp
3064

The lowering seems to be as intended. I will add the tests.

clang/test/CodeGen/builtins-nondet.c
3 ↗(On Diff #491441)

Does this also work for structs?

No, i don't think so. What would be expected IR?

erichkeane added inline comments.Jan 24 2023, 6:58 AM
clang/test/CodeGen/builtins-nondet.c
3 ↗(On Diff #491441)

can freeze/posion not be of struct types? If they don't work, it should be an error to try.

Add tests with bool and bool-vector

Oops wrong diff, i'll update it.

aaron.ballman added inline comments.Jan 24 2023, 10:21 AM
clang/include/clang/Basic/Builtins.def
658

Can we settle on __builtin_nondeterministic_value?

I can live with it.

Our issue last time with "unspecified" value was that this has a distinct meaning in C/C++, so we don't want to use that word.

I guess I don't see the issue; the behavior of the builtin as I understand it matches that distinct meaning in C/C++ (and English prose).

Also, should this be usable within a constant expression? I would assume not; we should have a test case showing you can't use it in such a context.

Can you shed a light on the implications of this? Or point me to some reference?

It boils down to whether you think this should be valid:

constexpr bool hoo_boy() {
  return __builtin_nondeterministic_value() == 42;
}
constexpr bool val = hoo_boy();

(among other uses of constant expressions). My intuition is that there's nothing useful you could do with this builtin in a constant expression (my example is generally always going to be false except for the times when it's true, lol), but if all we're doing is giving you *a* value, nothing says we can't generate that value at compile time so it seems possible to support.

Why does this require custom typechecking?

I wasn't sure how to have the builtin take one argument of any type.

I was thinking it can use . to specify that the function is varargs, and then SemaChecking.cpp can validate that only one argument is passed to the call, but now that I think about that.... maybe we don't want that behavior because that would induce default argument promotions which would change the type (for example, from float to double). So on reflection, I think this is fine as-is.

ManuelJBrito retitled this revision from [clang] Add builtin_nondet to [clang] Add builtin_nondeterministic_value.
ManuelJBrito edited the summary of this revision. (Show Details)
  • Change name to __builtin_nondeterministic_value
  • Add struct support
erichkeane added inline comments.Feb 1 2023, 6:38 AM
clang/lib/CodeGen/CGBuiltin.cpp
3070

This gets REALLY complicated, you can't just create a store, this might end up hitting conversion operators/etc, and is subject to triviality/etc, and also probably needs to go through a constructor. I suspect you're going to prefer to just decide this isn't a valid builtin for structs instead of getting bogged down in that mess.

clang/lib/Sema/SemaChecking.cpp
17873

So I think choosing the sub-set of types here is a much better idea, allowing any of the user-defined types is going to make your codegen for this builtin a mess. IMO, just isBuiltinType + isVectorType is probably an acceptable list for now.

clang/test/CodeGen/builtins-nondeterministic-value.c
7

If you're going to allow this, we need tests for unions, constructors, non-trivial types, copy assignment/etc.

ManuelJBrito added inline comments.Feb 1 2023, 7:13 AM
clang/lib/CodeGen/CGBuiltin.cpp
3070

The motivation for this builtin was to match intel's definition of undefined in the lowering of intrinsics such as https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_castsi128_si256&expand=628&ig_expand=755, so struct support isn't critical.
So if everyone else agrees i'll drop struct support, do as you suggest and if there are use cases for nondeterministic values of other types add support for them later?

erichkeane added inline comments.Feb 1 2023, 7:20 AM
clang/lib/CodeGen/CGBuiltin.cpp
3070

I suspect I speak for all with my suggestion. I don't think Aaron/Shafik are concerned with struct support here.

aaron.ballman added inline comments.Feb 1 2023, 7:21 AM
clang/lib/CodeGen/CGBuiltin.cpp
3070

If we reject the code, we can relax that restriction in the future without breaking code, so I think it's fine to not support structures. We can add support for them later if we have a need to do so.

Please add a release note.

Thanks for this new builtin, as I was working on something very very similar - the implementation looks fine to me.

  • Restrict builtin to base types and vectors

Can the release note and documentation update be part of this patch or should i create a new one ?

Can the release note and documentation update be part of this patch or should i create a new one ?

They should be a part of this patch.

erichkeane added inline comments.Feb 1 2023, 10:00 AM
clang/lib/Sema/SemaChecking.cpp
17869

This should set the expression as invalid, shouldn't it? That way we don't end up depending on it too much.

  • set Invalid in type checking
  • add builtin documentation
  • add release note
erichkeane accepted this revision.Feb 1 2023, 11:35 AM

Last set of comments, docs and release notes look fine to me. feel free to fix the last comments while committing.

clang/lib/Sema/SemaChecking.cpp
17869
17871

Diag always returns true for exactly this situation, omit the {} and return true;, and just return Diag.

This revision is now accepted and ready to land.Feb 1 2023, 11:35 AM
This revision was landed with ongoing or failed builds.Feb 2 2023, 3:14 AM
This revision was automatically updated to reflect the committed changes.

Thank you all for the reviews and helping me see this patch through.

So this is getting some build failures:

I suspect it's because of the way different ABIs handle the vectors. I was thinking that if we have the variable be a local variable and also not returning the value can fix the tests.

Does this make sense?

thakis added a subscriber: thakis.Feb 2 2023, 5:11 AM

Looks like the test fails on win: http://45.33.8.238/win/74290/step_7.txt

And Mac: http://45.33.8.238/macm1/54080/step_7.txt

Please take a look and revert for now if it takes a while to fix.

So this is getting some build failures:

I suspect it's because of the way different ABIs handle the vectors. I was thinking that if we have the variable be a local variable and also not returning the value can fix the tests.

Does this make sense?

Yep, thats what it looks like, ABI differences. Feel free to update this review with the new test now that you've been reverted, and I'll look to see if it seems sufficient.

Fix vector tests

ManuelJBrito reopened this revision.Feb 2 2023, 9:30 AM
This revision is now accepted and ready to land.Feb 2 2023, 9:30 AM

Other than the newline, the test change is acceptable to me. Commit when ready :)

clang/test/CodeGen/builtins-nondeterministic-value.c
61

We do need a newline here.

This revision was landed with ongoing or failed builds.Feb 3 2023, 1:48 AM
This revision was automatically updated to reflect the committed changes.
zixuan-wu added inline comments.
clang/test/CodeGen/builtins-nondeterministic-value.c
26

hi, @ManuelJBrito , because double is 4 alignment in CSKY target, could you please update this with capture match pattern which makes it more adaptable?

ManuelJBrito added inline comments.Jun 9 2023, 8:57 AM
clang/test/CodeGen/builtins-nondeterministic-value.c
26

Hi, I'll make a patch dropping the alignment since it's not relevant for what we are testing here.

zixuan-wu added inline comments.Jun 11 2023, 7:18 PM
clang/test/CodeGen/builtins-nondeterministic-value.c
26

Thank you!

rsmith added a subscriber: rsmith.Jun 11 2023, 7:39 PM

I have a concern with the name of this builtin. People are going to assume it produces a nondeterministic value, and use it for seeding random number generators and similar, and will be surprised when the value produced is actually deterministic, and, worse, might leak information on the stack that was left behind by some previous function call. Can we give this a name that doesn't suggest that it produces its value nondeterministically?

I have a concern with the name of this builtin. People are going to assume it produces a nondeterministic value, and use it for seeding random number generators and similar, and will be surprised when the value produced is actually deterministic, and, worse, might leak information on the stack that was left behind by some previous function call. Can we give this a name that doesn't suggest that it produces its value nondeterministically?

That's a fair point; do you have a suggestion for a name? (I had suggested __builtin_unspecified_value() but perhaps that is too focused on the term of art.)

xbolva00 added a comment.EditedJun 12 2023, 10:28 AM

https://reviews.llvm.org/D136737 used 'builtin_unspecified_value' which could be okay. .. or builtin_undefined_value maybe?

Are we sure we want to expose this intrinsic as it can be easily misused (as mentioned, as random generator) and/or cause security issues? Maybe only allow it in system headers (original motivation in https://reviews.llvm.org/D136737)

xbolva00 added inline comments.Jun 12 2023, 10:29 AM
clang/docs/LanguageExtensions.rst
3087

This text really sells this new builtin as a random value generator..

I have a concern with the name of this builtin. People are going to assume it produces a nondeterministic value, and use it for seeding random number generators and similar, and will be surprised when the value produced is actually deterministic, and, worse, might leak information on the stack that was left behind by some previous function call. Can we give this a name that doesn't suggest that it produces its value nondeterministically?

That's a fair point; do you have a suggestion for a name? (I had suggested __builtin_unspecified_value() but perhaps that is too focused on the term of art.)

If this means exactly what C++ means by "unspecified value" I'm OK with that, but I don't think it does with the current design; __builtin_???(bool) can seemingly return std::bit_cast<bool>('x'), which is not an unspecified value of type bool in C++-land (because it's not a value of type bool at all). On that subject: can we restrict this to types where every object representation is valid, and rule out bool and (at least some) enumerations, as well as any _BitInt(N)s with no unused bits? Either that, or mask off the "bad" bits so that it actually produces valid values? Allowing type arguments for which any use of the builtin is just immediately UB doesn't seem like a good idea.

I don't like __builtin_undefined_value because very commonly "undefined" is used to mean something else, in ways that show up in close proximity to this builtin.

__builtin_arbitrary_value works a bit better for me, but is still quite wrong / misleading, because the value is *not* arbitrary. With either that or __builtin_nondeterminstic_value, I worry that an old and famous crypto bug will be reintroduced:

int my_random_number = __builtin_nondeterministic_value();
my_random_number ^= rand();
my_random_number ^= other_junk;

... where I would expect and hope that LLVM will optimize the resulting value of my_random_number to 0 by picking the exact right "nondeterminstic" value every time :)

I think __builtin_any_value works pretty well, and emphasizes that this can really return any value. I'd also be OK with __builtin_convenient_value, to emphasize that the compiler will pick something that's convenient for it, but I prefer __builtin_any_value.

clang/docs/LanguageExtensions.rst
3087

How about something more like this:

`__builtin_any_value` returns an arbitrary value of the same type as the provided argument.

[...]

Description:

Each call to `__builtin_any_value` returns an arbitrary value of the type given by the argument. The returned value will be chosen in a way that is convenient for the compiler, and may for example be a value left behind in a register or on the stack from a previous operation, or a value that causes a later calculation to always produce the same result. This function will not necessarily return a meaningful value if there are bit-patterns for the given type that do not correspond to meaningful values.

(I really want this to also not say it will produce a "valid value", because the valid values for a type are not in general known to the frontend. For a pointer, for example, this can return literally any bit pattern, and most of those will not be valid in any meaningful sense.)

I think __builtin_any_value works pretty well, and emphasizes that this can really return any value. I'd also be OK with __builtin_convenient_value, to emphasize that the compiler will pick something that's convenient for it, but I prefer __builtin_any_value.

I can get behind __builtin_any_value as the name.

@ManuelJBrito: would you be willing to make these modifications?

clang/docs/LanguageExtensions.rst
3087

I like this new formulation -- it makes it more clear that you can get a trap representation for the value, for example.

I think __builtin_any_value works pretty well, and emphasizes that this can really return any value. I'd also be OK with __builtin_convenient_value, to emphasize that the compiler will pick something that's convenient for it, but I prefer __builtin_any_value.

I can get behind __builtin_any_value as the name.

@ManuelJBrito: would you be willing to make these modifications?

Yes, I can change the name to __builtin_any_value .
Where do we stand in terms of further restricting the types for which we can use this? I'm a bit out of my depth in regards to all the subtleties.