This is an archive of the discontinued LLVM Phabricator instance.

[clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 5)
Needs ReviewPublic

Authored by jmciver on Sep 21 2022, 10:41 PM.

Details

Summary

This patch adds noundef metadata to scalar load instructions and is intended to gain implementation feedback before proceeding to adding noundef to other load types.

The intent is to apply noundef to scalar load instructions that are not of type:

  • char (if the target system maps to unsigned char)
  • unsigned char
  • std::byte

These types are excluded because of their different indeterminate value semantics (I think this is correct, but need to ask someone).

Feedback is greatly appreciated.

Diff Detail

Event Timeline

jmciver created this revision.Sep 21 2022, 10:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2022, 10:41 PM
jmciver retitled this revision from [clang][CodeGen] Add noundef metadata to scalar load instructions to [clang][CodeGen] Add noundef metadata to scalar load instructions (preliminary).Sep 21 2022, 11:31 PM
jmciver edited the summary of this revision. (Show Details)
jmciver retitled this revision from [clang][CodeGen] Add noundef metadata to scalar load instructions (preliminary) to [clang][CodeGen] Add noundef metadata to load instructions (preliminary).Sep 21 2022, 11:35 PM
jmciver edited the summary of this revision. (Show Details)Sep 21 2022, 11:37 PM
jmciver published this revision for review.Sep 22 2022, 8:44 AM
jmciver added reviewers: nikic, efriedma, aqjune, rjmccall.
nlopes added a subscriber: nlopes.Sep 22 2022, 8:51 AM

How much code does this break? How much does this help performance? (My instinct here is that the tradeoff is not worth it, but I'm willing to be convinced otherwise.)

The regression and test-suite pass using a bootstrap build. I'll provide performance data tomorrow.

The following are results from test-suite execution. LHS is a main build (17dde371e773) and the RHS is main with patch applied. The results-subset.txt is absent of unit, micro, and torture tests. Execution runtime is comprised from three runs of the main and patched version combined using the compare.py vs argument (compare the minimums of the two sets). Compilation times are acquired from a single invocation of each type of build. I will gather more compile time runs this weekend.

vitalybuka added a comment.EditedSep 26 2022, 9:01 PM

Are these patches uploaded with arc tool?

I tried to hook this patch into MemorySanitizer and it reduces instrumented code by ~30% !

I tried to hook this patch into MemorySanitizer and it reduces instrumented code by ~30% !

Wow, amazing results!

vitalybuka added a comment.EditedSep 26 2022, 11:39 PM

I tried to hook this patch into MemorySanitizer and it reduces instrumented code by ~30% !

Wow, amazing results!

Also I applied this patch with D134698 and used on our large test set, expecting significant number of pre-existing reports. To my surprise I see not many of them.

Also I applied this patch with D134698 and used on our large test set, expecting significant number of pre-existing reports. To my surprise I see not many of them.

Something wrong with my msan experiment, I'll re-evaluate size and reports tomorrow.

Are these patches uploaded with arc tool?

Yes, the patches were uploaded with arc. The size of the patch was too large to use the web interface.

Decreased CPU loading during test-suite build did lower times in some instances, but not a lot.

jmciver updated this revision to Diff 467690.Oct 13 2022, 11:20 PM
jmciver edited the summary of this revision. (Show Details)

Updating D134410: [clang][CodeGen] Add noundef metadata to load instructions (preliminary)

Resolve merge conflicts due to the adoption of the ptr type in openmp tests.

vitalybuka added a reviewer: vitalybuka.EditedOct 15 2022, 1:20 PM

Also I applied this patch with D134698 and used on our large test set, expecting significant number of pre-existing reports. To my surprise I see not many of them.

Something wrong with my msan experiment, I'll re-evaluate size and reports tomorrow.

Finally I had a time to fix my msan experiment D134698
It's about 5.5% .text savings for Msan, and 10% for msan with "track origins", which is still pretty good.
For context, for msan it's usually cheaper to report uninitialized ASAP, than propagating and reporting it later. With this metadata it will happen immediately after load.

However cleanup looks scary. Msan reports maybe 20% of unique tests on our code base. Many a of them share root cause, but still many unique root causes.
On quick looks I see no false report. A lot of stuff like this https://stackoverflow.com/questions/60112841/copying-structs-with-uninitialized-members which is technically is UB.
I assume with this patch landed, many such cases may change code behavior. So we will need to update msan to have a tool to detect cases like this anyway.

I assume with this patch landed, many such cases may change code behavior. So we will need to update msan to have a tool to detect cases like this anyway.

I believe that updated msan should come first.

I assume with this patch landed, many such cases may change code behavior. So we will need to update msan to have a tool to detect cases like this anyway.

I believe that updated msan should come first.

Yes, we can do that.

clang/lib/CodeGen/CGExpr.cpp
1761

Taking into account potantial risks from pre-existing UB, I believe we need clang switch to be able to shutdown this branch, similar to enable_noundef_analysis.
User with large code bases maybe need some time for transition.

I have no opinion if this should be ON or OFF by default

jmciver added inline comments.Oct 18 2022, 2:19 PM
clang/lib/CodeGen/CGExpr.cpp
1761

A flag is a good idea; I'll go ahead and add that. We can determine the default later, for now I would like to default to on to see/show the effects. Anyone feel free to let me know if you have reasoning otherwise.

For the flag name: enable_noundef_loads?

jdoerfert added a comment.EditedOct 18 2022, 3:37 PM

However cleanup looks scary. Msan reports maybe 20% of unique tests on our code base. Many a of them share root cause, but still many unique root causes.
On quick looks I see no false report

While scary, this also is extremely encouraging to move forward with this.

jmciver updated this revision to Diff 471708.Oct 28 2022, 8:43 PM

Updating D134410: [clang][CodeGen] Add noundef metadata to load instructions (preliminary)

Add flag -enable-noundef-load-analysis and rebase with main. Refactored noundef
metadata functionality into anonymous function to be used by additional patches.

I'll start fixing reported test failures tomorrow.

jmciver retitled this revision from [clang][CodeGen] Add noundef metadata to load instructions (preliminary) to [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 2).Oct 28 2022, 10:11 PM
nikic added a comment.Oct 29 2022, 1:25 AM

I think adding this under a default-disabled flag is fine for evaluation purposes, but I have doubts that we will ever be able to enable this by default. There is a lot of code out there assuming that copying uninitialized data around is fine.

I think adding this under a default-disabled flag is fine for evaluation purposes, but I have doubts that we will ever be able to enable this by default. There is a lot of code out there assuming that copying uninitialized data around is fine.

Well, I think that flags that are disabled by default don't exist and they are not useful.
We wanted this patch to make us switch uninitialized loads to poison at will, since they become UB. In practice, this helps us fixing bugs in SROA and etc without perf degradation.
As long as ubsan/valgrind can detect these uninitialized loads, I think we should be ok to deploy this change. We are touching memcpys yet, at least, as those may be susceptible to handling uninit memory.

We wanted this patch to make us switch uninitialized loads to poison at will, since they become UB. In practice, this helps us fixing bugs in SROA and etc without perf degradation.

Can you elaborate on this? I don't see how this is necessary for switching uninitialized loads to poison.

As long as ubsan/valgrind can detect these uninitialized loads, I think we should be ok to deploy this change.

Valgrind cannot detect uninitialized loads, it only detects branching on uninitialized data. Valgrind works on the assembly level, and as such does not have the necessary information to tell whether a mov of uninitialized data is problematic or not.

The only sanitizer that can detect this is msan (with the patch referenced above), which is incidentally also the sanitizer that sees the least use in practice, because it is much harder to deploy than ubsan and asan.

tschuett added inline comments.
clang/lib/CodeGen/CGExpr.cpp
676

Nit: You meant static.

Almost all flags are off by default. That's why they are called flags, i.e., -Weverything.

jmciver updated this revision to Diff 471878.Oct 30 2022, 2:42 PM

Updating D134410: [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 2)

Fix AMD-GPU, ARM, PowerPC, and new OpenMP tests.

We wanted this patch to make us switch uninitialized loads to poison at will, since they become UB. In practice, this helps us fixing bugs in SROA and etc without perf degradation.

Can you elaborate on this? I don't see how this is necessary for switching uninitialized loads to poison.

It's not mandatory, it's a simple way of achieving it as !noundef already exists.

We cannot change the default behavior of load as it would break BC. An alternative is to introduce a new !poison_on_unint for loads. Clang could use that on all loads except those for bit-fields.
Our suggestion is to jump straight to !noundef.
To fully remove undef, then we need to swtich loads to return freeze poison rather than undef on uninitialized access, but only after we are able to yield poison in the common case.

jmciver updated this revision to Diff 472051.Oct 31 2022, 10:06 AM

Updating D134410: [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 2)

Refactor local linkage function, applyNoundefToLoadInst, to use static rather
than an anonymous namespace as per LLVM coding standards.

jmciver added inline comments.Oct 31 2022, 10:14 AM
clang/lib/CodeGen/CGExpr.cpp
676

Thank you for bringing that to my attention. I reviewed the LLVM Coding Standards guidance on the topic.

nikic added a comment.Nov 1 2022, 2:05 AM

We wanted this patch to make us switch uninitialized loads to poison at will, since they become UB. In practice, this helps us fixing bugs in SROA and etc without perf degradation.

Can you elaborate on this? I don't see how this is necessary for switching uninitialized loads to poison.

It's not mandatory, it's a simple way of achieving it as !noundef already exists.

We cannot change the default behavior of load as it would break BC.

FWIW, I don't agree with this. It's fine to change load semantics, as long as bitcode autoupgrade is handled correctly. I'd go even further and say that at least long term, any solution that does not have a plain load instruction return poison for uninitialized memory will be a maintenance mess.

I think the core problem that prevents us from moving in this direction is that we have no way to represent a frozen load at all (as freeze(load) will propagate poison before freezing). If I understand correctly, you're trying to solve this by making *all* loads frozen loads, and use load !noundef to allow dropping the freeze. I think this would be a very bad outcome, especially as we're going to lose that !noundef on most load speculations, and I don't think we want to be introducing freezes when speculating loads (e.g. during LICM).

I expect that the path of least resistance is going to be to support bitwise poison for load/store/phi/select and promote it to full-value poison on any other operation, allowing a freezing load to be expressed as freeze(load).

Please let me know if I completely misunderstood the scheme you have in mind...

nlopes added a comment.Nov 1 2022, 2:54 AM

We wanted this patch to make us switch uninitialized loads to poison at will, since they become UB. In practice, this helps us fixing bugs in SROA and etc without perf degradation.

Can you elaborate on this? I don't see how this is necessary for switching uninitialized loads to poison.

It's not mandatory, it's a simple way of achieving it as !noundef already exists.

We cannot change the default behavior of load as it would break BC.

FWIW, I don't agree with this. It's fine to change load semantics, as long as bitcode autoupgrade is handled correctly. I'd go even further and say that at least long term, any solution that does not have a plain load instruction return poison for uninitialized memory will be a maintenance mess.

I think the core problem that prevents us from moving in this direction is that we have no way to represent a frozen load at all (as freeze(load) will propagate poison before freezing). If I understand correctly, you're trying to solve this by making *all* loads frozen loads, and use load !noundef to allow dropping the freeze. I think this would be a very bad outcome, especially as we're going to lose that !noundef on most load speculations, and I don't think we want to be introducing freezes when speculating loads (e.g. during LICM).

I expect that the path of least resistance is going to be to support bitwise poison for load/store/phi/select and promote it to full-value poison on any other operation, allowing a freezing load to be expressed as freeze(load).

Please let me know if I completely misunderstood the scheme you have in mind...

You got it right. But the load we propose is not exactly a freezing load. It only returns freeze poison for uninit memory. It doesn't freeze stored values.
If we introduce a !uninit_is_poison flag, we can drop !noundef when hoisting and add !uninit_is_poison instead (it's implied by !noundef).

The question is what's the default for uninit memory: freeze poison or poison? But I think we need both. And since we need both (unless we add a freezing load or bitwise poison), why not keep a behavior closer to the current?
A freezing load is worse as a store+load needs to be forwarded though a freeze, as the load is not a NOP anymore.

Bitwise poison would be nice, but I don't know how to make it work. To make it work with bit-fields, we would need and/or to propagate poison bit-wise as well. But then we will break optimizations that transform between those and arithmetic. Then you start wanting that add/mul/etc also propagate poison bit-wise and then I don't know how to specify that semantics. (if you want, I can forward you a proposal we wrote a few years ago, but we never managed to make sound, so it was never presented in public)

I agree that bit-wise poison for loads sounds appealing (for bit fields, load widening, etc), but without support in the rest of the IR, it's not worth much. If it becomes value-wise in the 1st operation, then I don't think we gain any expressivity.

jmciver updated this revision to Diff 478821.Nov 29 2022, 11:40 PM

Updating D134410: [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 2)

Refactor test matrix-type-operators.c to contain the noundef attribute. This
test will be further modified in future patches.

jmciver retitled this revision from [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 2) to [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 3).Nov 30 2022, 12:20 AM
jmciver updated this revision to Diff 480265.Dec 5 2022, 3:44 PM

Updating D134410: [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 3)

The following tests have been updated:

  • avx512f-builtins.c
  • avx512fp16-builtins.c
  • matrix-type-operators.c
  • matrix-type-operators.cpp
jmciver retitled this revision from [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 3) to [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 4).Dec 6 2022, 8:01 AM
jmciver retitled this revision from [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 4) to [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 5).Dec 6 2022, 10:49 PM
nikic added a comment.Dec 9 2022, 12:40 AM

FWIW, I don't agree with this. It's fine to change load semantics, as long as bitcode autoupgrade is handled correctly. I'd go even further and say that at least long term, any solution that does not have a plain load instruction return poison for uninitialized memory will be a maintenance mess.

I think the core problem that prevents us from moving in this direction is that we have no way to represent a frozen load at all (as freeze(load) will propagate poison before freezing). If I understand correctly, you're trying to solve this by making *all* loads frozen loads, and use load !noundef to allow dropping the freeze. I think this would be a very bad outcome, especially as we're going to lose that !noundef on most load speculations, and I don't think we want to be introducing freezes when speculating loads (e.g. during LICM).

I expect that the path of least resistance is going to be to support bitwise poison for load/store/phi/select and promote it to full-value poison on any other operation, allowing a freezing load to be expressed as freeze(load).

Please let me know if I completely misunderstood the scheme you have in mind...

You got it right. But the load we propose is not exactly a freezing load. It only returns freeze poison for uninit memory. It doesn't freeze stored values.
If we introduce a !uninit_is_poison flag, we can drop !noundef when hoisting and add !uninit_is_poison instead (it's implied by !noundef).

The question is what's the default for uninit memory: freeze poison or poison? But I think we need both. And since we need both (unless we add a freezing load or bitwise poison), why not keep a behavior closer to the current?
A freezing load is worse as a store+load needs to be forwarded though a freeze, as the load is not a NOP anymore.

Bitwise poison would be nice, but I don't know how to make it work. To make it work with bit-fields, we would need and/or to propagate poison bit-wise as well. But then we will break optimizations that transform between those and arithmetic. Then you start wanting that add/mul/etc also propagate poison bit-wise and then I don't know how to specify that semantics. (if you want, I can forward you a proposal we wrote a few years ago, but we never managed to make sound, so it was never presented in public)

I agree that bit-wise poison for loads sounds appealing (for bit fields, load widening, etc), but without support in the rest of the IR, it's not worth much. If it becomes value-wise in the 1st operation, then I don't think we gain any expressivity.

I think the gain in expressiveness is that we can write something like freeze(load). For example, D138766 currently proposes to use a silly pattern like bitcast(freeze(load(<4 x i8>)) to i32) to achieve a "frozen load", because there is no other way to express it right now.

The other problem I see here is that we still need to do something about the memcpy -> load/store fold. As soon as we have poison from uninit values (either directly or via !uninit_is_poison) this will start causing miscompiles very quickly. The only way to do this right now is again with an <N x i8> vector load/store, which still optimizes terribly. This needs either load+store of integer to preserve poison, or again some form of byte type.

Something I've only recently realized is that we also run into this problem when inserting spurious load/store pairs, e.g. as done by LICM scalar promotion. If we're promoting say an i32 value to scalar that previously used a conditional store, then promotion will now introduce an unconditional load and store, which will spread poison from individual bytes. So that means that technically scalar promotion (and SimplifyCFG store speculation) also need to do some special dance to preserve poison. And without preservation of poison across load/store/phi in plain ints, this is going to be a bad optimization outcome either way: We'd have to use either a vector of i8 or a byte type for the inserted phi nodes and only cast to integer and back when manipulating the value, which would probably defeat the optimization. At that point probably best to freeze the first load (which again needs a freezing load).

(We should probably move the discussion around this patch series to discourse.)

The other problem I see here is that we still need to do something about the memcpy -> load/store fold. As soon as we have poison from uninit values (either directly or via !uninit_is_poison) this will start causing miscompiles very quickly. The only way to do this right now is again with an <N x i8> vector load/store, which still optimizes terribly. This needs either load+store of integer to preserve poison, or again some form of byte type.

Something I've only recently realized is that we also run into this problem when inserting spurious load/store pairs, e.g. as done by LICM scalar promotion. If we're promoting say an i32 value to scalar that previously used a conditional store, then promotion will now introduce an unconditional load and store, which will spread poison from individual bytes. So that means that technically scalar promotion (and SimplifyCFG store speculation) also need to do some special dance to preserve poison. And without preservation of poison across load/store/phi in plain ints, this is going to be a bad optimization outcome either way: We'd have to use either a vector of i8 or a byte type for the inserted phi nodes and only cast to integer and back when manipulating the value, which would probably defeat the optimization. At that point probably best to freeze the first load (which again needs a freezing load).

For this stuff, I think the ideal solution is the byte type. That's the only way to do this kind of raw byte copies. Doesn't spread poison between bits nor do you need to know if you are reading a pointer or something else. Nor does it require a freeze, which is annoying.

John will submit a proposal (next week?) using poison for uninit loads. I think we have converged on something cool. Thanks for insisting on that one!

As Nuno mentioned we are targeting the proposal for next week. I will update the ticket with the Discourse link once it becomes available.