This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
-
CodeGenOptions.def
-
Driver/
-
Options.td
-
lib/CodeGen/
-
CodeGen/
2/4
CGExpr.cpp
-
test/
-
CodeGen/
-
X86/
-
avx-builtins.c
-
avx512bw-builtins.c
-
avx512f-builtins.c
-
avx512fp16-builtins.c
-
avx512vl-builtins.c
-
avx512vlbw-builtins.c
-
sse-builtins.c
-
sse2-builtins.c
-
aarch64-ls64-inline-asm.c
-
aarch64-ls64.c
-
aarch64-sve-acle-__ARM_FEATURE_SVE_VECTOR_OPERATORS.c
-
memcpy-inline-builtin.c
-
tbaa-array.cpp
-
tbaa-base.cpp
-
tbaa.cpp
-
ubsan-pass-object-size.c
-
CodeGenCUDA/
-
amdgpu-kernel-arg-pointer-type.cu
-
CodeGenCXX/
-
attr-likelihood-if-branch-weights.cpp
-
attr-likelihood-switch-branch-weights.cpp
-
builtin-bit-cast-no-tbaa.cpp
-
debug-info-line.cpp
-
pr12251.cpp
-
pragma-followup_inner.cpp
-
OpenMP/
-
cancel_codegen.cpp
-
cancellation_point_codegen.cpp
-
distribute_codegen.cpp
-
distribute_parallel_for_reduction_task_codegen.cpp
-
distribute_parallel_for_simd_private_codegen.cpp
-
distribute_parallel_for_simd_proc_bind_codegen.cpp
-
distribute_simd_codegen.cpp
-
distribute_simd_private_codegen.cpp
-
distribute_simd_reduction_codegen.cpp
-
for_reduction_task_codegen.cpp
-
irbuilder_safelen.cpp
-
irbuilder_safelen_order_concurrent.cpp
-
irbuilder_simd_aligned.cpp
-
irbuilder_simdlen.cpp
-
irbuilder_simdlen_safelen.cpp
-
master_taskloop_in_reduction_codegen.cpp
-
master_taskloop_simd_in_reduction_codegen.cpp
-
nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp
-
ordered_codegen.cpp
-
parallel_for_reduction_task_codegen.cpp
-
parallel_for_simd_aligned_codegen.cpp
-
parallel_for_simd_codegen.cpp
-
parallel_master_reduction_task_codegen.cpp
-
parallel_master_taskloop_codegen.cpp
-
parallel_master_taskloop_lastprivate_codegen.cpp
-
parallel_master_taskloop_simd_codegen.cpp
-
parallel_master_taskloop_simd_lastprivate_codegen.cpp
-
parallel_reduction_task_codegen.cpp
-
parallel_sections_reduction_task_codegen.cpp
-
sections_reduction_task_codegen.cpp
-
target_defaultmap_codegen_01.cpp
-
target_in_reduction_codegen.cpp
-
target_is_device_ptr_codegen.cpp
-
target_map_codegen_00.cpp
-
target_map_codegen_01.cpp
-
target_map_codegen_02.cpp
-
target_map_codegen_04.cpp
-
target_map_codegen_05.cpp
-
target_map_codegen_07.cpp
-
target_map_codegen_11.cpp
-
target_map_codegen_13.cpp
-
target_map_codegen_14.cpp
-
target_map_codegen_15.cpp
-
target_map_codegen_17.cpp
-
target_map_codegen_26.cpp
-
target_map_codegen_27.cpp
-
target_map_codegen_29.cpp
-
target_parallel_codegen.cpp
-
target_parallel_for_codegen.cpp
-
target_parallel_for_reduction_task_codegen.cpp
-
target_parallel_for_simd_codegen.cpp
-
target_parallel_reduction_task_codegen.cpp
-
target_teams_codegen.cpp
-
target_teams_distribute_codegen.cpp
-
target_teams_distribute_parallel_for_order_codegen.cpp
-
target_teams_distribute_parallel_for_reduction_task_codegen.cpp
-
target_teams_distribute_parallel_for_simd_collapse_codegen.cpp
-
target_teams_distribute_parallel_for_simd_if_codegen.cpp
-
target_teams_distribute_parallel_for_simd_proc_bind_codegen.cpp
-
target_teams_distribute_parallel_for_simd_reduction_codegen.cpp
-
target_teams_distribute_simd_codegen.cpp
-
target_teams_distribute_simd_collapse_codegen.cpp
-
target_teams_distribute_simd_reduction_codegen.cpp
-
target_update_codegen.cpp
-
task_affinity_codegen.cpp
-
task_codegen.cpp
-
task_if_codegen.cpp
-
task_in_reduction_codegen.cpp
-
task_member_call_codegen.cpp
-
task_target_device_codegen.c
-
taskloop_in_reduction_codegen.cpp
-
taskloop_simd_in_reduction_codegen.cpp
-
teams_distribute_parallel_for_reduction_task_codegen.cpp
-
teams_distribute_parallel_for_simd_collapse_codegen.cpp
-
teams_distribute_parallel_for_simd_proc_bind_codegen.cpp
-
teams_distribute_parallel_for_simd_reduction_codegen.cpp
-
teams_distribute_simd_codegen.cpp
-
teams_distribute_simd_collapse_codegen.cpp
-
teams_distribute_simd_reduction_codegen.cpp
-
utils/update_cc_test_checks/Inputs/
-
update_cc_test_checks/
-
Inputs/
-
basic-cplusplus.cpp.expected
-
check-attributes.cpp.funcattrs.expected
-
check-attributes.cpp.plain.expected
-
def-and-decl.c.expected
-
explicit-template-instantiation.cpp.expected
-
generated-funcs-regex.c.expected
-
generated-funcs.c.generated.expected
-
generated-funcs.c.no-generated.expected
-
mangled_names.c.expected
-
mangled_names.c.funcsig.expected
-
resolve-tmp-conflict.cpp.expected

Differential D134410

[clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 5)
Needs ReviewPublic

Authored by jmciver on Sep 21 2022, 10:41 PM.

Download Raw Diff

Details

Reviewers

nikic
efriedma
aqjune
rjmccall
jdoerfert
vitalybuka

Summary

This patch adds noundef metadata to scalar load instructions and is intended to gain implementation feedback before proceeding to adding noundef to other load types.

The intent is to apply noundef to scalar load instructions that are not of type:

char (if the target system maps to unsigned char)
unsigned char
std::byte

These types are excluded because of their different indeterminate value semantics (I think this is correct, but need to ask someone).

Feedback is greatly appreciated.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jmciver created this revision.Sep 21 2022, 10:41 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2022, 10:41 PM

Herald added subscribers: mattd, asavonic, jdoerfert, pengfei. · View Herald Transcript

Harbormaster completed remote builds in B188103: Diff 462081.Sep 21 2022, 10:42 PM

jmciver retitled this revision from [clang][CodeGen] Add noundef metadata to scalar load instructions to [clang][CodeGen] Add noundef metadata to scalar load instructions (preliminary).Sep 21 2022, 11:31 PM

jmciver edited the summary of this revision. (Show Details)

jmciver added a parent revision: D134409: [utils][UpdateTestChecks] Add unnamed !noundef value support.

jmciver retitled this revision from [clang][CodeGen] Add noundef metadata to scalar load instructions (preliminary) to [clang][CodeGen] Add noundef metadata to load instructions (preliminary).Sep 21 2022, 11:35 PM

jmciver edited the summary of this revision. (Show Details)Sep 21 2022, 11:37 PM

jmciver published this revision for review.Sep 22 2022, 8:44 AM

jmciver added reviewers: nikic, efriedma, aqjune, rjmccall.

Herald added a reviewer: jdoerfert. · View Herald TranscriptSep 22 2022, 8:45 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: cfe-commits, • pcwang-thead, sstefan1. · View Herald Transcript

nlopes added a subscriber: nlopes.Sep 22 2022, 8:51 AM

How much code does this break? How much does this help performance? (My instinct here is that the tradeoff is not worth it, but I'm willing to be convinced otherwise.)

The regression and test-suite pass using a bootstrap build. I'll provide performance data tomorrow.

vitalybuka added a subscriber: vitalybuka.Sep 23 2022, 12:36 AM

The following are results from test-suite execution. LHS is a main build (17dde371e773) and the RHS is main with patch applied. The results-subset.txt is absent of unit, micro, and torture tests. Execution runtime is comprised from three runs of the main and patched version combined using the compare.py vs argument (compare the minimums of the two sets). Compilation times are acquired from a single invocation of each type of build. I will gather more compile time runs this weekend.

results-subset.txt72 KBDownload

results-full.txt666 KBDownload

xbolva00 added a subscriber: xbolva00.Sep 24 2022, 1:20 AM

Are these patches uploaded with arc tool?

I tried to hook this patch into MemorySanitizer and it reduces instrumented code by ~30% !

vitalybuka added a child revision: D134698: [WIP][msan] Use !noundef of load instruction.Sep 26 2022, 9:29 PM

In D134410#3816918, @vitalybuka wrote:

I tried to hook this patch into MemorySanitizer and it reduces instrumented code by ~30% !

Wow, amazing results!

In D134410#3817070, @xbolva00 wrote:

In D134410#3816918, @vitalybuka wrote:

I tried to hook this patch into MemorySanitizer and it reduces instrumented code by ~30% !

Wow, amazing results!

Also I applied this patch with D134698 and used on our large test set, expecting significant number of pre-existing reports. To my surprise I see not many of them.

Also I applied this patch with D134698 and used on our large test set, expecting significant number of pre-existing reports. To my surprise I see not many of them.

Something wrong with my msan experiment, I'll re-evaluate size and reports tomorrow.

In D134410#3816881, @vitalybuka wrote:

Are these patches uploaded with arc tool?

Yes, the patches were uploaded with arc. The size of the patch was too large to use the web interface.

Decreased CPU loading during test-suite build did lower times in some instances, but not a lot.

results-subset.txt72 KBDownload

results-full.txt666 KBDownload

jmciver mentioned this in D134409: [utils][UpdateTestChecks] Add unnamed !noundef value support.Oct 13 2022, 9:40 AM

Updating D134410: [clang][CodeGen] Add noundef metadata to load instructions (preliminary)

Resolve merge conflicts due to the adoption of the ptr type in openmp tests.

Harbormaster completed remote builds in B192124: Diff 467690.Oct 13 2022, 11:56 PM

In D134410#3817190, @vitalybuka wrote:

Also I applied this patch with D134698 and used on our large test set, expecting significant number of pre-existing reports. To my surprise I see not many of them.

Something wrong with my msan experiment, I'll re-evaluate size and reports tomorrow.

Finally I had a time to fix my msan experiment D134698
It's about 5.5% .text savings for Msan, and 10% for msan with "track origins", which is still pretty good.
For context, for msan it's usually cheaper to report uninitialized ASAP, than propagating and reporting it later. With this metadata it will happen immediately after load.

However cleanup looks scary. Msan reports maybe 20% of unique tests on our code base. Many a of them share root cause, but still many unique root causes.
On quick looks I see no false report. A lot of stuff like this https://stackoverflow.com/questions/60112841/copying-structs-with-uninitialized-members which is technically is UB.
I assume with this patch landed, many such cases may change code behavior. So we will need to update msan to have a tool to detect cases like this anyway.

I assume with this patch landed, many such cases may change code behavior. So we will need to update msan to have a tool to detect cases like this anyway.

I believe that updated msan should come first.

nlopes mentioned this in D136055: [ValueTracking] Make !range metadata imply noundef for load & call results.Oct 17 2022, 3:36 AM

In D134410#3860646, @xbolva00 wrote:

I assume with this patch landed, many such cases may change code behavior. So we will need to update msan to have a tool to detect cases like this anyway.

I believe that updated msan should come first.

Yes, we can do that.

clang/lib/CodeGen/CGExpr.cpp
1759	Taking into account potantial risks from pre-existing UB, I believe we need clang switch to be able to shutdown this branch, similar to enable_noundef_analysis. User with large code bases maybe need some time for transition. I have no opinion if this should be ON or OFF by default

jmciver added inline comments.Oct 18 2022, 2:19 PM

clang/lib/CodeGen/CGExpr.cpp
1759	A flag is a good idea; I'll go ahead and add that. We can determine the default later, for now I would like to default to `on` to see/show the effects. Anyone feel free to let me know if you have reasoning otherwise. For the flag name: `enable_noundef_loads`?

However cleanup looks scary. Msan reports maybe 20% of unique tests on our code base. Many a of them share root cause, but still many unique root causes.
On quick looks I see no false report

While scary, this also is extremely encouraging to move forward with this.

Updating D134410: [clang][CodeGen] Add noundef metadata to load instructions (preliminary)

Add flag -enable-noundef-load-analysis and rebase with main. Refactored noundef
metadata functionality into anonymous function to be used by additional patches.

jmciver added a child revision: D137005: [clang][CodeGen] Add noundef metadata to load insturctions (preliminary 2 of 5).Oct 28 2022, 9:00 PM

Harbormaster completed remote builds in B195055: Diff 471708.Oct 28 2022, 9:27 PM

I'll start fixing reported test failures tomorrow.

jmciver retitled this revision from [clang][CodeGen] Add noundef metadata to load instructions (preliminary) to [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 2).Oct 28 2022, 10:11 PM

I think adding this under a default-disabled flag is fine for evaluation purposes, but I have doubts that we will ever be able to enable this by default. There is a lot of code out there assuming that copying uninitialized data around is fine.

In D134410#3893918, @nikic wrote:

I think adding this under a default-disabled flag is fine for evaluation purposes, but I have doubts that we will ever be able to enable this by default. There is a lot of code out there assuming that copying uninitialized data around is fine.

Well, I think that flags that are disabled by default don't exist and they are not useful.
We wanted this patch to make us switch uninitialized loads to poison at will, since they become UB. In practice, this helps us fixing bugs in SROA and etc without perf degradation.
As long as ubsan/valgrind can detect these uninitialized loads, I think we should be ok to deploy this change. We are touching memcpys yet, at least, as those may be susceptible to handling uninit memory.

In D134410#3894995, @nlopes wrote:

We wanted this patch to make us switch uninitialized loads to poison at will, since they become UB. In practice, this helps us fixing bugs in SROA and etc without perf degradation.

Can you elaborate on this? I don't see how this is necessary for switching uninitialized loads to poison.

As long as ubsan/valgrind can detect these uninitialized loads, I think we should be ok to deploy this change.

Valgrind cannot detect uninitialized loads, it only detects branching on uninitialized data. Valgrind works on the assembly level, and as such does not have the necessary information to tell whether a mov of uninitialized data is problematic or not.

The only sanitizer that can detect this is msan (with the patch referenced above), which is incidentally also the sanitizer that sees the least use in practice, because it is much harder to deploy than ubsan and asan.

tschuett added a subscriber: tschuett.Oct 30 2022, 1:10 PM

tschuett added inline comments.

clang/lib/CodeGen/CGExpr.cpp
676	Nit: You meant static.

Almost all flags are off by default. That's why they are called flags, i.e., -Weverything.

Updating D134410: [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 2)

Fix AMD-GPU, ARM, PowerPC, and new OpenMP tests.

Herald added subscribers: kosarev, kerbowa, jvesely. · View Herald TranscriptOct 30 2022, 2:42 PM

In D134410#3895077, @nikic wrote:

In D134410#3894995, @nlopes wrote:

We wanted this patch to make us switch uninitialized loads to poison at will, since they become UB. In practice, this helps us fixing bugs in SROA and etc without perf degradation.

Can you elaborate on this? I don't see how this is necessary for switching uninitialized loads to poison.

It's not mandatory, it's a simple way of achieving it as !noundef already exists.

We cannot change the default behavior of load as it would break BC. An alternative is to introduce a new !poison_on_unint for loads. Clang could use that on all loads except those for bit-fields.
Our suggestion is to jump straight to !noundef.
To fully remove undef, then we need to swtich loads to return freeze poison rather than undef on uninitialized access, but only after we are able to yield poison in the common case.

Harbormaster completed remote builds in B195179: Diff 471878.Oct 30 2022, 4:15 PM

Updating D134410: [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 2)

Refactor local linkage function, applyNoundefToLoadInst, to use static rather
than an anonymous namespace as per LLVM coding standards.

jmciver added inline comments.Oct 31 2022, 10:14 AM

clang/lib/CodeGen/CGExpr.cpp
676	Thank you for bringing that to my attention. I reviewed the LLVM Coding Standards guidance on the topic.

Harbormaster completed remote builds in B195293: Diff 472051.Oct 31 2022, 11:17 AM

In D134410#3895253, @nlopes wrote:

In D134410#3895077, @nikic wrote:

In D134410#3894995, @nlopes wrote:

We wanted this patch to make us switch uninitialized loads to poison at will, since they become UB. In practice, this helps us fixing bugs in SROA and etc without perf degradation.

Can you elaborate on this? I don't see how this is necessary for switching uninitialized loads to poison.

It's not mandatory, it's a simple way of achieving it as !noundef already exists.

We cannot change the default behavior of load as it would break BC.

FWIW, I don't agree with this. It's fine to change load semantics, as long as bitcode autoupgrade is handled correctly. I'd go even further and say that at least long term, any solution that does not have a plain load instruction return poison for uninitialized memory will be a maintenance mess.

I think the core problem that prevents us from moving in this direction is that we have no way to represent a frozen load at all (as freeze(load) will propagate poison before freezing). If I understand correctly, you're trying to solve this by making *all* loads frozen loads, and use load !noundef to allow dropping the freeze. I think this would be a very bad outcome, especially as we're going to lose that !noundef on most load speculations, and I don't think we want to be introducing freezes when speculating loads (e.g. during LICM).

I expect that the path of least resistance is going to be to support bitwise poison for load/store/phi/select and promote it to full-value poison on any other operation, allowing a freezing load to be expressed as freeze(load).

Please let me know if I completely misunderstood the scheme you have in mind...

In D134410#3898563, @nikic wrote:

In D134410#3895253, @nlopes wrote:

In D134410#3895077, @nikic wrote:

In D134410#3894995, @nlopes wrote:

We wanted this patch to make us switch uninitialized loads to poison at will, since they become UB. In practice, this helps us fixing bugs in SROA and etc without perf degradation.

Can you elaborate on this? I don't see how this is necessary for switching uninitialized loads to poison.

It's not mandatory, it's a simple way of achieving it as !noundef already exists.

We cannot change the default behavior of load as it would break BC.

FWIW, I don't agree with this. It's fine to change load semantics, as long as bitcode autoupgrade is handled correctly. I'd go even further and say that at least long term, any solution that does not have a plain load instruction return poison for uninitialized memory will be a maintenance mess.

I think the core problem that prevents us from moving in this direction is that we have no way to represent a frozen load at all (as freeze(load) will propagate poison before freezing). If I understand correctly, you're trying to solve this by making *all* loads frozen loads, and use load !noundef to allow dropping the freeze. I think this would be a very bad outcome, especially as we're going to lose that !noundef on most load speculations, and I don't think we want to be introducing freezes when speculating loads (e.g. during LICM).

I expect that the path of least resistance is going to be to support bitwise poison for load/store/phi/select and promote it to full-value poison on any other operation, allowing a freezing load to be expressed as freeze(load).

Please let me know if I completely misunderstood the scheme you have in mind...

You got it right. But the load we propose is not exactly a freezing load. It only returns freeze poison for uninit memory. It doesn't freeze stored values.
If we introduce a !uninit_is_poison flag, we can drop !noundef when hoisting and add !uninit_is_poison instead (it's implied by !noundef).

The question is what's the default for uninit memory: freeze poison or poison? But I think we need both. And since we need both (unless we add a freezing load or bitwise poison), why not keep a behavior closer to the current?
A freezing load is worse as a store+load needs to be forwarded though a freeze, as the load is not a NOP anymore.

Bitwise poison would be nice, but I don't know how to make it work. To make it work with bit-fields, we would need and/or to propagate poison bit-wise as well. But then we will break optimizations that transform between those and arithmetic. Then you start wanting that add/mul/etc also propagate poison bit-wise and then I don't know how to specify that semantics. (if you want, I can forward you a proposal we wrote a few years ago, but we never managed to make sound, so it was never presented in public)

I agree that bit-wise poison for loads sounds appealing (for bit fields, load widening, etc), but without support in the rest of the IR, it's not worth much. If it becomes value-wise in the 1st operation, then I don't think we gain any expressivity.

Updating D134410: [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 2)

Refactor test matrix-type-operators.c to contain the noundef attribute. This
test will be further modified in future patches.

jmciver retitled this revision from [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 2) to [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 3).Nov 30 2022, 12:20 AM

Harbormaster completed remote builds in B200183: Diff 478821.Nov 30 2022, 12:38 AM

Updating D134410: [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 3)

The following tests have been updated:

avx512f-builtins.c
avx512fp16-builtins.c
matrix-type-operators.c
matrix-type-operators.cpp

Harbormaster completed remote builds in B201232: Diff 480265.Dec 5 2022, 10:02 PM

jmciver retitled this revision from [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 3) to [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 4).Dec 6 2022, 8:01 AM

jmciver retitled this revision from [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 4) to [clang][CodeGen] Add noundef metadata to load instructions (preliminary 1 or 5).Dec 6 2022, 10:49 PM

In D134410#3898615, @nlopes wrote:

In D134410#3898563, @nikic wrote:

FWIW, I don't agree with this. It's fine to change load semantics, as long as bitcode autoupgrade is handled correctly. I'd go even further and say that at least long term, any solution that does not have a plain load instruction return poison for uninitialized memory will be a maintenance mess.

I think the core problem that prevents us from moving in this direction is that we have no way to represent a frozen load at all (as freeze(load) will propagate poison before freezing). If I understand correctly, you're trying to solve this by making *all* loads frozen loads, and use load !noundef to allow dropping the freeze. I think this would be a very bad outcome, especially as we're going to lose that !noundef on most load speculations, and I don't think we want to be introducing freezes when speculating loads (e.g. during LICM).

I expect that the path of least resistance is going to be to support bitwise poison for load/store/phi/select and promote it to full-value poison on any other operation, allowing a freezing load to be expressed as freeze(load).

Please let me know if I completely misunderstood the scheme you have in mind...

You got it right. But the load we propose is not exactly a freezing load. It only returns freeze poison for uninit memory. It doesn't freeze stored values.
If we introduce a !uninit_is_poison flag, we can drop !noundef when hoisting and add !uninit_is_poison instead (it's implied by !noundef).

The question is what's the default for uninit memory: freeze poison or poison? But I think we need both. And since we need both (unless we add a freezing load or bitwise poison), why not keep a behavior closer to the current?
A freezing load is worse as a store+load needs to be forwarded though a freeze, as the load is not a NOP anymore.

Bitwise poison would be nice, but I don't know how to make it work. To make it work with bit-fields, we would need and/or to propagate poison bit-wise as well. But then we will break optimizations that transform between those and arithmetic. Then you start wanting that add/mul/etc also propagate poison bit-wise and then I don't know how to specify that semantics. (if you want, I can forward you a proposal we wrote a few years ago, but we never managed to make sound, so it was never presented in public)

I agree that bit-wise poison for loads sounds appealing (for bit fields, load widening, etc), but without support in the rest of the IR, it's not worth much. If it becomes value-wise in the 1st operation, then I don't think we gain any expressivity.

I think the gain in expressiveness is that we can write something like freeze(load). For example, D138766 currently proposes to use a silly pattern like bitcast(freeze(load(<4 x i8>)) to i32) to achieve a "frozen load", because there is no other way to express it right now.

The other problem I see here is that we still need to do something about the memcpy -> load/store fold. As soon as we have poison from uninit values (either directly or via !uninit_is_poison) this will start causing miscompiles very quickly. The only way to do this right now is again with an <N x i8> vector load/store, which still optimizes terribly. This needs either load+store of integer to preserve poison, or again some form of byte type.

Something I've only recently realized is that we also run into this problem when inserting spurious load/store pairs, e.g. as done by LICM scalar promotion. If we're promoting say an i32 value to scalar that previously used a conditional store, then promotion will now introduce an unconditional load and store, which will spread poison from individual bytes. So that means that technically scalar promotion (and SimplifyCFG store speculation) also need to do some special dance to preserve poison. And without preservation of poison across load/store/phi in plain ints, this is going to be a bad optimization outcome either way: We'd have to use either a vector of i8 or a byte type for the inserted phi nodes and only cast to integer and back when manipulating the value, which would probably defeat the optimization. At that point probably best to freeze the first load (which again needs a freezing load).

(We should probably move the discussion around this patch series to discourse.)

In D134410#3983684, @nikic wrote:

The other problem I see here is that we still need to do something about the memcpy -> load/store fold. As soon as we have poison from uninit values (either directly or via !uninit_is_poison) this will start causing miscompiles very quickly. The only way to do this right now is again with an <N x i8> vector load/store, which still optimizes terribly. This needs either load+store of integer to preserve poison, or again some form of byte type.

Something I've only recently realized is that we also run into this problem when inserting spurious load/store pairs, e.g. as done by LICM scalar promotion. If we're promoting say an i32 value to scalar that previously used a conditional store, then promotion will now introduce an unconditional load and store, which will spread poison from individual bytes. So that means that technically scalar promotion (and SimplifyCFG store speculation) also need to do some special dance to preserve poison. And without preservation of poison across load/store/phi in plain ints, this is going to be a bad optimization outcome either way: We'd have to use either a vector of i8 or a byte type for the inserted phi nodes and only cast to integer and back when manipulating the value, which would probably defeat the optimization. At that point probably best to freeze the first load (which again needs a freezing load).

For this stuff, I think the ideal solution is the byte type. That's the only way to do this kind of raw byte copies. Doesn't spread poison between bits nor do you need to know if you are reading a pointer or something else. Nor does it require a freeze, which is annoying.

John will submit a proposal (next week?) using poison for uninit loads. I think we have converged on something cool. Thanks for insisting on that one!

As Nuno mentioned we are targeting the proposal for next week. I will update the ticket with the Discourse link once it becomes available.

The Discourse proposal is available: RFC Load Instruction: Uninitialized Memory Semantics

Large Diff

This large diff affects 116 files. Files without inline comments have been collapsed. Expand All Files

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

CodeGenOptions.def

1 line

Driver/

Options.td

7 lines

lib/

CodeGen/

CGExpr.cpp

17 lines

test/

CodeGen/

X86/

avx-builtins.c

24 lines

avx512bw-builtins.c

6 lines

avx512f-builtins.c

12 lines

avx512fp16-builtins.c

10 lines

avx512vl-builtins.c

10 lines

avx512vlbw-builtins.c

10 lines

sse-builtins.c

10 lines

sse2-builtins.c

20 lines

aarch64-ls64-inline-asm.c

22 lines

aarch64-ls64.c

76 lines

aarch64-sve-acle-__ARM_FEATURE_SVE_VECTOR_OPERATORS.c

2 lines

memcpy-inline-builtin.c

24 lines

tbaa-array.cpp

12 lines

tbaa-base.cpp

4 lines

tbaa.cpp

6 lines

ubsan-pass-object-size.c

4 lines

CodeGenCUDA/

amdgpu-kernel-arg-pointer-type.cu

12 lines

CodeGenCXX/

attr-likelihood-if-branch-weights.cpp

30 lines

attr-likelihood-switch-branch-weights.cpp

70 lines

builtin-bit-cast-no-tbaa.cpp

7 lines

debug-info-line.cpp

5 lines

pr12251.cpp

30 lines

pragma-followup_inner.cpp

39 lines

OpenMP/

cancel_codegen.cpp

402 lines

cancellation_point_codegen.cpp

224 lines

distribute_codegen.cpp

1684 lines

distribute_parallel_for_reduction_task_codegen.cpp

480 lines

distribute_parallel_for_simd_private_codegen.cpp

900 lines

distribute_parallel_for_simd_proc_bind_codegen.cpp

222 lines

distribute_simd_codegen.cpp

4444 lines

distribute_simd_private_codegen.cpp

928 lines

distribute_simd_reduction_codegen.cpp

550 lines

for_reduction_task_codegen.cpp

454 lines

irbuilder_safelen.cpp

47 lines

irbuilder_safelen_order_concurrent.cpp

55 lines

irbuilder_simd_aligned.cpp

76 lines

irbuilder_simdlen.cpp

55 lines

irbuilder_simdlen_safelen.cpp

47 lines

master_taskloop_in_reduction_codegen.cpp

120 lines

master_taskloop_simd_in_reduction_codegen.cpp

150 lines

nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp

702 lines

ordered_codegen.cpp

1166 lines

parallel_for_reduction_task_codegen.cpp

448 lines

parallel_for_simd_aligned_codegen.cpp

26 lines

parallel_for_simd_codegen.cpp

10 lines

parallel_master_reduction_task_codegen.cpp

422 lines

parallel_master_taskloop_codegen.cpp

510 lines

parallel_master_taskloop_lastprivate_codegen.cpp

564 lines

parallel_master_taskloop_simd_codegen.cpp

2024 lines

parallel_master_taskloop_simd_lastprivate_codegen.cpp

766 lines

parallel_reduction_task_codegen.cpp

412 lines

parallel_sections_reduction_task_codegen.cpp

448 lines

sections_reduction_task_codegen.cpp

454 lines

target_defaultmap_codegen_01.cpp

40 lines

target_in_reduction_codegen.cpp

22 lines

target_is_device_ptr_codegen.cpp

20 lines

target_map_codegen_00.cpp

2 lines

target_map_codegen_01.cpp

6 lines

target_map_codegen_02.cpp

2 lines

target_map_codegen_04.cpp

2 lines

target_map_codegen_05.cpp

2 lines

target_map_codegen_07.cpp

2 lines

target_map_codegen_11.cpp

2 lines

target_map_codegen_13.cpp

2 lines

target_map_codegen_14.cpp

4 lines

target_map_codegen_15.cpp

2 lines

target_map_codegen_17.cpp

2 lines

target_map_codegen_26.cpp

2 lines

target_map_codegen_27.cpp

2 lines

target_map_codegen_29.cpp

4 lines

target_parallel_codegen.cpp

696 lines

target_parallel_for_codegen.cpp

2348 lines

target_parallel_for_reduction_task_codegen.cpp

450 lines

target_parallel_for_simd_codegen.cpp

4524 lines

target_parallel_reduction_task_codegen.cpp

414 lines

target_teams_codegen.cpp

924 lines

target_teams_distribute_codegen.cpp

1624 lines

target_teams_distribute_parallel_for_order_codegen.cpp

42 lines

target_teams_distribute_parallel_for_reduction_task_codegen.cpp

758 lines

target_teams_distribute_parallel_for_simd_collapse_codegen.cpp

1526 lines

target_teams_distribute_parallel_for_simd_if_codegen.cpp

2924 lines

target_teams_distribute_parallel_for_simd_proc_bind_codegen.cpp

222 lines

target_teams_distribute_parallel_for_simd_reduction_codegen.cpp

962 lines

target_teams_distribute_simd_codegen.cpp

4968 lines

target_teams_distribute_simd_collapse_codegen.cpp

1198 lines

target_teams_distribute_simd_reduction_codegen.cpp

652 lines

target_update_codegen.cpp

24 lines

task_affinity_codegen.cpp

2 lines

task_codegen.cpp

3820 lines

task_if_codegen.cpp

498 lines

task_in_reduction_codegen.cpp

112 lines

task_member_call_codegen.cpp

92 lines

task_target_device_codegen.c

62 lines

taskloop_in_reduction_codegen.cpp

120 lines

taskloop_simd_in_reduction_codegen.cpp

150 lines

teams_distribute_parallel_for_reduction_task_codegen.cpp

762 lines

teams_distribute_parallel_for_simd_collapse_codegen.cpp

1326 lines

teams_distribute_parallel_for_simd_proc_bind_codegen.cpp

222 lines

teams_distribute_parallel_for_simd_reduction_codegen.cpp

990 lines

teams_distribute_simd_codegen.cpp

1592 lines

teams_distribute_simd_collapse_codegen.cpp

1074 lines

teams_distribute_simd_reduction_codegen.cpp

680 lines

utils/

update_cc_test_checks/

Inputs/

basic-cplusplus.cpp.expected

10 lines

check-attributes.cpp.funcattrs.expected

2 lines

check-attributes.cpp.plain.expected

2 lines

def-and-decl.c.expected

2 lines

explicit-template-instantiation.cpp.expected

18 lines

generated-funcs-regex.c.expected

6 lines

generated-funcs.c.generated.expected

52 lines

generated-funcs.c.no-generated.expected

16 lines

mangled_names.c.expected

10 lines

mangled_names.c.funcsig.expected

10 lines

resolve-tmp-conflict.cpp.expected

2 lines

Diff 472051

clang/include/clang/Basic/CodeGenOptions.def

Load File

clang/include/clang/Driver/Options.td

Load File

clang/lib/CodeGen/CGExpr.cpp

Show First 20 Lines • Show All 666 Lines • ▼ Show 20 Lines

bool CodeGenFunction::sanitizePerformTypeCheck() const {		bool CodeGenFunction::sanitizePerformTypeCheck() const {
return SanOpts.has(SanitizerKind::Null) \|\|		return SanOpts.has(SanitizerKind::Null) \|\|
SanOpts.has(SanitizerKind::Alignment) \|\|		SanOpts.has(SanitizerKind::Alignment) \|\|
SanOpts.has(SanitizerKind::ObjectSize) \|\|		SanOpts.has(SanitizerKind::ObjectSize) \|\|
SanOpts.has(SanitizerKind::Vptr);		SanOpts.has(SanitizerKind::Vptr);
}		}

		static void applyNoundefToLoadInst(bool enable, const clang::QualType &Ty,
		llvm::LoadInst *Load) {
		tschuettUnsubmitted Not Done Reply Inline Actions Nit: You meant static. tschuett: Nit: You meant static.
		jmciverAuthorUnsubmitted Done Reply Inline Actions Thank you for bringing that to my attention. I reviewed the LLVM Coding Standards guidance on the topic. jmciver: Thank you for bringing that to my attention. I reviewed the [[ https://llvm.
		if (enable) {
		if (auto TyPtr = Ty.getTypePtrOrNull()) {
		if (!(TyPtr->isSpecificBuiltinType(BuiltinType::UChar) \|\|
		TyPtr->isSpecificBuiltinType(BuiltinType::Char_U) \|\|
		TyPtr->isStdByteType())) {
		Load->setMetadata(llvm::LLVMContext::MD_noundef,
		llvm::MDNode::get(Load->getContext(), None));
		}
		}
		}
		}

void CodeGenFunction::EmitTypeCheck(TypeCheckKind TCK, SourceLocation Loc,		void CodeGenFunction::EmitTypeCheck(TypeCheckKind TCK, SourceLocation Loc,
llvm::Value *Ptr, QualType Ty,		llvm::Value *Ptr, QualType Ty,
CharUnits Alignment,		CharUnits Alignment,
SanitizerSet SkippedChecks,		SanitizerSet SkippedChecks,
llvm::Value *ArraySize) {		llvm::Value *ArraySize) {
if (!sanitizePerformTypeCheck())		if (!sanitizePerformTypeCheck())
return;		return;

▲ Show 20 Lines • Show All 1,034 Lines • ▼ Show 20 Lines	if (const auto *ClangVecTy = Ty->getAs<VectorType>()) {
if (!CGM.getCodeGenOpts().PreserveVec3Type && VTy->getNumElements() == 3) {		if (!CGM.getCodeGenOpts().PreserveVec3Type && VTy->getNumElements() == 3) {

// Bitcast to vec4 type.		// Bitcast to vec4 type.
llvm::VectorType *vec4Ty =		llvm::VectorType *vec4Ty =
llvm::FixedVectorType::get(VTy->getElementType(), 4);		llvm::FixedVectorType::get(VTy->getElementType(), 4);
Address Cast = Builder.CreateElementBitCast(Addr, vec4Ty, "castToVec4");		Address Cast = Builder.CreateElementBitCast(Addr, vec4Ty, "castToVec4");
// Now load value.		// Now load value.
llvm::Value *V = Builder.CreateLoad(Cast, Volatile, "loadVec4");		llvm::Value *V = Builder.CreateLoad(Cast, Volatile, "loadVec4");

// Shuffle vector to get vec3.		// Shuffle vector to get vec3.
V = Builder.CreateShuffleVector(V, ArrayRef<int>{0, 1, 2}, "extractVec");		V = Builder.CreateShuffleVector(V, ArrayRef<int>{0, 1, 2}, "extractVec");
return EmitFromMemory(V, Ty);		return EmitFromMemory(V, Ty);
}		}
}		}

// Atomic operations have to be done on integral types.		// Atomic operations have to be done on integral types.
LValue AtomicLValue =		LValue AtomicLValue =
LValue::MakeAddr(Addr, Ty, getContext(), BaseInfo, TBAAInfo);		LValue::MakeAddr(Addr, Ty, getContext(), BaseInfo, TBAAInfo);
if (Ty->isAtomicType() \|\| LValueIsSuitableForInlineAtomic(AtomicLValue)) {		if (Ty->isAtomicType() \|\| LValueIsSuitableForInlineAtomic(AtomicLValue)) {
return EmitAtomicLoad(AtomicLValue, Loc).getScalarVal();		return EmitAtomicLoad(AtomicLValue, Loc).getScalarVal();
}		}

llvm::LoadInst *Load = Builder.CreateLoad(Addr, Volatile);		llvm::LoadInst *Load = Builder.CreateLoad(Addr, Volatile);
if (isNontemporal) {		if (isNontemporal) {
llvm::MDNode *Node = llvm::MDNode::get(		llvm::MDNode *Node = llvm::MDNode::get(
Load->getContext(), llvm::ConstantAsMetadata::get(Builder.getInt32(1)));		Load->getContext(), llvm::ConstantAsMetadata::get(Builder.getInt32(1)));
Load->setMetadata(CGM.getModule().getMDKindID("nontemporal"), Node);		Load->setMetadata(CGM.getModule().getMDKindID("nontemporal"), Node);
}		}

		applyNoundefToLoadInst(CGM.getCodeGenOpts().EnableNoundefLoadAttr, Ty, Load);
		vitalybukaUnsubmitted Not Done Reply Inline Actions Taking into account potantial risks from pre-existing UB, I believe we need clang switch to be able to shutdown this branch, similar to enable_noundef_analysis. User with large code bases maybe need some time for transition. I have no opinion if this should be ON or OFF by default vitalybuka: Taking into account potantial risks from pre-existing UB, I believe we need clang switch to be…
		jmciverAuthorUnsubmitted Done Reply Inline Actions A flag is a good idea; I'll go ahead and add that. We can determine the default later, for now I would like to default to `on` to see/show the effects. Anyone feel free to let me know if you have reasoning otherwise. For the flag name: `enable_noundef_loads`? jmciver: A flag is a good idea; I'll go ahead and add that. We can determine the default later, for now…

CGM.DecorateInstructionWithTBAA(Load, TBAAInfo);		CGM.DecorateInstructionWithTBAA(Load, TBAAInfo);

if (EmitScalarRangeCheck(Load, Ty, Loc)) {		if (EmitScalarRangeCheck(Load, Ty, Loc)) {
// In order to prevent the optimizer from throwing away the check, don't		// In order to prevent the optimizer from throwing away the check, don't
// attach range metadata to the load.		// attach range metadata to the load.
// TODO: Enable range metadata for AMDGCN after issue		// TODO: Enable range metadata for AMDGCN after issue
// https://github.com/llvm/llvm-project/issues/58176 is fixed.		// https://github.com/llvm/llvm-project/issues/58176 is fixed.
} else if (CGM.getCodeGenOpts().OptimizationLevel > 0 &&		} else if (CGM.getCodeGenOpts().OptimizationLevel > 0 &&
▲ Show 20 Lines • Show All 3,868 Lines • Show Last 20 Lines