Page MenuHomePhabricator

[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
ClosedPublic

Authored by hyeongyukim on Jun 29 2021, 9:37 PM.

Details

Summary

Turning on enable_noundef_analysis flag allows better codegen by removing freeze instructions.
I modified clang by renaming enable_noundef_analysis flag to disable-noundef-analysis and turning it off by default.

Test updates are made as a separate patch: D108453

Diff Detail

Event Timeline

hyeongyukim created this revision.Jun 29 2021, 9:37 PM
hyeongyukim requested review of this revision.Jun 29 2021, 9:37 PM
hyeongyukim edited the summary of this revision. (Show Details)

A large amount of testing will be affected, and I will update it soon.

Changed disable_noundef_args flag to enable_noundef_args to enable emitting noundef attributes on IR call arguments and return values by default.
Changed test codes will be attached at another PR.

hyeongyukim retitled this revision from [Clang/Test]: Enable enable_noundef_analysis as default to [Clang/Test]: Enable enable_noundef_analysis by default.Aug 20 2021, 3:01 AM
hyeongyukim edited the summary of this revision. (Show Details)
aqjune updated this revision to Diff 368033.Aug 22 2021, 10:38 PM

Rename disable-noundef-args to disable-noundef-analysis

aqjune retitled this revision from [Clang/Test]: Enable enable_noundef_analysis by default to [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default.
aqjune edited the summary of this revision. (Show Details)

Hello all,

I'm a messenger of @hyeongyukim's two patches that allow clang to turn on noundef analysis by default.
The noundef analysis flag was added by D81678 in the past. Its goal is to mark arguments and return values in C/C++ as noundef if legal.
Attaching noundef is beneficial because it allows quite a few optimizations that are unsound w.r.t. undef or poison (they are usually guarded with isGuaranteedNotToBeUndefOrPoison check).
Since the fact that arg/ret values are noundef is derived from the source language (C/C++)'s specification, the information is permanently lost unless explicitly attached by clang.

Previously, the flag was not activated by default because it required a lot of tests to be updated, possibly raising conflicts with downstream patches (discussed in this thread: D82317)
To handle the conflict, I'd like to update requested tests to simply adding -disable-noundef-analysis at // RUN: %clang_cc1 ... rather than fixing the whole text.
I'll talk about this more in the second patch (D108453).

kda added a subscriber: kda.Sep 1 2021, 3:14 PM
nlopes added a comment.Sep 7 2021, 2:37 AM

(repost my message from llvm-dev)

I can add one thought regarding why this direction makes sense and why it doesn't constrain optimizations.

Traditionally we don't want to mark too many things as UB as it restricts code movement and thus limits optimizations. That's why we have poison for e.g. signed arithmetic overflow rather than using UB as allowed by the C standard.
For function calls, however, optimizers are already super constrained: in general we cannot move them around because they may not return, they may throw an exception, they may modify memory, do a syscall, and so on. So adding another restriction to function calls isn't a big deal; optimizers don't touch them that much anyway.

This proposal adds more UB, as it turns undef/poison into UB on function calls, but that doesn't limit optimizations. So it seems like a good tradeoff: we learn some values can't be undef/poison "for free". Plus that UB can be dropped if needed; dropping noundef is legal! So there are really no downsides.
That's why I believe this is a good direction for clang.

From the users perspective, hopefully the sanitizers can already flag related issues so hopefully this won't lead to hard-to-debug UB bugs.

I did some experiments to confirm the benefits of adding a noundef attribute to the function parameter. (test result link)

One of the most significant advantages of this patch is that it can improve performance by removing unnecessary Freeze.
To check whether the performance is improved, first, I checked how much performance regression occurred on SPECrate2017 when we fixed the miscompilation problem of LoopUnswitch through the Freeze (D106041.)
Then, I applied this patch to see if the unnecessary Freeze was removed and performance was improved.
From the experimental results, the following facts were found.

  • Fixing only LoopUnswitch(D106041) reduces runtime performance and increases object size.
  • Freeze added that fixing the LoopUnswitch interferes with other optimizations, increasing the number of instructions. (There is still a problem with LoopUnswitch due to these performance problems)
  • Applying this patch improves runtime and makes the object size almost the same as the LLVM trunk.
  • Knowing that Function arguments are not noundef/poison, some freezes were removed, and performance and binary size have been improved.

The second sheet compares the performance of the LLVM trunk and this patch, but there is no noticeable change.
In the last sheet, about 2,000 LLVM test-suites were compiled, and the object size was compared. In most cases, the object size was improved.

I think many advantages can be obtained by applying this patch.

@hyeongyukim, thank you for the summary. This looks like a great change, and a net positive to me. The test churn in the other patch is unfortunate, but IMHO unavoidable.

If no one has any further objections,
LGTM

aqjune added a comment.Oct 7 2021, 6:16 PM

Thank you!

I'll send a mail to cfe-dev + llvm-dev to notify the change (and gather any possible concern if exists).

If there is no objection in a week, let's land these patches.

Thank you!
In the meantime, I will rebase this patch and resolve conflicts.

There has been no concern in a week, so I'd like to land this patch and D108453 in this weekend.
I'll carefully watch the buildbots and address failures if any.
@eugenis could you accept this patch and D108453 please, if you don't mind?

eugenis accepted this revision.Oct 15 2021, 9:54 AM
This revision is now accepted and ready to land.Oct 15 2021, 9:54 AM

I've bisected a crash in generated code down to this commit. The code that crashes is clean when run in ubsan. The misbehaviour happens across 4 tested architectures (i686, x86_64, armv7, aarch64).

The misbehaviour occurs in this preprocessed source, https://martin.st/temp/seek-preproc.c, compiled with clang -target x86_64-linux-gnu -c -O3 seek-preproc.c.

The whole failing testcase can be reproduced (on e.g. linux) with these steps:

git clone git://source.ffmpeg.org/ffmpeg
cd ffmpeg
./configure --cc=clang
make -j4 fate-seek-acodec-flac

(The failing object file is libavformat/seek.o.)

I can see that @ff_seek_frame_binary is the only affected function.
It introduces llvm.assume as well as !nonnull at a few places and folds null pointer checks.
Still investigating..

It seems the original code has a use of an uninitialized variable.
Line 4420 at seek-preproc.c (function ff_seek_frame_binary):

 int64_t pos_min=pos_min, pos_max=pos_max, pos, pos_limit; // pos_min and pos_max are self-assigned.
...
if (sti->index_entries) {
   ...
}
// pos_min and pos_max are used as arguments below
pos = ff_gen_search(s, stream_index, target_ts, pos_min, pos_max, pos_limit,
                     ts_min, ts_max, flags, &ts, avif->read_timestamp);

https://gist.github.com/aqjune/3bd0ea19bbc12b4744843c0c070e994c

If the branch is not taken, pos_min and pos_max are read while they are still uninitialized.

I guess the variables are self-assigned to avoid warnings?

It seems the original code has a use of an uninitialized variable.
Line 4420 at seek-preproc.c (function ff_seek_frame_binary):

 int64_t pos_min=pos_min, pos_max=pos_max, pos, pos_limit; // pos_min and pos_max are self-assigned.
...
if (sti->index_entries) {
   ...
}
// pos_min and pos_max are used as arguments below
pos = ff_gen_search(s, stream_index, target_ts, pos_min, pos_max, pos_limit,
                     ts_min, ts_max, flags, &ts, avif->read_timestamp);

https://gist.github.com/aqjune/3bd0ea19bbc12b4744843c0c070e994c

If the branch is not taken, pos_min and pos_max are read while they are still uninitialized.

I guess the variables are self-assigned to avoid warnings?

Yes, I believe so. If the branch is not taken, pos_min and pos_max are undefined when entering ff_gen_search. (I would assume that their value isn't ever used within ff_gen_search in that case.) But regardless of that, in this case, the generated code crashes around this line, https://gist.github.com/aqjune/3bd0ea19bbc12b4744843c0c070e994c#file-ff_seek_frame_binary-c-L39, before entering ff_gen_search - and within that branch, those variables are properly set before they're used.

Yes, I believe so. If the branch is not taken, pos_min and pos_max are undefined when entering ff_gen_search. (I would assume that their value isn't ever used within ff_gen_search in that case.) But regardless of that, in this case, the generated code crashes around this line, https://gist.github.com/aqjune/3bd0ea19bbc12b4744843c0c070e994c#file-ff_seek_frame_binary-c-L39, before entering ff_gen_search - and within that branch, those variables are properly set before they're used.

I see, the crash is happening at the line because SimplifyCFG removes the sti->index_entries null pointer check (which seems valid to me).
If stl->index_entries was null, it would lead to uses of uninitialized variables as function arguments, which is UB.
SimplifyCFG relies on the information and removes the stl->index_entries null check.

int64_t pos_min=pos_min, pos_max=pos_max, pos, pos_limit; // pos_min and pos_max are undef
...
if (sti->index_entries) {
   ... (dereference sti->index_entries+index)
   ... (initialize pos_min and pos_max)
}
// If sti->index_entries was NULL, UB must happen at the call below because undef is passed to ff_gen_search's noundef arg.
pos = ff_gen_search(s, stream_index, target_ts, pos_min, pos_max, pos_limit,
                     ts_min, ts_max, flags, &ts, avif->read_timestamp);

SimplifyCFG optimizes this to the code below:

int64_t pos_min=pos_min, pos_max=pos_max, pos, pos_limit;
...
if (true) { // Changed to true!
   ... (dereference sti->index_entries+index) // This can crash.
   ... (initialize pos_min and pos_max)
}

pos = ff_gen_search(s, stream_index, target_ts, pos_min, pos_max, pos_limit,
                     ts_min, ts_max, flags, &ts, avif->read_timestamp);

I think the solution is to initialize pos_min and pos_max to some value.
If sti->index_entries is null, they are never used inside ff_gen_search anyway, so initializing it into any value (e.g. 0) will work.

I see, the crash is happening at the line because SimplifyCFG removes the sti->index_entries null pointer check (which seems valid to me).
If stl->index_entries was null, it would lead to uses of uninitialized variables as function arguments, which is UB.
SimplifyCFG relies on the information and removes the stl->index_entries null check.

Thanks! That explains it. Although the actual crash is due to yet another removed condition:

int64_t pos_min=pos_min, pos_max=pos_max, pos, pos_limit; // pos_min and pos_max are undef
...
if (sti->index_entries) {
   index = av_index_search_timestamp();
   if (index >= 0) {
       ... (dereference sti->index_entries+index)
       ... (initialize pos_min and pos_max)
   }
}
// If sti->index_entries was NULL, UB must happen at the call below because undef is passed to ff_gen_search's noundef arg.
pos = ff_gen_search(s, stream_index, target_ts, pos_min, pos_max, pos_limit,
                     ts_min, ts_max, flags, &ts, avif->read_timestamp);

Both of the nested if conditions are removed:

int64_t pos_min=pos_min, pos_max=pos_max, pos, pos_limit;
...
if (true) { // Changed to true!
   index = av_index_search_timestamp();
   if (true) { // Also changed to true
       ... (dereference sti->index_entries+index) // This can crash with index = -1
       ... (initialize pos_min and pos_max)
   }
}
pos = ff_gen_search(s, stream_index, target_ts, pos_min, pos_max, pos_limit,
                     ts_min, ts_max, flags, &ts, avif->read_timestamp);

I think the solution is to initialize pos_min and pos_max to some value.
If sti->index_entries is null, they are never used inside ff_gen_search anyway, so initializing it into any value (e.g. 0) will work.

Yes, any value should be fine (and I guess 0 is the easiest to optimize for the compiler). Just as background - this is in a codebase that really tries to avoid default variable initializations unless they're proven to be necessary. But here they clearly are due to the UB rules of the language.

This change is causing a lot of failures in the address sanitiser tests on the 2-stage AArch64 buildbots. For example: https://lab.llvm.org/buildbot/#/builders/179/builds/1326

I can reproduce the failures on another AArch64 machine, they only happen with a 2-stage build, the first stage tests pass fine. I've verified that reverting this (and the related test patches) does make the failures go away.

Is there enough detail in the buildbot logs for you to investigate this, or is there any extra stuff (logs, intermediate files) I can send you to help? Would it be OK to revert this change until these failures are fixed?

This change caused libclc to fail to build on old ubuntu (using llvm-spir 10.0.0-1)
See:
https://bugs.llvm.org/show_bug.cgi?id=52200

aqjune added a comment.EditedOct 18 2021, 7:02 AM

I will revert this patch, since its fix needs some time for me to investigate.
I have a colleague who has access to an AArch64 server, so I can give it a try by myself.

I checked the reason for failure in address sanitizer tests on the 2-stage aarch64 buildbots.
The buildbot failure was occured because the internal_clone function of the compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp file is being compiled incorrectly.
The internal_clone function is a simple function that calls the clone system call of Linux. Its original return value should be the PID of the newly created process, but the actual returned value is 220 (which is the __NR_clone value.)

The aarch64 assembly changed by this patch is as follows.

// before
84: d2801b88  mov x8, #0xdc                   // #0xdc(220): system call number of clone
88: d4000001  svc #0x0                        // system call
...
a4: a9434ff4  ldp x20, x19, [sp, #48]
a8: a94257f6  ldp x22, x21, [sp, #32]
ac: a9415ff8  ldp x24, x23, [sp, #16]
b0: a8c467fe  ldp x30, x25, [sp], #64
b4: d65f03c0  ret

=========================
// after
88: d2801b88  mov x8, #0xdc                   // #0xdc(220): system call number of clone
8c: d4000001  svc #0x0                        // system call
...
a8: a9434ff4  ldp x20, x19, [sp, #48]
ac: aa0803e0  mov x0, x8                      // return value(x0) was overwritten by 0xdc(220)
b0: a94257f6  ldp x22, x21, [sp, #32]
b4: a9415ff8  ldp x24, x23, [sp, #16]
b8: a8c467fe  ldp x30, x25, [sp], #64
bc: d65f03c0  ret

Does anyone know why the internal_clone function of aarch64 is affected by this patch?

aqjune added a comment.Nov 1 2021, 8:53 PM

I am not familiar with inline assembly, but it seems the output variable (%0) is not updated because it does not appear in the code.

1382   __asm__ __volatile__(
1383                        "mov x0,x2\n" /* flags  */
1384                        "mov x2,x4\n" /* ptid  */
1385                        "mov x3,x5\n" /* tls  */
1386                        "mov x4,x6\n" /* ctid  */
1387                        "mov x8,%9\n" /* clone  */
1388 
1389                        "svc 0x0\n"
1390 
1391                        /* if (%r0 != 0)
1392                         *   return %r0;
1393                         */
1394                        "cmp x0, #0\n"
1395                        "bne 1f\n"
1396 
1397                        /* In the child, now. Call "fn(arg)". */
1398                        "ldp x1, x0, [sp], #16\n"
1399                        "blr x1\n"
1400 
1401                        /* Call _exit(%r0).  */
1402                        "mov x8, %10\n"
1403                        "svc 0x0\n"
1404                      "1:\n"
1405 
1406                        : "=r" (res)
1407                        : "i"(-EINVAL),
1408                          "r"(__fn), "r"(__stack), "r"(__flags), "r"(__arg),
1409                          "r"(__ptid), "r"(__tls), "r"(__ctid),
1410                          "i"(__NR_clone), "i"(__NR_exit)
1411                        : "x30", "memory");

Should %0 be updated?

You are absolutely right. X86 variant uses an "=a" constraint (rax register), others pin the output variable to a specific register with asm declaration. It appears we've missed it in Aarch64.

Could you check if asm("x0") in the declaration of res helps?

aqjune added a comment.Nov 3 2021, 1:40 AM

You are absolutely right. X86 variant uses an "=a" constraint (rax register), others pin the output variable to a specific register with asm declaration. It appears we've missed it in Aarch64.

Could you check if asm("x0") in the declaration of res helps?

Thank you, It worked!
Only a couple of failures in ASAN left, and we are investigating them.

diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
index ea3e5bdbc754..826c6d36e1b1 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
@@ -1360,7 +1360,7 @@ uptr internal_clone(int (*fn)(void *), void *child_stack, int flags, void *arg,
 #elif defined(__aarch64__)
 uptr internal_clone(int (*fn)(void *), void *child_stack, int flags, void *arg,
                     int *parent_tidptr, void *newtls, int *child_tidptr) {
-  long long res;
+  register long long res __asm__("x0");
   if (!fn || !child_stack)
     return -EINVAL;
   CHECK_EQ(0, (uptr)child_stack % 16);

After modifying internal_clone like this, the problem disappeared.
Is it okay to commit this change by myself?

-  long long res;
+  register long long res __asm__("x0");

Is it okay to commit this change by myself?

Yes, go ahead!

Either this or D108453 (which were committed together!) caused this assert according to my git-bisect: https://godbolt.org/z/4rqYKfW7K

NOTE that this fails in a lit-test for me, clang CodeGen/ifunc.c (though my downstream seems to run the verifier even with -emit-llvm, so you might need to just swap it to an -emit-obj to get this to repro).

If you cannot fix this quickly, let me know and I can revert it.

IR for this looks like:

[ekeane1@scsel-clx-24 llvm]$ /localdisk2/ekeane1/workspaces/xmain-web/builds/xmainefi2linux_debug/llvm/bin/clang -cc1 -internal-isystem /localdisk2/ekeane1/workspaces/xmain-web/builds/xmainefi2linux_debug/llvm/lib/clang/14.0.0/include -nostdsysteminc -triple i386-unknown-linux-gnu -emit-llvm -o - /localdisk2/ekeane1/workspaces/xmain-web/llvm/clang/test/CodeGen/ifunc.c -disable-llvm-passes
; ModuleID = '/localdisk2/ekeane1/workspaces/xmain-web/llvm/clang/test/CodeGen/ifunc.c'
source_filename = "/localdisk2/ekeane1/workspaces/xmain-web/llvm/clang/test/CodeGen/ifunc.c"
target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-linux-gnu"

@global = global i32 0, align 4

@foo = ifunc i32 (i32), i32 (i32)* ()* @foo_ifunc
@goo = ifunc void (), bitcast (i8* ()* @goo_ifunc to void ()* ()*)

; Function Attrs: noinline nounwind optnone
define internal i32 (i32)* @foo_ifunc() #0 {
entry:
  %0 = load i32, i32* @global, align 4
  %tobool = icmp ne i32 %0, 0
  %1 = zext i1 %tobool to i64
  %cond = select i1 %tobool, i32 (i32)* @f1, i32 (i32)* @f2
  ret i32 (i32)* %cond
}

; Function Attrs: noinline nounwind optnone
define dso_local i32 @bar() #0 {
entry:
  %call = call i32 @foo(i32 noundef 1)
  ret i32 %call
}

; Function Attrs: noinline nounwind optnone
define dso_local void @bar2() #0 {
entry:
  call void @goo()
  ret void
}

; Function Attrs: noinline nounwind optnone
define dso_local i8* @goo_ifunc() #0 {
entry:
  ret i8* null
}

; Function Attrs: noinline nounwind optnone
define internal i32 @f1(i32 noundef %i) #0 {
entry:
  %i.addr = alloca i32, align 4
  store i32 %i, i32* %i.addr, align 4
  %0 = load i32, i32* %i.addr, align 4
  %add = add nsw i32 %0, 1
  ret i32 %add
}

; Function Attrs: noinline nounwind optnone
define internal i32 @f2(i32 noundef %i) #0 {
entry:
  %i.addr = alloca i32, align 4
  store i32 %i, i32* %i.addr, align 4
  %0 = load i32, i32* %i.addr, align 4
  %add = add nsw i32 %0, 2
  ret i32 %add
}

attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+x87" }

!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}

!0 = !{i32 1, !"NumRegisterParameters", i32 0}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{!"Intel(R) oneAPI DPC++/C++ Compiler 2022.1.0 (2022.x.0.YYYYMMDD)"}

Either this or D108453 (which were committed together!) caused this assert according to my git-bisect: https://godbolt.org/z/4rqYKfW7K

NOTE that this fails in a lit-test for me, clang CodeGen/ifunc.c (though my downstream seems to run the verifier even with -emit-llvm, so you might need to just swap it to an -emit-obj to get this to repro).

If you cannot fix this quickly, let me know and I can revert it.

IR for this looks like:

[ekeane1@scsel-clx-24 llvm]$ /localdisk2/ekeane1/workspaces/xmain-web/builds/xmainefi2linux_debug/llvm/bin/clang -cc1 -internal-isystem /localdisk2/ekeane1/workspaces/xmain-web/builds/xmainefi2linux_debug/llvm/lib/clang/14.0.0/include -nostdsysteminc -triple i386-unknown-linux-gnu -emit-llvm -o - /localdisk2/ekeane1/workspaces/xmain-web/llvm/clang/test/CodeGen/ifunc.c -disable-llvm-passes
; ModuleID = '/localdisk2/ekeane1/workspaces/xmain-web/llvm/clang/test/CodeGen/ifunc.c'
source_filename = "/localdisk2/ekeane1/workspaces/xmain-web/llvm/clang/test/CodeGen/ifunc.c"
target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-linux-gnu"

@global = global i32 0, align 4

@foo = ifunc i32 (i32), i32 (i32)* ()* @foo_ifunc
@goo = ifunc void (), bitcast (i8* ()* @goo_ifunc to void ()* ()*)

; Function Attrs: noinline nounwind optnone
define internal i32 (i32)* @foo_ifunc() #0 {
entry:
  %0 = load i32, i32* @global, align 4
  %tobool = icmp ne i32 %0, 0
  %1 = zext i1 %tobool to i64
  %cond = select i1 %tobool, i32 (i32)* @f1, i32 (i32)* @f2
  ret i32 (i32)* %cond
}

; Function Attrs: noinline nounwind optnone
define dso_local i32 @bar() #0 {
entry:
  %call = call i32 @foo(i32 noundef 1)
  ret i32 %call
}

; Function Attrs: noinline nounwind optnone
define dso_local void @bar2() #0 {
entry:
  call void @goo()
  ret void
}

; Function Attrs: noinline nounwind optnone
define dso_local i8* @goo_ifunc() #0 {
entry:
  ret i8* null
}

; Function Attrs: noinline nounwind optnone
define internal i32 @f1(i32 noundef %i) #0 {
entry:
  %i.addr = alloca i32, align 4
  store i32 %i, i32* %i.addr, align 4
  %0 = load i32, i32* %i.addr, align 4
  %add = add nsw i32 %0, 1
  ret i32 %add
}

; Function Attrs: noinline nounwind optnone
define internal i32 @f2(i32 noundef %i) #0 {
entry:
  %i.addr = alloca i32, align 4
  store i32 %i, i32* %i.addr, align 4
  %0 = load i32, i32* %i.addr, align 4
  %add = add nsw i32 %0, 2
  ret i32 %add
}

attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+x87" }

!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}

!0 = !{i32 1, !"NumRegisterParameters", i32 0}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{!"Intel(R) oneAPI DPC++/C++ Compiler 2022.1.0 (2022.x.0.YYYYMMDD)"}

Hmm.. I'll revert it.

Hmm. I wonder if that's related to the problem uncovered by the verifier in https://reviews.llvm.org/D113352.

Prior to the latest revert (fd9b099906c61e46574d1ea2d99b973321fe1d21), the Linux kernel's binary verifier (objtool) points out an issue when building with ThinLTO and -fsanitize=integer-divide-by-zero. I have no idea if this is an issue with the tool or this series. A simplified reproducer:

$ cat ravb_main.i
int ravb_set_gti_ndev_rate;
unsigned int ravb_set_gti_ndev_inc;
void ravb_set_gti_ndev() {
  ravb_set_gti_ndev_inc = 1000000000;
  ravb_set_gti_ndev_inc = ravb_set_gti_ndev_inc / ravb_set_gti_ndev_rate;
  if (ravb_set_gti_ndev_inc)
    _dev_err(ravb_set_gti_ndev_inc);
}

$ clang -std=gnu89 -O2 -flto=thin -fsanitize=integer-divide-by-zero -c -o ravb_main.o ravb_main.i

$ llvm-ar cDPrsT ravb.o ravb_main.o

$ ld.lld -m elf_x86_64 -r -o ravb.lto.o --whole-archive ravb.o

$ ./objtool orc generate --no-fp --no-unreachable --retpoline --uaccess --mcount --module ravb.lto.o
ravb.lto.o: warning: objtool: .text.ravb_set_gti_ndev: unexpected end of section

With LLVM 13.0.0, there is no warning with those commands. The original and reduced .i file, interestingness test, and static objtool binary are available here.

Either this or D108453 (which were committed together!) caused this assert according to my git-bisect: https://godbolt.org/z/4rqYKfW7K

NOTE that this fails in a lit-test for me, clang CodeGen/ifunc.c (though my downstream seems to run the verifier even with -emit-llvm, so you might need to just swap it to an -emit-obj to get this to repro).

The lit-test failure of CodeGen/ifunc.c was not directly related to this patch.
emitIFuncDefinition was creating an incorrect function attribute.
It added the noundef attribute to the function even though there are no parameters (foo_ifunc function of ifunc.c), and it was fixed a few days ago.

The patch that solved this problem is D113352.

The emitIFuncDefinition fucntion incorrectly passes the GlobalDecl of the IFunc itself to the call to GetOrCreateLLVMFunction for creating the resolver, which causes it to be created with a wrong attribute list, which fails Verifier::visitFunction with "Attribute after last parameter!". You'll note that unlike the relationship between aliases and their aliasees, the resolver and the ifunc have different types - the resolver takes no parameters.

Either this or D108453 (which were committed together!) caused this assert according to my git-bisect: https://godbolt.org/z/4rqYKfW7K

NOTE that this fails in a lit-test for me, clang CodeGen/ifunc.c (though my downstream seems to run the verifier even with -emit-llvm, so you might need to just swap it to an -emit-obj to get this to repro).

The lit-test failure of CodeGen/ifunc.c was not directly related to this patch.
emitIFuncDefinition was creating an incorrect function attribute.
It added the noundef attribute to the function even though there are no parameters (foo_ifunc function of ifunc.c), and it was fixed a few days ago.

The patch that solved this problem is D113352.

The emitIFuncDefinition fucntion incorrectly passes the GlobalDecl of the IFunc itself to the call to GetOrCreateLLVMFunction for creating the resolver, which causes it to be created with a wrong attribute list, which fails Verifier::visitFunction with "Attribute after last parameter!". You'll note that unlike the relationship between aliases and their aliasees, the resolver and the ifunc have different types - the resolver takes no parameters.

Thanks for the update! Sorry for the late response, i was on vacation the last two weeks.

Prior to the latest revert (fd9b099906c61e46574d1ea2d99b973321fe1d21), the Linux kernel's binary verifier (objtool) points out an issue when building with ThinLTO and -fsanitize=integer-divide-by-zero. I have no idea if this is an issue with the tool or this series. A simplified reproducer:

$ cat ravb_main.i
int ravb_set_gti_ndev_rate;
unsigned int ravb_set_gti_ndev_inc;
void ravb_set_gti_ndev() {
  ravb_set_gti_ndev_inc = 1000000000;
  ravb_set_gti_ndev_inc = ravb_set_gti_ndev_inc / ravb_set_gti_ndev_rate;
  if (ravb_set_gti_ndev_inc)
    _dev_err(ravb_set_gti_ndev_inc);
}

$ clang -std=gnu89 -O2 -flto=thin -fsanitize=integer-divide-by-zero -c -o ravb_main.o ravb_main.i

$ llvm-ar cDPrsT ravb.o ravb_main.o

$ ld.lld -m elf_x86_64 -r -o ravb.lto.o --whole-archive ravb.o

$ ./objtool orc generate --no-fp --no-unreachable --retpoline --uaccess --mcount --module ravb.lto.o
ravb.lto.o: warning: objtool: .text.ravb_set_gti_ndev: unexpected end of section

With LLVM 13.0.0, there is no warning with those commands. The original and reduced .i file, interestingness test, and static objtool binary are available here.

I checked the reason why Objtool makes a warning.

cont.thread:                                      ; preds = %entry
  tail call void @__ubsan_handle_divrem_overflow(i8* bitcast ({ { [12 x i8]*, i32, i32 }, { i16, i16, [15 x i8] }* }* @1 to i8*), i64 1000000000, i64 0) #3, !nosanitize !8
  br label %if.then

cont:                                             ; preds = %entry
  %div = udiv i32 1000000000, %0
  store i32 %div, i32* @ravb_set_gti_ndev_inc, align 4, !tbaa !4
  %tobool.not = icmp ugt i32 %0, 1000000000
  br i1 %tobool.not, label %if.end, label %if.then

if.then:                                          ; preds = %cont.thread, %cont
  %div3 = phi i32 [ poison, %cont.thread ], [ %div, %cont ]
  %call = tail call i32 (i32, ...) bitcast (i32 (...)* @_dev_err to i32 (i32, ...)*)(i32 noundef %div3) #3
  br label %if.end

This IR code is an IR that has not passed the optimization pass completely.
This code calculates the division only if ravb_set_gti_ndev_rate is non-zero and it calls ubsan_handle_divrem_overflow function to handle UB if ravb_set_gti_ndev_rate is zero.
So far, there is no warning. But a warning occurs when this code passes the SimpleCFG optimization.

cont.thread:                                      ; preds = %entry
  tail call void @__ubsan_handle_divrem_overflow(i8* bitcast ({ { [12 x i8]*, i32, i32 }, { i16, i16, [15 x i8] }* }* @1 to i8*), i64 1000000000, i64 0) #3, !nosanitize !8
  unreachable

cont:                                             ; preds = %entry
  %div = udiv i32 1000000000, %0
  store i32 %div, i32* @ravb_set_gti_ndev_inc, align 4, !tbaa !4
  %tobool.not = icmp ugt i32 %0, 1000000000
  br i1 %tobool.not, label %if.end, label %if.then

if.then:                                          ; preds = %cont
  %call = tail call i32 (i32, ...) bitcast (i32 (...)* @_dev_err to i32 (i32, ...)*)(i32 noundef %div) #3
  br label %if.end

if.end:                                           ; preds = %if.then, %cont
  ret void
}

After it passes the SimplyCFG, the br instruction was changed to the unreachable instruction in cont.thread block.

This patch added noundef to the parameter of the _dev_err function, making the %div3 unable to be Poison.
It is impossible to jump from the cont.thread block to if.then block, so br instruction was changed to unreachable instruction.
It would be nice to remove the unreachable block, but the above IR is not wrong because it is UB when ravb_set_gti_ndev_rate is zero.

There seems to be no existing problem in clang, and I think we can bypass this warning by adding a code that checks whether the gravb_set_gti_ndev_rate is zero or not as follows.

int ravb_set_gti_ndev_rate;
unsigned int ravb_set_gti_ndev_inc;
void ravb_set_gti_ndev() {
  ravb_set_gti_ndev_inc = 1000000000;
  ravb_set_gti_ndev_inc = ravb_set_gti_ndev_inc / ravb_set_gti_ndev_rate;
  if (ravb_set_gti_ndev_rate != 0)
    if (ravb_set_gti_ndev_inc)
      _dev_err(ravb_set_gti_ndev_inc);
}

@nathanchance How about changing the existing test code as above?

@hyeongyukim I am currently offline for the evening but it seems like my reduction might have been too aggressive. It looks like this code comes from ravb_set_gti() (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/renesas/ravb_main.c?h=v5.16-rc7#n2480), which checks that rate is not zero before using it as a divisor. I will see if I can get a reproducer without any undefined behavior such as this.

@hyeongyukim I am currently offline for the evening but it seems like my reduction might have been too aggressive. It looks like this code comes from ravb_set_gti() (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/renesas/ravb_main.c?h=v5.16-rc7#n2480), which checks that rate is not zero before using it as a divisor. I will see if I can get a reproducer without any undefined behavior such as this.

Okay! Thank you for your response.

@hyeongyukim I hand reduced a couple of the translation units that I see issues in so apologies if they are a little verbose.

The full set of warnings:

drivers/net/ethernet/renesas/ravb.lto.o: warning: objtool: .text.ravb_set_gti: unexpected end of section
drivers/net/ethernet/wiznet/w5100.lto.o: warning: objtool: .text.w5100_tx_skb: unexpected end of section
drivers/net/ethernet/wiznet/w5100.lto.o: warning: objtool: .text.w5100_readbuf: unexpected end of section
drivers/video/fbdev/intelfb/intelfb.lto.o: warning: objtool: .text.intelfbhw_validate_mode: unexpected end of section

which individually can be viewed at the following links:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/renesas/ravb_main.c?h=v5.16-rc7#n2480
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/wiznet/w5100.c?h=v5.16-rc7#n798
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/wiznet/w5100.c?h=v5.16-rc7#n513
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/video/fbdev/intelfb/intelfbhw.c?h=v5.16-rc7#n313

I have a reproducer for the first two, as that is all I had time for; if you would like them for the other two, I can get those for you tomorrow.

https://github.com/nathanchance/creduce-files/tree/3699d2b725a305c0f02a7672447b30578115ba51/D105169-ravb_main

$ clang -O2 -flto=thin -fsanitize=integer-divide-by-zero -c -o ravb_main.o ravb_main.i
ravb_main.i:68:5: warning: expression result unused [-Wunused-value]
    __rem;
    ^~~~~
1 warning generated.

$ ld.lld -m elf_x86_64 -r -o ravb.lto.o --whole-archive ravb_main.o

$ ./objtool orc generate --module --no-fp --no-unreachable --uaccess --mcount ravb.lto.o
ravb.lto.o: warning: objtool: .text.ravb_set_gti: unexpected end of section

This one can also be reproduced without LTO:

$ clang -O2 -fsanitize=integer-divide-by-zero -c -o ravb_main.o ravb_main.i

$ ./objtool orc generate --module --no-fp --no-unreachable --uaccess --mcount ravb_main.o
ravb_main.o: warning: objtool: .text: unexpected end of section

As far as I can tell, that function already has a check to avoid dividing by zero.

https://github.com/nathanchance/creduce-files/tree/3699d2b725a305c0f02a7672447b30578115ba51/D105169-w5100

$ clang -O2 -flto=thin -fsanitize=integer-divide-by-zero -c -o w5100.o w5100.i

$ ld.lld -m elf_x86_64 -r -o w5100.lto.o --whole-archive w5100.o

$ w5100.lto.o: warning: objtool: .text.w5100_tx_skb: unexpected end of section

Without LTO, another warning is emitted, with a similar root cause I believe:

$ clang -O2 -fsanitize=integer-divide-by-zero -c -o w5100.o w5100.i

$ ./objtool orc generate --module --no-fp --no-unreachable --uaccess --mcount w5100.o
w5100.o: warning: objtool: w5100_tx_skb() falls through to next function w5100_probe()

In theory, w5100_writebuf() should probably have a check for s0_tx_buf_size being zero but the input is completely controlled by the kernel, as can be seen in w5100_probe() (s0_tx_buf_size cannot be zero). Most if not all kernel maintainers are not going to accept patches adding checks that are superfluous.

Great. I'll check it out.

I have a reproducer for the first two, as that is all I had time for; if you would like them for the other two, I can get those for you tomorrow.

@nathanchance I think the other two can be reproduced without difficulty. If the reproduction fails, I will request it :)

@nathanchance

I tried to reproduce the last warning (intelfbhw_validate_mode), but I failed to produce it.
I think my reproducer is correct, but it does not make any warning.
Can you tell me which part was wrong?

clang -O2 -flto=thin -fsanitize=integer-divide-by-zero -c -o intelfb.o intelfb.i
ld.lld -m elf_x86_64 -r -o intelfb.lto.o --whole-archive intelfb.o
objtool orc generate --module --no-fp --no-unreachable --uaccess --mcount intelfb.lto.o

I use these commands, and I attached the intelfb.i file.

@nathanchance

I tried to reproduce the last warning (intelfbhw_validate_mode), but I failed to produce it.
I think my reproducer is correct, but it does not make any warning.
Can you tell me which part was wrong?

clang -O2 -flto=thin -fsanitize=integer-divide-by-zero -c -o intelfb.o intelfb.i
ld.lld -m elf_x86_64 -r -o intelfb.lto.o --whole-archive intelfb.o
objtool orc generate --module --no-fp --no-unreachable --uaccess --mcount intelfb.lto.o

I use these commands, and I attached the intelfb.i file.

It looks like this particular case also needs -fsanitize-coverage=trace-pc (which comes from CONFIG_KCOV). Once I add that with your reduced reproducer, I see the initial warning.

$ clang -O2 -flto=thin -fsanitize=integer-divide-by-zero -fsanitize-coverage=trace-pc -c -o intelfb.{o,i}

$ ld.lld -m elf_x86_64 -r -o intelfb.lto.o --whole-archive intelfb.o

$ ./objtool orc generate --module --no-fp --no-unreachable --uaccess --mcount intelfb.lto.o
intelfb.lto.o: warning: objtool: .text.intelfbhw_validate_mode: unexpected end of section

@nathanchance
You were right. I succeeded in reproducing the warning with -fsanitize-coverage=trace-pc
However, this problem seems to be fixed in the latest version. (I used 57a551a)

@nathanchance Hi, I analyzed all four warnings.

Warning #1: can be handled by x86-backend. filled issue #53118
Warning #2: bug in the kernel, fixed in the next version.
Warning #3: same reason with #2
Warning #4: It was not reproduced in the latest clang.

As you see, one of this patch's advantages is that it exposed some bugs in the kernel!
All issues are either identified or will be fixed soon. Is it okay to recommit this patch?

@nathanchance Hi, I analyzed all four warnings.

Thanks a lot for doing this!

Warning #1: can be handled by x86-backend. filled issue #53118

This is the warning in drivers/net/ethernet/wiznet/w5100.lto.o if I am reading the issue, right? I would have thought that would be warnings #2 and #3.

Warning #2: bug in the kernel, fixed in the next version.
Warning #3: same reason with #2

I think this refers to drivers/net/ethernet/renesas/ravb.lto.o, which is indeed fixed by https://git.kernel.org/linus/d9f31aeaa1e5aefa68130878af3c3513d41c1e2d.

Warning #4: It was not reproduced in the latest clang.

I can still see this with clang @1441ffe6a6da90e9c4800914eb0005b1087aa5d2 and Linux @ 112450df61b7373529b0fe4c122ad13b89d80a8a built with KCFLAGS="-Xclang -enable-noundef-analysis". Your small reproducer has it as well:

$ clang --version | head -1
ClangBuiltLinux clang version 14.0.0 (https://github.com/llvm/llvm-project 1441ffe6a6da90e9c4800914eb0005b1087aa5d2)

$ clang -O2 -flto=thin -fsanitize=integer-divide-by-zero -fsanitize-coverage=trace-pc -Xclang -enable-noundef-analysis -c -o intelfb.{o,i}

$ ld.lld -m elf_x86_64 -r -o intelfb.lto.o --whole-archive intelfb.o

$ ./objtool orc generate --module --no-fp --no-unreachable --uaccess --mcount intelfb.lto.o
intelfb.lto.o: warning: objtool: .text.intelfbhw_validate_mode: unexpected end of section

As you see, one of this patch's advantages is that it exposed some bugs in the kernel!
All issues are either identified or will be fixed soon. Is it okay to recommit this patch?

I just went to go hammer on this option to how many warnings I see on the next version of Linux ("linux-next") [1] with allmodconfig (which in turn enables CONFIG_UBSAN) + CONFIG_LTO_CLANG_THIN=y . I think I see a lot more warnings there versus Linus' tree as the kernel no longer uses -fsanitize=object-size there, which potentially hides some of these?

$ echo "CONFIG_GCOV_KERNEL=n
CONFIG_KASAN=n
CONFIG_LTO_CLANG_THIN=y
CONFIG_WERROR=n" >allmod.config

$ make -skj"$(nproc)" KCFLAGS="-Xclang -enable-noundef-analysis" KCONFIG_ALLCONFIG=1 LLVM=1 allmodconfig all
...
drivers/video/fbdev/omap2/omapfb/dss/omapdss.lto.o: warning: objtool: .text.dsi_cio_timings: unexpected end of section
drivers/soc/qcom/qcom_rpmh.lto.o: warning: objtool: .text.rpmh_rsc_write_ctrl_data: unexpected end of section
net/tipc/tipc.lto.o: warning: objtool: .text.tipc_nl_compat_link_stat_dump: unexpected end of section
drivers/scsi/mpi3mr/mpi3mr.lto.o: warning: objtool: .text.mpi3mr_op_request_post: unexpected end of section
drivers/md/bcache/bcache.lto.o: warning: objtool: .text.bch_cache_show: unexpected end of section
drivers/scsi/dc395x.lto.o: warning: objtool: .text.msgin_phase0: unexpected end of section
...
fs/ocfs2/cluster/ocfs2_nodemanager.lto.o: warning: objtool: .text.o2hb_setup_one_bio: unexpected end of section
drivers/video/fbdev/intelfb/intelfb.lto.o: warning: objtool: .text.intelfbhw_validate_mode: unexpected end of section
drivers/net/ethernet/wiznet/w5100.lto.o: warning: objtool: .text.w5100_tx_skb: unexpected end of section
drivers/net/ethernet/wiznet/w5100.lto.o: warning: objtool: .text.w5100_rx_skb: unexpected end of section
drivers/gpu/drm/amd/amdgpu/amdgpu.lto.o: warning: objtool: .text.dqm_debugfs_hqds: unexpected end of section
...

Without KCFLAGS="-Xclang -enable-noundef-analysis", I only see the mpi3mr_op_request_post warning, so that is probably unrelated to this (probably some recent change regressed this, I'll see if I can hunt that down).

Without ThinLTO, I see:

$ echo CONFIG_WERROR=n >allmod.config

$ make -skj"$(nproc)" KCFLAGS="-Xclang -enable-noundef-analysis" KCONFIG_ALLCONFIG=1 LLVM=1 allmodconfig all
...
drivers/soc/qcom/rpmh-rsc.o: warning: objtool: rpmh_rsc_write_ctrl_data() falls through to next function trace_raw_output_rpmh_tx_done()
drivers/gpu/drm/radeon/sumo_dpm.o: warning: objtool: sumo_dpm_set_power_state() falls through to next function sumo_dpm_post_set_power_state()
...
drivers/scsi/mpi3mr/mpi3mr_fw.o: warning: objtool: mpi3mr_op_request_post() falls through to next function mpi3mr_check_rh_fault_ioc()
drivers/net/ethernet/wiznet/w5100.o: warning: objtool: w5100_tx_skb() falls through to next function w5100_get_drvinfo()
drivers/net/ethernet/wiznet/w5100.o: warning: objtool: w5100_rx_skb() falls through to next function w5100_mmio_probe()
...

Some of the same warnings, which likely have the same root cause as above but no LTO might make this easier to look into. Without KCFLAGS="-Xclang -enable-noundef-analysis", same as above, I only see the mpi3mr_op_request_post warning.

I can reduce all of these down for you and/or I can start an email thread with the objtool maintainers to see if there is a way to fix or avoid these warnings on the objtool side and include you in that discussion, if LLVM is not really doing anything wrong here. I am by no means an expert in this area and I don't want to delay this anymore but I want to avoid regressing our builds, as objtool regularly helps us spot compiler bugs. Perhaps @nickdesaulniers has some other thoughts?

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/

I can reduce all of these down for you and/or I can start an email thread with the objtool maintainers to see if there is a way to fix or avoid these warnings on the objtool side and include you in that discussion, if LLVM is not really doing anything wrong here. I am by no means an expert in this area and I don't want to delay this anymore but I want to avoid regressing our builds, as objtool regularly helps us spot compiler bugs. Perhaps @nickdesaulniers has some other thoughts?

LLVM is not doing anything wrong here. The issue is that once we have UB, LLVM is not required to lay out the functions nicely. For example, issue #53118 is just a "nice to fix", it's not a bug.
On the other hand, I understand that fixing objtool is likely impossible if we consider this UB thing as assembly doesn't have enough information to distinguish between a compiler bug and a UB case. I just don't know how frequent and/or useful these warnings are.

I don't think we should hold this patch anymore as it's not wrong and the side-effects are understood. We can and should try to reduce those kernel warnings to zero, but we cannot put all that burden on this patch's author.

I can reduce all of these down for you and/or I can start an email thread with the objtool maintainers to see if there is a way to fix or avoid these warnings on the objtool side and include you in that discussion, if LLVM is not really doing anything wrong here. I am by no means an expert in this area and I don't want to delay this anymore but I want to avoid regressing our builds, as objtool regularly helps us spot compiler bugs. Perhaps @nickdesaulniers has some other thoughts?

LLVM is not doing anything wrong here. The issue is that once we have UB, LLVM is not required to lay out the functions nicely. For example, issue #53118 is just a "nice to fix", it's not a bug.
On the other hand, I understand that fixing objtool is likely impossible if we consider this UB thing as assembly doesn't have enough information to distinguish between a compiler bug and a UB case. I just don't know how frequent and/or useful these warnings are.

I don't think we should hold this patch anymore as it's not wrong and the side-effects are understood. We can and should try to reduce those kernel warnings to zero, but we cannot put all that burden on this patch's author.

Fair enough. We will just try to deal with this change on the kernel side in one way or another.

This appears to have triggered some buildbot failures:

https://lab.llvm.org/buildbot/#/builders/5/builds/17461/steps/10/logs/stdio

e.g. in path-mappings.test:

"message": "Unknown argument: '-enable-noundef-analysis'",

This flag isn't being added by clangd code, I'm not sure where it's coming from... maybe somewhere in driver?

Ah, this seems to be coming from the buildbot configuration.

https://github.com/llvm/llvm-zorg/blob/38ab06456f7302b4eab53e5b6f1e2eaf4f127132/zorg/buildbot/builders/sanitizers/buildbot_functions.sh#L140-L149

@vitalybuka looks like sanitizer-buildbot7 might need a config update for this change?