Page MenuHomePhabricator

[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
ClosedPublic

Authored by hyeongyukim on Jun 29 2021, 9:37 PM.

Details

Summary

Turning on enable_noundef_analysis flag allows better codegen by removing freeze instructions.
I modified clang by renaming enable_noundef_analysis flag to disable-noundef-analysis and turning it off by default.

Test updates are made as a separate patch: D108453

Diff Detail

Event Timeline

hyeongyukim created this revision.Jun 29 2021, 9:37 PM
hyeongyukim requested review of this revision.Jun 29 2021, 9:37 PM
hyeongyukim edited the summary of this revision. (Show Details)

A large amount of testing will be affected, and I will update it soon.

Changed disable_noundef_args flag to enable_noundef_args to enable emitting noundef attributes on IR call arguments and return values by default.
Changed test codes will be attached at another PR.

hyeongyukim retitled this revision from [Clang/Test]: Enable enable_noundef_analysis as default to [Clang/Test]: Enable enable_noundef_analysis by default.Aug 20 2021, 3:01 AM
hyeongyukim edited the summary of this revision. (Show Details)
aqjune updated this revision to Diff 368033.Aug 22 2021, 10:38 PM

Rename disable-noundef-args to disable-noundef-analysis

aqjune retitled this revision from [Clang/Test]: Enable enable_noundef_analysis by default to [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default.
aqjune edited the summary of this revision. (Show Details)

Hello all,

I'm a messenger of @hyeongyukim's two patches that allow clang to turn on noundef analysis by default.
The noundef analysis flag was added by D81678 in the past. Its goal is to mark arguments and return values in C/C++ as noundef if legal.
Attaching noundef is beneficial because it allows quite a few optimizations that are unsound w.r.t. undef or poison (they are usually guarded with isGuaranteedNotToBeUndefOrPoison check).
Since the fact that arg/ret values are noundef is derived from the source language (C/C++)'s specification, the information is permanently lost unless explicitly attached by clang.

Previously, the flag was not activated by default because it required a lot of tests to be updated, possibly raising conflicts with downstream patches (discussed in this thread: D82317)
To handle the conflict, I'd like to update requested tests to simply adding -disable-noundef-analysis at // RUN: %clang_cc1 ... rather than fixing the whole text.
I'll talk about this more in the second patch (D108453).

kda added a subscriber: kda.Sep 1 2021, 3:14 PM
nlopes added a comment.Sep 7 2021, 2:37 AM

(repost my message from llvm-dev)

I can add one thought regarding why this direction makes sense and why it doesn't constrain optimizations.

Traditionally we don't want to mark too many things as UB as it restricts code movement and thus limits optimizations. That's why we have poison for e.g. signed arithmetic overflow rather than using UB as allowed by the C standard.
For function calls, however, optimizers are already super constrained: in general we cannot move them around because they may not return, they may throw an exception, they may modify memory, do a syscall, and so on. So adding another restriction to function calls isn't a big deal; optimizers don't touch them that much anyway.

This proposal adds more UB, as it turns undef/poison into UB on function calls, but that doesn't limit optimizations. So it seems like a good tradeoff: we learn some values can't be undef/poison "for free". Plus that UB can be dropped if needed; dropping noundef is legal! So there are really no downsides.
That's why I believe this is a good direction for clang.

From the users perspective, hopefully the sanitizers can already flag related issues so hopefully this won't lead to hard-to-debug UB bugs.

I did some experiments to confirm the benefits of adding a noundef attribute to the function parameter. (test result link)

One of the most significant advantages of this patch is that it can improve performance by removing unnecessary Freeze.
To check whether the performance is improved, first, I checked how much performance regression occurred on SPECrate2017 when we fixed the miscompilation problem of LoopUnswitch through the Freeze (D106041.)
Then, I applied this patch to see if the unnecessary Freeze was removed and performance was improved.
From the experimental results, the following facts were found.

  • Fixing only LoopUnswitch(D106041) reduces runtime performance and increases object size.
  • Freeze added that fixing the LoopUnswitch interferes with other optimizations, increasing the number of instructions. (There is still a problem with LoopUnswitch due to these performance problems)
  • Applying this patch improves runtime and makes the object size almost the same as the LLVM trunk.
  • Knowing that Function arguments are not noundef/poison, some freezes were removed, and performance and binary size have been improved.

The second sheet compares the performance of the LLVM trunk and this patch, but there is no noticeable change.
In the last sheet, about 2,000 LLVM test-suites were compiled, and the object size was compared. In most cases, the object size was improved.

I think many advantages can be obtained by applying this patch.

@hyeongyukim, thank you for the summary. This looks like a great change, and a net positive to me. The test churn in the other patch is unfortunate, but IMHO unavoidable.

If no one has any further objections,
LGTM

aqjune added a comment.Oct 7 2021, 6:16 PM

Thank you!

I'll send a mail to cfe-dev + llvm-dev to notify the change (and gather any possible concern if exists).

If there is no objection in a week, let's land these patches.

Thank you!
In the meantime, I will rebase this patch and resolve conflicts.

There has been no concern in a week, so I'd like to land this patch and D108453 in this weekend.
I'll carefully watch the buildbots and address failures if any.
@eugenis could you accept this patch and D108453 please, if you don't mind?

eugenis accepted this revision.Oct 15 2021, 9:54 AM
This revision is now accepted and ready to land.Oct 15 2021, 9:54 AM

I've bisected a crash in generated code down to this commit. The code that crashes is clean when run in ubsan. The misbehaviour happens across 4 tested architectures (i686, x86_64, armv7, aarch64).

The misbehaviour occurs in this preprocessed source, https://martin.st/temp/seek-preproc.c, compiled with clang -target x86_64-linux-gnu -c -O3 seek-preproc.c.

The whole failing testcase can be reproduced (on e.g. linux) with these steps:

git clone git://source.ffmpeg.org/ffmpeg
cd ffmpeg
./configure --cc=clang
make -j4 fate-seek-acodec-flac

(The failing object file is libavformat/seek.o.)

I can see that @ff_seek_frame_binary is the only affected function.
It introduces llvm.assume as well as !nonnull at a few places and folds null pointer checks.
Still investigating..

It seems the original code has a use of an uninitialized variable.
Line 4420 at seek-preproc.c (function ff_seek_frame_binary):

 int64_t pos_min=pos_min, pos_max=pos_max, pos, pos_limit; // pos_min and pos_max are self-assigned.
...
if (sti->index_entries) {
   ...
}
// pos_min and pos_max are used as arguments below
pos = ff_gen_search(s, stream_index, target_ts, pos_min, pos_max, pos_limit,
                     ts_min, ts_max, flags, &ts, avif->read_timestamp);

https://gist.github.com/aqjune/3bd0ea19bbc12b4744843c0c070e994c

If the branch is not taken, pos_min and pos_max are read while they are still uninitialized.

I guess the variables are self-assigned to avoid warnings?

It seems the original code has a use of an uninitialized variable.
Line 4420 at seek-preproc.c (function ff_seek_frame_binary):

 int64_t pos_min=pos_min, pos_max=pos_max, pos, pos_limit; // pos_min and pos_max are self-assigned.
...
if (sti->index_entries) {
   ...
}
// pos_min and pos_max are used as arguments below
pos = ff_gen_search(s, stream_index, target_ts, pos_min, pos_max, pos_limit,
                     ts_min, ts_max, flags, &ts, avif->read_timestamp);

https://gist.github.com/aqjune/3bd0ea19bbc12b4744843c0c070e994c

If the branch is not taken, pos_min and pos_max are read while they are still uninitialized.

I guess the variables are self-assigned to avoid warnings?

Yes, I believe so. If the branch is not taken, pos_min and pos_max are undefined when entering ff_gen_search. (I would assume that their value isn't ever used within ff_gen_search in that case.) But regardless of that, in this case, the generated code crashes around this line, https://gist.github.com/aqjune/3bd0ea19bbc12b4744843c0c070e994c#file-ff_seek_frame_binary-c-L39, before entering ff_gen_search - and within that branch, those variables are properly set before they're used.

Yes, I believe so. If the branch is not taken, pos_min and pos_max are undefined when entering ff_gen_search. (I would assume that their value isn't ever used within ff_gen_search in that case.) But regardless of that, in this case, the generated code crashes around this line, https://gist.github.com/aqjune/3bd0ea19bbc12b4744843c0c070e994c#file-ff_seek_frame_binary-c-L39, before entering ff_gen_search - and within that branch, those variables are properly set before they're used.

I see, the crash is happening at the line because SimplifyCFG removes the sti->index_entries null pointer check (which seems valid to me).
If stl->index_entries was null, it would lead to uses of uninitialized variables as function arguments, which is UB.
SimplifyCFG relies on the information and removes the stl->index_entries null check.

int64_t pos_min=pos_min, pos_max=pos_max, pos, pos_limit; // pos_min and pos_max are undef
...
if (sti->index_entries) {
   ... (dereference sti->index_entries+index)
   ... (initialize pos_min and pos_max)
}
// If sti->index_entries was NULL, UB must happen at the call below because undef is passed to ff_gen_search's noundef arg.
pos = ff_gen_search(s, stream_index, target_ts, pos_min, pos_max, pos_limit,
                     ts_min, ts_max, flags, &ts, avif->read_timestamp);

SimplifyCFG optimizes this to the code below:

int64_t pos_min=pos_min, pos_max=pos_max, pos, pos_limit;
...
if (true) { // Changed to true!
   ... (dereference sti->index_entries+index) // This can crash.
   ... (initialize pos_min and pos_max)
}

pos = ff_gen_search(s, stream_index, target_ts, pos_min, pos_max, pos_limit,
                     ts_min, ts_max, flags, &ts, avif->read_timestamp);

I think the solution is to initialize pos_min and pos_max to some value.
If sti->index_entries is null, they are never used inside ff_gen_search anyway, so initializing it into any value (e.g. 0) will work.

I see, the crash is happening at the line because SimplifyCFG removes the sti->index_entries null pointer check (which seems valid to me).
If stl->index_entries was null, it would lead to uses of uninitialized variables as function arguments, which is UB.
SimplifyCFG relies on the information and removes the stl->index_entries null check.

Thanks! That explains it. Although the actual crash is due to yet another removed condition:

int64_t pos_min=pos_min, pos_max=pos_max, pos, pos_limit; // pos_min and pos_max are undef
...
if (sti->index_entries) {
   index = av_index_search_timestamp();
   if (index >= 0) {
       ... (dereference sti->index_entries+index)
       ... (initialize pos_min and pos_max)
   }
}
// If sti->index_entries was NULL, UB must happen at the call below because undef is passed to ff_gen_search's noundef arg.
pos = ff_gen_search(s, stream_index, target_ts, pos_min, pos_max, pos_limit,
                     ts_min, ts_max, flags, &ts, avif->read_timestamp);

Both of the nested if conditions are removed:

int64_t pos_min=pos_min, pos_max=pos_max, pos, pos_limit;
...
if (true) { // Changed to true!
   index = av_index_search_timestamp();
   if (true) { // Also changed to true
       ... (dereference sti->index_entries+index) // This can crash with index = -1
       ... (initialize pos_min and pos_max)
   }
}
pos = ff_gen_search(s, stream_index, target_ts, pos_min, pos_max, pos_limit,
                     ts_min, ts_max, flags, &ts, avif->read_timestamp);

I think the solution is to initialize pos_min and pos_max to some value.
If sti->index_entries is null, they are never used inside ff_gen_search anyway, so initializing it into any value (e.g. 0) will work.

Yes, any value should be fine (and I guess 0 is the easiest to optimize for the compiler). Just as background - this is in a codebase that really tries to avoid default variable initializations unless they're proven to be necessary. But here they clearly are due to the UB rules of the language.

This change is causing a lot of failures in the address sanitiser tests on the 2-stage AArch64 buildbots. For example: https://lab.llvm.org/buildbot/#/builders/179/builds/1326

I can reproduce the failures on another AArch64 machine, they only happen with a 2-stage build, the first stage tests pass fine. I've verified that reverting this (and the related test patches) does make the failures go away.

Is there enough detail in the buildbot logs for you to investigate this, or is there any extra stuff (logs, intermediate files) I can send you to help? Would it be OK to revert this change until these failures are fixed?

This change caused libclc to fail to build on old ubuntu (using llvm-spir 10.0.0-1)
See:
https://bugs.llvm.org/show_bug.cgi?id=52200

aqjune added a comment.EditedOct 18 2021, 7:02 AM

I will revert this patch, since its fix needs some time for me to investigate.
I have a colleague who has access to an AArch64 server, so I can give it a try by myself.

I checked the reason for failure in address sanitizer tests on the 2-stage aarch64 buildbots.
The buildbot failure was occured because the internal_clone function of the compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp file is being compiled incorrectly.
The internal_clone function is a simple function that calls the clone system call of Linux. Its original return value should be the PID of the newly created process, but the actual returned value is 220 (which is the __NR_clone value.)

The aarch64 assembly changed by this patch is as follows.

// before
84: d2801b88  mov x8, #0xdc                   // #0xdc(220): system call number of clone
88: d4000001  svc #0x0                        // system call
...
a4: a9434ff4  ldp x20, x19, [sp, #48]
a8: a94257f6  ldp x22, x21, [sp, #32]
ac: a9415ff8  ldp x24, x23, [sp, #16]
b0: a8c467fe  ldp x30, x25, [sp], #64
b4: d65f03c0  ret

=========================
// after
88: d2801b88  mov x8, #0xdc                   // #0xdc(220): system call number of clone
8c: d4000001  svc #0x0                        // system call
...
a8: a9434ff4  ldp x20, x19, [sp, #48]
ac: aa0803e0  mov x0, x8                      // return value(x0) was overwritten by 0xdc(220)
b0: a94257f6  ldp x22, x21, [sp, #32]
b4: a9415ff8  ldp x24, x23, [sp, #16]
b8: a8c467fe  ldp x30, x25, [sp], #64
bc: d65f03c0  ret

Does anyone know why the internal_clone function of aarch64 is affected by this patch?

aqjune added a comment.Mon, Nov 1, 8:53 PM

I am not familiar with inline assembly, but it seems the output variable (%0) is not updated because it does not appear in the code.

1382   __asm__ __volatile__(
1383                        "mov x0,x2\n" /* flags  */
1384                        "mov x2,x4\n" /* ptid  */
1385                        "mov x3,x5\n" /* tls  */
1386                        "mov x4,x6\n" /* ctid  */
1387                        "mov x8,%9\n" /* clone  */
1388 
1389                        "svc 0x0\n"
1390 
1391                        /* if (%r0 != 0)
1392                         *   return %r0;
1393                         */
1394                        "cmp x0, #0\n"
1395                        "bne 1f\n"
1396 
1397                        /* In the child, now. Call "fn(arg)". */
1398                        "ldp x1, x0, [sp], #16\n"
1399                        "blr x1\n"
1400 
1401                        /* Call _exit(%r0).  */
1402                        "mov x8, %10\n"
1403                        "svc 0x0\n"
1404                      "1:\n"
1405 
1406                        : "=r" (res)
1407                        : "i"(-EINVAL),
1408                          "r"(__fn), "r"(__stack), "r"(__flags), "r"(__arg),
1409                          "r"(__ptid), "r"(__tls), "r"(__ctid),
1410                          "i"(__NR_clone), "i"(__NR_exit)
1411                        : "x30", "memory");

Should %0 be updated?

You are absolutely right. X86 variant uses an "=a" constraint (rax register), others pin the output variable to a specific register with asm declaration. It appears we've missed it in Aarch64.

Could you check if asm("x0") in the declaration of res helps?

aqjune added a comment.Wed, Nov 3, 1:40 AM

You are absolutely right. X86 variant uses an "=a" constraint (rax register), others pin the output variable to a specific register with asm declaration. It appears we've missed it in Aarch64.

Could you check if asm("x0") in the declaration of res helps?

Thank you, It worked!
Only a couple of failures in ASAN left, and we are investigating them.

diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
index ea3e5bdbc754..826c6d36e1b1 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_linux.cpp
@@ -1360,7 +1360,7 @@ uptr internal_clone(int (*fn)(void *), void *child_stack, int flags, void *arg,
 #elif defined(__aarch64__)
 uptr internal_clone(int (*fn)(void *), void *child_stack, int flags, void *arg,
                     int *parent_tidptr, void *newtls, int *child_tidptr) {
-  long long res;
+  register long long res __asm__("x0");
   if (!fn || !child_stack)
     return -EINVAL;
   CHECK_EQ(0, (uptr)child_stack % 16);

After modifying internal_clone like this, the problem disappeared.
Is it okay to commit this change by myself?

-  long long res;
+  register long long res __asm__("x0");

Is it okay to commit this change by myself?

Yes, go ahead!

Either this or D108453 (which were committed together!) caused this assert according to my git-bisect: https://godbolt.org/z/4rqYKfW7K

NOTE that this fails in a lit-test for me, clang CodeGen/ifunc.c (though my downstream seems to run the verifier even with -emit-llvm, so you might need to just swap it to an -emit-obj to get this to repro).

If you cannot fix this quickly, let me know and I can revert it.

IR for this looks like:

[ekeane1@scsel-clx-24 llvm]$ /localdisk2/ekeane1/workspaces/xmain-web/builds/xmainefi2linux_debug/llvm/bin/clang -cc1 -internal-isystem /localdisk2/ekeane1/workspaces/xmain-web/builds/xmainefi2linux_debug/llvm/lib/clang/14.0.0/include -nostdsysteminc -triple i386-unknown-linux-gnu -emit-llvm -o - /localdisk2/ekeane1/workspaces/xmain-web/llvm/clang/test/CodeGen/ifunc.c -disable-llvm-passes
; ModuleID = '/localdisk2/ekeane1/workspaces/xmain-web/llvm/clang/test/CodeGen/ifunc.c'
source_filename = "/localdisk2/ekeane1/workspaces/xmain-web/llvm/clang/test/CodeGen/ifunc.c"
target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-linux-gnu"

@global = global i32 0, align 4

@foo = ifunc i32 (i32), i32 (i32)* ()* @foo_ifunc
@goo = ifunc void (), bitcast (i8* ()* @goo_ifunc to void ()* ()*)

; Function Attrs: noinline nounwind optnone
define internal i32 (i32)* @foo_ifunc() #0 {
entry:
  %0 = load i32, i32* @global, align 4
  %tobool = icmp ne i32 %0, 0
  %1 = zext i1 %tobool to i64
  %cond = select i1 %tobool, i32 (i32)* @f1, i32 (i32)* @f2
  ret i32 (i32)* %cond
}

; Function Attrs: noinline nounwind optnone
define dso_local i32 @bar() #0 {
entry:
  %call = call i32 @foo(i32 noundef 1)
  ret i32 %call
}

; Function Attrs: noinline nounwind optnone
define dso_local void @bar2() #0 {
entry:
  call void @goo()
  ret void
}

; Function Attrs: noinline nounwind optnone
define dso_local i8* @goo_ifunc() #0 {
entry:
  ret i8* null
}

; Function Attrs: noinline nounwind optnone
define internal i32 @f1(i32 noundef %i) #0 {
entry:
  %i.addr = alloca i32, align 4
  store i32 %i, i32* %i.addr, align 4
  %0 = load i32, i32* %i.addr, align 4
  %add = add nsw i32 %0, 1
  ret i32 %add
}

; Function Attrs: noinline nounwind optnone
define internal i32 @f2(i32 noundef %i) #0 {
entry:
  %i.addr = alloca i32, align 4
  store i32 %i, i32* %i.addr, align 4
  %0 = load i32, i32* %i.addr, align 4
  %add = add nsw i32 %0, 2
  ret i32 %add
}

attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+x87" }

!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}

!0 = !{i32 1, !"NumRegisterParameters", i32 0}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{!"Intel(R) oneAPI DPC++/C++ Compiler 2022.1.0 (2022.x.0.YYYYMMDD)"}

Either this or D108453 (which were committed together!) caused this assert according to my git-bisect: https://godbolt.org/z/4rqYKfW7K

NOTE that this fails in a lit-test for me, clang CodeGen/ifunc.c (though my downstream seems to run the verifier even with -emit-llvm, so you might need to just swap it to an -emit-obj to get this to repro).

If you cannot fix this quickly, let me know and I can revert it.

IR for this looks like:

[ekeane1@scsel-clx-24 llvm]$ /localdisk2/ekeane1/workspaces/xmain-web/builds/xmainefi2linux_debug/llvm/bin/clang -cc1 -internal-isystem /localdisk2/ekeane1/workspaces/xmain-web/builds/xmainefi2linux_debug/llvm/lib/clang/14.0.0/include -nostdsysteminc -triple i386-unknown-linux-gnu -emit-llvm -o - /localdisk2/ekeane1/workspaces/xmain-web/llvm/clang/test/CodeGen/ifunc.c -disable-llvm-passes
; ModuleID = '/localdisk2/ekeane1/workspaces/xmain-web/llvm/clang/test/CodeGen/ifunc.c'
source_filename = "/localdisk2/ekeane1/workspaces/xmain-web/llvm/clang/test/CodeGen/ifunc.c"
target datalayout = "e-m:e-p:32:32-p270:32:32-p271:32:32-p272:64:64-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-linux-gnu"

@global = global i32 0, align 4

@foo = ifunc i32 (i32), i32 (i32)* ()* @foo_ifunc
@goo = ifunc void (), bitcast (i8* ()* @goo_ifunc to void ()* ()*)

; Function Attrs: noinline nounwind optnone
define internal i32 (i32)* @foo_ifunc() #0 {
entry:
  %0 = load i32, i32* @global, align 4
  %tobool = icmp ne i32 %0, 0
  %1 = zext i1 %tobool to i64
  %cond = select i1 %tobool, i32 (i32)* @f1, i32 (i32)* @f2
  ret i32 (i32)* %cond
}

; Function Attrs: noinline nounwind optnone
define dso_local i32 @bar() #0 {
entry:
  %call = call i32 @foo(i32 noundef 1)
  ret i32 %call
}

; Function Attrs: noinline nounwind optnone
define dso_local void @bar2() #0 {
entry:
  call void @goo()
  ret void
}

; Function Attrs: noinline nounwind optnone
define dso_local i8* @goo_ifunc() #0 {
entry:
  ret i8* null
}

; Function Attrs: noinline nounwind optnone
define internal i32 @f1(i32 noundef %i) #0 {
entry:
  %i.addr = alloca i32, align 4
  store i32 %i, i32* %i.addr, align 4
  %0 = load i32, i32* %i.addr, align 4
  %add = add nsw i32 %0, 1
  ret i32 %add
}

; Function Attrs: noinline nounwind optnone
define internal i32 @f2(i32 noundef %i) #0 {
entry:
  %i.addr = alloca i32, align 4
  store i32 %i, i32* %i.addr, align 4
  %0 = load i32, i32* %i.addr, align 4
  %add = add nsw i32 %0, 2
  ret i32 %add
}

attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+cx8,+x87" }

!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}

!0 = !{i32 1, !"NumRegisterParameters", i32 0}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{!"Intel(R) oneAPI DPC++/C++ Compiler 2022.1.0 (2022.x.0.YYYYMMDD)"}

Hmm.. I'll revert it.

Hmm. I wonder if that's related to the problem uncovered by the verifier in https://reviews.llvm.org/D113352.

Prior to the latest revert (fd9b099906c61e46574d1ea2d99b973321fe1d21), the Linux kernel's binary verifier (objtool) points out an issue when building with ThinLTO and -fsanitize=integer-divide-by-zero. I have no idea if this is an issue with the tool or this series. A simplified reproducer:

$ cat ravb_main.i
int ravb_set_gti_ndev_rate;
unsigned int ravb_set_gti_ndev_inc;
void ravb_set_gti_ndev() {
  ravb_set_gti_ndev_inc = 1000000000;
  ravb_set_gti_ndev_inc = ravb_set_gti_ndev_inc / ravb_set_gti_ndev_rate;
  if (ravb_set_gti_ndev_inc)
    _dev_err(ravb_set_gti_ndev_inc);
}

$ clang -std=gnu89 -O2 -flto=thin -fsanitize=integer-divide-by-zero -c -o ravb_main.o ravb_main.i

$ llvm-ar cDPrsT ravb.o ravb_main.o

$ ld.lld -m elf_x86_64 -r -o ravb.lto.o --whole-archive ravb.o

$ ./objtool orc generate --no-fp --no-unreachable --retpoline --uaccess --mcount --module ravb.lto.o
ravb.lto.o: warning: objtool: .text.ravb_set_gti_ndev: unexpected end of section

With LLVM 13.0.0, there is no warning with those commands. The original and reduced .i file, interestingness test, and static objtool binary are available here.

Either this or D108453 (which were committed together!) caused this assert according to my git-bisect: https://godbolt.org/z/4rqYKfW7K

NOTE that this fails in a lit-test for me, clang CodeGen/ifunc.c (though my downstream seems to run the verifier even with -emit-llvm, so you might need to just swap it to an -emit-obj to get this to repro).

The lit-test failure of CodeGen/ifunc.c was not directly related to this patch.
emitIFuncDefinition was creating an incorrect function attribute.
It added the noundef attribute to the function even though there are no parameters (foo_ifunc function of ifunc.c), and it was fixed a few days ago.

The patch that solved this problem is D113352.

The emitIFuncDefinition fucntion incorrectly passes the GlobalDecl of the IFunc itself to the call to GetOrCreateLLVMFunction for creating the resolver, which causes it to be created with a wrong attribute list, which fails Verifier::visitFunction with "Attribute after last parameter!". You'll note that unlike the relationship between aliases and their aliasees, the resolver and the ifunc have different types - the resolver takes no parameters.