This is an archive of the discontinued LLVM Phabricator instance.

[CVP] processSwitch: Remove default case when switch cover all possible values.
ClosedPublic

Authored by junparser on Jul 15 2021, 4:13 AM.

Details

Summary

Sometimes when switchInst lists all values of the condition, it is safe to set default destination as unreachable.
For now, we try to handle this in SimplifyCFG based on computeKnownBits which is not enough to handle some complex condition.

This patch first moves createUnreachableSwitchDefault to Local.cpp. then sets default case as unreachable in CVP pass as long as number of cases are equal to ConstantRange of the condition,

TestPlan: check-llvm

Diff Detail

Event Timeline

junparser created this revision.Jul 15 2021, 4:13 AM
junparser requested review of this revision.Jul 15 2021, 4:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2021, 4:13 AM

Patch description missing - i'd expect to see legality reasoning there.

junparser edited the summary of this revision. (Show Details)Jul 15 2021, 5:53 PM

Patch description missing - i'd expect to see legality reasoning there.

@lebedev.ri summary added.

nikic added inline comments.Jul 16 2021, 10:02 AM
llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
391

What about the case where the default already has an unreachable terminator? Aren't you going to perform a no-op transform in that case?

394

This condition is wrong. Let's say your switch argument has range [0, 9] and your switch cases have range [10, 19]. The input range and the switch cases have the same size, but they don't have the same values and default destination is not dead.

junparser added inline comments.Jul 18 2021, 10:41 PM
llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
391

yep,will add later

394

you are right, i was too aggressive to relay on CVP to remove useless cases. I'll check the lower/upper bound of the cases.

Address the comments of @nikic

@eli.friedman would you help to review this patch? Thank you very much.

ha, @dmgreen, thanks!
@efriedma would you help to review this patch? Thank you very much.

spatel added inline comments.Aug 6 2021, 5:13 AM
llvm/test/Transforms/CorrelatedValuePropagation/basic.ll
380

It would help if you pre-commit this test with the baseline CHECK lines, so it is clear what changed with this patch.
I just tried this example locally without the patch, and it does not seem to be different?

spatel added a comment.Aug 6 2021, 5:15 AM

For now, we try to handle this in SimplifyCFG based on computeKnownBits which is not enough to handle some complex condition.

Can you show an example (add a reduced test) that this patch would handle, but SimplifyCFG would miss?

junparser updated this revision to Diff 364774.Aug 6 2021, 6:23 AM

@spatel, thanks for review the patch, and sorry for confusing you about the testcase.

It turns out that the last change did something wrong to get the range of switch cases.
This patch fixes it and now the testcase shows what it would handle.

spatel added a comment.Aug 6 2021, 8:15 AM

@spatel, thanks for review the patch, and sorry for confusing you about the testcase.

It turns out that the last change did something wrong to get the range of switch cases.
This patch fixes it and now the testcase shows what it would handle.

If I run the test with -simplifycfg, I see:

define i32 @switch_range(i32 %cond) {
entry:
  %s = urem i32 %cond, 2
  %s1 = add i32 %s, 1
  %s1.off = add i32 %s1, -1
  %switch = icmp ult i32 %s1.off, 2
  %. = select i1 %switch, i32 1, i32 0
  ret i32 %.
}

I think you need to expand the test with >2 valid cases to show the potential optimization/motivation better (and I still recommend that you pre-commit the baseline test at least locally, so we can see the diff in IR from this patch).

@spatel, thanks for review the patch, and sorry for confusing you about the testcase.

It turns out that the last change did something wrong to get the range of switch cases.
This patch fixes it and now the testcase shows what it would handle.

If I run the test with -simplifycfg, I see:

define i32 @switch_range(i32 %cond) {
entry:
  %s = urem i32 %cond, 2
  %s1 = add i32 %s, 1
  %s1.off = add i32 %s1, -1
  %switch = icmp ult i32 %s1.off, 2
  %. = select i1 %switch, i32 1, i32 0
  ret i32 %.
}

I think you need to expand the test with >2 valid cases to show the potential optimization/motivation better (and I still recommend that you pre-commit the baseline test at least locally, so we can see the diff in IR from this patch).

Thanks!I'll try it.

Address comment

  1. add more switch cases
  2. split baseline testcase into nfc patch

@spatel, thanks for review the patch, and sorry for confusing you about the testcase.

It turns out that the last change did something wrong to get the range of switch cases.
This patch fixes it and now the testcase shows what it would handle.

If I run the test with -simplifycfg, I see:

define i32 @switch_range(i32 %cond) {
entry:
  %s = urem i32 %cond, 2
  %s1 = add i32 %s, 1
  %s1.off = add i32 %s1, -1
  %switch = icmp ult i32 %s1.off, 2
  %. = select i1 %switch, i32 1, i32 0
  ret i32 %.
}

I think you need to expand the test with >2 valid cases to show the potential optimization/motivation better (and I still recommend that you pre-commit the baseline test at least locally, so we can see the diff in IR from this patch).

Although simplifycfg reduces the switchinst,it is suboptimization since icmp+select can be removed and return directly, that what this patch does.

And your suggestion is more senseable, Thanks very much!

junparser added inline comments.Aug 10 2021, 12:20 AM
llvm/test/Transforms/CorrelatedValuePropagation/basic.ll
380

with simplifycfg, switch_range outputs

define i32 @switch_range(i32 %cond) {
entry:
  %s = urem i32 %cond, 3
  %s1 = add i32 %s, 1
  switch i32 %s1, label %unreachable [
    i32 1, label %common.ret
    i32 2, label %exit2
    i32 3, label %common.ret
  ]

common.ret:                                       ; preds = %entry, %entry, %unreachable, %exit2
  %common.ret.op = phi i32 [ 2, %exit2 ], [ 0, %unreachable ], [ 1, %entry ], [ 1, %entry ]
  ret i32 %common.ret.op

exit2:                                            ; preds = %entry
  br label %common.ret

unreachable:                                      ; preds = %entry
  br label %common.ret
}

define i1 @arg_attribute(i8* nonnull %a) {
  %cmp = icmp eq i8* %a, null
  ret i1 %cmp
}

the default case doesnot change

@spatel, any other suggestion? Thanks.

ping, thanks

spatel added inline comments.Aug 16 2021, 8:26 AM
llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
397–399

This logic seems more confusing to me than something like this:

// If the numbered switch cases cover the entire range of the condition,
// then the default case is not reachable.  
if (SIRange.getSignedMin() == Low && SIRange.getSignedMax() == High &&
    SI->getNumCases() == High - Low + 1) {

We should have a negative test where High/Low match, but we don't have enough cases to cover all values. For example, remove case 2 in the test that is included currently.

llvm/test/Transforms/CorrelatedValuePropagation/basic.ll
382–383

Why do we create this dead block?

junparser added inline comments.Aug 16 2021, 7:15 PM
llvm/test/Transforms/CorrelatedValuePropagation/basic.ll
382–383

This is done by createUnreachableSwitchDefault which uses SplitBlockPredecessors+ SplitBlock.

junparser updated this revision to Diff 366791.Aug 16 2021, 7:50 PM

Address comments.

spatel accepted this revision.Aug 17 2021, 7:38 AM

LGTM

This revision is now accepted and ready to land.Aug 17 2021, 7:38 AM

@spatel, Thanks for the review!

This revision was landed with ongoing or failed builds.Aug 17 2021, 7:23 PM
This revision was automatically updated to reflect the committed changes.

I apologize that I do not have a ton of information (I am happy to provide whatever would be useful) but this patch causes an issue with booting an x86_64_defconfig Linux kernel build with ThinLTO (initially reported here: https://github.com/ClangBuiltLinux/linux/issues/1440).

$ make -skj"$(nproc)" LLVM=1 LLVM_IAS=1 distclean defconfig

$ scripts/config -e LTO_CLANG_THIN

$ make -skj"$(nproc)" LLVM=1 LLVM_IAS=1 olddefconfig bzImage

# from https://github.com/ClangBuiltLinux/boot-utils
$ ../boot-utils/boot-qemu.sh -a x86_64 -k . -t 30s
...
[    3.999233] jump_label: Fatal kernel bug, unexpected op at __mod_timer.llvm.2796501342197355523+0x37f/0x4d0 [ffffffff8a90587f] (eb 02 eb 29 41 != 66 90 0f 1f 00)) size:2 type:1
[    4.000759] ------------[ cut here ]------------
[    4.001232] kernel BUG at arch/x86/kernel/jump_label.c:73!
[    4.001746] invalid opcode: 0000 [#1] SMP NOPTI
[    4.002174] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.14.0-rc6+ #1
[    4.002227] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[    4.002227] Workqueue: events timer_update_keys
[    4.002227] RIP: 0010:__jump_label_patch+0x1ab/0x1c0
[    4.002227] Code: 5d 41 5e 41 5f 5d c3 48 c7 c7 2a 5f b0 8b 4c 89 fe 4c 89 fa 4c 89 f9 49 89 e8 45 89 e1 31 c0 41 56 e8 6c e8 0a 00 48 83 c4 08 <0f> 0b e8 be 54 c7 00 0f 0b 0f 0b 0f 0b 00 00 cc cc 00 00 cc cc 48
[    4.002227] RSP: 0018:ffff8e5440037d88 EFLAGS: 00010286
[    4.002227] RAX: 00000000000000a4 RBX: ffffffff8c20f964 RCX: 5fbdcdcfa2eb3900
[    4.002227] RDX: 0000000000000000 RSI: 0000000000000004 RDI: c0000000ffffbfff
[    4.002227] RBP: ffffffff8bbd5c11 R08: 0000000000000000 R09: ffff8c86de0a0000
[    4.002227] R10: 000000000000bffd R11: ffff8e5440037c40 R12: 0000000000000002
[    4.002227] R13: ffffffff8bbd5c11 R14: 0000000000000001 R15: ffffffff8a90587f
[    4.002227] FS:  0000000000000000(0000) GS:ffff8c86de200000(0000) knlGS:0000000000000000
[    4.002227] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.002227] CR2: 0000000000000000 CR3: 000000000e20c000 CR4: 0000000000350ff0
[    4.002227] Call Trace:
[    4.002227]  ? __mod_timer.llvm.2796501342197355523+0x37f/0x4d0
[    4.002227]  ? __mod_timer.llvm.2796501342197355523+0x38e/0x4d0
[    4.002227]  ? __mod_timer.llvm.2796501342197355523+0x381/0x4d0
[    4.002227]  ? arch_jump_label_transform_queue+0x26/0x60
[    4.002227]  ? __jump_label_update+0x9c/0x150
[    4.002227]  ? static_key_enable_cpuslocked+0x59/0x70
[    4.002227]  ? timer_update_keys+0x59/0x70
[    4.002227]  ? process_one_work+0x1cb/0x340
[    4.002227]  ? worker_thread+0x25d/0x490
[    4.002227]  ? kthread+0x1b6/0x1d0
[    4.002227]  ? worker_clr_flags+0x40/0x40
[    4.002227]  ? kthread_unuse_mm+0x80/0x80
[    4.002227]  ? ret_from_fork+0x22/0x30
[    4.002227] Modules linked in:
[    4.018292] ---[ end trace 15930e2291b4abfa ]---
[    4.018735] RIP: 0010:__jump_label_patch+0x1ab/0x1c0
[    4.019229] Code: 5d 41 5e 41 5f 5d c3 48 c7 c7 2a 5f b0 8b 4c 89 fe 4c 89 fa 4c 89 f9 49 89 e8 45 89 e1 31 c0 41 56 e8 6c e8 0a 00 48 83 c4 08 <0f> 0b e8 be 54 c7 00 0f 0b 0f 0b 0f 0b 00 00 cc cc 00 00 cc cc 48
[    4.021000] RSP: 0018:ffff8e5440037d88 EFLAGS: 00010286
[    4.021481] RAX: 00000000000000a4 RBX: ffffffff8c20f964 RCX: 5fbdcdcfa2eb3900
[    4.022178] RDX: 0000000000000000 RSI: 0000000000000004 RDI: c0000000ffffbfff
[    4.022878] RBP: ffffffff8bbd5c11 R08: 0000000000000000 R09: ffff8c86de0a0000
[    4.023536] R10: 000000000000bffd R11: ffff8e5440037c40 R12: 0000000000000002
[    4.024255] R13: ffffffff8bbd5c11 R14: 0000000000000001 R15: ffffffff8a90587f
[    4.024957] FS:  0000000000000000(0000) GS:ffff8c86de200000(0000) knlGS:0000000000000000
[    4.025709] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.026307] CR2: 0000000000000000 CR3: 000000000e20c000 CR4: 0000000000350ff0
...

I also see

[    0.108802] **********************************************************
[    0.109339] **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
[    0.109883] **                                                      **
[    0.110470] ** This system shows unhashed kernel memory addresses   **
[    0.111011] ** via the console, logs, and other interfaces. This    **
[    0.111568] ** might reduce the security of your system.            **
[    0.112145] **                                                      **
[    0.112710] ** If you see this message and you are not debugging    **
[    0.113281] ** the kernel, report this immediately to your system   **
[    0.113841] ** administrator!                                       **
[    0.114378] **                                                      **
[    0.114890] **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
[    0.115439] **********************************************************

in the boot logs but there are no command line or kernel changes that would cause this, leading me to believe there is something codegen-wise going wrong. I am more than happy to provide more information if need be.

The BUG() in question can be viewed on line 73 here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/jump_label.c#n45

fhahn added a comment.Aug 19 2021, 1:56 AM

There's also https://bugs.llvm.org/show_bug.cgi?id=51531 which was also tracked down to this patch. It might be a good starting point.

I apologize that I do not have a ton of information (I am happy to provide whatever would be useful) but this patch causes an issue with booting an x86_64_defconfig Linux kernel build with ThinLTO (initially reported here: https://github.com/ClangBuiltLinux/linux/issues/1440).

$ make -skj"$(nproc)" LLVM=1 LLVM_IAS=1 distclean defconfig

$ scripts/config -e LTO_CLANG_THIN

$ make -skj"$(nproc)" LLVM=1 LLVM_IAS=1 olddefconfig bzImage

# from https://github.com/ClangBuiltLinux/boot-utils
$ ../boot-utils/boot-qemu.sh -a x86_64 -k . -t 30s
...
[    3.999233] jump_label: Fatal kernel bug, unexpected op at __mod_timer.llvm.2796501342197355523+0x37f/0x4d0 [ffffffff8a90587f] (eb 02 eb 29 41 != 66 90 0f 1f 00)) size:2 type:1
[    4.000759] ------------[ cut here ]------------
[    4.001232] kernel BUG at arch/x86/kernel/jump_label.c:73!
[    4.001746] invalid opcode: 0000 [#1] SMP NOPTI
[    4.002174] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.14.0-rc6+ #1
[    4.002227] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[    4.002227] Workqueue: events timer_update_keys
[    4.002227] RIP: 0010:__jump_label_patch+0x1ab/0x1c0
[    4.002227] Code: 5d 41 5e 41 5f 5d c3 48 c7 c7 2a 5f b0 8b 4c 89 fe 4c 89 fa 4c 89 f9 49 89 e8 45 89 e1 31 c0 41 56 e8 6c e8 0a 00 48 83 c4 08 <0f> 0b e8 be 54 c7 00 0f 0b 0f 0b 0f 0b 00 00 cc cc 00 00 cc cc 48
[    4.002227] RSP: 0018:ffff8e5440037d88 EFLAGS: 00010286
[    4.002227] RAX: 00000000000000a4 RBX: ffffffff8c20f964 RCX: 5fbdcdcfa2eb3900
[    4.002227] RDX: 0000000000000000 RSI: 0000000000000004 RDI: c0000000ffffbfff
[    4.002227] RBP: ffffffff8bbd5c11 R08: 0000000000000000 R09: ffff8c86de0a0000
[    4.002227] R10: 000000000000bffd R11: ffff8e5440037c40 R12: 0000000000000002
[    4.002227] R13: ffffffff8bbd5c11 R14: 0000000000000001 R15: ffffffff8a90587f
[    4.002227] FS:  0000000000000000(0000) GS:ffff8c86de200000(0000) knlGS:0000000000000000
[    4.002227] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.002227] CR2: 0000000000000000 CR3: 000000000e20c000 CR4: 0000000000350ff0
[    4.002227] Call Trace:
[    4.002227]  ? __mod_timer.llvm.2796501342197355523+0x37f/0x4d0
[    4.002227]  ? __mod_timer.llvm.2796501342197355523+0x38e/0x4d0
[    4.002227]  ? __mod_timer.llvm.2796501342197355523+0x381/0x4d0
[    4.002227]  ? arch_jump_label_transform_queue+0x26/0x60
[    4.002227]  ? __jump_label_update+0x9c/0x150
[    4.002227]  ? static_key_enable_cpuslocked+0x59/0x70
[    4.002227]  ? timer_update_keys+0x59/0x70
[    4.002227]  ? process_one_work+0x1cb/0x340
[    4.002227]  ? worker_thread+0x25d/0x490
[    4.002227]  ? kthread+0x1b6/0x1d0
[    4.002227]  ? worker_clr_flags+0x40/0x40
[    4.002227]  ? kthread_unuse_mm+0x80/0x80
[    4.002227]  ? ret_from_fork+0x22/0x30
[    4.002227] Modules linked in:
[    4.018292] ---[ end trace 15930e2291b4abfa ]---
[    4.018735] RIP: 0010:__jump_label_patch+0x1ab/0x1c0
[    4.019229] Code: 5d 41 5e 41 5f 5d c3 48 c7 c7 2a 5f b0 8b 4c 89 fe 4c 89 fa 4c 89 f9 49 89 e8 45 89 e1 31 c0 41 56 e8 6c e8 0a 00 48 83 c4 08 <0f> 0b e8 be 54 c7 00 0f 0b 0f 0b 0f 0b 00 00 cc cc 00 00 cc cc 48
[    4.021000] RSP: 0018:ffff8e5440037d88 EFLAGS: 00010286
[    4.021481] RAX: 00000000000000a4 RBX: ffffffff8c20f964 RCX: 5fbdcdcfa2eb3900
[    4.022178] RDX: 0000000000000000 RSI: 0000000000000004 RDI: c0000000ffffbfff
[    4.022878] RBP: ffffffff8bbd5c11 R08: 0000000000000000 R09: ffff8c86de0a0000
[    4.023536] R10: 000000000000bffd R11: ffff8e5440037c40 R12: 0000000000000002
[    4.024255] R13: ffffffff8bbd5c11 R14: 0000000000000001 R15: ffffffff8a90587f
[    4.024957] FS:  0000000000000000(0000) GS:ffff8c86de200000(0000) knlGS:0000000000000000
[    4.025709] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.026307] CR2: 0000000000000000 CR3: 000000000e20c000 CR4: 0000000000350ff0
...

I also see

[    0.108802] **********************************************************
[    0.109339] **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
[    0.109883] **                                                      **
[    0.110470] ** This system shows unhashed kernel memory addresses   **
[    0.111011] ** via the console, logs, and other interfaces. This    **
[    0.111568] ** might reduce the security of your system.            **
[    0.112145] **                                                      **
[    0.112710] ** If you see this message and you are not debugging    **
[    0.113281] ** the kernel, report this immediately to your system   **
[    0.113841] ** administrator!                                       **
[    0.114378] **                                                      **
[    0.114890] **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
[    0.115439] **********************************************************

in the boot logs but there are no command line or kernel changes that would cause this, leading me to believe there is something codegen-wise going wrong. I am more than happy to provide more information if need be.

The BUG() in question can be viewed on line 73 here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/jump_label.c#n45

@samitolvanen, @nathanchance, Thanks for report this!

I have comfirmed the boot failure. And after some digging, it shows that LLVM trunk remove some default case in cfg80211_edmg_chandef_valid, which is correct tranformation. however, some operation in tools/objtool/check.c emit "vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x169: can't find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x17b", and the boot failed. When I ignore this function by hack code, there is no warning emit and boot is success. I also comment on https://github.com/ClangBuiltLinux/linux/issues/1440, FYI.

@samitolvanen, @nathanchance, I do believe this undefined behavior caused by libtool, it passed when i removed the default switch case in cfg80211_edmg_chandef_valid

There's also https://bugs.llvm.org/show_bug.cgi?id=51531 which was also tracked down to this patch. It might be a good starting point.

Thanks, I'll take a look at it.

spatel reopened this revision.Aug 19 2021, 5:49 AM

Reopening - I reverted the patch based on the examples in PR51531.
We need to check that the default block is not the same as one or more of the case destination blocks or something like that.

This revision is now accepted and ready to land.Aug 19 2021, 5:49 AM
spatel requested changes to this revision.Aug 19 2021, 5:56 AM
This revision now requires changes to proceed.Aug 19 2021, 5:56 AM

@junparser I am not convinced this is a bug in objtool. @spatel 's comment makes me wonder if one of the other switch cases is becoming just return false because LTO allows LLVM to see that one of the if (max_contiguous < ...) checks is always true then this transform removes both that case and the default case because they are equivalent, which seems to be inline with what objtool is warning about. I am more than happy to test follow up versions of this patch as I am subscribed, feel free to ping me as well.

junparser updated this revision to Diff 367747.Aug 20 2021, 2:47 AM

Using SplitEdge to deal with multiple uses of default dest.

@junparser I am not convinced this is a bug in objtool. @spatel 's comment makes me wonder if one of the other switch cases is becoming just return false because LTO allows LLVM to see that one of the if (max_contiguous < ...) checks is always true then this transform removes both that case and the default case because they are equivalent, which seems to be inline with what objtool is warning about. I am more than happy to test follow up versions of this patch as I am subscribed, feel free to ping me as well.

@spatel @nathanchance , we can fix PR51531 now and kernel boot still failed. However, When I change default: return false to default: unreachable(), the boot also failed with thinlto. @nathanchance, can we have something reduced for us to debug?

Please rebased past 5d4f37e895487408e61498ec42e545ec5f778b5b - does it help?

yes, this also works for PR51531 but not the kernel boot, @nathanchance, It seems there are someting wrong with other pass(maybe after CVP), Still it is better to get reduced case.

Please rebased past 5d4f37e895487408e61498ec42e545ec5f778b5b - does it help?

yes, this also works for PR51531 but not the kernel boot,

Thanks for checking.

@nathanchance, It seems there are someting wrong with other pass(maybe after CVP), Still it is better to get reduced case.

Yep, we need an actionable reproducer.

I'm not sure if things are updated properly here, so I pushed baseline tests:
00a50f261784

Please rebase and make sure the tests are changing as expected.

I will be offline until Sunday at the earliest, I can try to get a reproducer of the issue then. Not sure if @samitolvanen would be able to provide one sooner.

All I can go off of at this point is the objtool warning so here is a reduced C program that reproduces with this patch:

$ cat chan.i
enum { false, true };
enum {
  IEEE80211_EDMG_BW_CONFIG_4 = 4,
  IEEE80211_EDMG_BW_CONFIG_5,
  IEEE80211_EDMG_BW_CONFIG_6,
  IEEE80211_EDMG_BW_CONFIG_7,
  IEEE80211_EDMG_BW_CONFIG_8,
  IEEE80211_EDMG_BW_CONFIG_9,
  IEEE80211_EDMG_BW_CONFIG_10,
  IEEE80211_EDMG_BW_CONFIG_11,
  IEEE80211_EDMG_BW_CONFIG_12,
  IEEE80211_EDMG_BW_CONFIG_13
} cfg80211_edmg_chandef_valid_chandef_1;
cfg80211_edmg_chandef_valid_max_contiguous;
cfg80211_edmg_chandef_valid_num_of_enabled() {
  switch (cfg80211_edmg_chandef_valid_chandef_1)
  case IEEE80211_EDMG_BW_CONFIG_4:
  case IEEE80211_EDMG_BW_CONFIG_13:
    if (cfg80211_edmg_chandef_valid_max_contiguous)
      switch (cfg80211_edmg_chandef_valid_chandef_1) {
      case IEEE80211_EDMG_BW_CONFIG_4:
      case IEEE80211_EDMG_BW_CONFIG_5:
      case IEEE80211_EDMG_BW_CONFIG_6:
      case IEEE80211_EDMG_BW_CONFIG_7:
        break;
      case IEEE80211_EDMG_BW_CONFIG_8:
      case IEEE80211_EDMG_BW_CONFIG_9:
      case IEEE80211_EDMG_BW_CONFIG_10:
      case IEEE80211_EDMG_BW_CONFIG_11:
        if (cfg80211_edmg_chandef_valid_num_of_enabled < 2)
        case IEEE80211_EDMG_BW_CONFIG_12:
        case IEEE80211_EDMG_BW_CONFIG_13:
          if (cfg80211_edmg_chandef_valid_num_of_enabled < 2)
          default:
            return false;
      }
  return true;
}

$ clang -O2 -flto=thin -fsplit-lto-unit -fvisibility=hidden -c -o chan.o chan.i
...

$ llvm-ar cDPrST chan.a chan.o

$ ld.lld -m elf_x86_64 -r -o vmlinux.o --whole-archive chan.a

$ ./objtool orc generate --duplicate --vmlinux --no-fp --no-unreachable --retpoline --uaccess vmlinux.o
vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid_num_of_enabled()+0x35: can't find jump dest instruction at .text.cfg80211_edmg_chandef_valid_num_of_enabled+0x4a

The objtool binary and other files are available here.

Hopefully that is enough to figure out what is going on here, I am happy to test or provide more information to the best of my ability.

The backend can, in general, create basic blocks that don't contain any instructions, and don't fall through to another block. A jump table entry can refer to such a block. I guess certain tools could be confused by this.

If that's the issue, it should be possible to work around it using "-mllvm -trap-unreachable".

I do not think that is the issue here. Adding -mllvm -trap-unreachable to the kernel's build flags does not silence objtool nor allows the kernel to boot. I switched over to CONFIG_UNWINDER_FRAME_POINTER instead of CONFIG_UNWINDER_ORC so objtool should only be running checks, not doing any sort of binary modifications, and I see the same warning. If the warning is not valid, then I can bring that up to the tool's author but it seems valid, given that it is pointing to the exact function that @junparser already identified as problematic.

+ tools/objtool/objtool check --duplicate --vmlinux --no-unreachable --retpoline --uaccess vmlinux.o
vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x16e: can't find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x180

I do not think that is the issue here. Adding -mllvm -trap-unreachable to the kernel's build flags does not silence objtool nor allows the kernel to boot. I switched over to CONFIG_UNWINDER_FRAME_POINTER instead of CONFIG_UNWINDER_ORC so objtool should only be running checks, not doing any sort of binary modifications, and I see the same warning. If the warning is not valid, then I can bring that up to the tool's author but it seems valid, given that it is pointing to the exact function that @junparser already identified as problematic.

+ tools/objtool/objtool check --duplicate --vmlinux --no-unreachable --retpoline --uaccess vmlinux.o
vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x16e: can't find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x180

@nathanchance I can confirm that @efriedma is right. since we are using lto, we need add -mllvm -trap-unreachable to link flags as well. After that we can see ud2 at the jump dest, then there is no warning from libtool, also boot has been succeed.

The backend can, in general, create basic blocks that don't contain any instructions, and don't fall through to another block. A jump table entry can refer to such a block. I guess certain tools could be confused by this.

If that's the issue, it should be possible to work around it using "-mllvm -trap-unreachable".

Thanks for the help! I wonder if we can enable this by default after this patch? Any suggest?

The backend can, in general, create basic blocks that don't contain any instructions, and don't fall through to another block. A jump table entry can refer to such a block. I guess certain tools could be confused by this.

If that's the issue, it should be possible to work around it using "-mllvm -trap-unreachable".

Thanks for the help! I wonder if we can enable this by default after this patch? Any suggest?

Assuming that the kernel source code isn't still being miscompiled, this sounds like a bug in objtool to me?

Thank you for all of the insight so far, I can confirm that adding -mllvm -trap-unreachable to KBUILD_{C,LD}FLAGS does resolve the issue. I have brought up this discussion with the objtool maintainers here and I will report back here depending on what they say; hopefully you do not mind holding off on merging the patch until there is a response.

Thank you for all of the insight so far, I can confirm that adding -mllvm -trap-unreachable to KBUILD_{C,LD}FLAGS does resolve the issue. I have brought up this discussion with the objtool maintainers here and I will report back here depending on what they say; hopefully you do not mind holding off on merging the patch until there is a response.

LGTM, Let's hold this patch and wait your update.

Hi @junparser , thank you kindly for helping us look into this; even building and booting kernels locally! Right now, we're chasing multiple regressions (https://github.com/ClangBuiltLinux/linux/issues/1438, https://github.com/ClangBuiltLinux/linux/issues/325#issuecomment-885197956, and this) with the first currently marked a clang-13 release blocker. Unless this patch fixes a regression marked as blocking the clang-13 release, can you please extend your generosity and patience so that we have another week to sort those out and revisit this?

Hi @junparser , thank you kindly for helping us look into this; even building and booting kernels locally! Right now, we're chasing multiple regressions (https://github.com/ClangBuiltLinux/linux/issues/1438, https://github.com/ClangBuiltLinux/linux/issues/325#issuecomment-885197956, and this) with the first currently marked a clang-13 release blocker. Unless this patch fixes a regression marked as blocking the clang-13 release, can you please extend your generosity and patience so that we have another week to sort those out and revisit this?

Thanks for the explanation. and of course let's wait for your time.

Thanks for your patience! D109103 has landed, please proceed with re-landing this. Thanks for your patience while we fixed that issue this patch uncovered!

@spatel ok for relanding?

spatel accepted this revision.Sep 8 2021, 7:45 PM

@spatel ok for relanding?

Yes, sounds like we found the problems.

This revision is now accepted and ready to land.Sep 8 2021, 7:45 PM
alexfh added a subscriber: alexfh.Sep 24 2021, 2:32 PM

FYI, we see a ~5x increase in compile time on some of our code (mainly on some large protobuf-generated files) after rG8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614. We're working on an isolated test case.

FYI, we see a ~5x increase in compile time on some of our code (mainly on some large protobuf-generated files) after rG8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614. We're working on an isolated test case.

With switches with the huge number of cases?

FYI, we see a ~5x increase in compile time on some of our code (mainly on some large protobuf-generated files) after rG8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614. We're working on an isolated test case.

With switches with the huge number of cases?

The iteration of switch cases does not change, so maybe It's caused by LVI->getConstantRange(SI->getCondition(), SI); ?

FYI, we see a ~5x increase in compile time on some of our code (mainly on some large protobuf-generated files) after rG8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614. We're working on an isolated test case.

With switches with the huge number of cases?

Yes, there are switches with large number of cases. I'm still working on a reduced test case though.

While it's running, maybe this information will help: I collected profile using perf record --call-graph lbr and these are the hottest functions:

-   98.71%     0.00%  clang-after  clang-after         [.] llvm::FPPassManager::runOnFunction                                                                                                                  ◆
   - llvm::FPPassManager::runOnFunction                                                                                                                                                                        ▒
      - 98.00% llvm::MachineFunctionPass::runOnFunction                                                                                                                                                        ▒
         - 86.87% (anonymous namespace)::MachineBlockPlacement::runOnMachineFunction                                                                                                                           ▒
            - 86.85% (anonymous namespace)::MachineBlockPlacement::buildCFGChains                                                                                                                              ▒
               - 80.96% (anonymous namespace)::MachineBlockPlacement::buildLoopChains                                                                                                                          ▒
                  - 80.35% (anonymous namespace)::MachineBlockPlacement::TopFallThroughFreq                                                                                                                    ▒
                       73.34% llvm::MachineBranchProbabilityInfo::getEdgeProbability                                                                                                                           ▒
               - 5.89% (anonymous namespace)::MachineBlockPlacement::buildChain                                                                                                                                ▒
                  - 5.88% (anonymous namespace)::MachineBlockPlacement::canTailDuplicateUnplacedPreds                                                                                                          ▒
                     - 5.82% hasSameSuccessors                                                                                                                                                                 ▒
                          llvm::SmallPtrSetImplBase::FindBucketFor                                                                                                                                             ▒
         - 4.57% (anonymous namespace)::BranchFolderPass::runOnMachineFunction                                                                                                                                 ▒
            - llvm::BranchFolder::OptimizeFunction                                                                                                                                                             ▒
               - 4.18% llvm::BranchFolder::TailMergeBlocks                                                                                                                                                     ▒
                    2.32% llvm::MachineBasicBlock::hasEHPadSuccessor                                                                                                                                           ▒
                    1.77% llvm::MachineBasicBlock::mayHaveInlineAsmBr                                                                                                                                          ▒
         - 3.75% (anonymous namespace)::PHIElimination::runOnMachineFunction                                                                                                                                   ▒
              3.43% llvm::LiveVariables::isLiveOut                                                                                                                                                             ▒
         - 0.65% (anonymous namespace)::TailDuplicateBase::runOnMachineFunction                                                                                                                                ▒
            - llvm::TailDuplicator::tailDuplicateBlocks                                                                                                                                                        ▒
                 0.64% llvm::TailDuplicator::tailDuplicateAndUpdate                                                                                                                                            ▒
        0.68% (anonymous namespace)::CodeGenPrepare::runOnFunction                                                                                                                                             ▒
...
-   73.82%    71.11%  clang-after  clang-after         [.] llvm::MachineBranchProbabilityInfo::getEdgeProbability                                                                                              ▒
   - 70.99% llvm::FPPassManager::runOnFunction                                                                                                                                                                 ▒
        llvm::MachineFunctionPass::runOnFunction                                                                                                                                                               ▒
        (anonymous namespace)::MachineBlockPlacement::runOnMachineFunction                                                                                                                                     ▒
      - (anonymous namespace)::MachineBlockPlacement::buildCFGChains                                                                                                                                           ▒
         - 70.99% (anonymous namespace)::MachineBlockPlacement::buildLoopChains                                                                                                                                ▒
            - 70.65% (anonymous namespace)::MachineBlockPlacement::TopFallThroughFreq                                                                                                                          ▒
                 llvm::MachineBranchProbabilityInfo::getEdgeProbability                                                                                                                                        ▒
     2.71% llvm::MachineBranchProbabilityInfo::getEdgeProbability                                                                                                                                              ▒

Reduced test case:

$ time ./clang-before -O3 -std=c++17 -c test.cc

real    0m0.549s
user    0m0.461s
sys     0m0.088s
$ time ./clang-after -O3 -std=c++17 -c test.cc

real    0m10.661s
user    0m10.497s
sys     0m0.163s

We'd appreciate a prompt fix or a workaround. If neither is possible, please revert the commit while working on a proper solution.

We'd appreciate a prompt fix or a workaround. If neither is possible, please revert the commit while working on a proper solution.

OK, I'll take a look. Thanks!

We'd appreciate a prompt fix or a workaround. If neither is possible, please revert the commit while working on a proper solution.

Well, it is not "common" code, switch 1000 cases. But easy fix is probably to introduce some reasonable limit when to bail out.

We'd appreciate a prompt fix or a workaround. If neither is possible, please revert the commit while working on a proper solution.

Well, it is not "common" code, switch 1000 cases. But easy fix is probably to introduce some reasonable limit when to bail out.

We'd appreciate a prompt fix or a workaround. If neither is possible, please revert the commit while working on a proper solution.

Well, it is not "common" code, switch 1000 cases. But easy fix is probably to introduce some reasonable limit when to bail out.

switch 1000 cases is not the issue, use loop induction variable as condition is. It cause LVI recursive call SolveBlockValue.

I've hit this issue in Jumpthreading pass. Maybe we should add one parameter for LVI::solve

We'd appreciate a prompt fix or a workaround. If neither is possible, please revert the commit while working on a proper solution.

Well, it is not "common" code, switch 1000 cases. But easy fix is probably to introduce some reasonable limit when to bail out.

We'd appreciate a prompt fix or a workaround. If neither is possible, please revert the commit while working on a proper solution.

Well, it is not "common" code, switch 1000 cases. But easy fix is probably to introduce some reasonable limit when to bail out.

switch 1000 cases is not the issue, use loop induction variable as condition is. It cause LVI recursive call SolveBlockValue.

@xbolva00 I was wrong. The compiler time regression is caused by CodeGen Passes rather than CVP or LVI.

We'd appreciate a prompt fix or a workaround. If neither is possible, please revert the commit while working on a proper solution.

Well, it is not "common" code, switch 1000 cases. But easy fix is probably to introduce some reasonable limit when to bail out.

switch 1000 cases is not the issue, use loop induction variable as condition is. It cause LVI recursive call SolveBlockValue.

@xbolva00 I was wrong. The compiler time regression is caused by CodeGen Passes rather than CVP or LVI.

This conclusion corresponds with the profile in my earlier comment (e.g. this line: 80.96% (anonymous namespace)::MachineBlockPlacement::buildLoopChains).

We'd appreciate a prompt fix or a workaround. If neither is possible, please revert the commit while working on a proper solution.

Well, it is not "common" code, switch 1000 cases. But easy fix is probably to introduce some reasonable limit when to bail out.

switch 1000 cases is not the issue, use loop induction variable as condition is. It cause LVI recursive call SolveBlockValue.

@xbolva00 I was wrong. The compiler time regression is caused by CodeGen Passes rather than CVP or LVI.

This conclusion corresponds with the profile in my earlier comment (e.g. this line: 80.96% (anonymous namespace)::MachineBlockPlacement::buildLoopChains).

However, i did not hit the regression with aarch64 target. I'll revert this first.

junparser reopened this revision.Sep 27 2021, 5:47 AM
This revision is now accepted and ready to land.Sep 27 2021, 5:47 AM

@alexfh Turns out it was caused by EarlyTailDuplicatePass, would you try with -mllvm -disable-early-taildup=true?

lkail added a subscriber: lkail.Sep 28 2021, 6:00 PM

@alexfh Turns out it was caused by EarlyTailDuplicatePass, would you try with -mllvm -disable-early-taildup=true?

-mllvm -disable-early-taildup=true reduces the compilation time significantly:

$ time ./clang-after -O3  -c -xc++ history/test.cc.reduced -o after2.o

real    0m10.626s
user    0m10.480s
sys     0m0.146s

$ time ./clang-after -O3 -mllvm -disable-early-taildup=true -c -xc++ history/test.cc.reduced -o after.o

real    0m0.340s
user    0m0.262s
sys     0m0.077s

What's your plan here? It looks like the issue in EarlyTailDuplicatePass should be fixed before re-landing this patch.

junparser closed this revision.Nov 1 2021, 12:35 AM

FYI, the last re-commit of this patch introduced another compile time regression. See https://reviews.llvm.org/rGc93f93b2e3f28997f794265089fb8138dd5b5f13 for more information.

rnk added a subscriber: rnk.Nov 22 2021, 11:18 AM