This is an archive of the discontinued LLVM Phabricator instance.

[X86] Do not emit JCC to __x86_indirect_thunk
ClosedPublic

Authored by joaomoreira on Sep 29 2022, 3:22 PM.

Details

Summary

Clang may optimize conditional tailcall blocks with the following layout:

cmp <condition>
je tailcall_target
ret

When retpoline is in place, indirect calls are converted into direct calls to a retpoline thunk. When these indirect calls are tail calls, they may be subject to the above described optimization (there is no indirect JCC, but since now the jump is direct it can be made conditional). The above layout is non-ideal for the Linux kernel scenario because the branches into thunks may be patched back into indirect branches during runtime depending on the underlying CPU features, what would not be feasible if the binary is emitted with the optimized layout above.

Thus, prevent clang from emitting this it if CodeModel is Kernel.

Feature request from the respective kernel mailing list: https://lore.kernel.org/llvm/Yv3uI%2FMoJVctmBCh@worktop.programming.kicks-ass.net/

Diff Detail

Event Timeline

joaomoreira created this revision.Sep 29 2022, 3:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2022, 3:22 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
joaomoreira requested review of this revision.Sep 29 2022, 3:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2022, 3:22 PM
pengfei added inline comments.Sep 29 2022, 11:50 PM
llvm/lib/Target/X86/X86InstrInfo.cpp
2953

Do we need to consider the 32-bit case?

Thanks for the patch!

llvm/lib/Target/X86/X86InstrInfo.cpp
2950

const MachineOperand &

llvm/test/CodeGen/X86/jcc-indirect-thunk-kernel.ll
2

If you don't have multiple run lines with different check-prefixes, then check prefixes are unnecessary. Please drop this flag, then just use the implicit default ; CHECK: below.

2

Consider using update_llc_test_checks.py llvm/test/CodeGen/X86/jcc-indirect-thunk-kernel.ll process this file.

2

does -x86-indirect-branch-tracking make a difference to this test?

10

delete this line.

25

can these attributes be reduced?

26

delete this attribute

Please update the subject of this revision in phab to prefix it with [X86] .

joaomoreira retitled this revision from Do not emit JCC to __x86_indirect_thunk to [X86] Do not emit JCC to __x86_indirect_thunk.
joaomoreira marked 7 inline comments as done.
joaomoreira added inline comments.
llvm/lib/Target/X86/X86InstrInfo.cpp
2953

Good question. I'll figure and I'll update the diff again soon.

Looking good, please mark comments as "Done" in phabricator UI so that reviewers know what's left to be addressed.

llvm/test/CodeGen/X86/jcc-indirect-thunk-kernel.ll
1

Compile for performance...

5

I think if you remove one layer of indirection then you might be able to drop the load instruction below?

21

delete

28

but optimize this fn for size? Are either/any of these necessary?

joaomoreira marked an inline comment as done.
joaomoreira added inline comments.
llvm/test/CodeGen/X86/jcc-indirect-thunk-kernel.ll
28

yes -- while experimenting what I noticed is that optsize is what ends up forcing the sequence:

cmp
je 1f
jmp indirect_thunk
1:
ret

to be converted in:

cmp
jne indirect_thunk
ret

Also, we are only preventing the above for jcc tail calls that end up targeting the external thunk, thus we want the indirect call to be converted into a direct call to the thunk. Thus we want the retpoline-external-thunk too. I was under the impression that retpoline-indirect-calls was also necessary at first, but just re-tested it and it seems that this one can go away.

joaomoreira marked an inline comment as done.Sep 30 2022, 12:17 PM
joaomoreira marked an inline comment as done.Sep 30 2022, 12:28 PM
joaomoreira added inline comments.
llvm/test/CodeGen/X86/jcc-indirect-thunk-kernel.ll
5

This is another weird one -- without the indirection, the jcc pattern is not generated, because the conditional (JE) is generated before the assignment of the function address to r11 and such assignment only happens if the conditional is met. With the indirection, the load takes the function address from memory and already places it on r11 before the condition is tested, what later makes room for fusing the pattern JE; JMP into JNE.

joaomoreira added inline comments.Sep 30 2022, 12:36 PM
llvm/test/CodeGen/X86/jcc-indirect-thunk-kernel.ll
1

what do you mean exactly?

llvm/test/CodeGen/X86/jcc-indirect-thunk-kernel.ll
1

ah, nvm. It looks like llc doesn't accept -Os; that -O2 and optsize is how we get -Os in llc. TIL.

5

It looks like those 2 cases diverge during

Control Flow Optimizer (branch-folder)

due to block placement. Not particularly sure why, but seems like it may be brittle. I guess this test case must come from:

void foo (void (**something)(void)) {
  if (*something)
    (*something)();
}

I guess we'll keep the additional indirection for now...

6

should just be an opaque ptr

13

Now consider either adding a comment that "the intent of the test is that we do not generate conditional tail calls to the thunk" or an explicit CHECK-NOT: jne __x86_indirect_thunk_r11 line (triple check that would not get removed by update_llc_checks) or both.

17–18

I'm surprised to see typed pointers, since we've enabled opaque pointers. Was this test case generated from an older release of llvm? I'd have expected

%0 = load ptr, ptr %something, align 8
%tobool.not = icmp eq ptr %0, null

no void()* or void()**. Please change this, and triple check if you're using an older version of llvm for development, or have something wrong in your cmakecache. We'd like to eliminate non-opaque ptrs from the codebase, IIUC.

21

This isn't done; just the context I highlighted has moved. Please remove the dangling reference to attribute #1 from the tail call.

joaomoreira added inline comments.Sep 30 2022, 2:32 PM
llvm/test/CodeGen/X86/jcc-indirect-thunk-kernel.ll
17–18

The test was not fully generated out of a kernel use case. After generating the IR for the test, I tweaked a lot by hand. I wasn't up to date with the changes (in fact, I very rarely touch IR as I'm normally working on the backend), thanks for updating me. I'll fix this.

joaomoreira marked an inline comment as done.Sep 30 2022, 2:39 PM
joaomoreira added inline comments.
llvm/lib/Target/X86/X86InstrInfo.cpp
2953

Just got told that 32bit is not relevant for the kernel scenario where this matters.

joaomoreira marked 3 inline comments as done.
nickdesaulniers accepted this revision.Sep 30 2022, 3:44 PM

Thanks for the patch. I don't see any conditional jmp's to __x86_indirect_thunk_r11 in defconfig or defconfig+thinLTO. Booted just fine.

I then tested those two with https://lore.kernel.org/all/20220817185410.1174782-1-nathan@kernel.org/ applied on top, which turns the jmp/calls into cs prefixed versions.

This revision is now accepted and ready to land.Sep 30 2022, 3:44 PM
pengfei accepted this revision.Sep 30 2022, 9:20 PM
pengfei added inline comments.
llvm/lib/Target/X86/X86InstrInfo.cpp
2953

Thanks!

(Sorry, I'm still working on testing the kernel with this as per directions in https://lore.kernel.org/llvm/Yv3uI%2FMoJVctmBCh@worktop.programming.kicks-ass.net/)

This revision was automatically updated to reflect the committed changes.