This is an archive of the discontinued LLVM Phabricator instance.

Reland "[X86][ABI] Don't preserve return regs for preserve_all/preserve_most CCs""
ClosedPublic

Authored by AntonBikineev on Feb 6 2023, 11:09 AM.

Details

Summary

The original change mistakenly excluded parameter registers from the
list of callee-saved-registers. This reland fixes it - it only excludes
the return registers for these calling conventions.

Phoebe, could you please take a look? This would be highly appreciated.

Original description:

Currently both calling conventions preserve registers that are used to
store a return value. This causes the returned value to be lost:

define i32 @bar() {
  %1 = call preserve_mostcc i32 @foo()
  ret i32 %1
}

define preserve_mostcc i32 @foo() {
  ret i32 2
  ; preserve_mostcc will restore %rax,
  ; whatever it was before the call.
}

This contradicts the current documentation (preserve_allcc "behaves
identical to the C calling conventions on how arguments and return
values are passed") and also breaks [[clang::preserve_most]].

This change makes CSRs be preserved iff they are not used to store a
return value (e.g. %rax for scalars, {%rax:%rdx} for __int128, %xmm0
for double). For void functions no additional registers are
preserved, i.e. the behaviour is backward compatible with existing
code.

Diff Detail

Event Timeline

AntonBikineev created this revision.Feb 6 2023, 11:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2023, 11:09 AM
AntonBikineev requested review of this revision.Feb 6 2023, 11:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2023, 11:09 AM

Add a test case for the previous fail?

goldstein.w.n added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
4936–4940

Do you need shouldDisableRetRegFromCSR here? It seems shouldDisableRetRegFromCSR is a superset.

Thanks for the comments!

Add a test case for the previous fail?

I've changed the test functions to take parameters, e.g.

define preserve_allcc RETTYPE @preserve_allcc1(i64, i64, double, double)

This covers the broken scenario.

llvm/lib/Target/X86/X86ISelLowering.cpp
4936–4940

Technically it's not needed, but I added it for readability, since we do need to create a separate RegMask for preserve_* calls.

(slightly edited code in LowerCall for readability)

This covers the broken scenario.

Can you explain it in more detail? I checked the test locally but found no difference in codegen with or without the new change.

This covers the broken scenario.

Can you explain it in more detail? I checked the test locally but found no difference in codegen with or without the new change.

Did you include the original (broken) CL in the baseline? The CL was reverted, so you'd have to apply it to see the codegen diff. But the diff is basically this:

  _preserve_mostcc1:                      ## @preserve_mostcc1
  ## %bb.0:                               ## %entry
        pushq   %r10
        pushq   %r9
        pushq   %r8
+       pushq   %rdi
+       pushq   %rsi
        pushq   %rdx
        pushq   %rcx
        pushq   %rax
        pushq   %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r13
        pushq   %r12
        pushq   %rbx
        ## InlineAsm Start
        ## InlineAsm End
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %r15
        popq    %rbp
        popq    %rax
        popq    %rcx
        popq    %rdx
+       popq    %rsi
+       popq    %rdi
        popq    %r8
        popq    %r9
        popq    %r10
        retq

The original (broken) CL didn't preserve parameter regs (%rdi, %rsi).

This covers the broken scenario.

Can you explain it in more detail? I checked the test locally but found no difference in codegen with or without the new change.

Did you include the original (broken) CL in the baseline? The CL was reverted, so you'd have to apply it to see the codegen diff. But the diff is basically this:

No, I compared between commenting out if (ShouldDisableArgRegs) { and not, and didn't see any difference. Or I misunderstood something?

  _preserve_mostcc1:                      ## @preserve_mostcc1
  ## %bb.0:                               ## %entry
        pushq   %r10
        pushq   %r9
        pushq   %r8
+       pushq   %rdi
+       pushq   %rsi
        pushq   %rdx
        pushq   %rcx
        pushq   %rax
        pushq   %rbp
        pushq   %r15
        pushq   %r14
        pushq   %r13
        pushq   %r12
        pushq   %rbx
        ## InlineAsm Start
        ## InlineAsm End
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %r15
        popq    %rbp
        popq    %rax
        popq    %rcx
        popq    %rdx
+       popq    %rsi
+       popq    %rdi
        popq    %r8
        popq    %r9
        popq    %r10
        retq

The original (broken) CL didn't preserve parameter regs (%rdi, %rsi).

That meets my expectation, I just didn't confirm it.

This covers the broken scenario.

Can you explain it in more detail? I checked the test locally but found no difference in codegen with or without the new change.

Did you include the original (broken) CL in the baseline? The CL was reverted, so you'd have to apply it to see the codegen diff. But the diff is basically this:

No, I compared between commenting out if (ShouldDisableArgRegs) { and not, and didn't see any difference. Or I misunderstood something?

Indeed, the tests didn't cover this case. Thanks! I added new tests dynamic-regmask-preserve-*.ll to check exactly the preserved RegMask on the caller side. Now the difference between the broken commit is much clearer on the caller side. For example, commenting out if (ShouldDisableArgRegs) { would remove rdi, rsi, rdx, rcx, r8 and their subregisters from the CustomRegMask in the call:

CALL64pcrel32 @callee1, CustomRegMask($bh,$bl,$bp,$bph,$bpl,$bx,$ch,$cl,$cx,$dh,$di,$dih,$dil,$dl,$dx,$ebp,$ebx,$ecx,$edi,$edx,$esi,$hbp,$hbx,$hcx,$hdi,$hdx,$hsi,$rbp,$rbx,$rcx,$rdi,$rdx,$rsi,$si,$sih,$sil,$r8,$r9,$r10,$r12,$r13,$r14,$r15,$r8b,$r9b,$r10b,$r12b,$r13b,$r14b,$r15b,$r8bh,$r9bh,$r10bh,$r12bh,$r13bh,$r14bh,$r15bh,$r8d,$r9d,$r10d,$r12d,$r13d,$r14d,$r15d,$r8w,$r9w,$r10w,$r12w,$r13w,$r14w,$r15w,$r8wh,$r9wh,$r10wh,$r12wh,$r13wh,$r14wh,$r15wh), implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit $rcx, implicit $r8, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
    ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp

The updated revision also contains a slight functional change: for functions with the sret parameter (function returning a struct by value) I still preserve %rax. The standard SysV CC requires the sret (i.e. %rdi) be copied into %rax, but for the preserve_* CCs this is redundant and is not actually used, since it's always the same as the preserved %rdi.

(Add tests for checking RegMask and don't preserve %rax for sret functions)

This covers the broken scenario.

Can you explain it in more detail? I checked the test locally but found no difference in codegen with or without the new change.

Did you include the original (broken) CL in the baseline? The CL was reverted, so you'd have to apply it to see the codegen diff. But the diff is basically this:

No, I compared between commenting out if (ShouldDisableArgRegs) { and not, and didn't see any difference. Or I misunderstood something?

Indeed, the tests didn't cover this case. Thanks! I added new tests dynamic-regmask-preserve-*.ll to check exactly the preserved RegMask on the caller side. Now the difference between the broken commit is much clearer on the caller side. For example, commenting out if (ShouldDisableArgRegs) { would remove rdi, rsi, rdx, rcx, r8 and their subregisters from the CustomRegMask in the call:

CALL64pcrel32 @callee1, CustomRegMask($bh,$bl,$bp,$bph,$bpl,$bx,$ch,$cl,$cx,$dh,$di,$dih,$dil,$dl,$dx,$ebp,$ebx,$ecx,$edi,$edx,$esi,$hbp,$hbx,$hcx,$hdi,$hdx,$hsi,$rbp,$rbx,$rcx,$rdi,$rdx,$rsi,$si,$sih,$sil,$r8,$r9,$r10,$r12,$r13,$r14,$r15,$r8b,$r9b,$r10b,$r12b,$r13b,$r14b,$r15b,$r8bh,$r9bh,$r10bh,$r12bh,$r13bh,$r14bh,$r15bh,$r8d,$r9d,$r10d,$r12d,$r13d,$r14d,$r15d,$r8w,$r9w,$r10w,$r12w,$r13w,$r14w,$r15w,$r8wh,$r9wh,$r10wh,$r12wh,$r13wh,$r14wh,$r15wh), implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit $rdx, implicit $rcx, implicit $r8, implicit-def $rsp, implicit-def $ssp, implicit-def $rax
    ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp

That makes sense to me, thanks for the test!

The updated revision also contains a slight functional change: for functions with the sret parameter (function returning a struct by value) I still preserve %rax. The standard SysV CC requires the sret (i.e. %rdi) be copied into %rax, but for the preserve_* CCs this is redundant and is not actually used, since it's always the same as the preserved %rdi.

I doubt about it, given it is ABI specified. Are there existing callers use RAX to access date rather than rdi?

llvm/test/CodeGen/X86/dynamic-regmask-preserve-most.ll
2

Why choose machine-sink as the check point?

4

What's this RUN checking for?

AntonBikineev marked 2 inline comments as done.Feb 8 2023, 9:27 AM

The updated revision also contains a slight functional change: for functions with the sret parameter (function returning a struct by value) I still preserve %rax. The standard SysV CC requires the sret (i.e. %rdi) be copied into %rax, but for the preserve_* CCs this is redundant and is not actually used, since it's always the same as the preserved %rdi.

I doubt about it, given it is ABI specified. Are there existing callers use RAX to access date rather than rdi?

For these CCs there shouldn't be any callers expecting %rax to change since it's anyway been preserved. In that sense the change doesn't break any existing users that may use sret with these CCs.

llvm/test/CodeGen/X86/dynamic-regmask-preserve-most.ll
2

Hm, I just copied the code from dynamic-regmask.ll. I guess we can stop after finalize-isel to check CSRs and masks.

4

(removed)

AntonBikineev marked 2 inline comments as done.

(fix RUN: commands for tests)

should there be a test with avx512 for xmm16-31?

llvm/lib/Target/X86/X86ISelLowering.cpp
3388

CallConv == CallingConv::X86_RegCall

should there be a test with avx512 for xmm16-31?

Actually, preserve_all doesn't preserve them, which is apparently missing, since the CCs were introduced short after AVX512 was proposed. It would probably make sense to introduce

def CSR_64_RT_AllRegs_AVX512 : CalleeSavedRegs<(add CSR_64_RT_MostRegs,
                                               (sequence "ZMM%u", 0, 31))>;

but I believe it should go to a separate change.

llvm/lib/Target/X86/X86ISelLowering.cpp
3388

This would also need || MF.getFunction().hasFnAttribute("no_caller_saved_registers"), so I don't think it'd be more readable overall.

pengfei accepted this revision.Feb 8 2023, 6:28 PM

LGTM.

This revision is now accepted and ready to land.Feb 8 2023, 6:28 PM

Thanks a lot for taking a look!