Page MenuHomePhabricator

[X86][VARARG] Avoid spilling xmm registers for va_start.
Changes PlannedPublic

Authored by avl on May 18 2020, 2:24 PM.

Details

Summary

That review is extracted from D69372.
It fixes https://bugs.llvm.org/show_bug.cgi?id=42219 bug.

For the noimplicitfloat mode, the compiler mustn't generate
floating-point code if it was not asked directly to do so.
This rule does not work with variable function arguments currently.
Though compiler correctly guards block of code, which copies xmm vararg
parameters with a check for %al, it does not protect spills for xmm registers.
Thus, such spills are generated in non-protected areas and could break code,
which does not expect floating-point data. The problem happens in -O0
optimization mode. With this optimization level there is used
FastRegisterAllocator, which spills virtual registers at basic block boundaries.
Register Allocator does not protect spills with additional control-flow modifications.
Thus to resolve that problem, it is suggested to not copy incoming physical
registers into virtual registers. Instead, store incoming physical xmm registers
into the memory from scratch.

Diff Detail

Event Timeline

avl created this revision.May 18 2020, 2:24 PM
aeubanks resigned from this revision.May 26 2020, 10:14 AM

not knowledgeable enough on this

RKSimon added a subscriber: efriedma.

Do we need better i686- triple testing?

I don't know much about this area - @craig.topper @efriedma any thoughts?

llvm/lib/Target/X86/X86ExpandPseudo.cpp
506

Any reason not to deal with YMM/ZMM as well?

llvm/test/CodeGen/X86/x32-va_start.ll
3–5

Regenerate with update_llc ?

avl added a comment.Aug 16 2020, 2:49 PM

Do we need better i686- triple testing?

Will rebase and retest, including clang bootstrap and clang bootstrap with address sanitizer.

llvm/lib/Target/X86/X86ExpandPseudo.cpp
506

The reason is to not complicate the patch. YMM/ZMM is not supported for varargs currently. So YMM/ZMM support needs its own patch independent from noimplicitfloat problem.

llvm/test/CodeGen/X86/x32-va_start.ll
3–5

will do that way. thanks.

avl added a comment.Aug 17 2020, 5:33 AM

Do we need better i686- triple testing?

It looks like I misunderstood the question. This patch should affect X86-64(AMD64 abi) only and should not affect 32-bit abi. Thus I would add checking for i386,i686 triples into the llvm/test/CodeGen/X86/x32-va_start.ll. That test would show that 32-bit abi is not affected(as it currently is checked for -mtriple=x86_64-linux-gnux32 -mattr=-sse).

@RKSimon The general approach of using a pseudo to hide the control flow until after register allocation seems fine. I'm not going to try to review the exact instruction sequences.

avl updated this revision to Diff 290184.Sep 6 2020, 11:17 PM

rebased,
retested(bootstrap with check-all),
added i386/i686 triples into X86/x32-va_start.ll,
regenerated tests with update_llc_test_checks.py.

avl added a comment.Sep 14 2020, 3:01 AM

@RKSimon @craig.topper Would you mind to take a look at this review, please?

craig.topper added inline comments.Sep 14 2020, 8:40 PM
llvm/lib/Target/X86/X86ExpandPseudo.cpp
531

Why do we only add them ass liveins this late? Shouldn't they have been livein when the block was created in the custom inserter?

llvm/lib/Target/X86/X86ISelLowering.cpp
31654

Is RegIdx used here?

31657

I'm not familiar with InternalRead. The documentation says it means it was defined by the same instruction. Is that case here?

avl added inline comments.Sep 15 2020, 4:51 AM
llvm/lib/Target/X86/X86ExpandPseudo.cpp
531

Things were done that way(fill liveins when the block was created
in the custom inserter) in the first version of the fix.
But it creates new liveness rule for physical registers
(newly added block should have physical registers in its liveins
but it is not allowed before regalloc):

https://reviews.llvm.org/D69372?id=227838#change-TJhqJRPUjpDM

So there was a request to do things in other way, not requiring
to change liveness rule:

https://reviews.llvm.org/D69372#1734756

The solution done in that patch creates a block with SAVE_VARARG_XMM_REGS
pseudo instruction. That pseudo instruction defines xmm registers
(used by varargs). After register allocator, SAVE_VARARG_XMM_REGS pseudo
instruction is expanded to the real copy instructions and put xmm registers
into the block liveins.

llvm/lib/Target/X86/X86ISelLowering.cpp
31654

no. will delete it.

31657

that is how the solution works: SAVE_VARARG_XMM_REGS pseudo declares
itself as an instruction which reads the registers defined by itself. And later
(when SAVE_VARARG_XMM_REGS is expanded) the correct definitions
would be put into the liveins.

avl added inline comments.Sep 15 2020, 11:00 AM
llvm/lib/Target/X86/X86ExpandPseudo.cpp
531

But... Reading this recent comment https://reviews.llvm.org/D80163#2224926 I think I implemented not exactly what was requested. Implementation from this patch does not hide control flow. Would do implementation which will hide control flow.

avl planned changes to this revision.Oct 22 2020, 2:59 AM