Page MenuHomePhabricator

[AArch64] Match the windows canonical callee saved register order [alternative implementation]
AbandonedPublic

Authored by mstorsjo on Thu, Oct 1, 2:09 AM.

Details

Summary

On windows, the callee saved registers in a canonical prologue are ordered starting from a lower register number at a lower stack address (with the possible gap for aligning the stack at the top); this is the opposite order that llvm normally produces.

To achieve this, make computeCalleeSaveRegisterPairs lay registers out in the reverse direction, bottom up.

This keeps the logic mostly straightforward compared to before, but gives some amount of mismatch between the stack objects generated by PrologEpilogInserter, which are laid out from top to bottom in the original register order.

This is an alternative to D88543.

Diff Detail

Unit TestsFailed

TimeTest
1,130 mslinux > libarcher.parallel::parallel-nosuppression.c
Script: -- : 'RUN: at line 15'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/parallel/parallel-nosuppression.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-nosuppression.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0' /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-nosuppression.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-nosuppression.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/parallel/parallel-nosuppression.c

Event Timeline

mstorsjo created this revision.Thu, Oct 1, 2:09 AM
mstorsjo requested review of this revision.Thu, Oct 1, 2:10 AM

The mismatch with generated stack objects can be observed in the new wineh-frame-scavenge.mir testcase. The full output of the tested command contains this:

- { id: 0, name: '', type: default, offset: -4, size: 4, alignment: 4, 
    stack-id: default, callee-saved-register: '', callee-saved-restored: true, 
    local-offset: -4, debug-info-variable: '', debug-info-expression: '', 
    debug-info-location: '' }
- { id: 1, name: '', type: spill-slot, offset: -16, size: 8, alignment: 16, 
    stack-id: default, callee-saved-register: '$x19', callee-saved-restored: true, 
    debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
- { id: 2, name: '', type: spill-slot, offset: -24, size: 8, alignment: 8, 
    stack-id: default, callee-saved-register: '$x20', callee-saved-restored: true, 
    debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
- { id: 3, name: '', type: spill-slot, offset: -32, size: 8, alignment: 8, 
    stack-id: default, callee-saved-register: '$x21', callee-saved-restored: true, 
    debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
- { id: 4, name: '', type: spill-slot, offset: -40, size: 8, alignment: 8, 
    stack-id: default, callee-saved-register: '$x22', callee-saved-restored: true, 
    debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
- { id: 5, name: '', type: spill-slot, offset: -48, size: 8, alignment: 8, 
    stack-id: default, callee-saved-register: '$x23', callee-saved-restored: true, 
    debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }

While the actual generated prolog looks like this:

early-clobber $sp = frame-setup STPXpre killed $x19, killed $x20, $sp, -6 :: (store 8 into %stack.4), (store 8 into %stack.5) 
frame-setup SEH_SaveRegP_X 19, 20, -48
frame-setup STPXi killed $x21, killed $x22, $sp, 2 :: (store 8 into %stack.2), (store 8 into %stack.3)
frame-setup SEH_SaveRegP 21, 22, 16
frame-setup STRXui killed $x23, $sp, 4 :: (store 8 into %stack.1)
frame-setup SEH_SaveReg 23, 32
frame-setup SEH_PrologEnd

or more readably in the final assembly form like this:

	stp	x19, x20, [sp, #-48]!           ; 16-byte Folded Spill
	.seh_save_regp_x x19, 48
	stp	x21, x22, [sp, #16]             ; 16-byte Folded Spill
	.seh_save_regp x21, 16
	str	x23, [sp, #32]                  ; 8-byte Folded Spill
	.seh_save_reg x23, 32
	.seh_endprologue

Note how the MIR seems to think that x23 is saved at stack offset -48, while in reality, x19 is saved there.

This keeps the logic mostly straightforward compared to before, but gives some amount of mismatch between the stack objects generated by PrologEpilogInserter, which are laid out from top to bottom in the original register order.

It seems worth adding a little extra code to avoid this.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
2092

When I said "iterate backwards" in the other review, I was more thinking of changing this for loop to iterate over CSI starting from Count - 1, not reversing the stores relative to the frame objects.

mstorsjo abandoned this revision.Thu, Oct 1, 11:35 PM

Superseded by D88677.