Page MenuHomePhabricator

Merge memtag instructions with adjacent stack slots.
ClosedPublic

Authored by eugenis on Nov 14 2019, 5:58 PM.

Details

Summary

Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence

  • replace STG + STG with ST2G
  • replace STGloop + STGloop with STGloop

This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.

This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and base address as a FI operand when
possible. This is needed to simplify recognizing an STGloop instruction
as operating on a stack slot post-regalloc.

This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).

Diff Detail

Event Timeline

eugenis created this revision.Nov 14 2019, 5:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2019, 5:58 PM
Herald added a subscriber: hiraditya. · View Herald Transcript

Ping.

Most of the smaller changes in this patch (almost everything outside of AArch64FrameLowering.cpp) are needed to handle FrameIndex operand in STGloop. It is so special because STGloop has a writeback constraint, which regalloc has no way of honoring in the case when the input operand is FrameIndex; therefore eliminateFrameIndex has to be smart about the choice of the scratch register there.

eugenis planned changes to this revision.Dec 17 2019, 11:09 AM
eugenis updated this revision to Diff 235973.Jan 2 2020, 4:54 PM

Cleaned up the code and extended it to allow gaps in a sequence of memory tagging instructions (in which case subsequences before and after the gap would be optimized separatly).

Unit tests: pass. 61177 tests passed, 0 failed and 729 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

eugenis updated this revision to Diff 235977.Jan 2 2020, 5:20 PM

clang-format

Unit tests: pass. 61177 tests passed, 0 failed and 729 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

ostannard added inline comments.Jan 6 2020, 9:40 AM
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
2731

Instructions must be added in the ascending order of Offset, and have to be adjacent.

Could we add an assertion to verify this? Either TagStores must be empty, or this instruction must be adjacent to the last one.

2867

Why do we want to return an empty list if any one input instruction doesn't have a memoperand?

2895

Where does the number 176 come from?

3025

This can be Instrs.back()

llvm/lib/Target/AArch64/AArch64InstrInfo.td
1540–1541

This comment needs updating to mention the size operand.

llvm/lib/Target/AArch64/AArch64RegisterInfo.h
127 ↗(On Diff #235977)

Whitespace change.

eugenis updated this revision to Diff 236465.Jan 6 2020, 2:56 PM
eugenis marked 6 inline comments as done.

Addressed review comments and most of the clang-tidy findings.

eugenis added inline comments.Jan 6 2020, 3:08 PM
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
2867

Added a comment. No memoperand wins over any memoperand.
See MachineInstr::mayAlias, MachineInstr::cloneMergedMemRefs.

2895

It's approximately the breaking point between loop and unrolled STG sequence in terms of code size. Anything <176 bytes can be tagged with 5 instructions or less; the loop body itself is 3, but it usually needs one or two instructions to prepare the counter and the base pointer.

We can tune it for performance later when the hardware is available.

Unit tests: pass. 61260 tests passed, 0 failed and 736 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

This revision is now accepted and ready to land.Jan 8 2020, 1:59 AM
This revision was automatically updated to reflect the committed changes.
eugenis marked an inline comment as done.Jan 8 2020, 2:39 PM
eugenis added inline comments.
llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
457

Hmm, there is actually a check against this in the verifier:

I've reverted the change for now.

eugenis reopened this revision.Jan 8 2020, 2:39 PM
This revision is now accepted and ready to land.Jan 8 2020, 2:39 PM
eugenis updated this revision to Diff 236939.Jan 8 2020, 5:20 PM

Fixed the tied use of a non-register problem by introducing _untied versions to STGloop pseudos for use with a FrameIndex operand.

Unit tests: pass. 61321 tests passed, 0 failed and 736 were skipped.

clang-tidy: pass.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

ostannard added inline comments.Jan 9 2020, 2:38 AM
llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
460

Based on the name of the function, I wouldn't expect this to modify the instruction passed in. I think it would be better to hoist this code out to the call-site, or rename the function to something like createScratchRegisterForInstruction, and note that it might do this in a comment.

eugenis updated this revision to Diff 237188.Jan 9 2020, 1:46 PM

Updated a comment.
Renamed "old" STGloop instructions to STGloop_wback, and new ones to STGloop.

Unit tests: pass. 61728 tests passed, 0 failed and 779 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

friendly ping

This revision was automatically updated to reflect the committed changes.